Reinforced Disentangled HTML Representation Learning with Hard-Sample Mining for Phishing Webpage Detection

Jun Ho Yoon; Seok Jun Buu; Hae Jung Kim

Journal ArticleOPEN ACCESS

Reinforced Disentangled HTML Representation Learning with Hard-Sample Mining for Phishing Webpage Detection

Electronics (Switzerland) (2025) 14(6)

DOI: 10.3390/electronics14061080

1Citations

18Readers

Get full text

Abstract

Phishing webpage detection is critical in combating cyber threats, yet distinguishing between benign and phishing webpages remains challenging due to significant feature overlap in the representation space. This study introduces a reinforced Triplet Network to optimize disentangled representation learning tailored for phishing detection. By employing reinforcement learning, the method enhances the sampling of anchor, positive, and negative examples, addressing a core limitation of traditional Triplet Networks. The disentangled representations generated through this approach provide a clear separation between benign and phishing webpages, substantially improving detection accuracy. To achieve comprehensive modeling, the method integrates multimodal features from both URLs and HTML DOM Graph structures. The evaluation leverages a real-world dataset comprising over one million webpages, meticulously collected for diverse and representative phishing scenarios. Experimental results demonstrate a notable improvement, with the proposed method achieving a 6.7% gain in the F1 score over state-of-the-art approaches, highlighting its superior capability and the dataset’s critical role in robust performance.

Author supplied keywords

Cite

CITATION STYLE

APA

Yoon, J. H., Buu, S. J., & Kim, H. J. (2025). Reinforced Disentangled HTML Representation Learning with Hard-Sample Mining for Phishing Webpage Detection. Electronics (Switzerland), 14(6). https://doi.org/10.3390/electronics14061080

Reinforced Disentangled HTML Representation Learning with Hard-Sample Mining for Phishing Webpage Detection

Abstract

Author supplied keywords

Cite

Register to see more suggestions