Detection of Phishing Websites Using Machine Learning
DOI:
https://doi.org/10.21467/proceedings.7.6.62Keywords:
Phishing detection, machine learning, cybersecurityAbstract
Phishing is still a major cybersecurity issue, with perpetrators using advanced methods to accurately replicate legitimate websites and therefore obtain confidential data. We developed an innovative, end-to-end, machine learning-powered system for identifying phishing sites, which combines multiple machine learning techniques to leverage their power. The integrated model of our system uniquely fuses three methodologies: URL feature analysis, content-based analysis, and visual similarity analysis to enhance detection accuracy. Our system features a wider variety of feature with a hybrid model that uses Gradient Boosting to select features and Deep Neural Networks to classify optimized by ensemble learning as opposed to previous works that mainly use URL features or single-model approaches. On a diverse dataset of 10,000 websites, our method reaches a striking 97.3% accuracy, thereby single-algorithm solutions are outperformed by a large margin. Due to the system's real-time functionality, it can be effectively used as a browser extension or be integrated into security software, thus equipping the user with a strong weapon against skillfully crafted phishing attacks.
References
[1] Jones, M. Khonji, and Y. Iraqi, “A Literature Review on Phishing Detection,” IEEE Commun. Surveys Tuts., vol. 15, no. 4, pp. 2091–2121, 2013.
[2] Z. Dou, I. Khalil, A. Khreishah, A. Al-Fuqaha, and M. Guizani, “Systematization of knowledge: A systematic review of software-based web phishing detection,” IEEE Commun. Surveys Tuts., 2017.
[3] Zhang, S. Sheng, B. Wardman, G. Warner, L. F. Cranor, and J. Hong, “Phishing Blacklists: An Empirical Study,” in Proc. 6th Conf. Email and Anti-Spam (CEAS), Mountain View, CA, USA, Jul. 2009.
[4] F. Vanhoenshoven, G. Nápoles, R. Falcon, K. Vanhoof, and M. Köppen, “Detecting dangerous URLs with machine learning,” in Proc. IEEE Symp. Series Comput. Intell. (SSCI), Dec. 2016.
[5] G. Xiang, J. I. Hong, C. P. Rosé, and L. Cranor, “CANTINA+: A feature-rich machine learning framework for phishing detection,” ACM Trans. Inf. Syst. Security, vol. 14, no. 2, Art. No. 21, 2011.
[6] R. B. Basnet and A. H. Sung, “Mining the web to detect phishing URLs,” in Proc. Int. Conf. Mach. Learn. Appl., vol. 1, pp. 568–573, Dec. 2012.
[7] L. A. T. Nguyen, B. L. To, H. K. Nguyen, and M. H. Nguyen, “A novel approach for phishing detection using URL-based heuristic,” in Proc. IEEE Int. Conf. Comput., Manage., Telecommun. (ComManTel), 2014.
[8] E. M. El-Alfy, “Probabilistic neural networks and k-medoids clustering for phishing detection,” Comput. J., vol. 60, no. 12, pp. 1745–1759, 2017.
[9] I. Krishnamurthi and R. Gowtham, “A thorough and effective system for detecting phishing websites,” Comput. Security, vol. 40, pp. 23–37, 2014.
[10] H. Sanders, J. Saxe, R. Harang, and C. Wild, “A deep learning approach to detecting malicious web content,” in Proc. IEEE Symp. Security Privacy Workshops (SPW), San Francisco, CA, USA, pp. 8–14, Aug. 2018.
[11] Ye, E. Zhu, D. Liu, F. Liu, F. Wang, and X. Li, “An effective phishing detection model using neural networks,” in Proc. IEEE Int. Symp. Parallel Distrib. Process. Appl. (ISPA), Melbourne, Australia, pp. 781–787, Dec. 2018.
[12] J. Wu, L. Wu, and X. Du, “Phishing attacks on mobile computing platforms: Effective defense schemes,” IEEE Trans. Veh. Technol., vol. 65, no. 8, pp. 6678–6691, 2016.
[13] H. Matute, M. M. Moreno-Fernández, F. Blanco, and P. Garaizar, “I’m looking for phishers: Improving internet users’ sensitivity to visual deception indicators,” Comput. Human Behav., vol. 69, pp. 421–436, 2017.
[14] W. Liu et al., “Discovering phishing target via semantic link networks,” Future Gener. Comput. Syst., vol. 26, no. 3, pp. 381–388, 2010.
Downloads
Published
Issue
Section
License

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.