Enhanced Phishing Detection Using Machine Learning and URL Analysis
Isi Artikel Utama
Abstrak
Phishing adalah salah satu ancaman keamanan siber yang signifikan dan terus berkembang, dengan memanfaatkan URL berbahaya untuk menipu pengguna agar mengungkapkan informasi sensitif. Di era modern, Phishing semakin sulit dideteksi karena menggunakan struktur URL yang kompleks dan menyerupai situs resmi. Oleh karena itu, diperlukan metode deteksi yang adaptif dan efisien. Penelitian ini mengusulkan sistem deteksi URL phishing berbasis machine learning dengan menganalisis karakteristik URL seperti leksikal, struktural, dan statistik. Tujuan dari penelitian ini adalah untuk mengukur model machine learning yang memiliki kinerja terbaik untuk mendeteksi Phishing. Dataset URL Phishing dan legitimate didapat dari situs PhishTank dan Majestic, kemudian diproses melalui tahap pre-processing, ekstraksi fitur, dan penanganan ketidakseimbangan kelas. Fitur yang digunakan mencakup panjang URL, struktur domain dan subdomain, karakter khusus, keberadaan alamat IP, penggunaan HTTPS, kata kunci mencurigakan, layanan URL shortener, serta fitur Shannon entropy. Empat algoritma supervised machine learning diuji, yaitu K-Nearest Neighbors, Decision Tree, Random Forest, dan Extreme Gradient Boosting (XGBoost). Pelatihan model dilakukan menggunakan pipeline terintegrasi dengan normalisasi fitur, penyetelan hyperparameter melalui grid search, serta validasi silang lima lipatan. Evaluasi kinerja dilakukan menggunakan metrik akurasi, presisi, recall, F1-score, dan ROC-AUC. Hasil eksperimen menunjukkan bahwa model ensemble memberikan kinerja terbaik. Random Forest mencapai F1-score sebesar 0,822 dan nilai ROC-AUC tertinggi sebesar 0,916, sementara XGBoost menunjukkan presisi tertinggi sebesar 0,962 dengan jumlah false positive yang lebih rendah dibandingkan model lainnya. Analisis kontribusi fitur menunjukkan bahwa jumlah subdomain, jumlah titik, dan kata kunci mencurigakan merupakan indikator paling berpengaruh. Hasil ini membuktikan bahwa pendekatan berbasis analisis URL dan machine learning efektif untuk deteksi Phishing secara otomatis.
Rincian Artikel
Referensi
R. Zieni, L. Massari, and M. C. Calzarossa, “Phishing or Not Phishing? A Survey on the Detection of Phishing Websites,” IEEE Access, vol. 11, pp. 18499–18519, 2023, doi: 10.1109/ACCESS.2023.3247135.
F. P. E. Putra, U. Ubaidi, A. Zulfikri, G. Arifin, and R. M. Ilhamsyah, “Analysis of Phishing Attack Trends, Impacts and Prevention Methods: Literature Study,” Brilliance: Research of Artificial Intelligence, vol. 4, no. 1, pp. 413–421, Aug. 2024, doi: 10.47709/brilliance.v4i1.4357.
R. Mahajan and I. Siddavatam, “Phishing Website Detection using Machine Learning Algorithms,” Int J Comput Appl, vol. 181, no. 23, pp. 45–47, Oct. 2018, doi: 10.5120/ijca2018918026.
E. Zhu, Y. Chen, C. Ye, X. Li, and F. Liu, “OFS-NN: An Effective Phishing Websites Detection Model Based on Optimal Feature Selection and Neural Network,” IEEE Access, vol. 7, pp. 73271–73284, 2019, doi: 10.1109/ACCESS.2019.2920655.
I. Kara, M. Ok, and A. Ozaday, “Characteristics of Understanding URLs and Domain Names Features: The Detection of Phishing Websites With Machine Learning Methods,” IEEE Access, vol. 10, pp. 124420–124428, 2022, doi: 10.1109/ACCESS.2022.3223111.
S.-J. Bu and H.-J. Kim, “Optimized URL Feature Selection Based on Genetic-Algorithm-Embedded Deep Learning for Phishing Website Detection,” Electronics (Basel), vol. 11, no. 7, p. 1090, Mar. 2022, doi: 10.3390/electronics11071090.
Mr. Vishal Borate, Dr. Alpana Adsul, Mr. Rohit Dhakane, Mr. Shahuraj Gawade, Ms. Shubhangi Ghodake, and Mr. Pranit Jadhav, “A Comprehensive Review of Phishing Attack Detection Using Machine Learning Techniques,” International Journal of Advanced Research in Science, Communication and Technology, pp. 435–441, Oct. 2024, doi: 10.48175/IJARSCT-19963.
D. Minh Linh, H. D. Hung, H. Minh Chau, Q. Sy Vu, and T.-N. Tran, “Real-time phishing detection using deep learning methods by extensions,” International Journal of Electrical and Computer Engineering (IJECE), vol. 14, no. 3, p. 3021, Jun. 2024, doi: 10.11591/ijece.v14i3.pp3021-3035.
A. K. Jain and B. B. Gupta, “A survey of phishing attack techniques, defence mechanisms and open research challenges,” Enterp Inf Syst, vol. 16, no. 4, pp. 527–565, Apr. 2022, doi: 10.1080/17517575.2021.1896786.
M. C. Calzarossa, P. Giudici, and R. Zieni, “Explainable machine learning for phishing feature detection,” Qual Reliab Eng Int, vol. 40, no. 1, pp. 362–373, Feb. 2024, doi: 10.1002/qre.3411.
K. Sharifani and M. Amini, “Machine Learning and Deep Learning: A Review of Methods and Applications.” [Online]. Available: https://ssrn.com/abstract=4458723
R. K. Halder, M. N. Uddin, M. A. Uddin, S. Aryal, and A. Khraisat, “Enhancing K-nearest neighbor algorithm: a comprehensive review and performance analysis of modifications,” J Big Data, vol. 11, no. 1, Dec. 2024, doi: 10.1186/s40537-024-00973-y.
M. A. Shaik, G. Rakshitha, K. Saipriya, T. Thrisha, M. Varshini, and J. G. Sai, “Machine Learning for Detecting the Phishing Threats,” in 2025 6th International Conference on Mobile Computing and Sustainable Informatics (ICMCSI), IEEE, Jan. 2025, pp. 1221–1226. doi: 10.1109/ICMCSI64620.2025.10883227.
Md. M. Uddin, K. Arfatul Islam, M. Mamun, V. K. Tiwari, and J. Park, “A Comparative Analysis of Machine Learning-Based Website Phishing Detection Using URL Information,” in 2022 5th International Conference on Pattern Recognition and Artificial Intelligence (PRAI), IEEE, Aug. 2022, pp. 220–224. doi: 10.1109/PRAI55851.2022.9904055.
M. Kawale, B. Maru, S. Dagu, M. Varghese, and V. Gupta, “Machine Learning based Phishing Website Detection,” in 2024 11th International Conference on Computing for Sustainable Global Development (INDIACom), IEEE, Feb. 2024, pp. 833–837. doi: 10.23919/INDIACom61295.2024.10498854.
H. A. Salman, A. Kalakech, and A. Steiti, “Random Forest Algorithm Overview,” Babylon. J. Mach. Learn., vol. 2024, pp. 69–79, Dec. 2024, doi: 10.58496/BJML/2024/007.
X. Zhu, J. Chu, K. Wang, S. Wu, W. Yan, and K. Chiam, “Prediction of rockhead using a hybrid N-XGBoost machine learning framework,” J. Rock Mech. Geotech. Eng., vol. 13, no. 6, pp. 1231–1245, Dec. 2021, doi: 10.1016/j.jrmge.2021.06.01