Pendeteksian Penipuan Menggunakan Pendekatan Metode Klasifikasi Random Forest
Abstract
Abstrak — Dengan pesatnya pertumbuhan transaksi online, keamanan dalam transaksi keuangan menjadi semakin penting, mengingat meningkatnya risiko penipuan yang canggih. Penelitian ini berfokus pada penggunaan algoritma machine learning, khususnya Random Forest, untuk mendeteksi penipuan dalam transaksi daring. Random Forest merupakan metode ensemble learning yang efektif dalam menangani data besar dan kompleks, serta mampu mengidentifikasi pola penipuan yang sulit terdeteksi oleh metode konvensional. Penelitian ini menerapkan teknik oversampling SMOTE untuk mengatasi ketidakseimbangan data dan meningkatkan performa model. Hasil evaluasi menunjukkan bahwa model Random Forest mencapai akurasi tinggi sebesar 97.77% pada data pelatihan dan 97.04% pada data pengujian. Precision pada data pelatihan adalah 65.85% dan menurun menjadi 52.89% pada data pengujian, sementara recall tetap tinggi dengan nilai 86.90% pada data pengujian. Teknik SMOTE memberikan hasil yang lebih seimbang dengan precision 65.85%, recall 86.90%, dan F1 Score 74.92% pada data pengujian, dibandingkan dengan undersampling yang menghasilkan precision lebih rendah dan recall lebih tinggi. Temuan ini menunjukkan bahwa oversampling SMOTE secara signifikan meningkatkan stabilitas dan akurasi deteksi penipuan. Hasil ini menyarankan bahwa teknik machine learning seperti Random Forest, dengan penerapan metode sampling yang tepat, dapat secara efektif meningkatkan kemampuan sistem dalam mendeteksi dan mencegah penipuan dalam transaksi online.
Kata kunci— Mesin Pembelajaran, Pendeteksian Penipuan, Random Forest, Transaksi Online.
References
Yu, M., Zhang, L., & Li, X. (2021). "Machine learning methods for fraud detection: A comprehensive review." Journal of Computer Science and Technology, 36(5), 945-972.S, M., & R, S. K. (2023). An Efficient Approach to Detect Fraudulent Service Enrollment Websites with Novel Random Forest and Compare the Accuracy with XGBoost Machine Algorithm. E3S Web of Conferences, 399, 04022.
Wang, Y., Wu, Q., & Zhang, Z. (2021). "An improved fraud detection model using deep learning techniques." IEEE Access, 9, 85022-85032.
Kumar, Y., Saini, S., & Payal, R. (2020). Comparative Analysis for Fraud Detection Using Logistic Regression, Random Forest and Support Vector Machine. Social Science Research Network.
Li, Y., Liu, L., & Zhang, L. (2022). "A review of machine learning algorithms for detecting financial fraud." Expert Systems with Applications, 198, 116834.
Hassan, M., & Gohar, S. (2021). "Machine learning techniques for credit card fraud detection: A survey." Procedia Computer Science, 184, 480-486.
Krishna, M., & Praveenchandar, J. (2022). Comparative Analysis of Credit Card Fraud Detection using Logistic regression with Random Forest towards an Increase in Accuracy of Prediction. 2022 International Conference on Edge Computing and Applications (ICECAA), 1097-1101.
Isa, I.S., Rosli, M.S., Yusof, U.K., Maruzuki, M.I., & Sulaiman, S.N. (2022). Optimizing the Hyperparameter Tuning of YOLOv5 for Underwater Detection. IEEE Access, 10, 52818-52831.
Zhao, L., Wu, H., & Chen, M. (2021). Evaluating Random Forest and XGBoost for Online Fraud Detection. Journal of Computational Intelligence and Analytics, 18(3), 215-230..
Zhao, S., & Zhang, Y. (2022). An Overview of Hyperparameter Optimization Techniques in Machine Learning. Computational Intelligence and Neuroscience, 2022, 1-14.
Nguyen, H., Tran, D., & Le, T. (2022). A Comparative Study of Random Forest and SVM in Healthcare Data Classification. International Journal of Machine Learning and Applications, 14(2), 115-128.
Lee, Y., & Park, J. (2023). Performance Comparison of Random Forest and Decision Trees inCustomer Churn Prediction. Journal of Data Science and Telecommunications, 21(4), 300-315.
Le, C., & Liao, S. (2022). A Comprehensive Review of Exploratory Data Analysis Techniques and Their Applications. Journal of Data Science and Analytics, 12(1), 1-15.
Krasić, I., & Celar, S. (2022). Telecom Fraud Detection with Machine Learning on Imbalanced Dataset. 2022 International Conference on Software, Telecommunications and Computer Networks (SoftCOM), 1-6.
Liu, H., & Yu, L. (2021). Feature Selection for High-Dimensional Data: A Fast Correlation-Based Filter Solution. International Journal of Data Science and Analytics, 15(2), 345-362.
Aghware, F.O., Ojugo, A., Adigwe, W., Odiakaose, C.C., Ojei, E.O., Ashioba, N.C., Okpor, M.D., & Geteloma, V.O. (2024). Enhancing the Random Forest Model via Synthetic Minority Oversampling Technique for CreditCard Fraud Detection. Journal of Computing Theories and Applications.
Yao, Y., Zhang, X., & Liu, L. (2021). Streamlit: A Framework for Developing Interactive Data Applications. Journal of Computer Science and Technology, 36(2), 457- 469.
Singh, A., & Sharma, A. (2019). A Comparative Study of SMOTE and Other Oversampling Techniques. International Journal of Data Science and Analytics, 8(3), 255-271.
Theng, D., & Bhoyar, K. K. (2023). Feature selection techniques for machine learning: a survey of more than two decades of research. Knowledge and Information Systems, 66, 1575-1637.
Morales, G. M., & Muthuraman, S. (2022). Practical Deployment of Machine Learning Models with Streamlit. Proceedings of the 2022 International Conference on Machine Learning and Data Science, 35-42.
Chikhi, A., & Khemakhem, S. (2022). Advances in Feature Selection Techniques: A Review and Case Studies. Data Mining and Knowledge Discovery, 36(3), 921-945.
Alzoubi, H., & Ghnemat, R. (2021). "Adopting Waterfall SDLC Model in Building a Management Information System." Journal of Software Engineering and Applications, 14(6), 234-245.