Deteksi Threat Dan Vulnerability Pada Unggahan Twitter Menggunakan Algoritma Naive Bayes

Authors

  • Paulin Al Imady Telkom University
  • Casi Setianingsih Telkom University
  • M. Faris Ruriawan Telkom University

Abstract

Abstrak— Twitter merupakan platform media sosial yang menjadi tempat bagi banyak orang untuk dapat mengunggah berbagai hal, tidak terkecuali unggahan yang mengandung unsur ancaman keamanan suatu sistem. Tentunya ini merupakan hal yang berbahaya jika seseorang mengunggah celah keamanan suatu sistem. Ancaman sistem yang dipublikasi dapat disalah gunakan oleh orang lain sehingga merugikan pemilik sistem. Untuk mengantisipasi hal ini, maka dibuat sistem untuk mendeteksi unggahan yang mengandung unsur ancaman (threat) dan kerentanan (vulnerability) sistem pada media sosial Twitter. Sistem ini menerapkan algoritma text processing yang menggunakan metode Naïve Bayes dan TF-IDF (Term FrequencyInverse Document Frequency). Metode ini dipilih karena dianggap dapat menghasilkan akurasi yang baik meskipun dengan data training yang sedikit. Pada penelitian Tugas Akhir ini, hasil akhir yang didapatkan adalah sistem dapat membedakan tweet yang mengandung unsur threat atau vulnerability, dan yang tidak. Dengan rasio pembagian dataset ke dalam data training dan data testing adalah 70%:30% dan 80%:20%, keduanya mendapatkan nilai akurasi sebesar 88%, nilai presisi sebesar 88%, recall sebesar 88%, dan F1 score sebesar 88%.
Kata Kunci: text mining, naïve bayes, TF-IDF, threat, vulnerabilities, klasifikasi teks.

References

L. A. McFarland and R. E. Ployhart, “Social media: A contextual framework to guide research and practice,” J. Appl. Psychol., vol. 100, no. 6, pp. 1653–1677, 2015, doi: 10.1037/a0039244.

W. He, “A review of social media security risks and

mitigation techniques,” J. Syst. Inf. Technol., vol. 14, no. 2, pp. 171–180, 2012, doi: 10.1108/13287261211232180.

D. Sgandurra and E. Lupu, “Evolution of attacks, threat models, and solutions for virtualized systems,” ACM Comput. Surv., vol. 48, no. 3, pp. 1–38, 2016, doi: 10.1145/2856126.

D. A. Muthia, “Komparasi Algoritma Klasifikasi Text Mining Untuk Analisis Sentimen Pada Review Restoran,” J. PILAR Nusa Mandiri, vol. 14, no. 1, pp. 69–74, 2018.

M. S. Saputri, R. Mahendra, and M. Adriani, “Emotion Classification on Indonesian Twitter Dataset,” Proc. 2018 Int. Conf. Asian Lang. Process. IALP 2018, pp. 90–95, 2019, doi: 10.1109/IALP.2018.8629262.

X. Li, Q. Xie, J. Jiang, Y. Zhou, and L. Huang, “Identifying and monitoring the development trends of emerging technologies using patent analysis and Twitter data mining: The case of perovskite solar cell technology,” Technol. Forecast. Soc. Change, vol. 146, no. May, pp. 687–705, 2019, doi: 10.1016/j.techfore.2018.06.004.

M. Humayun, M. Niazi, N. Jhanjhi, M. Alshayeb, and S. Mahmood, “Cyber Security Threats and Vulnerabilities: A Systematic Mapping Study,” Arab. J. Sci. Eng., vol. 45, no. 4, pp. 3171–3189, 2020, doi: 10.1007/s13369-019-04319-2.

W. A. Al-Khater, S. Al-Maadeed, A. A. Ahmed, A. S. Sadiq, and M. K. Khan, “Comprehensive review of cybercrime detection techniques,” IEEE Access, vol. 8, pp. 137293–137311, 2020, doi: 10.1109/ACCESS.2020.3011259.

M. Bertolini, D. Mezzogori, M. Neroni, and F. Zammori, “Machine Learning for industrial applications: A comprehensive literature review,” Expert Syst. Appl., vol. 175, no. March, p. 114820, 2021, doi: 10.1016/j.eswa.2021.114820.

M. Batta, “Machine Learning Algorithms - A Review,” Int. J. Sci. Res. (IJ, vol. 9, no. 1, pp. 381–386, 2020, doi: 10.21275/ART20203995.

G. Adomavicius and A. Tuzhilin, “Web Scraping:State of the art,” IEEE Trans. Knowl. Data Eng., vol. 17, no. 6, pp. 734–749, 2019.

A. V Saurkar and S. A. Gode, “An Overview On Web Scraping Techniques And Tools,” Int. J. Futur. Revolut. Comput. Sci. Commun. Eng., vol. 4, no. 4, pp. 363–367, 2018, [Online]. Available: http://www.ijfrcsce.org/index.php/ijfrcsce/article/view/ 1529.

M. A. Khder, “Web scraping or web crawling: State of art, techniques, approaches and application,” Int. J. Adv. Soft Comput. its Appl., vol. 13, no. 3, pp. 144–168, 2021, doi: 10.15849/ijasca.211128.11.

S. Ananiadou, D. B. Kell, and J. ichi Tsujii, “Text mining and its potential applications in systems biology,” Trends Biotechnol., vol. 24, no. 12, pp. 571– 579, 2006, doi: 10.1016/j.tibtech.2006.10.002.

H. Hassani, C. Beneki, S. Unger, M. T.

Mazinani, and M. R. Yeganegi, “Text mining in big data analytics,” Big Data Cogn. Comput., vol. 4, no. 1, pp. 1–34, 2020, doi: 10.3390/bdcc4010001.

K. L.Sumathy and M. Chidambaram, “Text Mining: Concepts, Applications, Tools and Issues An Overview,” Int. J. Comput. Appl., vol. 80, no. 4, pp. 29–32, 2013, doi: 10.5120/13851-1685.

M. Pota, F. Marulli, M. Esposito, G. De Pietro, and

H. Fujita, “Multilingual POS tagging by a composite deep architecture based on character-level features and on-the-fly enriched Word Embeddings,” Knowledge-Based Syst., vol. 164, no. xxxx, pp. 309–323, 2019, doi: 10.1016/j.knosys.2018.11.003.

A. S. Shafie, N. M. Sharef, M. A. A. Murad, and A. Azman, “Aspect Extraction Performance with POS Tag Pattern of Dependency Relation in Aspect-based Sentiment Analysis,” Proc. - 2018 4th Int. Conf. Inf. Retr. Knowl. Manag. Diving into Data Sci. CAMP 2018, pp. 107–112, 2018, doi: 10.1109/INFRKM.2018.8464692.

IBM Corporation, “Part-of-speech tag sets.” https://www.ibm.com/docs/en/wca/3.5.0?topic=analytics- part-speech-tag-sets.

M. Nurjannah and I. Fitri Astuti, “PENERAPAN ALGORITMA TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF) UNTUK TEXT MINING Mahasiswa S1 Program Studi Ilmu Komputer FMIPA Universitas Mulawarman Dosen Program Studi Ilmu Komputer FMIPA Universitas Mulawarman,” J. Inform. Mulawarman, vol. 8, no. 3, pp. 110–113, 2013.

S. Qaiser and R. Ali, “Text Mining: Use of TF-IDF to Examine the Relevance of Words to Documents,” Int. J. Comput. Appl., vol. 181, no. 1, pp. 25–29, 2018, doi: 10.5120/ijca2018917395.

G. I. Webb, “Naïve Bayes,” in Encyclopedia of Machine Learning and Data Mining, Springer US, 2016, pp. 1–2.

Bustami, “Penerapan Algoritma Naive Bayes,” J. Inform., vol. 8, no. 1, pp. 884–898, 2014.

H. Tabrizchi, M. M. Javidi, and V. Amirzadeh, “Estimates of residential building energy consumption using a multi-verse optimizer-based support vector machine with k- fold cross-validation,” Evol. Syst., vol. 12, no. 3, pp. 755–767, 2021, doi: 10.1007/s12530-019-09283-8.

I. K. Nti, O. Nyarko-Boateng, and J. Aning, “Performance of Machine Learning Algorithms with Different K Values in K-fold CrossValidation,” Int. J. Inf. Technol. Comput. Sci., vol. 13, no. 6, pp. 61–71, 2021, doi: 10.5815/ijitcs.2021.06.05.

M. Hasnain, M. F. Pasha, I. Ghani, M. Imran, M. Y. Alzahrani, and R. Budiarto, “Evaluating Trust Prediction and Confusion Matrix Measures for Web Services Ranking,” IEEE Access, vol. 8, pp. 90847–90861, 2020, doi: 10.1109/ACCESS.2020.2994222.

F. Rahmad, Y. Suryanto, and K. Ramli, “Performance Comparison of Anti-Spam Technology Using Confusion Matrix Classification,” IOP Conf. Ser. Mater. Sci. Eng., vol. 879, no. 1, 2020, doi: 10.1088/1757-899X/879/1/012076.

D. Normawati and S. A. Prayogi, “Implementasi Naïve Bayes Classifier Dan Confusion Matrix Pada Analisis

Sentimen Berbasis Teks Pada Twitter,” J. Sains Komput. Inform., vol. 5, no. 2, pp. 697–711, 2021.

R. Xu, “POS weighted TF-IDF algorithm and its application for an MOOC search engine,” ICALIP 2014 - 2014 Int. Conf. Audio, Lang. Image Process. Proc., pp. 868–873, 2015, doi: 10.1109/ICALIP.2014.7009919.

A. P. Wijaya and H. A. Santoso, “Naive Bayes Classification pada Klasifikasi Dokumen Untuk Identifikasi Konten E-Government Naïve Bayes Classification on Document Classification to Identify E- Government Content,” J. Appl. Intell. Syst., vol. 1, no. 1, pp. 48–55, 2016.

Downloads

Published

2023-03-06

Issue

Section

Program Studi S1 Teknik Komputer