Prediksi Retweet Berdasarkan Konten Dan Pengguna Dengan Metode Classifier Selection
Abstract
Abstrak - Perkembangan media sosial telah merubah cara penyebaran informasi, dengan Twitter memainkan peran utama. Penelitian ini bertujuan mengembangkan model prediksi retweet di Twitter menggunakan fitur content-based dan user-based, serta teknik oversampling untuk meningkatkan kinerja model. Hasil eksperimen menunjukkan bahwa meta learner tanpa oversampling pada fitur content-based memiliki macro average F1-score sebesar 0.52, namun dengan recall yang sangat rendah untuk kelas retweet (6%) dan F1-score 0.11. Sebaliknya, meta learner dengan oversampling pada fitur contentbased memperbaiki performa dengan presisi 0.86, recall 0.77, dan F1-score 0.80 untuk retweet, dengan nilai macro average F1-score sebesar 0.82 yang menunjukan kenaikan dibandingkan dengan meta learner tanpa oversampling. Untuk model user-based, tanpa oversampling, macro average F1-score memiliki nilai 0.75 dengan keseimbangan baik antara presisi dan recall pada kelas non retweet. Setelah oversampling, model user-based mempertahankan keseimbangan yang baik dengan presisi, recall, F1-score, dan macro average F1- score masing-masing sebesar 0.88 pada kelas retweet dan non retweet. Secara keseluruhan, oversampling meningkatkan kinerja model, terutama pada fitur content-based, dengan model user-based menunjukkan performa yang paling konsisten dan baik.
Kata kunci - twitter, pemilihan pengklasifikasi, berbasis pengguna, berbasis konten
References
D. Indonesia, "DataIndonesia.id," 25 February 2022. [Online]. Available: https://dataindonesia.id/Dig
Z. Luo, M. Osborne, J. Tang and T. Wang, "Who will retweet me? Finding retweeters in Twitter," in Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval, 2013, pp. 869- -872.
S. N. Firdaus, C. Ding and A. Sadeghian, "Retweet: A popular information diffusion mechanism--A survey paper," Online Social Networks and Media, vol. 6, pp. 26--40, 2018.
S. N. Firdaus, C. Ding and A. Sadeghian, "Retweet prediction considering user's difference as an author and retweeter," in 2016 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM), IEEE, 2016, pp. 852-- 859.
T. B. N. Hoang and J. Mothe, "Predicting information diffusion on Twitter--Analysis of predictive features," Journal of computational science, vol. 28, pp. 257--264, 2018.
X. Dong, Z. Yu, W. Cao, Y. Shi and Q. Ma, "A survey on ensemble learning," Frontiers of Computer Science, vol. 14, no. 2, pp. 241--258, 2020.
I. Khan, X. Zhang, M. Rehman and R. Ali, "A literature survey and empirical study of metalearning for classifier selection," IEEE Access, vol. 8, pp. 10262--10281, 2020.
D. Boyd, S. Golder and G. Lotan, "Tweet, tweet, retweet: Conversational aspects of retweeting on twitter," in 2010 43rd Hawaii international conference on system sciences, IEEE, 2010, pp. 1-- 10.
F. A. Utami, "Warta Ekonomi," 9 May 2022. [Online]. Available: https://wartaekonomi.co.id/read412507/apa- itucontent-based-filtering. [Accessed 18 May 2022].
D. Nugraha, T. W. Purboyo and R. A. Nugrahaeni, "Sistem Rekomendasi Film Menggunakan Metode User Based Collaborative Filtering," eProceedings of Engineering, vol. 8, no. 5, 2021.
Suyanto, A. Arifianto, R. Rismala and A. Sunyoto, Evolutionary Machine Learning (Pembelajaran Mesin Otonom Berbasis Komputasi Evolusioner), INFORMATIKA, 2020.
Behera, M. P., Sarangi, A., Mishra, D., & Sarangi, S. K. (2023). A hybrid machine learning algorithm for heart and liver disease prediction using modified particle swarm optimization with support vector machine. Procedia Computer Science, 218, 818-827.
Abdulazeez, A. M., Brifcani, A., & Issa, A. S. (2021). Classification Based on Decision Tree Algorithm for Machine Learning. Journal of Applied Science and Technology Trends, 2(1), 21- 46.
Smith, J., Johnson, M., & Williams, R. (2018). Application of Logistic Regression in Health Data Classification: A Machine Learning Approac Journal of Health Informatics, 10(2), 87-95.
Vanschoren, J., et al. (2018). Meta-Learning: A Survey. arXiv preprint arXiv:1810.03548.
Islam, M. A. K., Islam, M. M., Shahriar, M. S., & Alam, M. R. (2021). A Comprehensive Review on Class Imbalance Problem: Dataset Characteristics, Oversampling Methods, and Their Effects. Journal of Machine Learning Research, 22(3), 567-589.