Pendeteksian berita palsu menggunakan RoBERTa dengan Optimalisasi Word Embedding
Abstract
Penyebaran berita palsu (hoax) telah menjadi permasalahan
serius yang mempengaruhi opini publik dan menciptakan
polarisasi di masyarakat. Penelitian ini bertujuan untuk
mendeteksi berita palsu menggunakan model RoBERTa yang
dioptimalkan dengan tiga teknik word embedding. Word
embedding yang digunakan adalah RoBERTa, Word2Vec,
dan GloVe. Dataset yang digunakan adalah "Indonesian fact
and hoax political news" yang diambil dari Kaggle, Dataset
ini memerlukan tahap pre-processing untuk membersihkan
ketidakkonsistenan data, seperti mengubah singkatan
menjadi kata lengkap dan menghapus tanda baca.
Selanjutnya, dilakukan representasi teks menggunakan tiga
metode word embedding yaitu Word2Vec, GloVe, dan
RoBERTa. Proses pelatihan model dilakukan dengan validasi
silang K-Fold untuk meningkatkan generalisasi model. Hasil
penelitian menunjukkan bahwa embedding RoBERTa
mencapai akurasi terbaik 96%, sedangkan word embedding
Word2Vec mendapatkan akurasi 94%. Word Embedding
Glove menunjukkan performa paling rendah dengan akurasi
51%. Penelitian ini membuktikan bahwa pemilihan teknik
word embedding yang tidak tepat untuk model RoBERTa
dapat mengurangi akurasi dan efektivitas model dalam
mendeteksi berita palsu. Diharapkan bahwa temuan dalam
penelitian ini dapat memberikan kontribusi terhadap
peningkatan sistem deteksi berita palsu di masa mendatang.
Kata kunci: hoax, RoBERTa, GloVe, Word2Vec
References
J. C. Hernández dkk, "A first step towards
automatic hoax detection," dalam IEEE Annual
International Carnahan Conference on Security
Technology, Proceedings, 2002, hlm. 102–114. doi:
1109/ccst.2002.1049234.
S. Zannettou dkk, "The Web of False Information:
Rumors, Fake News, Hoaxes, Clickbait, and Various Other
Shenanigans," Apr. 2018, doi: 10.1145/3309699.
K. Pol dkk, "Pencegahan Hoax di Media Sosial
Guna Memelihara Harmoni Sosial," 2019.
W. C. Chang dkk, "Taming Pretrained
Transformers for Extreme Multi-label Text Classification,"
dalam Proceedings of the ACM SIGKDD International
Conference on Knowledge Discovery and Data Mining,
Association for Computing Machinery, Agu. 2020, hlm.
–3171. doi: 10.1145/3394486.3403368.
Wijiyanto dkk., "Teknik K-Fold Cross Validation
untuk Mengevaluasi Kinerja Mahasiswa," Jurnal
Algoritma, vol. 21, no. 1, hlm. 239–248, Mei 2024. doi:
33364/algoritma/v.21-1.1618.
S. D. Oktaviani dkk, "Perbandingan Kinerja Word
Embedding Word2Vec, GloVe, dan FastText dalam
Klasifikasi Teks," Jurnal Teknokompak, vol. 14, no. 2,
hlm. 45–52, 2022. Tersedia di:
https://ejurnal.teknokrat.ac.id/index.php/teknokompak/arti
cle/download/732/462.
V. Kolev dkk, "FOREAL: RoBERTa Model for
Fake News Detection based on Emotions," dalam
International Conference on Agents and Artificial
Intelligence, Science and Technology Publications, Lda,
, hlm. 429–440. doi: 10.5220/0010873900003116.
S. Bankar dan S. Gupta, "Fake News Detection
Using LSTM-Based Deep Learning Approach and Word
Embedding Feature Extraction," 2023, hlm. 129–141. doi:
1007/978-981-99-1699-3_8.
R. Adipradana dkk, "Hoax Analyzer for
Indonesian News Using RNNs with FastText and GloVe
Embeddings," Bulletin of Electrical Engineering and
Informatics, vol. 10, no. 4, hlm. 2130-2136, Agu. 2021.
doi: 10.11591/eei.v10i4.2956.
S. F. N. Azizah dkk, "Performance Analysis of
Transformer Based Models (BERT, ALBERT and
RoBERTa) in Fake News Detection," dalam Proceedings
of Universitas Sebelas Maret, 2023, hlm. 1–6. Tersedia di:
https://github.com/Shafna81/fakenewsdetection.git.
C. W. Kencana dkk, "Sistem Deteksi Hoax pada
Twitter dengan Metode Klasifikasi Feed-Forward dan
Back-Propagation Neural Networks," Jurnal RESTI
(Rekayasa Sistem dan Teknologi Informasi), vol. 4, no. 4,
, hlm. 655–663. doi: 10.29207/resti.v4i4.2038.
B. Irena dan E. B. Setiawan, "Identifikasi Berita
Palsu (Hoax) pada Media Sosial Twitter dengan Metode
Decision Tree C4.5," Jurnal RESTI (Rekayasa Sistem dan
Teknologi Informasi), vol. 4, no. 4, hlm. 711–716, 2020.
doi: 10.29207/resti.v4i4.2125.
M. G. Adrian dkk, "Effectiveness of Word
Embedding GloVe and Word2Vec within News Detection
of Indonesian using LSTM," Jurnal Media Informatika
Budidarma, vol. 7, no. 3, hlm. 1180, Jul. 2023. doi:
30865/mib.v7i3.6411.
A. Mallik dan S. Kumar, "Word2Vec and LSTM
based deep learning technique for context-free fake news
detection," Multimedia Tools and Applications, vol. 83, no.
, hlm. 919–940, 2024. doi: 10.1007/s11042-023-15364-3.
W. Shishah, "Fake News Detection Using BERT
Model with Joint Learning," Arab Journal of Science and
Engineering, vol. 46, no. 9, hlm. 9115–9127, 2021. doi:
1007/s13369-021-05780-8.
C. C. Wang, “Fake news and related concepts:
Definitions and recent research development,”
Contemporary Management Research, vol. 16, no. 3, pp.
–174, Sep. 2020, doi: 10.7903/CMR.20677.
N. Arbiyah dkk “The Danger of Hoax: The Effect
of Inaccurate Information on Semantic Memory,” Makara
Human Behavior Studies in Asia, vol. 24, no. 1, p. 80, Jul.
, doi: 10.7454/hubs.asia.1020719.
Y. Liu dkk, "RoBERTa: A Robustly Optimized
BERT Pretraining Approach," Jul. 2019. Tersedia di:
http://arxiv.org/abs/1907.11692.
Y. Muliono dkk, "Hoax Classification in
Imbalanced Datasets Based on Indonesian News Title
using RoBERTa," 2022 3rd International Conference on
Artificial Intelligence and Data Sciences (AiDAS), IPOH,
Malaysia, 2022, pp. 264-268, doi:
1109/AiDAS56890.2022.9918747.
A. Vaswani et al., “Attention Is All You Need.”
J. Pennington dkk, “GloVe: Global Vectors for
Word Representation,” in Proceedings of the 2014
Conference on Empirical Methods in Natural Language
Processing (EMNLP), Doha, Qatar, Oct. 2014, pp. 1532–
K. He dkk “Deep Residual Learning for Image
Recognition.” [Online]. Available: http://imagenet.org/challenges/LSVRC/2015/.
J. L. Ba dkk, “Layer Normalization,” Jul. 2016,
[Online]. Available: http://arxiv.org/abs/1607.06450.
S. Sivakumar dkk, "Review on Word2Vec Word
Embedding Neural Net," dalam 2020 International