Pendeteksian berita palsu menggunakan RoBERTa dengan Optimalisasi Word Embedding

Authors

  • Adisaputra Nur Arminta Telkom University
  • Yuliant Sibaroni Telkom University

Abstract

Penyebaran berita palsu (hoax) telah menjadi permasalahan
serius yang mempengaruhi opini publik dan menciptakan
polarisasi di masyarakat. Penelitian ini bertujuan untuk
mendeteksi berita palsu menggunakan model RoBERTa yang
dioptimalkan dengan tiga teknik word embedding. Word
embedding yang digunakan adalah RoBERTa, Word2Vec,
dan GloVe. Dataset yang digunakan adalah "Indonesian fact
and hoax political news" yang diambil dari Kaggle, Dataset
ini memerlukan tahap pre-processing untuk membersihkan
ketidakkonsistenan data, seperti mengubah singkatan
menjadi kata lengkap dan menghapus tanda baca.
Selanjutnya, dilakukan representasi teks menggunakan tiga
metode word embedding yaitu Word2Vec, GloVe, dan
RoBERTa. Proses pelatihan model dilakukan dengan validasi
silang K-Fold untuk meningkatkan generalisasi model. Hasil
penelitian menunjukkan bahwa embedding RoBERTa
mencapai akurasi terbaik 96%, sedangkan word embedding
Word2Vec mendapatkan akurasi 94%. Word Embedding
Glove menunjukkan performa paling rendah dengan akurasi
51%. Penelitian ini membuktikan bahwa pemilihan teknik
word embedding yang tidak tepat untuk model RoBERTa
dapat mengurangi akurasi dan efektivitas model dalam
mendeteksi berita palsu. Diharapkan bahwa temuan dalam
penelitian ini dapat memberikan kontribusi terhadap
peningkatan sistem deteksi berita palsu di masa mendatang.

Kata kunci: hoax, RoBERTa, GloVe, Word2Vec

References

J. C. Hernández dkk, "A first step towards

automatic hoax detection," dalam IEEE Annual

International Carnahan Conference on Security

Technology, Proceedings, 2002, hlm. 102–114. doi:

1109/ccst.2002.1049234.

S. Zannettou dkk, "The Web of False Information:

Rumors, Fake News, Hoaxes, Clickbait, and Various Other

Shenanigans," Apr. 2018, doi: 10.1145/3309699.

K. Pol dkk, "Pencegahan Hoax di Media Sosial

Guna Memelihara Harmoni Sosial," 2019.

W. C. Chang dkk, "Taming Pretrained

Transformers for Extreme Multi-label Text Classification,"

dalam Proceedings of the ACM SIGKDD International

Conference on Knowledge Discovery and Data Mining,

Association for Computing Machinery, Agu. 2020, hlm.

–3171. doi: 10.1145/3394486.3403368.

Wijiyanto dkk., "Teknik K-Fold Cross Validation

untuk Mengevaluasi Kinerja Mahasiswa," Jurnal

Algoritma, vol. 21, no. 1, hlm. 239–248, Mei 2024. doi:

33364/algoritma/v.21-1.1618.

S. D. Oktaviani dkk, "Perbandingan Kinerja Word

Embedding Word2Vec, GloVe, dan FastText dalam

Klasifikasi Teks," Jurnal Teknokompak, vol. 14, no. 2,

hlm. 45–52, 2022. Tersedia di:

https://ejurnal.teknokrat.ac.id/index.php/teknokompak/arti

cle/download/732/462.

V. Kolev dkk, "FOREAL: RoBERTa Model for

Fake News Detection based on Emotions," dalam

International Conference on Agents and Artificial

Intelligence, Science and Technology Publications, Lda,

, hlm. 429–440. doi: 10.5220/0010873900003116.

S. Bankar dan S. Gupta, "Fake News Detection

Using LSTM-Based Deep Learning Approach and Word

Embedding Feature Extraction," 2023, hlm. 129–141. doi:

1007/978-981-99-1699-3_8.

R. Adipradana dkk, "Hoax Analyzer for

Indonesian News Using RNNs with FastText and GloVe

Embeddings," Bulletin of Electrical Engineering and

Informatics, vol. 10, no. 4, hlm. 2130-2136, Agu. 2021.

doi: 10.11591/eei.v10i4.2956.

S. F. N. Azizah dkk, "Performance Analysis of

Transformer Based Models (BERT, ALBERT and

RoBERTa) in Fake News Detection," dalam Proceedings

of Universitas Sebelas Maret, 2023, hlm. 1–6. Tersedia di:

https://github.com/Shafna81/fakenewsdetection.git.

C. W. Kencana dkk, "Sistem Deteksi Hoax pada

Twitter dengan Metode Klasifikasi Feed-Forward dan

Back-Propagation Neural Networks," Jurnal RESTI

(Rekayasa Sistem dan Teknologi Informasi), vol. 4, no. 4,

, hlm. 655–663. doi: 10.29207/resti.v4i4.2038.

B. Irena dan E. B. Setiawan, "Identifikasi Berita

Palsu (Hoax) pada Media Sosial Twitter dengan Metode

Decision Tree C4.5," Jurnal RESTI (Rekayasa Sistem dan

Teknologi Informasi), vol. 4, no. 4, hlm. 711–716, 2020.

doi: 10.29207/resti.v4i4.2125.

M. G. Adrian dkk, "Effectiveness of Word

Embedding GloVe and Word2Vec within News Detection

of Indonesian using LSTM," Jurnal Media Informatika

Budidarma, vol. 7, no. 3, hlm. 1180, Jul. 2023. doi:

30865/mib.v7i3.6411.

A. Mallik dan S. Kumar, "Word2Vec and LSTM

based deep learning technique for context-free fake news

detection," Multimedia Tools and Applications, vol. 83, no.

, hlm. 919–940, 2024. doi: 10.1007/s11042-023-15364-3.

W. Shishah, "Fake News Detection Using BERT

Model with Joint Learning," Arab Journal of Science and

Engineering, vol. 46, no. 9, hlm. 9115–9127, 2021. doi:

1007/s13369-021-05780-8.

C. C. Wang, “Fake news and related concepts:

Definitions and recent research development,”

Contemporary Management Research, vol. 16, no. 3, pp.

–174, Sep. 2020, doi: 10.7903/CMR.20677.

N. Arbiyah dkk “The Danger of Hoax: The Effect

of Inaccurate Information on Semantic Memory,” Makara

Human Behavior Studies in Asia, vol. 24, no. 1, p. 80, Jul.

, doi: 10.7454/hubs.asia.1020719.

Y. Liu dkk, "RoBERTa: A Robustly Optimized

BERT Pretraining Approach," Jul. 2019. Tersedia di:

http://arxiv.org/abs/1907.11692.

Y. Muliono dkk, "Hoax Classification in

Imbalanced Datasets Based on Indonesian News Title

using RoBERTa," 2022 3rd International Conference on

Artificial Intelligence and Data Sciences (AiDAS), IPOH,

Malaysia, 2022, pp. 264-268, doi:

1109/AiDAS56890.2022.9918747.

A. Vaswani et al., “Attention Is All You Need.”

J. Pennington dkk, “GloVe: Global Vectors for

Word Representation,” in Proceedings of the 2014

Conference on Empirical Methods in Natural Language

Processing (EMNLP), Doha, Qatar, Oct. 2014, pp. 1532–

K. He dkk “Deep Residual Learning for Image

Recognition.” [Online]. Available: http://imagenet.org/challenges/LSVRC/2015/.

J. L. Ba dkk, “Layer Normalization,” Jul. 2016,

[Online]. Available: http://arxiv.org/abs/1607.06450.

S. Sivakumar dkk, "Review on Word2Vec Word

Embedding Neural Net," dalam 2020 International

Published

2025-06-23

Issue

Section

Prodi S1 Informatika