Implementasi Automatic Speech Recognition Bacaan Al-Qur’an Menggunakan Metode Wav2Vec 2.0 dan OpenAI-Whisper
Abstract
Keywords
Full Text:
PDF (Bahasa Indonesia)References
A. J. Muhammad Yasir, Studi Al-Quran, vol. 53, no. 9. 2016.
I. Sri Maharani, “Pembelajaran Baca Tulis Al- Qur ’ an Anak Usia Dini,” vol. 4, no. 2, pp. 1288–1298, 2020.
D. I. Fitriani and F. Hayati, “Penerapan Metode Tahsin untuk Meningkatkan Kemampuan Membaca Al-Qur’an Siswa Sekolah Menengah Atas,” J. Pendidik. Islam Indones., vol. 5, no. 1, pp. 15–31, 2020, doi: 10.35316/jpii.v4i1.227.
R. Gretter et al., “ETLT 2021: Shared task on automatic speech recognition for non-native children’s speech,” Proc. Annu. Conf. Int. Speech Commun. Assoc. INTERSPEECH, vol. 3, pp. 1923–1927, 2021, doi: 10.21437/Interspeech.2021-1237.
S. Chen et al., “Continuous speech separation with conformer,” ICASSP, IEEE Int. Conf. Acoust. Speech Signal Process. - Proc., vol. 2021–June, pp. 5749–5753, 2021, doi: 10.1109/ICASSP39728.2021.9413423.
N. Kanda et al., “Streaming Speaker-Attributed ASR with Token-Level Speaker Embeddings,” Proc. Annu. Conf. Int. Speech Commun. Assoc. INTERSPEECH, vol. 2022–September, pp. 521–525, 2022, doi: 10.21437/Interspeech.2022-253.
R. De Mori, “Recent advances in automatic speech recognition,” Signal Processing, vol. 1, no. 2, pp. 95–123, 1979, doi: 10.1016/0165-1684(79)90013-6.
S. Feng, O. Kudina, B. M. Halpern, and O. Scharenborg, “Quantifying Bias in Automatic Speech Recognition,” 2021, [Online]. Available: http://arxiv.org/abs/2103.15122
T. Novela, Martin; Basaruddin, “Dataset Suara Dan Teks Berbahasa Indonesia Pada Rekaman,” vol. 11, no. 2, pp. 61–66, 2021.
O. Iosifova, I. Iosifov, V. Sokolov, O. Romanovskyi, and I. Sukaylo, “Analysis of automatic speech recognition methods,” CEUR Workshop Proc., vol. 2923, pp. 252–257, 2021.
A. Al Harere and K. Al Jallad, “Quran Recitation Recognition using End-to-End Deep Learning,” pp. 1–22, 2023, [Online]. Available: https://arxiv.org/abs/2305.07034v1
L. R. S. Gris, R. Marcacini, A. C. Junior, E. Casanova, A. Soares, and S. M. Aluísio, “Evaluating OpenAI’s Whisper ASR for Punctuation Prediction and Topic Modeling of life histories of the Museum of the Person,” 2023, [Online]. Available: http://arxiv.org/abs/2305.14580
S. Wang, C.-H. H. Yang, J. Wu, and C. Zhang, “Can Whisper perform speech-based in-context learning,” 2023, [Online]. Available: https://arxiv.org/abs/2309.07081v1
L. Pepino, P. Riera, and L. Ferrer, “Emotion recognition from speech using wav2vec 2.0 embeddings,” Proc. Annu. Conf. Int. Speech Commun. Assoc. INTERSPEECH, vol. 1, pp. 551–555, 2021, doi: 10.21437/Interspeech.2021-703.
S. Schneider, A. Baevski, R. Collobert, and M. Auli, “WAV2vec: Unsupervised pre-training for speech recognition,” Proc. Annu. Conf. Int. Speech Commun. Assoc. INTERSPEECH, vol. 2019–September, pp. 3465–3469, 2019, doi: 10.21437/Interspeech.2019-1873.
A. Baevski, S. Schneider, and M. Auli, “Vq-Wav2Vec: Self-Supervised Learning of Discrete Speech Representations,” 8th Int. Conf. Learn. Represent. ICLR 2020, pp. 1–12, 2020.
S. Siriwardhana, A. Reis, R. Weerasekera, and S. Nanayakkara, “Jointly fine-tuning ‘BERT-like’ self supervised models to improve multimodal speech emotion recognition,” Proc. Annu. Conf. Int. Speech Commun. Assoc. INTERSPEECH, vol. 2020–October, pp. 3755–3759, 2020, doi: 10.21437/Interspeech.2020-1212.
M. MacAry, M. Tahon, Y. Esteve, and A. Rousseau, “On the Use of Self-Supervised Pre-Trained Acoustic and Linguistic Features for Continuous Speech Emotion Recognition,” 2021 IEEE Spok. Lang. Technol. Work. SLT 2021 - Proc., pp. 373–380, 2021, doi: 10.1109/SLT48900.2021.9383456.
J. Boigne, B. Liyanage, and T. Östrem, “Recognizing More Emotions with Less Data Using Self-supervised Transfer Learning,” 2020, [Online]. Available: http://arxiv.org/abs/2011.05585
A. Radford, J. W. Kim, T. Xu, G. Brockman, C. McLeavey, and I. Sutskever, “Robust Speech Recognition via Large-Scale Weak Supervision,” 2022, [Online]. Available: http://arxiv.org/abs/2212.04356
H. Heriyanto, H. Jayadianti, and J. Juwairiah, “The Implementation Of Mfcc Feature Extraction And Selection of Cepstral Coefficient for Qur’an Recitation in TPA (Qur’an Learning Center) Nurul Huda Plus Purbayan,” RSF Conf. Ser. Eng. Technol., vol. 1, no. 1, pp. 453–478, 2021, doi: 10.31098/cset.v1i1.417.
A. Khumaidi and R. L. Pradana, “Identifikasi Penyebab Cacat Pada Hasil Pengelasan Dengan Image Processing Menggunakan Metode Yolo,” J. Tek. Elektro dan Komput. TRIAC, vol. 9, no. 3, pp. 107–112, 2022, [Online]. Available: https://journal.trunojoyo.ac.id/triac/article/view/15997
L. Ou, X. Gu, and Y. Wang, “Transfer Learning of wav2vec 2.0 for Automatic Lyric Transcription,” 2022, [Online]. Available: http://arxiv.org/abs/2207.09747
A. Baevski, H. Zhou, A. Mohamed, and M. Auli, “wav2vec 2.0: A framework for self-supervised learning of speech representations,” Adv. Neural Inf. Process. Syst., vol. 2020–December, pp. 1–19, 2020.
Q. Xu et al., “Self-training and pre-training are complementary for speech recognition,” ICASSP, IEEE Int. Conf. Acoust. Speech Signal Process. - Proc., vol. 2021–June, pp. 3030–3034, 2021, doi: 10.1109/ICASSP39728.2021.9414641.
A. Vaswani et al., “Attention is all you need,” Adv. Neural Inf. Process. Syst., vol. 2017–December, no. Nips, pp. 5999–6009, 2017.
DOI: https://doi.org/10.21107/triac.v11i1.24332
Refbacks
- There are currently no refbacks.
Copyright (c) 2024 Jurnal Teknik Elektro dan Komputer TRIAC
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.