Cross-Lingual Transfer Learning for Enhancing Kinyarwanda Automatic Speech Recognition

Igiraneza Lahairoi Bayingana, NTEZIRIZA NKERABAHIZI Josbert, HABIMANA Theodore

Abstract


In today’s voice-activated digital world, millions of people face a profound challenge: their native languages remain invisible to the very technologies that are transforming human-computer interaction. While speakers of major world languages like English, Mandarin, and Spanish enjoy seamless access to virtual assistants, transcription services, and voice-controlled systems across education, healthcare, and entertainment, countless indigenous and regional languages have been left behind in this technological revolution. Kinyarwanda, the vibrant national language spoken by over 13,246,394 million people across Rwanda, Burundi, Uganda, and the Democratic Republic of Congo, exemplifies this digital divide despite its crucial role in preserving cultural identity and facilitating daily communication, Kinyarwanda speakers are forced to abandon their mother tongue when interacting with modern speech recognition systems. This study uses cross-lingual transfer learning a new Speech-to-Text system formulated for Kinyarwanda, an under-developed Bantu language, with more than 14,104,965 million speakers mainly in Rwanda, Burundi, Uganda, and the Democratic Republic of Congo. The problem addressed involved merging different open-source Natural Language Processing data models with individualized preprocessing algorithms and acoustic feature extraction techniques. Using this innovative technique of combining Connectionist Temporal Classification with attention, assisted in achieving low Word Error Rate on standard Kinyarwanda speech corpora. The developed model is very efficient Automatic Speech Recognition system that can write spoken Kinyarwanda into text and promote digital inclusion and preserving linguistic heritage. This research demonstrates a developed complete speech recognition system through the deployment of the most recent deep learning architectures, such as Recurrent Neural Networks, Long Short-Term Memory models, and Transformer architecture. This new creation addresses the particular phonetic features, tonal differences, and morphological complexity of Kinyarwanda, while functioning within the confines of scarcity of training data as is characteristic of low resource languages.

Full Text:

PDF

References


REFERENCES

. Ajani, Y. A., Tella, A., & Dlamini, N. P. (2024). Indigenous Language Preservation and Promotion through Digital Media Technology in the Fourth Industrial Revolution. Digital Media and the Preservation of Indigenous Languages in Africa: Toward a Digitaliz.

.Alharbi, S., Alrazgan, M., Alrashed, A., Alnomasi, T., Almojel, R., Alharbi, R., ... & Almojil, M. (2021). Automatic speech recognition: Systematic literature review. Ieee Access, 9, 131858-131876.

. Ayvaz, U., Gürüler, H., Khan, F., Ahmed, N., & Bobomirzaevich, A. A. (2022). Automatic Speaker Recognition Using Mel-Frequency Cepstral Coefficients Through Machine Learning. Computers, Materials & Continua, 71(3).

. Besacier, L. B. (2014). Automatic speech recognition for under-resourced languages. A survey. Speech communication, 56, 85-100.

. Fayzullayeva, N., & Kamolova, M. (2025). PHONETICS AS THE STUDY OF THE ACTUAL SPEECH SOUNDS THAT CREATE WORDS IN A LANGUAGE. . Modern Science and Research, , 4(2), 46-52.

. Fendji, J. L. K. E., Tala, D. C., Yenke, B. O., & Atemkeng, M. (2022). Automatic speech recognition using limited vocabulary: A survey. Applied Artificial Intelligence, 36(1), 2095039.

. Hassini, K., Khalis, S., Habibi, O., Chemmakha, M., & Lazaar, M. (2024). An end-to-end learning approach for enhancing intrusion detection in Industrial-Internet of Things. . Knowledge-Based Systems, , 294, 111785.

. Huang, X., Qiao, L., Yu, W., Li, J., & Ma, Y. (2020). End-to-end sequence labeling via convolutional recurrent neural network with a connectionist temporal classification layer. International Journal of Computational Intelligence Systems, 13(1), 341-351.

. Kumar, Y. ( 2024). A comprehensive analysis of speech recognition systems in healthcare: current research challenges and future prospects. SN Computer Science, 5(1), 137.

. Li, J. (2022). Recent advances in end-to-end automatic speech recognition. APSIPA Transactions on Signal and Information Processing, 11(1).

. Myakala, P. K., & Naayini, P. . (2023). Bridging the Gap: Leveraging Transfer Learning for Low-Resource NLP Tasks. International Journal of Computer Techniques, 10(5).

. Pandey, L. L. (2024). Towards scalable efficient on-device ASR with transfer learning. . arXiv preprint arXiv, 2407.16664.

Park, D. S., Chan, W., Zhang, Y., Chiu, C. C., Zoph, B., Cubuk, E. D., & Le, Q. V. (2019). Specaugment: A simple data augmentation method for automatic speech recognition. arXiv preprint arXiv, 1904.08779.

. Ramaila, S. (2025). The affordances of code-switching: a systematic review of its roles and impacts in multilingual contexts. African Journal of Teacher Education, 14(1), 142-175.

. Ranathunga, S., Lee, E. S. A., Prifti Skenduli, M., Shekhar, R., Alam, M., & Kaur, R. (2023). Neural machine translation for low-resource languages. A survey. ACM Computing Surveys, , 55(11), 1-37.

. Sayers, D., Sousa-Silva, R., Höhn, S., Ahmedi, L., Allkivi-Metsoja, K., Anastasiou, D., ... & Yayilgan, S. Y. (2021). The Dawn of the Human-Machine Era: A forecast of new and emerging language technologies. language technologies.

. Sherstinsky, A. (2020). Fundamentals of recurrent neural network (RNN) and long short-term memory (LSTM) network. Physica D: Nonlinear Phenomena, 404, 132306.

. Smit, P., Virpioja, S., & Kurimo, M. (2021). Advances in subword-based HMM-DNN speech recognition across languages. Computer Speech & Language, 66, 101158.

. Soydaner, D. . (2022). Attention mechanism in neural networks: where it comes and where it goes. Neural Computing and Applications, 34(16), 13371-13385.

. Yılmaz, E. B. (2018). Building a unified code-switching ASR system for South African languages. arXiv preprint arXiv, 1807.10949.

. nisr, (2022). Fifth Rwanda Population and Housing Census (2022 RPHC).

. Park, D. S., Chan, W., Zhang, Y., Chiu, C. C., Zoph, B., Cubuk, E. D., & Le, Q. V. (2019). Specaugment: A simple data augmentation method for automatic speech recognition. arXiv preprint arXiv, 1904.08779.

Ilori, O., Nwosu, N. T., & Naiho, H. N. N. (2024). Enhancing IT audit effectiveness with agile methodologies: A conceptual exploration. Engineering Science & Technology Journal, 5(6), 1969-1994.

. Wang, J., Huang, Y., Chen, C., Liu, Z., Wang, S., & Wang, Q. (2024). Software testing with large language models: Survey, landscape, and vision. IEEE Transactions on Software Engineering, 50(4), 911-936.

. Ai, X., Allaire, C., Calace, N., Czirkos, A., Elsing, M., Ene, I., ... & Zhang, J. (2022). A common tracking software project. Computing and Software for Big Science, 6(1), 8.

. Sharma, N., Baral, S., Paing, M. P., & Chawuthai, R. (2023). Parking time violation tracking using YOLOv8 and tracking algorithms. Sensors, 23(13), 5843.

. Rokis, K., & Kirikova, M. (2022, September). Challenges of low-code/no-code software development: A literature review. In International conference on business informatics research (pp. 3-17). Cham: Springer International Publishing.

. Kinoshita‐Ise, M., & Sachdeva, M. (2022). Update on trichoscopy: integration of the terminology by systematic approach and a proposal of a diagnostic flowchart. The Journal of Dermatology, 49(1), 4-18.




DOI: http://dx.doi.org/10.52155/ijpsat.v55.2.7815

Data citation

REFERENCES [1]. Ajani, Y. A., Tella, A., & Dlamini, N. P. (2024). Indigenous Language Preservation and Promotion through Digital Media Technology in the Fourth Industrial Revolution. Digital Media and the Preservation of Indigenous Languages in Africa: Toward a Digitaliz. [2].Alharbi, S., Alrazgan, M., Alrashed, A., Alnomasi, T., Almojel, R., Alharbi, R., ... & Almojil, M. (2021). Automatic speech recognition: Systematic literature review. Ieee Access, 9, 131858-131876. [3]. Ayvaz, U., Gürüler, H., Khan, F., Ahmed, N., & Bobomirzaevich, A. A. (2022). Automatic Speaker Recognition Using Mel-Frequency Cepstral Coefficients Through Machine Learning. Computers, Materials & Continua, 71(3). [4]. Besacier, L. B. (2014). Automatic speech recognition for under-resourced languages. A survey. Speech communication, 56, 85-100. [5]. Fayzullayeva, N., & Kamolova, M. (2025). PHONETICS AS THE STUDY OF THE ACTUAL SPEECH SOUNDS THAT CREATE WORDS IN A LANGUAGE. . Modern Science and Research, , 4(2), 46-52. [6]. Fendji, J. L. K. E., Tala, D. C., Yenke, B. O., & Atemkeng, M. (2022). Automatic speech recognition using limited vocabulary: A survey. Applied Artificial Intelligence, 36(1), 2095039. [7]. Hassini, K., Khalis, S., Habibi, O., Chemmakha, M., & Lazaar, M. (2024). An end-to-end learning approach for enhancing intrusion detection in Industrial-Internet of Things. . Knowledge-Based Systems, , 294, 111785. [8]. Huang, X., Qiao, L., Yu, W., Li, J., & Ma, Y. (2020). End-to-end sequence labeling via convolutional recurrent neural network with a connectionist temporal classification layer. International Journal of Computational Intelligence Systems, 13(1), 341-351. [9]. Kumar, Y. ( 2024). A comprehensive analysis of speech recognition systems in healthcare: current research challenges and future prospects. SN Computer Science, 5(1), 137. [10]. Li, J. (2022). Recent advances in end-to-end automatic speech recognition. APSIPA Transactions on Signal and Information Processing, 11(1). [11]. Myakala, P. K., & Naayini, P. . (2023). Bridging the Gap: Leveraging Transfer Learning for Low-Resource NLP Tasks. International Journal of Computer Techniques, 10(5). [12]. Pandey, L. L. (2024). Towards scalable efficient on-device ASR with transfer learning. . arXiv preprint arXiv, 2407.16664. Park, D. S., Chan, W., Zhang, Y., Chiu, C. C., Zoph, B., Cubuk, E. D., & Le, Q. V. (2019). Specaugment: A simple data augmentation method for automatic speech recognition. arXiv preprint arXiv, 1904.08779. [13]. Ramaila, S. (2025). The affordances of code-switching: a systematic review of its roles and impacts in multilingual contexts. African Journal of Teacher Education, 14(1), 142-175. [14]. Ranathunga, S., Lee, E. S. A., Prifti Skenduli, M., Shekhar, R., Alam, M., & Kaur, R. (2023). Neural machine translation for low-resource languages. A survey. ACM Computing Surveys, , 55(11), 1-37. [15]. Sayers, D., Sousa-Silva, R., Höhn, S., Ahmedi, L., Allkivi-Metsoja, K., Anastasiou, D., ... & Yayilgan, S. Y. (2021). The Dawn of the Human-Machine Era: A forecast of new and emerging language technologies. language technologies. [16]. Sherstinsky, A. (2020). Fundamentals of recurrent neural network (RNN) and long short-term memory (LSTM) network. Physica D: Nonlinear Phenomena, 404, 132306. [17]. Smit, P., Virpioja, S., & Kurimo, M. (2021). Advances in subword-based HMM-DNN speech recognition across languages. Computer Speech & Language, 66, 101158. [18]. Soydaner, D. . (2022). Attention mechanism in neural networks: where it comes and where it goes. Neural Computing and Applications, 34(16), 13371-13385. [19]. Yılmaz, E. B. (2018). Building a unified code-switching ASR system for South African languages. arXiv preprint arXiv, 1807.10949. [20]. nisr, (2022). Fifth Rwanda Population and Housing Census (2022 RPHC). [21]. Park, D. S., Chan, W., Zhang, Y., Chiu, C. C., Zoph, B., Cubuk, E. D., & Le, Q. V. (2019). Specaugment: A simple data augmentation method for automatic speech recognition. arXiv preprint arXiv, 1904.08779. [22] Ilori, O., Nwosu, N. T., & Naiho, H. N. N. (2024). Enhancing IT audit effectiveness with agile methodologies: A conceptual exploration. Engineering Science & Technology Journal, 5(6), 1969-1994. [23]. Wang, J., Huang, Y., Chen, C., Liu, Z., Wang, S., & Wang, Q. (2024). Software testing with large language models: Survey, landscape, and vision. IEEE Transactions on Software Engineering, 50(4), 911-936. [24]. Ai, X., Allaire, C., Calace, N., Czirkos, A., Elsing, M., Ene, I., ... & Zhang, J. (2022). A common tracking software project. Computing and Software for Big Science, 6(1), 8. [25]. Sharma, N., Baral, S., Paing, M. P., & Chawuthai, R. (2023). Parking time violation tracking using YOLOv8 and tracking algorithms. Sensors, 23(13), 5843. [26]. Rokis, K., & Kirikova, M. (2022, September). Challenges of low-code/no-code software development: A literature review. In International conference on business informatics research (pp. 3-17). Cham: Springer International Publishing. [27]. Kinoshita‐Ise, M., & Sachdeva, M. (2022). Update on trichoscopy: integration of the terminology by systematic approach and a proposal of a diagnostic flowchart. The Journal of Dermatology, 49(1), 4-18.

Refbacks

  • There are currently no refbacks.


Copyright (c) 2026 Igiraneza Lahairoi Bayingana, NTEZIRIZA NKERABAHIZI Josbert, HABIMANA Theodore

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License.