When Lexical Data Speak: Reassessing the Genetic Classification of Indonesia’s Regional Languages
DOI:
https://doi.org/10.26499/li.v44i1.1065Keywords:
genetic classification, glottochronology, historical comparative linguistic, lexical similarityAbstract
This study analyzes the lexical similarity of ten regional languages in Indonesia, namely Jambi Malay (jax), Kerinci (kvr), Minangkabau (min), Banjar (bjn), Mentawai (mwv), Sasak (sas), Javanese (jav), Toba (tob), Angkola (akb), and Mandailing (btm), along with Indonesian (ind) as a lingua franca; the genetic status of the regional languages; and the separation times among the regional languages. Data were collected through field observations at ten locations, with three informants per language, gathering 257 glosses from core (L1), nature (L2), general (L3), and cultural (L4) vocabulary. The analysis was conducted in three stages: first, synchronic lexical similarity was calculated using the Jaccard method; second, genetic relationships were analyzed through lexicostatistics based on L1 and L2; third, glottochronology was used to estimate language separation times among the regional languages. The results indicate that no language pairs share high similarity; most fall into the low-to-moderate similarity category. Lexicostatistical analysis reveals that core languages (jax, kvr, min, and bjn) and peripheral languages (tob, akb, and btm) form distinct genetic families, while mwv is the most lexically isolated language. Estimates of separation times indicate that core languages have a more recent lineage, while other languages such as mwv, jav, and tob show earlier divergence periods. These findings confirm that geographic proximity does not always correlate with linguistic relationships and suggest the need to revise the classification of Indonesian languages available in online databases, particularly the position of mwv, which should be reclassified as part of the Barrier Island language group rather than part of the Sumatran group. This study also highlights the importance of using primary data in language documentation to provide a more accurate map of linguistic evolution for regional languages in the archipelago.
References
Adelaar, A. (2011). Austronesian linguistics. Oxford University Press. 10.1093/obo/9780199772810-0055
Adelaar, A., & Himmelmann, N. (2004). The Austronesian languages of Asia and Madagascar. Routledge.
Adelaar, A., & Schapper, A. (2024). The Oxford guide to the Malayo-Polynesian languages of Southeast Asia. Oxford University Press. https://global.oup.com/academic/product/the-oxford-guide-to-the-malayo-polynesian-languages-of-southeast-asia-9780198807353
Afria, R., Izar, J., Prawolo, I. S., & Arezky, B. (2020). Relasi bahasa Melayu Riau, Bugis, dan Banjar: Kajian linguistik historis komparatif. Medan Makna: Jurnal Ilmu Kebahasaan dan Kesastraan, 18(1), 94–106. https://doi.org/10.26499/mm.v18i1.2330
Amri, U. (2017). Identifikasi fonologis dan leksikal Bahasa Minangkabau isolek Nagari Pariangan [Master Thesis, Universitas Andalas]. http://scholar.unand.ac.id/56808/
Amri, U. (2022). Variasi fonologis fonem vokal Bahasa Minangkabau isolek Nagari Pariangan. Islamic Manuscript of Linguistics and Humanity, 4(1), 89–107. https://ejournal.uinib.ac.id/jurnal/index.php/imlah/article/download/4554/2761
Amri, U., Husna, L., Kartika Putri, A., Zubaidah, Z., & Pratiwi, A. (2025). Fonem konsonan, vokal, dan diftong dalam bahasa Minangkabau dan bahasa Korea: Kajian linguistik kontrastif. Puitika, 1(1), 83–102. https://doi.org/10.25077/puitika.v21i1.689
Amri, U., Nadra, N., & Yusdi, M. (2020). Variasi leksikal bahasa Minangkabau di Nagari Tuo Pariangan. Nusantara: Jurnal Ilmu Pengetahuan Sosial, 7(1), 52–78. http://dx.doi.org/10.31604/jips.v7i1.2020.52-78
Amri, U., Putra, Y. M., Putri, A. K., Triandana, A., & Fitriah, S. (2024). A comparative analysis of lexical variation of verbs in Minangkabau and Banjar languages: Historical comparative linguistic study. Vivid: Journal of Language and Literature, 13(2), 185–193. https://doi.org/10.25077/vj.13.2.185-193.2024
Amri, U., Sirait, J. V., & Dewi, H. (2025). Kajian lintas bahasa variasi leksikal peralatan rumah tangga pada isolek Minangkabau dan Batak Toba. Prosiding Seminar Nasional Humaniora, 4, 193–202. https://conference.unja.ac.id/SNH/article/view/391
Anderbeck, K. R. (2008). Malay dialects of the Batanghari river basin (Jambi, Sumatra). SIL International. https://www.sil.org/resources/publications/entry/9245
Aritonang, I. Y., & Silalahi, D. A. (2022). Perubahan bunyi bahasa Proto-Austronesia ke dalam bahasa Batak dialek Toba. Talenta Conference Series: Local Wisdom, Social, and Arts (LWSA), 5(1), 106–111. https://doi.org/10.32734/lwsa.v5i1.1331
Armis, M. K., Harahap, A. I., & Syarfina, T. (2023). Analisis prosodi kajian fonetik akustik pada Bahasa Batak Angkola. Fon: Jurnal Pendidikan Bahasa dan Sastra Indonesia, 19(1), 158–165. https://doi.org/10.25134/fon.v19i1.6878
Badan Pengembangan dan Pembinaan Bahasa. (2025). Persebaran bahasa daerah berdasarkan provinsi [Indonesian Government]. Data Pokok Kebahasaan dan Kesastraan. https://dapobas.dikdasmen.go.id/home?show=isidata&id=195
Bag, S., Kumar, S. K., & Tiwari, M. K. (2019). An efficient recommendation generation using relevant Jaccard similarity. Information Sciences, 483, 53–64. https://doi.org/10.1016/j.ins.2019.01.023
Batubara, M. H. (2025). The position of bahasa Mandailing within the linguistic affiliation of nusantara languages: A systematic literature review. Seunebok Lada: Jurnal ilmu-ilmu Sejarah, Sosial, Budaya dan Pendidikan, 12(2), 552–565. https://doi.org/10.33059/jsnbl.v12i2.12933
Bellwood, P. (2024). The origins and spread of agriculture in the Indo-Pacific region: Gradualism and diffusion or revolution and colonization? In D. R. Harris (Ed.), The origins and spread of agriculture and pastoralism in Eurasia (pp. 465–498). Routledge. https://www.taylorfrancis.com/chapters/edit/10.4324/9780203020906-6/origins-spread-agriculture-david-harris
Billings, B., & McDonnell, B. (2024). Sumatran. Oceanic Linguistics, 63(1), 112–174. https://doi.org/10.1353/ol.2024.a928205
Blust, R. A. (2013). The Austronesian languages (Revised Edition). Asia-Pacific Linguistics Research School of Pacific and Asian Studies of The Australian National University. http://hdl.handle.net/1885/10191
Blust, R. A. (2015). 35 Southeast Asian islands and Oceanic Austronesian linguistic history. In P. Bellwood (Ed.), The global history of human migration: (pp. 276–283). Willey Blackwell. https://www.researchgate.net/publication/313988992_35_Southeast_Asian_islands_and_Oceania_Austronesian_linguistic_history
Bowern, C. (2013). Relatedness as a factor in language contact. Journal of Language Contact, 6(2), 411–432. https://doi.org/10.1163/19552629-00602010
Budiono, S., Novita, R., & Syarfina, T. (2023). Mentawai language variations in the Mentawai Islands Regency, West Sumatra Province. Jurnal Arbitrer, 10(1), 8–18. https://doi.org/10.25077/ar.10.1.8-18.2023
Burhanuddin, B., Melani, B. Z., & Saharudin, S. (2025). Austronesian’s traces in Sasak: Historical linguistics study. Jurnal Arbitrer, 12(2), 238–258. https://doi.org/10.25077/ar.12.2.238-258.2025
Campbell, L., & Grondona, V. (2008). Ethnologue: Languages of the world. Language, 84(3), 636–641. https://doi.org/10.1353/lan.0.0054
Capell, A. (1982). Bezirkssprachen im gebiet des UAN. Gava’: Studies in Austronesian Languages and Cultures Dedicated to Hans Kähler, 1–14. https://www.semanticscholar.org/paper/GAVA%CA%BF-%3A-studies-in-Austronesian-languages-and-%3A-to Carle/570d3c46c810ff9afd0dd2d1239ceca9826e0e69
Casasanto, D. (2008). Similarity and proximity: When does close in space mean close in mind? Memory & Cognition, 36(6), 1047–1056. https://doi.org/10.3758/MC.36.6.1047
Cavalli-Sforza, L. L. (1997). Genes, peoples, and languages. Proceedings of the National Academy of Sciences, 94(15), 7719–7724. https://doi.org/10.1073/pnas.94.15.7719
Collin, R. O. (2010). Ethnologue. Ethnopolitics, 9(3–4), 425–432. https://doi.org/10.1080/17449057.2010.502305
Cornwell, S. E. (2019). Language classification in the Ethnologue and its consequences. Proceedings of the Annual Conference of CAIS/Actes du congrès annuel de l’ACSI. https://doi.org/10.29173/cais1104
Crouch, S. E. (2009). Voice and verb morphology in Minangkabau, a language of West Sumatra, Indonesia [Master Thesis, The University of Western Australia]. https://pure.mpg.de/rest/items/item_886558_2/component/file_886556/content
Dalimunthe, S. (2018). Hubungan kekerabatan bahasa Batak Mandailing dan bahasa Tanah Ulu (Suatu kajian linguistik historis komparatif). Medan Makna: Jurnal Ilmu Kebahasaan Dan Kesastraan, 16(1), 84–91. https://doi.org/10.26499/mm.v16i1.2276
Dewanti, R., & Zainuddin. (2024). Kinship relationship between Mandailing and Toba languages: A comparative historical linguistic study. JALC: Journal of Applied Linguistic and Studies of Cultural, 2(2), 23–28.
https://jurnal.rahiscendekiaindonesia.co.id/index.php/jalc/article/view/529
Eberhard, D. M., Gary F., S., & Fennig, C. D. (2025). Ethnologue [Online Encyclopedia]. Ethnologue. https://www.ethnologue.com/
Edwards, O. (2015). The position of Enggano within Austronesian. Oceanic Linguistics, 54(1), 54–109. https://doi.org/10.1353/ol.2015.0001
Endriani, H., Ernanda, & Afria, R. (2023). Alih kode dialek Kecamatan Danau Kerinci dengan bahasa Korea: Studi kasus pada penggemar budaya Korea. Kajian Linguistik Dan Sastra, 2(3), 293–304. https://doi.org/10.22437/kalistra.v2i3.24358
Ermanto. (2025). Linguistik historis komparatif: Teori dan praktik penentuan kekerabatan bahasa di dunia. PT. Raja Grafindo Persada.
Ermanto, & Emidar. (2018). Perbandingan bahasa Minangkabau, Kerinci, dan Mentawai: Suatu tinjauan linguistik historis komparatif. Universitas Negeri Padang Press. https://www.researchgate.net/publication/328344663_Perbandingan_Bahasa_Minangkabau_Kerinci_dan_Mentawai_Suatu_Tinjauan_Linguistik_Historis_Komparatif
Ernanda. (2015). Phrasal alternation in the Pondok Tinggi dialect of Kerinci: An intergenerational analysis. Wacana, 16(2), 355–382. https://doi.org/10.17510/wacana.v16i2.382
Ernanda. (2017). Phrasal alternation in Kerinci. Wacana, 18(3), 791–812. https://doi.org/10.17510/wacana.v18i3.637
Ernanda. (2018). Pemilihan bahasa dan sikap bahasa pada masyarakat Pondok Tinggi Kerinci. Titian: Jurnal Ilmu Humaniora, 2(2), 193–211.
https://doi.org/10.22437/titian.v2i02.6087
Ernanda. (2020). The referential uses of demonstratives in Kerinci Malay, Indonesia. Arbitrer, 7(2), 118–127. https://doi.org/10.25077/ar.7.2.118-127.2020
Ernanda. (2021). Some notes on the Semerap dialect of Kerinci and its historical development. Wacana, Journal of the Humanities of Indonesia, 22(1), 4. https://doi.org/10.17510/wacana.v22i1.978
Ernanda, Ekarina, & Arief, N. (2025). Ornamental replication in multilingual Duano speakers. WORD, 71(3), 131–156. https://doi.org/10.1080/00437956.2025.2540183
Ernanda, & Yap, F. H. (2024). Phrasal alternation and Kerinci demonstrative (i) neh: Implications for spatial and socio-interactional deixis. Journal of Pragmatics, 222, 40–59. https://doi.org/10.1016/j.pragma.2023.12.002
Farid, R. N. (2012). Bahasa Banjar: Its varieties and characteristics (A conceptual description of Bahasa Banjar in sociolinguistics point of view). Language Maintenance and Shift II, 2, 517–521. https://www.academia.edu/93387049/Bahasa_Banjar_Its_Varieties_and_ Characteristics_A_Conceptualdescription_of_Bahasa_Banjar_in_Sociolinguistics_Point_of_View_
Fatria, M., Ernanda, & Afria, R. (2023). Analisis relasi makna sinonim dan antonim bahasa Kerinci dialek Tebing Tinggi Kecamatan Danau Kerinci. Kajian Linguistik dan Sastra, 2(2), 114–121. https://doi.org/10.22437/kalistra.v2i2.23184
Febrina, R. (2014). Geografi dialek bahasa Mentawai di Kecamatan Siberut Selatan [Master Thesis, Universitas Andalas]. http://scholar.unand.ac.id/7938/
Greenhill, S. J., Blust, R., & Gray, R. (2025a). Austronesian basic vocabulary database. Austronesian Basic Vocabulary Database. https://abvd.eva.mpg.de/austronesian/
Greenhill, S. J., Blust, R., & Gray, R. D. (2008). The Austronesian basic vocabulary database: From bioinformatics to lexomics. Evolutionary Bioinformatics, 4, EBO-S893. https://doi.org/10.4137/EBO.S893
Greenhill, S. J., Blust, R., & Gray, R. D. (2025b). Basic vocabulary database: Mentawai. Austronesian Basic Vocabulary Database.
https://abvd.eva.mpg.de/austronesian/search.php?type=language&query=mentawai
Gudschinsky, S. C. (1956). The ABC’s of lexicostatistics (glottochronology). Word, 12(2), 175–210. https://doi.org/10.1080/00437956.1956.11659599
Hammarström, H. (2015). Ethnologue 16/17/18th editions: A comprehensive review. Language, 91(3), 723–737. https://doi.org/10.1353/lan.2015.0038
Harmedianti, H., Ernanda, & Afria, R. (2023). Variasi leksikal bahasa Kerinci isolek desa-desa di Kecamatan Depati Tujuh Kabupaten Kerinci: Kajian dialektologi. Jurnal Kalistra: Kajian Bahasa dan Sastra, 1(3), 257–270. https://doi.org/10.22437/kalistra.v1i3.20307
Harvina, H., Fariani, F., Putra, D. K., Simanjuntak, H., & Sihotang, D. (2017). Daliha na tolu pada masyarakat Batak Toba di Kota Medan. Balai Pelestarian dan Nilai Budaya Aceh. https://repositori.kemendikdasmen.go.id/24438/
Hugh, V. (2012). Ethnologue: The linguistic straw-man. The Journeyler. https://hugh.thejourneyler.org/2012/ethnologue-the-linguistic-straw-man/
Huisman, J. L., Franco, K., & van Hout, R. (2021). Linking linguistic and geographic distance in four semantic domains: Computational geo-analyses of internal and external factors in a dialect continuum. Frontiers in Artificial Intelligence, 4, 668–035. https://doi.org/10.3389/frai.2021.668035
Humaeni, A., Ulumi, H. F. B., & Heryatun, Y. (2011). Peta bahasa masyarakat Banten. Laboratorium Bantenologi IAIN Sultan Maulana Hasanuddin. https://repository.uinbanten.ac.id/4238/1/Peta%20Bahasa.pdf
Kamiura, M., & Sekine, R. (2023). Jaccard matrix for nonlinear filter statistics. SICE Journal of Control, Measurement, and System Integration, 16(1), 152–163. https://doi.org/10.1080/18824889.2023.2194169
Kawi, D. (1991). Bahasa Banjar: Dialek dan subdialeknya [Doctoral Dissertation, Universitas Indonesia]. https://lontar.ui.ac.id/detail?id=83540
Kisyani, L., & Savitri, A. D. (2009). Dialektologi. Unesa University Press.
Klamer, M. (2018). Documenting the linguistic diversity of Indonesia: Time is running out. In Santri. E. P. Djahimo (Ed.), Revitalization of local languages as the pillar of pluralism (pp. 1–10). Satya Wacana University Press. https://www.researchgate.net/publication/ 363832819_ISBN_Proceedings-_International_Conference_on_Local_Languages_ Revitalization_on_Local_Languages_as_the_Pillar_of_Pluralism
Kroeber, A. L. (1955). Linguistic time depth results so far and their meaning. International Journal of American Linguistics, 21(2), 91–104. https://doi.org/10.1086/464318
Lenggang, Z., Nio, B. K. H., Ansyar, M., Zainil, & Adam, S. (1978). Bahasa Mentawai. Pusat Pembinaan dan Pengembangan Bahasa, Departemen Pendidikan dan Kebudayaan. https://repositori.kemendikdasmen.go.id/2366/
Mahriyuni, Isda Pramuniati, & Rizky Ainun Maftuhah. (2023). Lexicostatistics of Javanese and Sasak languages: Comparative historical linguistic studies. Mimbar Ilmu, 28(1), 124–130. https://doi.org/10.23887/mi.v28i1.59797
Mahsun. (1995). Dialektologi diakronis: Sebuah pengantar. Gadjah Mada University Press.
Majumdar, D. (2025). Introduction to lexical similarity. Language Technology and Data Analysis Laboratory. https://ladal.edu.au/tutorials/lexsim/lexsim.html#jaccard-similarity
Meliana, R., Manalu, M. M. S., & Triyono, S. (2024). Tracing the linguistic roots of Malay and Batak languages in Sumatra Island: A historical comparative study. OKARA: Jurnal Bahasa dan Sastra, 18(1), 142–164. https://doi.org/10.19105/ojbs.v18i1.12865
Nadra & Reniwati. (2023). Dialektologi teori dan metode (2nd ed.). Textium. https://grahailmu.id/textium/produk/dialektologi-edisi-2-teori-dan-metode/
Naim, M. (2013). Merantau pola migrasi suku Minangkabau (3rd ed.). PT Raja Grafindo Persada. https://www.scribd.com/document/851271920/Merantau-Pola-Migrasi-Suku-Minangkabau
Nalee, M. A., Nadra, N., & Yusdi, M. (2020). Hubungan kekerabatan bahasa Melayu Patani dengan bahasa Minangkabau. Madah: Jurnal Bahasa dan Sastra, 11(1), 43–56. http://dx.doi.org/10.31503/madah.v11i1.225
Nasution, H. S. (2024). Comparative analysis of word formation and particle of language on Angkola Barumun and Angkola Tapanuli Selatan language: Written text taken from WA script. Indonesian Journal of Education, Social Sciences and Research (IJESSR), 5(2), 17–28. https://doi.org/10.30596/ijessr.v5i2.20399
Nothofer, B. (1986). The barrier island languages in the Austronesian language family. In P. Geraghty, L. Carrington, & S. A. Wurm (Eds.), FOCAL II: Papers from the Fourth International Conference on Austronesian Linguistics (Vol. 94, pp. 87–109). Department of Linguistics Research School of Pacific Studies The Australian National University. https://openresearch-repository.anu.edu.au/bitstreams/749ab386-9a3e-49d8-bd0e-7929cec4c069/download
Oakes, M. P. (2009). Javanese. In B. Comrie (Ed.), The world’s major languages (pp. 830–843). Routledge. https://doi.org/10.4324/9780203301524
Padilla-Iglesias, C., Gjesfjeld, E., & Vinicius, L. (2020). Geographical and social isolation drive the evolution of Austronesian languages. PLOS ONE, 15(12), e0243171. https://doi.org/10.1371/journal.pone.0243171
Paolillo, J. C., & Das, A. (2006). Evaluating language statistics: The Ethnologue and beyond [Contract report for UNESCO Institute for Statistics]. UNESCO Institute for Statistics. https://www.academia.edu/download/92975/UNESCO_report_Paolillo_Das.pdf
Percival, W. K. (1981). In A grammar of the urbanised Toba-Batak of Medan (Vol. 76). Departement of Linguistics, Research School of Pacific Studies, The Australian National University.
Petroni, F., & Serva, M. (2010). Measures of lexical distance between languages. Physica A: Statistical Mechanics and Its Applications, 389(11), 2280–2283. https://doi.org/10.1016/j.physa.2010.02.004
Pratiwi, A., Revita, I., Fauzanna, W., Ghaniyyah, M., & Amri, U. (2025). Speech acts in Minangkabau language during commercial transactions in Mentawai’s traditional market: A case study in Pasar Raya Muara Siberut. Andalas International Journal of Socio-Humanities, 7(1), 31–40. https://doi.org/10.25077/aijosh.v7i1.80
Rahman, H. (2016). ’Merantau’—An informal entrepreneurial learning pattern in the culture of Minangkabau tribe in Indonesia. DeReMa (Development Research of Management): Jurnal Manajemen, 11(1), 15–34. https://doi.org/10.19166/derema.v11i1.186
Ross, M. (1996). On the origin of the term “Malayo-Polynesian.” Oceanic Linguistics, 35(1), 143–145.
Saragih, E. L. L. & Mulyadi. (2020). Pola pembentukan konstruksi verba serial dalam bahasa Batak Toba (Teori X‑Bar). GERAM (Gerakan Aktif Menulis), 8(1), 1–8. https://doi.org/10.25299/geram.2020.4432
Schreier, D. (2009). Language in isolation, and its implications for variation and change. Language and Linguistics Compass, 3(2), 682–699. https://doi.org/10.1111/j.1749-818X.2009.00130.x
Sholeha, M. (2022). Kekerabatan bahasa Melayu Jambi dan Melayu Palembang. Kabastra: Kajian Bahasa Dan Sastra, 2(1). https://doi.org/10.31002/kabastra.v2i1
Sholeha, M., & Hendrokumoro, H. (2022). Kekerabatan bahasa Kerinci, Melayu Jambi, dan Minangkabau. Diglosia: Jurnal Kajian Bahasa, Sastra, Dan Pengajarannya, 5(2), 399–420. https://doi.org/10.30872/diglosia.v5i2.404
Siregar, E. D., Ernanda, & Afria, R. (2022). Perubahan bunyi bahasa Proto Austronesia (PAN) pada bahasa Karo, bahasa Toba, bahasa Pakpak, bahasa Simalungun, bahasa Mandailing dan bahasa Angkola: Kajian linguistik historis komparatif dan fonologi. Kalistra: Kajian Linguistik dan Sastra, 1(2), 116. https://doi.org/10.22437/kalistra.v1i2.20294
Sneddon, J. (2003). The Indonesian language. University of New South Wales Press. http://ndl.ethernet.edu.et/bitstream/123456789/2877/1/80.pdf.pdf
Steinhauer, H. (2018). Sound-changes and loanwords in Sungai Penuh Kerinci. Wacana, Journal of the Humanities of Indonesia, 19(2), 5. https://doi.org/10.17510/wacana.v19i2.708
Tim Pemetaan Bahasa. (2017). Bahasa dan peta bahasa di Indonesia. Kementerian Pendidikan dan Kebudayaan.
Tim Pemetaan Bahasa. (2018). Pedoman penelitian pemetaan bahasa. Pusat Pengembangan dan Pelindungan Bahasa dan Sastra, Kemdikbud. http://repositori.kemdikbud.go.id/id/eprint/22496
Wahab, M. K. A., & Halin, A. K. C. (2021). Leksikostatistik dan glotokronologi antara bahasa Banjar dengan bahasa Melayu: Kajian linguistik sejarah dan Perbandingan. Jurnal Kesidang, 6(1), 44–61.
https://www.unimel.edu.my/journal/index.php/JK/article/view/969
Wilymafidini, O. (2017). An analysis of the dominance of Minang dialect in Kerinci society. Inovish Journal, 2(2), 63–78.
https://ejournal.polbeng.ac.id/index.php/IJ/article/view/234
Zein, S. (2020). Language policy in superdiverse Indonesia (1st Edition). Routledge. https://doi.org/10.4324/9780429019739
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2026 Linguistik Indonesia

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
The name and email address in this journal will only be used for the benefit of the Indonesian Linguistics journal and will not be used for other purposes.




