Chi-Square, Quantitative Corpus Linguistics, R programming language, metaphors, form-meaning pairing, lexical field of PANAS, Indonesian


This contribution discusses basic concepts of Chi-Square (χ2) test as a kind of analytical statistics and illustrates its application to one of the central issues in linguistics, namely form-meaning relationship. As a case study using Indonesian Web as Corpus from the Sketch Engine, this paper measures the association between morphosyntactic forms of words in the lexical field of panas ‘hot’ and their (non-)metaphorical usages. The χ2 test demonstrates a highly significant and robust association between the morphosyntactic form of words with the root panas ‘hot’ and their preference for (non-)metaphorical usages. The clear effects are shown by (i) the strong preference of the inchoative form memanas ‘to become hot’ for metaphorical usage, and (ii) the strong dispreference of dipanaskan ‘to be caused to be hot’ and panas ‘hot’ for metaphorical usage. This finding has implication on the predominant semantic trait of words with certain morphosyntactic forms, thus capturing the form-meaning relationship in language.


Arka, I. W. (2017). The core-oblique distinction in some Austronesian languages of Indonesia and beyond. Linguistik Indonesia, 35(2), 101–144. doi:10.26499/li.v35i2.58

Aryawibawa, I. N., & Ambridge, B. (2018). Is Syntax Semantically Constrained? Evidence From a Grammaticality Judgment Study of Indonesian. Cognitive Science, 1–14. doi:10.1111/cogs.12697

Baayen, R. H. (2008). Analyzing linguistic data: A practical introduction to statistics using R. Cambridge, UK ; New York: Cambridge University Press.

Deignan, A. (2006). The grammar of linguistic metaphors. In A. Stefanowitsch & S. T. Gries (Eds.), Corpus-based approaches to metaphor and metonymy (pp. 106–122). Berlin: Mouton de Gruyter.

Denistia, K., & Baayen, R. H. (2018). Pe- and PeN-: A corpus-based analysis in allomorphy. In K. E. Sukamto (Ed.), Prosiding Kongres Internasional Masyarakat Linguistik Indonesia (KIMLI) 2018 (pp. 179–183). Universitas Papua, Manokwari: Masyarakat Linguistik Indonesia (MLI). Retrieved from

Denistia, K., Bajestan, E. S., & Baayen, R. H. (2018, September). A semantic vector model for the Indonesian prefixes pe- and peN-. Presented at the 11th International Conference on the Mental Lexicon, Edmonton, Alberta, Canada. Retrieved from

Glynn, D. (2010). Corpus-driven cognitive semantics: Introduction to the field. In Dylan Glynn & Kerstin Fischer (Eds.), Quantitative methods in cognitive semantics: Corpus-driven approaches (pp. 1–41). Berlin: Mouton de Gruyter.

Gries, S. T. (2010). Useful statistics for corpus linguistics. In Aquilino Sánchez & Moisés Almela (Eds.), A mosaic of corpus linguistics: Selected approaches (pp. 269–291). Frankfurt am Main: Peter Lang. Retrieved from

Gries, S. T. (2013). Statistics for linguistics with R: A practical introduction (2nd ed.). Berlin: Mouton de Gruyter.

Gries, S. T. (2014). Basic significance testing. In R. J. Podesva & D. Sharma (Eds.), Research Methods in Linguistics (1st ed., pp. 316–336). Cambridge University Press. doi:10.1017/CBO9781139013734.017

Janda, L. A. (2013a). Quantitative methods in Cognitive Linguistics: An introduction. In L. A. Janda (Ed.), Cognitive Linguistics: The quantitative turn (pp. 1–32). Berlin: Mouton de Gruyter.

Janda, L. A. (2016). Linguistic profiles: A quantitative approach to theoretical questions. Język I Metoda, 127–145. Retrieved from

Janda, L. A. (Ed.). (2013b). Cognitive linguistics: The quantitative turn. Berlin: Mouton de Gruyter.

Kilgarriff, A., Baisa, V., BuÅ¡ta, J., JakubíÄek, M., Kovvář, V., Michelfeit, J., … Suchomel, V. (2014). The Sketch Engine: Ten years on. Lexicography, 1, 7–36.

Kuznetsova, J. (2015). Linguistic profiles: Going from form to meaning via statistics. Berlin: de Gruyter Mouton.

Levshina, N. (2015). How to do Linguistics with R: Data exploration and statistical analysis. John Benjamins Publishing Company.

Meyer, D., Zeileis, A., & Hornik, K. (2017). Vcd: Visualizing categorical data.

Moeljadi, D. (2011). Possessive verbal predicate constructions in Indonesian. Tokyo University Linguistic Papers, 31, 117–133. Retrieved from

Moeljadi, D. (2014). Usage of Indonesian possessive verbal predicates: A statistical analysis based on storytelling survey. Tokyo University Linguistic Papers, 35, 155–176. Retrieved from

Musgrave, S. (2013). Functional categories in the syntax and semantics of Malay. NUSA, 55, 135–152. Retrieved from

Pragglejaz Group. (2007). MIP: A method for identifying metaphorically used words in discourse. Metaphor and Symbol, 22(1), 1–39.

R Core Team. (2018). R: A language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing. Retrieved from

Rajeg, G. P. W. (2014). Metaphorical profiles of five Indonesian quasi-synonyms of ANGER: Multiple distinctive collexeme analysis. In Proceedings of the International Congress of the Linguistic Society of Indonesia 2014 (pp. 165–170). Bandar Lampung, Sumatra, Indonesia: Masyarakat Linguistik Indonesia (MLI). doi:10.4225/03/58578ddba1fd2

Rajeg, G. P. W. (2016). Exploring the semantics of near-synonyms via metaphorical profiles: A quantitative corpus-based study of Indonesian words for HAPPINESS. In Proceedings of the International Congress of The Linguistic Society of Indonesia (pp. 261–265). Universitas Udayana, Bali-Indonesia: Masyarakat Linguistik Indonesia (MLI) & Universitas Udayana. doi:10.4225/03/5913aec719240

Rajeg, G. P. W. (2018). Happyr: The accompanying R package for Rajeg’s (2018) PhD thesis titled “Metaphorical profiles and near-synonyms: A corpus-based study of Indonesian words for Happiness†(Version 0.1.0). doi:10.5281/zenodo.1436331

Rajeg, G. P. W., & Rajeg, I. M. (2017). Mempertemukan morfologi dan linguistik korpus: Kajian konstruksi pembentukan kata kerja [per-+Ajektiva] dalam Bahasa Indonesia. In I. N. Sudipa & M. S. Satyawati (eds.), Rona Bahasa: Buku persembahan kepada Prof. Dr. Aron Meko Mbete memasuki masa purnatugas (pp. 288–327). Denpasar, Bali, Indonesia: Swasta Nulus. doi:10.4225/03/5a0627de02453

Rajeg, G. P. W., & Rajeg, I. M. (2019). Analisis Koleksem Khas dan potensinya untuk kajian kemiripan makna konstruksional dalam Bahasa Indonesia. Etika Bahasa: Buku Persembahan Menapaki Usia Pensiun I Ketut Tika. forthcoming. doi:10.31227/

Rajeg, G. P. W., Denistia, K., & Musgrave, S. (2018, May). Semantic vector space model and the usage patterns of Indonesian denominal verbs with meN-, meN- -Kan, and meN- -i affixes. Presented at the Twenty-Second International Symposium on Malay/Indonesian Linguistics (ISMIL 22), The University of California, Los Angeles. doi:10.4225/03/5acffc60eb649

Rajeg, G. P. W., Denistia, K., & Rajeg, I. M. (2018). Working with a linguistic corpus using R: An introductory note with Indonesian negating construction. Linguistik Indonesia, 36(1), 1–36. doi:10.4225/03/5a7ee2ac84303

Rajeg, I. M. (2014). Metafora spesifik emosi Bahasa Indonesia: Kajian linguistik korpus. In Proceedings of the International Congress of the Linguistic Society of Indonesia 2014 (pp. 209–213). Bandar Lampung, Sumatra, Indonesia: Masyarakat Linguistik Indonesia (MLI). Retrieved from

Siahaan, P. (2011). HEAD and EYE in German and Indonesian figurative uses. In Z. A. Maalej & N. Yu (Eds.), Embodiment via Body Parts: Studies from various languages and cultures (pp. 93–114). Amsterdam/Philadelphia: John Benjamins Publishing Company.

Siahaan, P. (2015). Why is it not cool? Temperature terms in Indonesian. In M. Koptjevskaja-Tamm (Ed.), The Linguistics of Temperature (pp. 666–699). Amsterdam: John Benjamins Publishing Company.

Stefanowitsch, A. (2004). Quantitative thinking for corpus linguists [Tutorial]. Retrieved August 4, 2011, from

Stefanowitsch, A. (2010). Empirical cognitive semantics: Some thoughts. In Dylan Glynn & Kerstin Fischer (Eds.), Quantitative methods in cognitive semantics: Corpus-driven approaches (pp. 355–380). Berlin: Mouton de Gruyter.

Tummers, J., Heylen, K., & Geeraerts, D. (2005). Usage-based approaches in Cognitive Linguistics: A technical state of the art. Corpus Linguistics and Linguistic Theory, 1(2), 225–261.

Wickham, H. (2016). Ggplot2: Elegant graphics for data analysis. Springer-Verlag New York. Retrieved from

Wickham, H., & Grolemund, G. (2017). R for Data Science. Canada: O’Reilly. Retrieved from

Zeileis, A., Meyer, D., & Hornik, K. (2007). Residual-based shadings for visualizing (conditional) independence. Journal of Computational and Graphical Statistics, 16(3), 507–525.




How to Cite