• Gede Primahadi Wijaya Rajeg Monash University
  • Karlina Denistia Eberhard Karls University of Tübingen
  • I Made Rajeg Universitas Udayana



R programming language, Quantitative Corpus Linguistics, Distinctive Collexeme Analysis, Indonesian Negating Constructions


This paper demonstrates the use of R for a unified data science in corpus linguistics via a series of corpus-based analyses on Indonesian Negating Construction. The data is based on c17-million word-tokens of an online-news corpus, a part of the Indonesian Leipzig Corpora. We identified that tidak is the most frequent form in our corpus. Next, we found that tak has significantly higher type frequency for negated-predicates with [ter-X-kan] schema compared to tidak; this finding provides a quantitative nuance against a description in an Indonesian reference grammar, stating that (i) in present-day Indonesian tidak is also common to negate ter- related predicates, while (ii) the compulsoriness of tak to negate ter- predicates is a past usage. Lastly, we refine our second finding by applying Distinctive Collexeme Analysis to determine that tak strongly attracts specific verbs predominantly in the [ter-X-kan] schema compared to tidak; this finding offers a deeper characterisation for tidak and tak.


Anthony, L. (2014). AntConc (Version 3.4.3). Tokyo, Japan: Waseda University. Retrieved from

Arka, I. W. (2010). Dynamic and stative passives in Indonesian & their computational implementation. Paper presented at the MALINDO Workshop, Jakarta: Paper.

Baayen, R. H. (2008). Analyzing linguistic data: A practical introduction to statistics using R. Cambridge, UK ; New York: Cambridge University Press.

Biemann, C., Heyer, G., Quasthoff, U., & Richter, M. (2007). The Leipzig Corpora Collection: Monolingual corpora of standard size. In M. Davies, P. Rayson, S. Hunston, & P. Danielsson (Eds.), Proceedings of the Corpus Linguistics Conference. University of Birmingham, UK. Retrieved from

Desagulier, G. (2017). Corpus linguistics and statistics with R: Introduction to quantitative methods in linguistics. New York, NY: Springer Berlin Heidelberg.

Diessel, H. (2015). 14. Usage-based construction grammar. In E. Dabrowska & D. Divjak (Eds.), Handbook of Cognitive Linguistics (pp. 296–322). Berlin ; Boston: De Gruyter Mouton.

Dinakaramani, A., Rashel, F., Luthfi, A., & Manurung, R. (2014). Designing an Indonesian part of speech tagset and manually tagged Indonesian corpus. In 2014 International Conference on Asian Language Processing (IALP) (pp. 66–69). doi:10.1109/IALP.2014.6973519

Flanagan, J. (2017). Reproducible research: Strategies, tools, and workflows. Studies in Variation, Contacts and Change in English, 19. Retrieved from

Goldberg, A. E. (1995). Constructions: A construction grammar approach to argument structure. Chicago: University of Chicago Press.

Goldberg, A. E. (2006). Constructions at work: The nature of generalization in language. Oxford ; New York: Oxford University Press.

Goldhahn, D., Eckart, T., & Quasthoff, U. (2012). Building large monolingual dictionaries at the Leipzig Corpora Collection: From 100 to 200 languages. In Proceedings of the 8 Th Language

Resources and Evaluation Conference (LREC) 2012 (pp. 759–765). Istanbul. Retrieved from

Gries, S. T. (2009a). Quantitative Corpus Linguistics with R: A Practical Introduction. New York: Routledge.

Gries, S. T. (2009b). Statistics for linguistics with R: A practical introduction. Berlin: Mouton de Gruyter.

Gries, S. T. (2012). Collostructions. In P. J. Robinson (Ed.), The Routledge encyclopedia of second language acquisition (pp. 92–95). London: Routledge.

Gries, S. T. (2013a). 50-something years of work on collocations: What is or should be next …. International Journal of Corpus Linguistics, 18(1), 137–166. doi:10.1075/ijcl.18.1.09gri

Gries, S. T. (2013b). Corpus linguistics: Quantitative methods. In C. A. Chapelle (Ed.), The Encyclopedia of Applied Linguistics (Vols. 1–10, pp. 1380–1385). Chichester, West Sussex, UK: Blackwell Publishing Ltd. doi:10.1002/9781405198431.wbeal0258

Gries, S. T. (2013c). Statistics for linguistics with R: A practical introduction (2 nd). Berlin: Mouton de Gruyter.

Gries, S. T. (2014). Coll.Analysis 3.5. A script for R to compute perform collostructional analysis. Retrieved from

Gries, S. T., & David, C. V. (2007). This is kind of / sort of interesting: Variation in hedging in English. Towards Multimedia in Corpus Studies, 2. Retrieved from

Gries, S. T., & Stefanowitsch, A. (2004). Extending collostructional analysis: A corpus-based perspective on ’alternations’. International Journal of Corpus Linguistics, 9(1), 97–129.

Gries, S. T., Hampe, B., & Schönefeld, D. (2005). Converging evidence: Bringing together experimental and corpus data on the association of verbs and constructions. Cognitive Linguistics, 16(4), 635–676.

Hilpert, M. (2006). Distinctive collexeme analysis and diachrony. Corpus Linguistics and Linguistic Theory, 2(2), 243–256.

Hilpert, M. (2014). Collostructional analysis: Measuring associations between constructions and lexical elements. In D. Glynn & J. A. Robinson (Eds.), Corpus methods for semantics: Quantitative studies in polysemy and synonymy (pp. 391–404). Amsterdam: John Benjamins Publishing Company.

Janda, L. A. (2013). Quantitative methods in Cognitive Linguistics: An introduction. In L. A. Janda (Ed.), Cognitive Linguistics: The quantitative turn (pp. 1–32). Berlin: Mouton de Gruyter.

Janda, L. A., & Lyashevskaya, O. (2013). Semantic profiles of five Russian prefixes: Po-, s-, Za-, Na-, Pro-. Journal of Slavic Linguistics, 21(2), 211–258. doi:10.1353/jsl.2013.0012

Kroeger, P. (2014). External negation in Malay/Indonesian. Language, 90(1), 137–184. doi:10.1353/lan.2014.0000

Larasati, S. D., Kuboň, V., & Zeman, D. (2011). Indonesian Morphology Tool (MorphInd): Towards an Indonesian Corpus. In Systems and Frameworks for Computational Morphology (pp. 119–129). Springer, Berlin, Heidelberg. doi:10.1007/978-3-642-23138-4_8

Levshina, N. (2015). How to do Linguistics with R: Data exploration and statistical analysis. John Benjamins Publishing Company.

R Core Team. (2016). R: A language and environment for statistical computing (Version 3.2.4 – “Very Secure Dishesâ€). R Foundation for Statistical Computing, Vienna, Austria. Retrieved from

Sanchez, G. (2013). Handling and processing strings in R. Berkeley: Trowchez Editions. Retrieved from and Processing Strings in R.pdf

Sneddon, J. N. (2006). Colloquial Jakartan Indonesian. Canberra, Australia: Pacific Linguistics, Research School of Pacific and Asian Studies, The Australian National University.

Sneddon, J. N., Adelaar, A., Djenar, D. N., & Ewing, M. C. (2010). Indonesian reference grammar (2 nd). Crows Nest, New South Wales, Australia: Allen & Unwin.

Stefanowitsch, A. (2005). The function of metaphor: Developing a corpus-based perspective. International Journal of Corpus Linguistics, 10(2), 161–198. doi:10.1075/ijcl.10.2.03ste

Stefanowitsch, A. (2013). Collostructional analysis. In T. Hoffmann & G. Trousdale (Eds.), The Oxford handbook of Construction Grammar. Oxford: Oxford University Press. doi:10.1093/oxfordhb/9780195396683.013.0016

Stefanowitsch, A. (2014). Collostructional analysis: A case study of the English into-causative. In T. Herbst, H.-J. Schmid, & S. Faulhaber (Eds.), Constructions collocations patterns. Berlin ; Boston: Walter De Gruyter, GmbH.

Stefanowitsch, A., & Gries, S. T. (2003). Collostructions: Investigating the interaction of words and constructions. International Journal of Corpus Linguistics, 8(2), 209–243.

Stefanowitsch, A., & Gries, S. T. (2009). Corpora and grammar. In A. Lüdeling & M. Kytö (Eds.), Corpus linguistics: An international handbook (Vol. 2, pp. 933–951). Berlin: Mouton de Gruyter.

Wickham, H., & Grolemund, G. (2017). R for Data Science. Canada: O’Reilly. Retrieved from




How to Cite