Gede Primahadi Wijaya Rajeg, Karlina Denistia, I Made Rajeg


This paper demonstrates the use of R for a unified data science in corpus linguistics via a series of corpus-based analyses on Indonesian Negating Construction. The data is based on c17-million word-tokens of an online-news corpus, a part of the Indonesian Leipzig Corpora. We identified that tidak is the most frequent form in our corpus. Next, we found that tak has significantly higher type frequency for negated-predicates with [ter-X-kan] schema compared to tidak; this finding provides a quantitative nuance against a description in an Indonesian reference grammar, stating that (i) in present-day Indonesian tidak is also common to negate ter- related predicates, while (ii) the compulsoriness of tak to negate ter- predicates is a past usage. Lastly, we refine our second finding by applying Distinctive Collexeme Analysis to determine that tak strongly attracts specific verbs predominantly in the [ter-X-kan] schema compared to tidak; this finding offers a deeper characterisation for tidak and tak.


R programming language; Quantitative Corpus Linguistics; Distinctive Collexeme Analysis; Indonesian Negating Constructions

Full Text:



