WORKING WITH A LINGUISTIC CORPUS USING R: AN INTRODUCTORY NOTE WITH INDONESIAN NEGATING CONSTRUCTION

Gede Primahadi Wijaya Rajeg; Karlina Denistia; I Made Rajeg

doi:10.26499/li.v36i1.71

Authors

Gede Primahadi Wijaya Rajeg Monash University
Karlina Denistia Eberhard Karls University of TÃ¼bingen
I Made Rajeg Universitas Udayana

DOI:

https://doi.org/10.26499/li.v36i1.71

Keywords:

R programming language, Quantitative Corpus Linguistics, Distinctive Collexeme Analysis, Indonesian Negating Constructions

Abstract

This paper demonstrates the use of R for a unified data science in corpus linguistics via a series of corpus-based analyses on Indonesian Negating Construction. The data is based on c17-million word-tokens of an online-news corpus, a part of the Indonesian Leipzig Corpora. We identified that tidak is the most frequent form in our corpus. Next, we found that tak has significantly higher type frequency for negated-predicates with [ter-X-kan] schema compared to tidak; this finding provides a quantitative nuance against a description in an Indonesian reference grammar, stating that (i) in present-day Indonesian tidak is also common to negate ter- related predicates, while (ii) the compulsoriness of tak to negate ter- predicates is a past usage. Lastly, we refine our second finding by applying Distinctive Collexeme Analysis to determine that tak strongly attracts specific verbs predominantly in the [ter-X-kan] schema compared to tidak; this finding offers a deeper characterisation for tidak and tak.

References

Anthony, L. (2014). AntConc (Version 3.4.3). Tokyo, Japan: Waseda University. Retrieved from http://www.laurenceanthony.net

Arka, I. W. (2010). Dynamic and stative passives in Indonesian & their computational implementation. Paper presented at the MALINDO Workshop, Jakarta: Paper.

Baayen, R. H. (2008). Analyzing linguistic data: A practical introduction to statistics using R. Cambridge, UK ; New York: Cambridge University Press.

Biemann, C., Heyer, G., Quasthoff, U., & Richter, M. (2007). The Leipzig Corpora Collection: Monolingual corpora of standard size. In M. Davies, P. Rayson, S. Hunston, & P. Danielsson (Eds.), Proceedings of the Corpus Linguistics Conference. University of Birmingham, UK. Retrieved from http://ucrel.lancs.ac.uk/publications/CL2007/paper/190_Paper.pdf

Desagulier, G. (2017). Corpus linguistics and statistics with R: Introduction to quantitative methods in linguistics. New York, NY: Springer Berlin Heidelberg.

Diessel, H. (2015). 14. Usage-based construction grammar. In E. Dabrowska & D. Divjak (Eds.), Handbook of Cognitive Linguistics (pp. 296â€“322). Berlin ; Boston: De Gruyter Mouton.

Dinakaramani, A., Rashel, F., Luthfi, A., & Manurung, R. (2014). Designing an Indonesian part of speech tagset and manually tagged Indonesian corpus. In 2014 International Conference on Asian Language Processing (IALP) (pp. 66â€“69). doi:10.1109/IALP.2014.6973519

Flanagan, J. (2017). Reproducible research: Strategies, tools, and workflows. Studies in Variation, Contacts and Change in English, 19. Retrieved from http://www.helsinki.fi/varieng/series/volumes/19/flanagan/

Goldberg, A. E. (1995). Constructions: A construction grammar approach to argument structure. Chicago: University of Chicago Press.

Goldberg, A. E. (2006). Constructions at work: The nature of generalization in language. Oxford ; New York: Oxford University Press.

Goldhahn, D., Eckart, T., & Quasthoff, U. (2012). Building large monolingual dictionaries at the Leipzig Corpora Collection: From 100 to 200 languages. In Proceedings of the 8 Th Language

Resources and Evaluation Conference (LREC) 2012 (pp. 759â€“765). Istanbul. Retrieved from http://www.lrec-conf.org/proceedings/lrec2012/pdf/327_Paper.pdf

Gries, S. T. (2009a). Quantitative Corpus Linguistics with R: A Practical Introduction. New York: Routledge.

Gries, S. T. (2009b). Statistics for linguistics with R: A practical introduction. Berlin: Mouton de Gruyter.

Gries, S. T. (2012). Collostructions. In P. J. Robinson (Ed.), The Routledge encyclopedia of second language acquisition (pp. 92â€“95). London: Routledge.

Gries, S. T. (2013a). 50-something years of work on collocations: What is or should be next â€¦. International Journal of Corpus Linguistics, 18(1), 137â€“166. doi:10.1075/ijcl.18.1.09gri

Gries, S. T. (2013b). Corpus linguistics: Quantitative methods. In C. A. Chapelle (Ed.), The Encyclopedia of Applied Linguistics (Vols. 1â€“10, pp. 1380â€“1385). Chichester, West Sussex, UK: Blackwell Publishing Ltd. doi:10.1002/9781405198431.wbeal0258

Gries, S. T. (2013c). Statistics for linguistics with R: A practical introduction (2 nd). Berlin: Mouton de Gruyter.

Gries, S. T. (2014). Coll.Analysis 3.5. A script for R to compute perform collostructional analysis. Retrieved from http://www.linguistics.ucsb.edu/faculty/stgries/teaching/groningen/readme.txt

Gries, S. T., & David, C. V. (2007). This is kind of / sort of interesting: Variation in hedging in English. Towards Multimedia in Corpus Studies, 2. Retrieved from http://www.helsinki.fi/varieng/journal/volumes/02/gries_david/

Gries, S. T., & Stefanowitsch, A. (2004). Extending collostructional analysis: A corpus-based perspective on â€™alternationsâ€™. International Journal of Corpus Linguistics, 9(1), 97â€“129.

Gries, S. T., Hampe, B., & SchÃ¶nefeld, D. (2005). Converging evidence: Bringing together experimental and corpus data on the association of verbs and constructions. Cognitive Linguistics, 16(4), 635â€“676.

Hilpert, M. (2006). Distinctive collexeme analysis and diachrony. Corpus Linguistics and Linguistic Theory, 2(2), 243â€“256.

Hilpert, M. (2014). Collostructional analysis: Measuring associations between constructions and lexical elements. In D. Glynn & J. A. Robinson (Eds.), Corpus methods for semantics: Quantitative studies in polysemy and synonymy (pp. 391â€“404). Amsterdam: John Benjamins Publishing Company.

Janda, L. A. (2013). Quantitative methods in Cognitive Linguistics: An introduction. In L. A. Janda (Ed.), Cognitive Linguistics: The quantitative turn (pp. 1â€“32). Berlin: Mouton de Gruyter.

Janda, L. A., & Lyashevskaya, O. (2013). Semantic profiles of five Russian prefixes: Po-, s-, Za-, Na-, Pro-. Journal of Slavic Linguistics, 21(2), 211â€“258. doi:10.1353/jsl.2013.0012

Kroeger, P. (2014). External negation in Malay/Indonesian. Language, 90(1), 137â€“184. doi:10.1353/lan.2014.0000

Larasati, S. D., KuboÅˆ, V., & Zeman, D. (2011). Indonesian Morphology Tool (MorphInd): Towards an Indonesian Corpus. In Systems and Frameworks for Computational Morphology (pp. 119â€“129). Springer, Berlin, Heidelberg. doi:10.1007/978-3-642-23138-4_8

Levshina, N. (2015). How to do Linguistics with R: Data exploration and statistical analysis. John Benjamins Publishing Company.

R Core Team. (2016). R: A language and environment for statistical computing (Version 3.2.4 â€“ â€œVery Secure Dishesâ€). R Foundation for Statistical Computing, Vienna, Austria. Retrieved from http://www.R-project.org/

Sanchez, G. (2013). Handling and processing strings in R. Berkeley: Trowchez Editions. Retrieved from http://www.gastonsanchez.com/Handling and Processing Strings in R.pdf

Sneddon, J. N. (2006). Colloquial Jakartan Indonesian. Canberra, Australia: Pacific Linguistics, Research School of Pacific and Asian Studies, The Australian National University.

Sneddon, J. N., Adelaar, A., Djenar, D. N., & Ewing, M. C. (2010). Indonesian reference grammar (2 nd). Crows Nest, New South Wales, Australia: Allen & Unwin.

Stefanowitsch, A. (2005). The function of metaphor: Developing a corpus-based perspective. International Journal of Corpus Linguistics, 10(2), 161â€“198. doi:10.1075/ijcl.10.2.03ste

Stefanowitsch, A. (2013). Collostructional analysis. In T. Hoffmann & G. Trousdale (Eds.), The Oxford handbook of Construction Grammar. Oxford: Oxford University Press. doi:10.1093/oxfordhb/9780195396683.013.0016

Stefanowitsch, A. (2014). Collostructional analysis: A case study of the English into-causative. In T. Herbst, H.-J. Schmid, & S. Faulhaber (Eds.), Constructions collocations patterns. Berlin ; Boston: Walter De Gruyter, GmbH.

Stefanowitsch, A., & Gries, S. T. (2003). Collostructions: Investigating the interaction of words and constructions. International Journal of Corpus Linguistics, 8(2), 209â€“243.

Stefanowitsch, A., & Gries, S. T. (2009). Corpora and grammar. In A. LÃ¼deling & M. KytÃ¶ (Eds.), Corpus linguistics: An international handbook (Vol. 2, pp. 933â€“951). Berlin: Mouton de Gruyter.

Wickham, H., & Grolemund, G. (2017). R for Data Science. Canada: Oâ€™Reilly. Retrieved from http://r4ds.had.co.nz/