authorship analysis, electronic texts, forensic stylistic, WhatsApp, Indonesian Text


The most recent changes to the criteria in legal process for scientific evidence have emphasized scientific methods of authorship analysis. This study examined the authorship of electronic texts using a quantitative method based on forensic stylistics and computer technologies. This study uses 300 digital texts produced by 100 authors, including 100 questioned texts (Q-text) and 200 known texts (K-text). Personal texts of WhatsApp messages are used in this study as electronic texts. Authorship analysis was conducted by tracing the n-gram and testing all the text sets using the Similarity Comparison Method (SCM). Based on the results of the word 1-gram test, the SCM accuracy was found to be quite high, ranging from 85% to 96%. The findings of employing the tiny set are promising, with the various stylistic traits offering dependable accuracy ranging from 92% to 98.5%. The character-level n-gram tracing indicates a key feature of authorship attribution.


Alshammari, N., & Alanazi, S. (2021). The impact of using different annotation schemes on named entity recognition. Egyptian Informatics Journal, 22(3), 295–302.

Anwar, W., Bajwa, I. S., Choudhary, M. A., & Ramzan, S. (2019). An empirical study on forensic analysis of Urdu text using LDA-based authorship attribution. IEEE Access, 7, 3224–3234.

Aziz, E. A. (2021). A linguistic contribution for law and justice enforcement 1(1), 1–22.

Bacchini, S. (2016). “The routledge handbook of stylistics”. Reference Reviews, Vol. 30 No. 4, pp. 20-28.

Bailey, B. (2000). Qualitative methods in sociolinguistics. Journal of Linguistic Anthropology, 10(2), 285–286.

Baker, P. (2010). Sociolinguistics and corpus linguistics. Edinburg University Press.

Baker, P., Gabrielatos, C., Khosravinik, M., Krzyzanowski, M., McEnery, T., & Wodak, R. (2008). A useful methodological synergy? Combining critical discourse analysis and corpus linguistics to examine discourses of refugees and asylum seekers in the UK press. Discourse and Society, 19(3), 273–30.

Belvisi, N. M. S., Muhammad, N., & Alonso-Fernandez, F. (2020). Forensic authorship analysis of microblogging texts using n-grams and stylometric features. 2020 8th International Workshop on Biometrics and Forensics (IWBF), Portugal, 1–6,

Brennan, M., Afroz, S., & Greenstadt, R. (2012). Adversarial stylometry. ACM Transactions on Information and System Security, 15(3), 1–22.

Casillas, L., & Ramirez, A. (2019). Emotion mining mechanism over texts in social media. Research in Computing Science, 148(7), 227–240.

Chiang, E. (2021). Book Review: Language and online identities: The undercover policing of sexual crime by Tim Grant and Nicci MacLeod, 2020. Pp. x + 195. International Journal of Speech, Language and the Law, 28(1), 155–160.

Coulthard, M. (2004). Author identification, idiolect, and linguistic uniqueness. Applied Linguistics, 25(4), 431–447.

Coulthard, M. (2013). On admissible linguistic evidence. Journal of Law & Policy, 21(2), 441-446.

Eder, M., Rybicki, J., & Kestemont, M. (2016). Stylometry with R: A package for computational text analysis. R Journal, 8(1), 107–121.

Fobbe, E. (2020). Text-linguistic analysis in forensic authorship attribution. JLL, 9, 93–114.

Frye, R., & Wilson, D. C. (2018). Defining forensic authorship attribution for limited samples from social media. Proceedings of the 31st International Florida Artificial Intelligence Research Society Conference, FLAIRS 2018, 248–251.

Gorsuch, G. (2009). Book Review: An introduction to forensic linguistics: language in evidence by Malcolm Coulthard and Alison Johnson. London: Routledge, 2007. Pp. x + 237. Studies in Second Language Acquisition, 31(1), 130–131. doi:10.1017/S0272263109090093

Grant, T. (2007). Quantifying evidence in forensic authorship analysis. International Journal of Speech, Language and The Law, 14(1), 1–25.

Grant, T., & Baker, K. (2007). Identifying reliable, valid markers of authorship: A response to Chaski. International Journal of Speech Language and the Law, 8(1), 66–79.

hooverikeojuolamautner, J., Clarke, I., Chiang, E., Gideon, H., Heini, A., Nini, A., & Waibel, E. (2019). Attributing the Bixby Letter using n-gram tracing. Digital Scholarship in the Humanities, 34(3), 493–512.

Hoover, D. L. (2007). Corpus stylistics, stylometry, and the styles of Henry James. Style, 41(2), 174–203.

Ikeo, R. (2008). Book Review: An Introduction to Forensic Linguistics: Language in Evidence by Malcolm Coulthard and Alison Johnson, 2007. London: Routledge, pp. 237. ISBN 978 0 415 32023 8 (pbk). Language and Literature, 17(4), 377–379.

Ison, D. (2020). Detection of online contract cheating through stylometry: A pilot study. Online Learning, 24(2), 142–165.

Juola, P. (2007). Authorship attribution. Foundations and Trends® in Information Retrieval, 1(3), 233–334.

Mautner, G. (2009). Corpora and critical discourse analysis. In P. Baker (Ed.), Contemporary Corpus Linguistics (pp. 32–46). Bloomsbury.

McIntyre, D. (2015). Towards an integrated corpus stylistics. Topics in Linguistics, 16(1).

McMenamin, G. R. (2019). Forensic linguistics: Advances in forensic stylistics. CRC Press LLC.

Neal, T., Sundararajan, K., Fatima, A., Yan, Y., Xiang, Y., & Woodard, D. (2017). Surveying stylometry techniques and applications. ACM Computing Surveys, 50(6).

Neme, A., Pulido, J. R. G., Muñoz, A., Hernández, S., & Dey, T. (2015). Stylistics analysis and authorship attribution algorithms based on self-organizing maps. Neurocomputing, 147(1), 147–159.

Nini, A. (2018). An authorship analysis of the Jack the Ripper letters. Digital Scholarship in the Humanities, 33(3), 621–636.

Patodkar, V.N., & I.R, S. (2016). Twitter as a corpus for sentiment analysis and opinion mining. International Journal of Advanced Research in Computer and Communication Engineering, 5, 320–322..

Peng, J., Choo, K. K. R., & Ashman, H. (2016). Bit-level n-gram based forensic authorship analysis on social media: Identifying individuals from linguistic profiles. Journal of Network and Computer Applications, 70, 171–182.

Puspitasari, D. A. (2021). Tracing Word Trends on Social Media in 2012 and 2020 Through Corpus Linguistics. In J. Endardi (Ed.), Demi bahasa bermanfaat dan bermartabat: percikan pemikiran strategi kebahasaan dalam dinamika bahasa, pendidikan, dan budaya era kiwari (pp. 40–54). Deeppublish Publisher.

Puspitasari, D. A. (2022). Corpus-based speech act analysis on the use of word ‘lu’ in cyber bullying speech. Proceedings of the 1st Konferensi Internasional Berbahasa Indonesia Universitas Indraprasta PGRI, KIBAR 2020, Indonesia, 1–10.

Puspitasari, D. A., & Sukma, B. P. (2022). Potraying The Covid-19 hoaxes at the beginning of the pandemic through a corpus assisted discourse analysis. Ranah: Jurnal Kajian Bahasa, 11(2), 243.

Rebuschat, P., Meurers, D., & McEnery, T. (2017). Language learning research at the intersection of experimental, computational, and corpus-based approaches. Language Learning, 67(S1), 6–13.

Rheingold, H. (2000). The virtual community. The MIT Press.

Rifai, B. (2020). Pemanfaatan metode riset digital dalam pengembangan ekosistem penelitian dan inovasi. LIPI.

Snee, H. (2016). Digital methods for social science: An interdisciplinary guide to research innovation. Palgrave Macmillan London.

Takwin, B. (2020). Tantangan psikologi siber. Jurnal Psikologi Sosial, 18(1), 3–4.

Tarrayo, V. N. (2020). Wounds and words: A lexical and syntactic analysis of Casocot’s “There are other things beside brightness and light.” Indonesian Journal of Applied Linguistics, 10(2), 502–512.

Theophilo, A., Giot, R., & Rocha, A. (2021). Authorship Attribution of Social Media Messages. IEEE Transactions on Computational Social Systems, 10(1), 10–15.

Unik, M., & Larenda, V. G. (2019). Analisis investigasi android forensik short message service (SMS) pada smartphone. JOISIE (Journal Of Information Systems And Informatics Engineering), 3(1), 10–15.




How to Cite

Puspitasari, D. A., Fakhrurroja, H., & Sutrisno, A. (2024). AUTHORSHIP ANALYSIS IN ELECTRONIC TEXTS USING SIMILARITY COMPARISON METHOD. Linguistik Indonesia, 42(1), 91–112.