AUTHORSHIP ANALYSIS IN ELECTRONIC TEXTS USING SIMILARITY COMPARISON METHOD

Authors

DOI:

https://doi.org/10.26499/li.v42i1.544

Keywords:

authorship analysis, electronic texts, forensic stylistic, WhatsApp, Indonesian Text

Abstract

The most recent changes to the criteria in legal process for scientific evidence have emphasized scientific methods of authorship analysis. This study examined the authorship of electronic texts using a quantitative method based on forensic stylistics and computer technologies. This study uses 300 digital texts produced by 100 authors, including 100 questioned texts (Q-text) and 200 known texts (K-text). Personal texts of WhatsApp messages are used in this study as electronic texts. Authorship analysis was conducted by tracing the n-gram and testing all the text sets using the Similarity Comparison Method (SCM). Based on the results of the word 1-gram test, the SCM accuracy was found to be quite high, ranging from 85% to 96%. The findings of employing the tiny set are promising, with the various stylistic traits offering dependable accuracy ranging from 92% to 98.5%. The character-level n-gram tracing indicates a key feature of authorship attribution.

References

Alshammari, N., & Alanazi, S. (2021). The impact of using different annotation schemes on named entity recognition. Egyptian Informatics Journal, 22(3), 295–302. https://doi.org/10.1016/j.eij.2020.10.004

Anwar, W., Bajwa, I. S., Choudhary, M. A., & Ramzan, S. (2019). An empirical study on forensic analysis of Urdu text using LDA-based authorship attribution. IEEE Access, 7, 3224–3234. https://doi.org/10.1109/ACCESS.2018.2885011

Aziz, E. A. (2021). A linguistic contribution for law and justice enforcement 1(1), 1–22. https://ojs.badanbahasa.kemdikbud.go.id/jurnal/index.php/jfk/index

Bacchini, S. (2016). “The routledge handbook of stylistics”. Reference Reviews, Vol. 30 No. 4, pp. 20-28. https://doi.org/10.1108/rr-03-2016-0074

Bailey, B. (2000). Qualitative methods in sociolinguistics. Journal of Linguistic Anthropology, 10(2), 285–286. https://doi.org/10.1525/jlin.2000.10.2.285

Baker, P. (2010). Sociolinguistics and corpus linguistics. Edinburg University Press.

Baker, P., Gabrielatos, C., Khosravinik, M., Krzyzanowski, M., McEnery, T., & Wodak, R. (2008). A useful methodological synergy? Combining critical discourse analysis and corpus linguistics to examine discourses of refugees and asylum seekers in the UK press. Discourse and Society, 19(3), 273–30. https://doi.org/10.1177/0957926508088962

Belvisi, N. M. S., Muhammad, N., & Alonso-Fernandez, F. (2020). Forensic authorship analysis of microblogging texts using n-grams and stylometric features. 2020 8th International Workshop on Biometrics and Forensics (IWBF), Portugal, 1–6, https://doi.org/10.1109/IWBF49977.2020.9107953.

Brennan, M., Afroz, S., & Greenstadt, R. (2012). Adversarial stylometry. ACM Transactions on Information and System Security, 15(3), 1–22. https://doi.org/10.1145/2382448.2382450

Casillas, L., & Ramirez, A. (2019). Emotion mining mechanism over texts in social media. Research in Computing Science, 148(7), 227–240. https://doi.org/10.13053/rcs-148-7-17

Chiang, E. (2021). Book Review: Language and online identities: The undercover policing of sexual crime by Tim Grant and Nicci MacLeod, 2020. Pp. x + 195. International Journal of Speech, Language and the Law, 28(1), 155–160. https://doi.org/10.1558/ijsll.20645

Coulthard, M. (2004). Author identification, idiolect, and linguistic uniqueness. Applied Linguistics, 25(4), 431–447. https://doi.org/10.1093/applin/25.4.431

Coulthard, M. (2013). On admissible linguistic evidence. Journal of Law & Policy, 21(2), 441-446. https://brooklynworks.brooklaw.edu/jlp

Eder, M., Rybicki, J., & Kestemont, M. (2016). Stylometry with R: A package for computational text analysis. R Journal, 8(1), 107–121. https://doi.org/10.32614/rj-2016-007

Fobbe, E. (2020). Text-linguistic analysis in forensic authorship attribution. JLL, 9, 93–114. https://doi.org/10.14762/jll.2020.093

Frye, R., & Wilson, D. C. (2018). Defining forensic authorship attribution for limited samples from social media. Proceedings of the 31st International Florida Artificial Intelligence Research Society Conference, FLAIRS 2018, 248–251.

Gorsuch, G. (2009). Book Review: An introduction to forensic linguistics: language in evidence by Malcolm Coulthard and Alison Johnson. London: Routledge, 2007. Pp. x + 237. Studies in Second Language Acquisition, 31(1), 130–131. doi:10.1017/S0272263109090093

Grant, T. (2007). Quantifying evidence in forensic authorship analysis. International Journal of Speech, Language and The Law, 14(1), 1–25. https://doi.org/10.1558/ijsll.v14i1.1

Grant, T., & Baker, K. (2007). Identifying reliable, valid markers of authorship: A response to Chaski. International Journal of Speech Language and the Law, 8(1), 66–79. https://doi.org/10.1558/ijsll.v8i1.66

hooverikeojuolamautner, J., Clarke, I., Chiang, E., Gideon, H., Heini, A., Nini, A., & Waibel, E. (2019). Attributing the Bixby Letter using n-gram tracing. Digital Scholarship in the Humanities, 34(3), 493–512. https://doi.org/10.1093/llc/fqy042

Hoover, D. L. (2007). Corpus stylistics, stylometry, and the styles of Henry James. Style, 41(2), 174–203. http://www.jstor.org/stable/10.5325/style.41.2.174

Ikeo, R. (2008). Book Review: An Introduction to Forensic Linguistics: Language in Evidence by Malcolm Coulthard and Alison Johnson, 2007. London: Routledge, pp. 237. ISBN 978 0 415 32023 8 (pbk). Language and Literature, 17(4), 377–379. https://doi.org/10.1177/09639470080170040505

Ison, D. (2020). Detection of online contract cheating through stylometry: A pilot study. Online Learning, 24(2), 142–165. https://doi.org/10.24059/olj.v24i2.2096

Juola, P. (2007). Authorship attribution. Foundations and Trends® in Information Retrieval, 1(3), 233–334. https://doi.org/10.1561/1500000005

Mautner, G. (2009). Corpora and critical discourse analysis. In P. Baker (Ed.), Contemporary Corpus Linguistics (pp. 32–46). Bloomsbury.

McIntyre, D. (2015). Towards an integrated corpus stylistics. Topics in Linguistics, 16(1). https://doi.org/10.2478/topling-2015-0011

McMenamin, G. R. (2019). Forensic linguistics: Advances in forensic stylistics. CRC Press LLC.

Neal, T., Sundararajan, K., Fatima, A., Yan, Y., Xiang, Y., & Woodard, D. (2017). Surveying stylometry techniques and applications. ACM Computing Surveys, 50(6). https://doi.org/10.1145/3132039

Neme, A., Pulido, J. R. G., Muñoz, A., Hernández, S., & Dey, T. (2015). Stylistics analysis and authorship attribution algorithms based on self-organizing maps. Neurocomputing, 147(1), 147–159. https://doi.org/10.1016/j.neucom.2014.03.064

Nini, A. (2018). An authorship analysis of the Jack the Ripper letters. Digital Scholarship in the Humanities, 33(3), 621–636. https://doi.org/10.1093/LLC/FQX065

Patodkar, V.N., & I.R, S. (2016). Twitter as a corpus for sentiment analysis and opinion mining. International Journal of Advanced Research in Computer and Communication Engineering, 5, 320–322.. https://doi.org/10.17148/ijarcce.2016.51274

Peng, J., Choo, K. K. R., & Ashman, H. (2016). Bit-level n-gram based forensic authorship analysis on social media: Identifying individuals from linguistic profiles. Journal of Network and Computer Applications, 70, 171–182. https://doi.org/10.1016/j.jnca.2016.04.001

Puspitasari, D. A. (2021). Tracing Word Trends on Social Media in 2012 and 2020 Through Corpus Linguistics. In J. Endardi (Ed.), Demi bahasa bermanfaat dan bermartabat: percikan pemikiran strategi kebahasaan dalam dinamika bahasa, pendidikan, dan budaya era kiwari (pp. 40–54). Deeppublish Publisher.

Puspitasari, D. A. (2022). Corpus-based speech act analysis on the use of word ‘lu’ in cyber bullying speech. Proceedings of the 1st Konferensi Internasional Berbahasa Indonesia Universitas Indraprasta PGRI, KIBAR 2020, Indonesia, 1–10. https://doi.org/10.4108/eai.28-10-2020.2315314

Puspitasari, D. A., & Sukma, B. P. (2022). Potraying The Covid-19 hoaxes at the beginning of the pandemic through a corpus assisted discourse analysis. Ranah: Jurnal Kajian Bahasa, 11(2), 243. https://doi.org/10.26499/rnh.v11i2.5152

Rebuschat, P., Meurers, D., & McEnery, T. (2017). Language learning research at the intersection of experimental, computational, and corpus-based approaches. Language Learning, 67(S1), 6–13. https://doi.org/10.1111/lang.12243

Rheingold, H. (2000). The virtual community. The MIT Press. https://doi.org/10.7551/mitpress/7105.001.0001

Rifai, B. (2020). Pemanfaatan metode riset digital dalam pengembangan ekosistem penelitian dan inovasi. LIPI.

Snee, H. (2016). Digital methods for social science: An interdisciplinary guide to research innovation. Palgrave Macmillan London.

Takwin, B. (2020). Tantangan psikologi siber. Jurnal Psikologi Sosial, 18(1), 3–4. https://doi.org/10.7454/jps.2020.02

Tarrayo, V. N. (2020). Wounds and words: A lexical and syntactic analysis of Casocot’s “There are other things beside brightness and light.” Indonesian Journal of Applied Linguistics, 10(2), 502–512. https://doi.org/10.17509/ijal.v10i2.28594

Theophilo, A., Giot, R., & Rocha, A. (2021). Authorship Attribution of Social Media Messages. IEEE Transactions on Computational Social Systems, 10(1), 10–15. https://doi.org/10.1109/tcss.2021.3123895

Unik, M., & Larenda, V. G. (2019). Analisis investigasi android forensik short message service (SMS) pada smartphone. JOISIE (Journal Of Information Systems And Informatics Engineering), 3(1), 10–15. https://doi.org/10.35145/joisie.v3i1.414

Downloads

Published

31-01-2024

How to Cite

Puspitasari, D. A., Fakhrurroja, H., & Sutrisno, A. (2024). AUTHORSHIP ANALYSIS IN ELECTRONIC TEXTS USING SIMILARITY COMPARISON METHOD. Linguistik Indonesia, 42(1), 91–112. https://doi.org/10.26499/li.v42i1.544