AUTHORSHIP ANALYSIS IN ELECTRONIC TEXTS USING SIMILARITY COMPARISON METHOD
DOI:
https://doi.org/10.26499/li.v42i1.544Keywords:
authorship analysis, electronic texts, forensic stylistic, WhatsApp, Indonesian TextAbstract
The most recent changes to the criteria in legal process for scientific evidence have emphasized scientific methods of authorship analysis. This study examined the authorship of electronic texts using a quantitative method based on forensic stylistics and computer technologies. This study uses 300 digital texts produced by 100 authors, including 100 questioned texts (Q-text) and 200 known texts (K-text). Personal texts of WhatsApp messages are used in this study as electronic texts. Authorship analysis was conducted by tracing the n-gram and testing all the text sets using the Similarity Comparison Method (SCM). Based on the results of the word 1-gram test, the SCM accuracy was found to be quite high, ranging from 85% to 96%. The findings of employing the tiny set are promising, with the various stylistic traits offering dependable accuracy ranging from 92% to 98.5%. The character-level n-gram tracing indicates a key feature of authorship attribution.
References
Alshammari, N., & Alanazi, S. (2021). The impact of using different annotation schemes on named entity recognition. Egyptian Informatics Journal, 22(3), 295–302. https://doi.org/10.1016/j.eij.2020.10.004
Anwar, W., Bajwa, I. S., Choudhary, M. A., & Ramzan, S. (2019). An empirical study on forensic analysis of Urdu text using LDA-based authorship attribution. IEEE Access, 7, 3224–3234. https://doi.org/10.1109/ACCESS.2018.2885011
Aziz, E. A. (2021). A linguistic contribution for law and justice enforcement 1(1), 1–22. https://ojs.badanbahasa.kemdikbud.go.id/jurnal/index.php/jfk/index
Bacchini, S. (2016). “The routledge handbook of stylistics”. Reference Reviews, Vol. 30 No. 4, pp. 20-28. https://doi.org/10.1108/rr-03-2016-0074
Bailey, B. (2000). Qualitative methods in sociolinguistics. Journal of Linguistic Anthropology, 10(2), 285–286. https://doi.org/10.1525/jlin.2000.10.2.285
Baker, P. (2010). Sociolinguistics and corpus linguistics. Edinburg University Press.
Baker, P., Gabrielatos, C., Khosravinik, M., Krzyzanowski, M., McEnery, T., & Wodak, R. (2008). A useful methodological synergy? Combining critical discourse analysis and corpus linguistics to examine discourses of refugees and asylum seekers in the UK press. Discourse and Society, 19(3), 273–30. https://doi.org/10.1177/0957926508088962
Belvisi, N. M. S., Muhammad, N., & Alonso-Fernandez, F. (2020). Forensic authorship analysis of microblogging texts using n-grams and stylometric features. 2020 8th International Workshop on Biometrics and Forensics (IWBF), Portugal, 1–6, https://doi.org/10.1109/IWBF49977.2020.9107953.
Brennan, M., Afroz, S., & Greenstadt, R. (2012). Adversarial stylometry. ACM Transactions on Information and System Security, 15(3), 1–22. https://doi.org/10.1145/2382448.2382450
Casillas, L., & Ramirez, A. (2019). Emotion mining mechanism over texts in social media. Research in Computing Science, 148(7), 227–240. https://doi.org/10.13053/rcs-148-7-17
Chiang, E. (2021). Book Review: Language and online identities: The undercover policing of sexual crime by Tim Grant and Nicci MacLeod, 2020. Pp. x + 195. International Journal of Speech, Language and the Law, 28(1), 155–160. https://doi.org/10.1558/ijsll.20645
Coulthard, M. (2004). Author identification, idiolect, and linguistic uniqueness. Applied Linguistics, 25(4), 431–447. https://doi.org/10.1093/applin/25.4.431
Coulthard, M. (2013). On admissible linguistic evidence. Journal of Law & Policy, 21(2), 441-446. https://brooklynworks.brooklaw.edu/jlp
Eder, M., Rybicki, J., & Kestemont, M. (2016). Stylometry with R: A package for computational text analysis. R Journal, 8(1), 107–121. https://doi.org/10.32614/rj-2016-007
Fobbe, E. (2020). Text-linguistic analysis in forensic authorship attribution. JLL, 9, 93–114. https://doi.org/10.14762/jll.2020.093
Frye, R., & Wilson, D. C. (2018). Defining forensic authorship attribution for limited samples from social media. Proceedings of the 31st International Florida Artificial Intelligence Research Society Conference, FLAIRS 2018, 248–251.
Gorsuch, G. (2009). Book Review: An introduction to forensic linguistics: language in evidence by Malcolm Coulthard and Alison Johnson. London: Routledge, 2007. Pp. x + 237. Studies in Second Language Acquisition, 31(1), 130–131. doi:10.1017/S0272263109090093
Grant, T. (2007). Quantifying evidence in forensic authorship analysis. International Journal of Speech, Language and The Law, 14(1), 1–25. https://doi.org/10.1558/ijsll.v14i1.1
Grant, T., & Baker, K. (2007). Identifying reliable, valid markers of authorship: A response to Chaski. International Journal of Speech Language and the Law, 8(1), 66–79. https://doi.org/10.1558/ijsll.v8i1.66
hooverikeojuolamautner, J., Clarke, I., Chiang, E., Gideon, H., Heini, A., Nini, A., & Waibel, E. (2019). Attributing the Bixby Letter using n-gram tracing. Digital Scholarship in the Humanities, 34(3), 493–512. https://doi.org/10.1093/llc/fqy042
Hoover, D. L. (2007). Corpus stylistics, stylometry, and the styles of Henry James. Style, 41(2), 174–203. http://www.jstor.org/stable/10.5325/style.41.2.174
Ikeo, R. (2008). Book Review: An Introduction to Forensic Linguistics: Language in Evidence by Malcolm Coulthard and Alison Johnson, 2007. London: Routledge, pp. 237. ISBN 978 0 415 32023 8 (pbk). Language and Literature, 17(4), 377–379. https://doi.org/10.1177/09639470080170040505
Ison, D. (2020). Detection of online contract cheating through stylometry: A pilot study. Online Learning, 24(2), 142–165. https://doi.org/10.24059/olj.v24i2.2096
Juola, P. (2007). Authorship attribution. Foundations and Trends® in Information Retrieval, 1(3), 233–334. https://doi.org/10.1561/1500000005
Mautner, G. (2009). Corpora and critical discourse analysis. In P. Baker (Ed.), Contemporary Corpus Linguistics (pp. 32–46). Bloomsbury.
McIntyre, D. (2015). Towards an integrated corpus stylistics. Topics in Linguistics, 16(1). https://doi.org/10.2478/topling-2015-0011
McMenamin, G. R. (2019). Forensic linguistics: Advances in forensic stylistics. CRC Press LLC.
Neal, T., Sundararajan, K., Fatima, A., Yan, Y., Xiang, Y., & Woodard, D. (2017). Surveying stylometry techniques and applications. ACM Computing Surveys, 50(6). https://doi.org/10.1145/3132039
Neme, A., Pulido, J. R. G., Muñoz, A., Hernández, S., & Dey, T. (2015). Stylistics analysis and authorship attribution algorithms based on self-organizing maps. Neurocomputing, 147(1), 147–159. https://doi.org/10.1016/j.neucom.2014.03.064
Nini, A. (2018). An authorship analysis of the Jack the Ripper letters. Digital Scholarship in the Humanities, 33(3), 621–636. https://doi.org/10.1093/LLC/FQX065
Patodkar, V.N., & I.R, S. (2016). Twitter as a corpus for sentiment analysis and opinion mining. International Journal of Advanced Research in Computer and Communication Engineering, 5, 320–322.. https://doi.org/10.17148/ijarcce.2016.51274
Peng, J., Choo, K. K. R., & Ashman, H. (2016). Bit-level n-gram based forensic authorship analysis on social media: Identifying individuals from linguistic profiles. Journal of Network and Computer Applications, 70, 171–182. https://doi.org/10.1016/j.jnca.2016.04.001
Puspitasari, D. A. (2021). Tracing Word Trends on Social Media in 2012 and 2020 Through Corpus Linguistics. In J. Endardi (Ed.), Demi bahasa bermanfaat dan bermartabat: percikan pemikiran strategi kebahasaan dalam dinamika bahasa, pendidikan, dan budaya era kiwari (pp. 40–54). Deeppublish Publisher.
Puspitasari, D. A. (2022). Corpus-based speech act analysis on the use of word ‘lu’ in cyber bullying speech. Proceedings of the 1st Konferensi Internasional Berbahasa Indonesia Universitas Indraprasta PGRI, KIBAR 2020, Indonesia, 1–10. https://doi.org/10.4108/eai.28-10-2020.2315314
Puspitasari, D. A., & Sukma, B. P. (2022). Potraying The Covid-19 hoaxes at the beginning of the pandemic through a corpus assisted discourse analysis. Ranah: Jurnal Kajian Bahasa, 11(2), 243. https://doi.org/10.26499/rnh.v11i2.5152
Rebuschat, P., Meurers, D., & McEnery, T. (2017). Language learning research at the intersection of experimental, computational, and corpus-based approaches. Language Learning, 67(S1), 6–13. https://doi.org/10.1111/lang.12243
Rheingold, H. (2000). The virtual community. The MIT Press. https://doi.org/10.7551/mitpress/7105.001.0001
Rifai, B. (2020). Pemanfaatan metode riset digital dalam pengembangan ekosistem penelitian dan inovasi. LIPI.
Snee, H. (2016). Digital methods for social science: An interdisciplinary guide to research innovation. Palgrave Macmillan London.
Takwin, B. (2020). Tantangan psikologi siber. Jurnal Psikologi Sosial, 18(1), 3–4. https://doi.org/10.7454/jps.2020.02
Tarrayo, V. N. (2020). Wounds and words: A lexical and syntactic analysis of Casocot’s “There are other things beside brightness and light.” Indonesian Journal of Applied Linguistics, 10(2), 502–512. https://doi.org/10.17509/ijal.v10i2.28594
Theophilo, A., Giot, R., & Rocha, A. (2021). Authorship Attribution of Social Media Messages. IEEE Transactions on Computational Social Systems, 10(1), 10–15. https://doi.org/10.1109/tcss.2021.3123895
Unik, M., & Larenda, V. G. (2019). Analisis investigasi android forensik short message service (SMS) pada smartphone. JOISIE (Journal Of Information Systems And Informatics Engineering), 3(1), 10–15. https://doi.org/10.35145/joisie.v3i1.414
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2024 Linguistik Indonesia
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
The name and email address in this journal will only be used for the benefit of the Indonesian Linguistics journal and will not be used for other purposes.