Call for papers for the NUSA special issue
"Linguistic studies using large annotated corpora"
Background
Corpora have been used widely in modern linguistic research. Two notable features in corpus development in recent years are a significant increase in size and various kinds of annotations. Billion-size corpora are not uncommon nowadays. Efforts have been made to enrich raw texts with linguistic information such as morphology, parts of speech, constituent structure, semantic dependency, information and discourse structural status and so on. However, these developments that took place primarily in the field of natural language processing have not been maximally utilised in linguistic research of languages in Nusantara.
This special issue of NUSA: Linguistic studies of languages in and around Indonesia is intended to encourage researchers to explore the available resources and share ways of using them to investigate old and new empirical and theoretical topics.
Important dates
15 March 10 May 2019 | Manuscript submission deadline |
Mid May (Late June for artciles submitted after 15 Mar) 2019 | Notification of the editorial decision |
1 August 2019 | Final manuscript deadline |
September 2019 | Publication online |
Examples of large annotated corpora
All manuscripts should explicitly state what resource(s) they use and how they utilise the annotations. In addition to the annotated corpora listed below, one can also build his/her own annotated corpus by annotating a raw corpus using a morphological dictionary (e.g. MALINDO Morph (Nomoto et al. 2018a)), a POS tagger (e.g. Morphind (Larasati et al. 2011), Rule-Based POS Tagger Bahasa Indonesia (Rashel et al. 2014)), an HPSG grammar (e.g. INDRA (Moeljadi et al. 2015)), etc.
Use of open resources is recommended to ensure the replicability of the findings and equality amongst researchers from different financial backgrounds.
For authors
- For style files (LaTeX and Microsoft Word) and enquiries, please contact Hiroki Nomoto (nomoto 〈ΑΤ〉 tufs.ac.jp) or David Moeljadi (davidmoeljadi 〈ΑΤ〉 gmail.com).
- We may be able to provide English language proofreading for selected authors of the accepted papers who are not native English speakers and are not affiliated with institutions whose main language of instruction/administration is English.
References
- Goldhahn, Dirk, Thomas Eckart & Uwe Quasthoff. 2012. Building large monolingual dictionaries at the Leipzig Corpora Collection: From 100 to 200 languages. In Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12).
- Kwary, Deny A. 2018. Towards the first online Indonesian National Corpus. The 4th Asia Pacific Corpus Linguistics Conference (APCLC 2018).
- Larasati, Septina Dian, Vladislav KuboĊ & Daniel Zeman. 2011. Indonesian Morphology Tool (MorphInd): Towards an Indonesian Corpus. In Cerstin Mahlow et al. (eds.) Systems and Frameworks for Computational Morphology, 119-129. Verlag: Springer.
- Moeljadi, David, Francis Bond & Sanghoun Song. 2015. Building an HPSG-based Indonesian Resource Grammar (INDRA). In Proceedings of the Grammar Engineering Across Frameworks (GEAF) Workshop, 53rd Annual Meeting of the ACL and 7th IJCNLP, 9-16.
- Nomoto, Hiroki, Hannah Choi, David Moeljadi & Francis Bond. 2018a. MALINDO Morph: Morphological dictionary and analyser for Malay/Indonesian. In Kiyoaki Shirai (ed.) Proceedings of the LREC 2018 Workshop "The 13th Workshop on Asian Language Resources", 36-43.
- Nomoto, Hiroki, Shiro Akasegawa & Asako Shiohara. 2018b. Building an open online concordancer for Malay/Indonesian. The 22nd International Symposium on Malay/Indonesian Linguistics (ISMIL). [slides]
- Nomoto, Hiroki, Shiro Akasegawa & Asako Shiohara. to appear. Reclassification of the Leipzig Corpora Collection for Malay and Indonesian. NUSA 65.
- Rashel, Fam, Andry Luthfi, Arawinda Dinakaramani & Ruli Manurung. 2014. Building an Indonesian rule-based part-of-speech tagger. In International Conference on Asian Language Processing (IALP2014). IEEE.