Call for papers for the NUSA special issue
"Linguistic studies using large annotated corpora"

Editors: Hiroki Nomoto and David Moeljadi


Corpora have been used widely in modern linguistic research. Two notable features in corpus development in recent years are a significant increase in size and various kinds of annotations. Billion-size corpora are not uncommon nowadays. Efforts have been made to enrich raw texts with linguistic information such as morphology, parts of speech, constituent structure, semantic dependency, information and discourse structural status and so on. However, these developments that took place primarily in the field of natural language processing have not been maximally utilised in linguistic research of languages in Nusantara.

This special issue of NUSA: Linguistic studies of languages in and around Indonesia is intended to encourage researchers to explore the available resources and share ways of using them to investigate old and new empirical and theoretical topics.

Important dates

15 March 10 May 2019Manuscript submission deadline
Mid May (Late June for artciles submitted after 15 Mar) 2019Notification of the editorial decision
1 August 2019Final manuscript deadline
September 2019Publication online

Examples of large annotated corpora

All manuscripts should explicitly state what resource(s) they use and how they utilise the annotations. In addition to the annotated corpora listed below, one can also build his/her own annotated corpus by annotating a raw corpus using a morphological dictionary (e.g. MALINDO Morph (Nomoto et al. 2018a)), a POS tagger (e.g. Morphind (Larasati et al. 2011), Rule-Based POS Tagger Bahasa Indonesia (Rashel et al. 2014)), an HPSG grammar (e.g. INDRA (Moeljadi et al. 2015)), etc.

Use of open resources is recommended to ensure the replicability of the findings and equality amongst researchers from different financial backgrounds.

For authors