The Oxford-NINJAL Corpus of Old Japanese (abbreviated ONCOJ) is a long-term collaborative research project between the University of Oxford and the National Institute for Japanese Language and Linguistics, which is developing a lemmatized, parsed and comprehensively annotated digital corpus of all texts in Japanese from the Old Japanese period.
Old Japanese is the earliest attested stage of the Japanese language (mainly the 8th century AD). The texts from the period are mainly poetry. The ONCOJ is an ongoing, long-term collaborative research project between the Research Centre for Japanese Language and Linguistics in the University of Oxford, and the National Institute for Japanese Language and Linguistics, Tokyo.
The ONCOJ contains the texts in original script and in a phonemic transcription. It is lemmatized and has annotation for mode of writing (phonographic or logographic), morphology, constituency, and grammatical function. This release presents the poetic texts from the period, approximately 90,000 words of text.
The corpus is searchable through a suite of online search facilities and both the full data in the corpus and individual search results are downloadable for offline use. The data is primarily presented in a Penn Historical style bracketed tree format, but will also soon be available in a TEI convertible xml format.
The corpus is available through this website: http://oncoj.ninjal.ac.jp/