Description |
A compiled Matukar Panau corpus of 150,740 words produced from newly and previously collected data, including words in context, speaker metadata, file metadata and where available parsing and glossing and translations. A subset of this corpus is included in a separate file as a morpheme corpus with parsing and glossing of 20,359 morphemes.
Most files have been standardized for spelling. The spelling standardization script package for ELAN was developed by Jake Farrell, AI Specialist at Appen, for the use by CoEDL researchers.
A lexicon from ELAN in xml format is included.
An annotation guideline for clause chains is included. Annotations are in tiers with the ELAN type "chain". | workingLanguages: tpi | subgenre: Corpus | access: O | accessDescription: The material is licensed under Creative Commons Licences with the licence CC BY-NC-ND Attribution-NonCommercial-NoDerivs, which means that others may download the material and share them with others as long as they credit the creators. Others cannot change the materials in any way or use the materials commercially. | description: A compiled Matukar Panau corpus of 150,740 words produced from newly and previously collected data, including words in context, speaker metadata, file metadata and where available parsing and glossing and translations. A subset of this corpus is included in a separate file as a morpheme corpus with parsing and glossing of 20,359 morphemes.
Most files have been standardized for spelling. The spelling standardization script package for ELAN was developed by Jake Farrell, AI Specialist at Appen, for the use by CoEDL researchers.
A lexicon from ELAN in xml format is included.
An annotation guideline for clause chains is included. Annotations are in tiers with the ELAN type "chain". | status: Incoming | involvement: unspecified | planningType: unspecified | socialContext: unspecified | keyword: Lexicon, Corpus, Morphemes, Words | topic: Aggregated Data |