Item details
Item ID
DGB1-corpus2023_dict
Title Corpus and dictionary files for 2023
Description A compiled Matukar Panau corpus of 150,740 words produced from newly and previously collected data, including words in context, speaker metadata, file metadata and where available parsing and glossing and translations. A subset of this corpus is included in a separate file as a morpheme corpus with parsing and glossing of 20,359 morphemes.

Most files have been standardized for spelling. The spelling standardization script package for ELAN was developed by Jake Farrell, AI Specialist at Appen, for the use by CoEDL researchers.

A lexicon from ELAN in xml format is included.

An annotation guideline for clause chains is included. Annotations are in tiers with the ELAN type "chain". | workingLanguages: tpi | subgenre: Corpus | access: O | accessDescription: The material is licensed under Creative Commons Licences with the licence CC BY-NC-ND Attribution-NonCommercial-NoDerivs, which means that others may download the material and share them with others as long as they credit the creators. Others cannot change the materials in any way or use the materials commercially. | description: A compiled Matukar Panau corpus of 150,740 words produced from newly and previously collected data, including words in context, speaker metadata, file metadata and where available parsing and glossing and translations. A subset of this corpus is included in a separate file as a morpheme corpus with parsing and glossing of 20,359 morphemes.

Most files have been standardized for spelling. The spelling standardization script package for ELAN was developed by Jake Farrell, AI Specialist at Appen, for the use by CoEDL researchers.

A lexicon from ELAN in xml format is included.

An annotation guideline for clause chains is included. Annotations are in tiers with the ELAN type "chain". | status: Incoming | involvement: unspecified | planningType: unspecified | socialContext: unspecified | keyword: Lexicon, Corpus, Morphemes, Words | topic: Aggregated Data
Origination date 2023-03-31
Origination date free form
Archive link https://catalog.paradisec.org.au/repository/DGB1/corpus2023_dict
URL
Collector
Danielle Barth
Countries To view related information on a country, click its name
Language as given
Subject language(s) To view related information on a language, click its name
Content language(s) To view related information on a language, click its name
Dialect
Region / village Oceania
Originating university Australian National University
Operator Julia Colleen Miller
Data Categories
Data Types
Discourse type
Roles
DOI
Cite as Danielle Barth (collector), 2023. Corpus and dictionary files for 2023. DGB1-corpus2023_dict at catalog.paradisec.org.au. http://catalog.paradisec.org.au/collections/DGB1/items/corpus2023_dict
Content Files (0)
Filename Type File size Duration File access
no files available

Show 10 Show 50 Show all 0

Collection Information
Collection ID DGB1
Collection title Matukar Panau Language Documentation
Description Recordings collected during 2010-2020 for language documentation. Includes traditional stories, descriptions of traditional practices, family stories, songs, myths and procedural texts including narration of videos of typical village activities.
Countries To view related information on a country, click its name
Languages To view related information on a language, click its name
Access Information
Edit access
View/Download access
Data access conditions Open (subject to agreeing to PDSC access conditions)
Data access narrative
Metadata
RO-Crate Metadata
Comments

Must be logged in to comment


No comments found