Nabu - Corpus and dictionary files for 2023

PARADISEC Catalog

Item details

Item ID	DGB1-corpus2023_dict (Collection Details)
Title	Corpus and dictionary files for 2023
Description	A compiled Matukar Panau corpus of 150,740 words produced from newly and previously collected data, including words in context, speaker metadata, file metadata and where available parsing and glossing and translations. A subset of this corpus is included in a separate file as a morpheme corpus with parsing and glossing of 20,359 morphemes. Most files have been standardized for spelling. The spelling standardization script package for ELAN was developed by Jake Farrell, AI Specialist at Appen, for the use by CoEDL researchers. A lexicon from ELAN in xml format is included. An annotation guideline for clause chains is included. Annotations are in tiers with the ELAN type "chain". \| workingLanguages: tpi \| subgenre: Corpus \| access: O \| accessDescription: The material is licensed under Creative Commons Licences with the licence CC BY-NC-ND Attribution-NonCommercial-NoDerivs, which means that others may download the material and share them with others as long as they credit the creators. Others cannot change the materials in any way or use the materials commercially. \| description: A compiled Matukar Panau corpus of 150,740 words produced from newly and previously collected data, including words in context, speaker metadata, file metadata and where available parsing and glossing and translations. A subset of this corpus is included in a separate file as a morpheme corpus with parsing and glossing of 20,359 morphemes. Most files have been standardized for spelling. The spelling standardization script package for ELAN was developed by Jake Farrell, AI Specialist at Appen, for the use by CoEDL researchers. A lexicon from ELAN in xml format is included. An annotation guideline for clause chains is included. Annotations are in tiers with the ELAN type "chain". \| status: Incoming \| involvement: unspecified \| planningType: unspecified \| socialContext: unspecified \| keyword: Lexicon, Corpus, Morphemes, Words \| topic: Aggregated Data
Origination date	2023-03-31
Origination date free form
Archive link	https://catalog.paradisec.org.au/repository/DGB1/corpus2023_dict
URL
Collector	Danielle Barth Find similar
Countries	Papua New Guinea - PG To view related information on a country, click its name
Language as given
Subject language(s)	Matukar - mjk To view related information on a language, click its name
Content language(s)	Matukar - mjk Tok Pisin - tpi To view related information on a language, click its name
Dialect
Region / village	Oceania

Originating university	Australian National University
Operator	Julia Colleen Miller
Data Categories
Data Types
Discourse type
Roles
DOI
Cite as	Danielle Barth (collector), 2023. Corpus and dictionary files for 2023. DGB1-corpus2023_dict at catalog.paradisec.org.au. http://catalog.paradisec.org.au/collections/DGB1/items/corpus2023_dict

Content Files (0)

Filename	Type	File size	Duration	File access
no files available

Show 10 Show 50 Show all 0

Collection Information

Collection ID	DGB1
Collection title	Matukar Panau Language Documentation
Description	Recordings collected during 2010-2020 for language documentation. Includes traditional stories, descriptions of traditional practices, family stories, songs, myths and procedural texts including narration of videos of typical village activities.
Countries	Papua New Guinea - PG To view related information on a country, click its name
Languages	English - eng Gedaged - gdd Matukar - mjk Aruamu - msy Manam - mva Takia - tbc Tok Pisin - tpi To view related information on a language, click its name

Access Information

Edit access
View/Download access
Data access conditions	Open (subject to agreeing to PDSC access conditions)
Data access narrative

Metadata

RO-Crate Metadata	Live (Public)

Comments

Must be logged in to comment

No comments found

Glossary | APIs