原文传递 Provenance and Processing of an Inuktitut-English Parallel Corpus Part 1: Inuktitut Data Preparation and Factored Data Format.
题名: Provenance and Processing of an Inuktitut-English Parallel Corpus Part 1: Inuktitut Data Preparation and Factored Data Format.
作者: Micher, J. C.
关键词: Machine translation, Linguistics, Morphology (linguistics), Data processing, Translations, Computational linguistics, Analyzers, Polysynthetic languages, Parliamentary proceedings, Inuit languages, Inuktitut
摘要: We describe the Nunavut Hansard, a parallel English-Inuktitut corpus derived from Nunavut legislative proceedings, and we describe the processing that was carried out to prepare the data for use in morphological analysis and downstream machine translation experiments. We provide all of the scripts and code used to process the data.
报告类型: 科技报告
检索历史
应用推荐