原文传递 Addressing Challenges of Machine Translation of Inuit Languages.
题名: Addressing Challenges of Machine Translation of Inuit Languages.
作者: Micher, J. C.
关键词: Machine translation, Computational linguistics, Artificial neural networks, Natural language processing software, Inuktitut language, Polysynthetic languages, Morphological analysis, Morphemes, Inuit languages
摘要: Machine translation to and from polysynthetic languages, such as those of the Inuit language family, has largely been overlooked as their complex morphology has been a barrier to research in computational methodologies. Polysynthetic languages pack abundant semantic and grammatical information into single words, thus the data sets are inherently extremely sparse, making them challenging computationally using typical word-based analysis. Here, we focus on Inuktitut, a polysynthetic language spoken in Canada, one of the official languages of the Nunavut territory, used in all its governmental and educational documentation. We discuss Inuktitut, highlighting its polysynthetic typology, word formation, grammatical complexity, morphophonemics, spelling, and dialect variation, and review how this complexity presents challenges for machine translation and morphological processing. We consider the following: improving the performance of an finite-state transducer morphological analyzer using various neural network approaches; using alternate subword units with a neural network architecture to improve over a baseline English-Inuktitut statistical machine translation system and determining what subword unit yields the most improvement; using a pipelined English-Inuktitut translation system, featuring deep-representation morpheme sequences converted to surface forms, to compete with the best subword system; and using hierarchical structures over morphemes in a novel approach to improve over the best subword system.
报告类型: 科技报告
检索历史
应用推荐