Skip to content

Latest commit

 

History

History
107 lines (68 loc) · 6.92 KB

MT4LRL.md

File metadata and controls

107 lines (68 loc) · 6.92 KB

Papers and Resources for Low Resource Machine Translation

There are a wide variety of techniques to employ when trying to create a new machine translation model for a low resource language or improve an existing baseline. The applicability of these techniques generally depend on the availability of parallel and monolingual corpora for the target language and the availability of parallel corpora for related languages/ domains.

Common scenarios

Scenario #1 - The data you have is super noisy (e.g., scraped from the web), and you aren't sure which sentence pairs are "good"

Papers:

Resources/ examples:

Scenario #2 - You don't have any parallel data for the source-target language pair, you only have monolingual target data

Papers:

Resources/ examples:

Scenario #3 - You only have a small amount of parallel data for the source-target language pair, but you have lots of parallel data for a related source-target language pair

Papers:

Resources/ examples:

Scenario #4 - You only have a small amount of parallel data for the source-target language pair, but you have lots of monolingual data for the target and/or source language

Papers:

Resources/ examples:

Scenario #5 - You have a small amount of parallel data for the source-target language pair, but you also have a lot of parallel data for other language pairs

Papers:

Resources/ examples:

Scenario #6 - You don't have any data for the source-target language pair, not even monolingual data, but you have a linguist or a speaker

Papers:

Resources / examples:

Miscellaneous other resources

General papers and resources about African languages or African language MT:

Data gathering, corpus creation:

Research languages, language families, populations, known language resources online, etc. via:

  • OLAC
  • Glottolog
  • Ethnologue (Free up to a certain amount of clicks, but many Universities have subscriptions, if you happen to be affiliated with one)