This MLHub package provides a quick introduction to the pre-built Text Translation models provided through Azure's Cognitive Services. This service translates text between multiple languages, also identifying the source language. Many languages are supported. This package is part of the Azure on MLHub repository.
In addition to the demo command this package provides a collection of commands that turn the service into useful command line tools for translation and transliteration.
A free Azure subscription allowing up to 2,000,000 character translations per month is available from https://azure.microsoft.com/free/ as the F0 pricing tier. After subscribing visit https://ms.portal.azure.com and Create a resource under AI and Machine Learning called Text Translations. Once created you can access the web API subscription key and endpoint from the portal. This will be prompted for when running a command, and then saved to file to reduce the need for repeated authentication requests.
Please note that these Azure models, unlike the MLHub models in general, use closed source services which have no guarantee of ongoing availability and do not come with the freedom to modify and share.
Visit the github repository for more details: https://github.com/azure/aztranslate
The Python code is based on the Azure Text Translator Quick Start
- To install mlhub (Ubuntu 18.04 LTS)
$ pip3 install mlhub
- To install and configure the demo:
$ ml install aztranslate
$ ml configure aztranslate
In addition to the demo presented below, the aztranslate package provides a number of useful command line tools. Below we demonstrate a number of these. Most commands take text on the command line, piped through to the command, from a supplied file, or else through an interactive session.
supported
The supported command is useful in checking which languages are supported for translation.
$ ml supported aztranslate
af,ltr,Afrikaans,Afrikaans
ar,rtl,Arabic,العربية
bg,ltr,Bulgarian,Български
bn,ltr,Bangla,বাংলা
bs,ltr,Bosnian,bosanski (latinica)
ca,ltr,Catalan,Català
cs,ltr,Czech,Čeština
cy,ltr,Welsh,Welsh
da,ltr,Danish,Dansk
de,ltr,German,Deutsch
el,ltr,Greek,Ελληνικά
en,ltr,English,English
es,ltr,Spanish,Español
et,ltr,Estonian,Eesti
fa,rtl,Persian,Persian
fi,ltr,Finnish,Suomi
fil,ltr,Filipino,Filipino
fj,ltr,Fijian,Fijian
fr,ltr,French,Français
he,rtl,Hebrew,עברית
hi,ltr,Hindi,हिंदी
hr,ltr,Croatian,Hrvatski
ht,ltr,Haitian Creole,Haitian Creole
hu,ltr,Hungarian,Magyar
id,ltr,Indonesian,Indonesia
is,ltr,Icelandic,Íslenska
it,ltr,Italian,Italiano
ja,ltr,Japanese,日本語
ko,ltr,Korean,한국어
lt,ltr,Lithuanian,Lietuvių
lv,ltr,Latvian,Latviešu
mg,ltr,Malagasy,Malagasy
ms,ltr,Malay,Melayu
mt,ltr,Maltese,Il-Malti
mww,ltr,Hmong Daw,Hmong Daw
nb,ltr,Norwegian,Norsk
nl,ltr,Dutch,Nederlands
otq,ltr,Querétaro Otomi,Querétaro Otomi
pl,ltr,Polish,Polski
pt,ltr,Portuguese,Português
ro,ltr,Romanian,Română
ru,ltr,Russian,Русский
sk,ltr,Slovak,Slovenčina
sl,ltr,Slovenian,Slovenščina
sm,ltr,Samoan,Samoan
sr-Cyrl,ltr,Serbian (Cyrillic),srpski (ćirilica)
sr-Latn,ltr,Serbian (Latin),srpski (latinica)
sv,ltr,Swedish,Svenska
sw,ltr,Kiswahili,Kiswahili
ta,ltr,Tamil,தமிழ்
te,ltr,Telugu,తెలుగు
th,ltr,Thai,ไทย
tlh,ltr,Klingon,Klingon
to,ltr,Tongan,lea fakatonga
tr,ltr,Turkish,Türkçe
ty,ltr,Tahitian,Tahitian
uk,ltr,Ukrainian,Українська
ur,rtl,Urdu,اردو
vi,ltr,Vietnamese,Tiếng Việt
yua,ltr,Yucatec Maya,Yucatec Maya
yue,ltr,Cantonese (Traditional),粵語 (繁體中文)
zh-Hans,ltr,Chinese Simplified,简体中文
zh-Hant,ltr,Chinese Traditional,繁體中文
To check if a specific language is supported:
$ ml supported aztranslate fr
fr,ltr,French,Français
$ ml supported aztext ku
Use the --header command line option to list the header row which names the columns:
$ ml supported aztext --header fr
code,direction,name,native
fr,ltr,French,Français
The --transliterate option will identify the transliteration pairs available for each language.
$ ml supported aztranslate --transliterate
ar,Arabic,العربية,Arab-Latn Latn-Arab
bn,Bangla,বাংলা,Beng-Latn Latn-Beng
gu,Gujarati,ગુજરાતી,Gujr-Latn Latn-Gujr
he,Hebrew,עברית,Hebr-Latn Latn-Hebr
hi,Hindi,हिंदी,Deva-Latn Latn-Deva
ja,Japanese,日本語,Jpan-Latn Latn-Jpan
kn,Kannada,ಕನ್ನಡ,Knda-Latn Latn-Knda
ml,Malayalam,മലയാളം,Mlym-Latn Latn-Mlym
mr,Marathi,मराठी,Deva-Latn Latn-Deva
or,Oriya,Oriya,Orya-Latn Latn-Orya
pa,Punjabi,ਪੰਜਾਬੀ,Guru-Latn Latn-Guru
sr-Cyrl,Serbian (Cyrillic),srpski (ćirilica),Cyrl-Latn
sr-Latn,Serbian (Latin),srpski (latinica),Latn-Cyrl
ta,Tamil,தமிழ்,Taml-Latn Latn-Taml
te,Telugu,తెలుగు,Telu-Latn Latn-Telu
th,Thai,ไทย,Thai-Latn Latn-Thai
zh-Hans,Chinese Simplified,简体中文,Hans-Latn Hans-Hant Latn-Hans Latn-Hant
zh-Hant,Chinese Traditional,繁體中文,Hant-Latn Hant-Hans Latn-Hans Latn-Hant
06 Nov 17:38:23 gjw@yoga ~azure/aztranslate$ python3 supported.py --transliteration
ar,Arabic,العربية,Arab:Latn Latn:Arab
bn,Bangla,বাংলা,Beng:Latn Latn:Beng
gu,Gujarati,ગુજરાતી,Gujr:Latn Latn:Gujr
he,Hebrew,עברית,Hebr:Latn Latn:Hebr
hi,Hindi,हिंदी,Deva:Latn Latn:Deva
ja,Japanese,日本語,Jpan:Latn Latn:Jpan
kn,Kannada,ಕನ್ನಡ,Knda:Latn Latn:Knda
ml,Malayalam,മലയാളം,Mlym:Latn Latn:Mlym
mr,Marathi,मराठी,Deva:Latn Latn:Deva
or,Oriya,Oriya,Orya:Latn Latn:Orya
pa,Punjabi,ਪੰਜਾਬੀ,Guru:Latn Latn:Guru
sr-Cyrl,Serbian (Cyrillic),srpski (ćirilica),Cyrl:Latn
sr-Latn,Serbian (Latin),srpski (latinica),Latn:Cyrl
ta,Tamil,தமிழ்,Taml:Latn Latn:Taml
te,Telugu,తెలుగు,Telu:Latn Latn:Telu
th,Thai,ไทย,Thai:Latn Latn:Thai
zh-Hans,Chinese Simplified,简体中文,Hans:Latn Hans:Hant Latn:Hans Latn:Hant
zh-Hant,Chinese Traditional,繁體中文,Hant:Latn Hant:Hans Latn:Hans Latn:Hant
The 4 letter script names are reported paired in a from:to ordering.
detect
The detect command will identify the language of a provided text, the confidence of the detection, and whether translation and transliteration are supported for that language.
$ ml detect aztranslate उनकी कविता में प्रकृति के सौंदर्य और कोमलतम मानवीय भावनाओं का उत्कृष्ट चित्रण है.
hi,1.00,True,True
translate
The translate command takes a text to be translated and returns the identified language code, the certainty of that, the language code for the target translation, and the resulting translation.
$ ml translate aztranslate मुझे सबसे महत्वपूर्ण संदेश आज सुबह बताओ
hi,1.00,en,Tell me the most important message this morning
$ ml translate aztranslate उनकी कविता में प्रकृति के सौंदर्य और कोमलतम मानवीय भावनाओं का उत्कृष्ट चित्रण है.
hi,1.00,en,His poetry has excellent depictions of nature's beauty and the softest human emotions.
As a command line tool the text to be translated can be piped into the command:
$ echo मुझे सबसे महत्वपूर्ण संदेश आज सुबह बताओ | ml translate aztranslate
hi,1.00,en,Tell me the most important message this morning
If a file name is supplied then each line within the file is translated, line by line:
$ ml translate aztranslate thai.txt
th,1.00,en,Congee
th,1.00,en,Rice kan Chin
th,1.00,en,Pork Leg Rice
th,1.00,en,Rice Omelet
th,1.00,en,Fried rice with shrimp paste
th,1.00,en,Kao Mok Chicken
th,1.00,en,Beef Porridge
th,1.00,en,Chicken Rice
th,1.00,en,Crispy pork rice
th,1.00,en,Red Pork crispy Pork rice
Use the --keep command line option to retain the original text:
$ ml translate aztranslate --keep scratch/thai_menu.txt
th,1.00,en,โจ๊ก,Congee
th,1.00,en,ข้าวกั๊นจิ๊น,Rice kan Chin
th,1.00,en,ข้าวขาหมู,Pork Leg Rice
th,1.00,en,ข้าวไข่เจียว,Rice Omelet
th,1.00,en,ข้าวคลุกกะปิ,Fried rice with shrimp paste
th,1.00,en,ข้าวหมกไก่,Kao Mok Chicken
th,1.00,en,ข้าวหมกเนื้อ,Beef Porridge
th,1.00,en,ข้าวมันไก่,Chicken Rice
th,1.00,en,ข้าวหมูกรอบ,Crispy pork rice
th,1.00,en,ข้าวหมูกรอบหมูแดง,Red Pork crispy Pork rice
The --profanity command line option will replace any identified profanities in the translation with asterisks.
If no text is supplied on the command line nor through a pipe nor from a specified file then the program enters an interactive loop:
$ ml translate aztranslate
Enter text to be analysed. Quit with Empty or Ctrl-d.
> मुझे सबसे महत्वपूर्ण संदेश आज सुबह बताओ?
hi,1.00,en,Tell me the most important message this morning?
> ข้าวคลุกกะปิ
th,1.00,en,Fried rice with shrimp paste
> Di mana toko yang baik untuk membeli ponsel?
id,1.00,en,Where is a good store to buy mobile phones?
>
The default is to translate into English (en). Other languages can be chosen:
$ ml translate aztranslate --to=id मुझे सबसे महत्वपूर्ण संदेश आज सुबह बताओ
hi,1.00,id,Ceritakan pesan yang paling penting pagi ini
$ ml translate aztranslate --to=fr मुझे सबसे महत्वपूर्ण संदेश आज सुबह बताओ
hi,1.00,fr,Dites-moi le message le plus important ce matin
By default the translator will determine the source language. This can be overridden using --from=.
Different language translation engines have had different training experiences and thus have different capabilities. For example, Google translates the Indonesian Wah kayak artis Korea into Wow, like a Korean artist whilst Azure translates it as Wah Kayaking Korean artist. This can have an impact on downstream processing such as sentiment analysis, for example.
$ ml translate aztranslate Wah kayak artis Korea
id,1.0,en,Wah Kayaking Korean artist
$ ml translate aztranslate Wah kayak artis Korea | cut -d, -f4 | ml sentiment aztext
0.50
$ ml sentiment aztext Wow, like a Korean artist
0.97
transliterate
The transliterate command takes a text to be transliterated, for example into Latin characters, retaining the phonetics. This command is under development and currently only supports transliteration from Thai script to Latin script for illustrative purposes.
$ ml transliterate aztranslate คั่ว กลิ้ง แกง ยอด มะพร้าว อ่อน ใส่ ไก่
khua kling kaeng yot maphrao on sai kai
$ ml translate aztranslate คั่ว กลิ้ง แกง ยอด มะพร้าว อ่อน ใส่ ไก่
th,1.00,en,Roasted Coconut curry with chicken
Normally the LANGUAGE can be automatically determined and the first script language reported by the supported --transliterate command is the default FROM script. The default TO script is Latin. Command line options can be used to specify the LANGUAGE, the FROM script and the TO script if required. This is also useful to remove additional API calls (to determine the language and the default FROM script) for each query.
$ ml transliterate aztranslate -l th -f thai -t latn คั่ว กลิ้ง แกง ยอด มะพร้าว อ่อน ใส่ ไก่
th,thai,latn,khua kling kaeng yot maphrao on sai kai
$ ml demo aztranslate
======================
Azure Text Translation
======================
Welcome to a demo of the pre-built models for Text Translation provided
through Azure's Cognitive Services. This service translates text between
multiple languages.
The following file has been found and is assumed to contain an Azure Text
Translator subscription key. We will load the file and use this information.
/home/gjw/.mlhub/aztranslate/private.py
Press Enter to continue:
===================
Supported Languages
===================
These are the languages supported by the Azure Translator for translation.
af ltr Afrikaans Afrikaans
ar rtl Arabic العربية
bg ltr Bulgarian Български
bn ltr Bangla বাংলা
bs ltr Bosnian bosanski (latinica)
ca ltr Catalan Català
cs ltr Czech Čeština
cy ltr Welsh Welsh
da ltr Danish Dansk
de ltr German Deutsch
el ltr Greek Ελληνικά
en ltr English English
es ltr Spanish Español
et ltr Estonian Eesti
fa rtl Persian Persian
fi ltr Finnish Suomi
fil ltr Filipino Filipino
fj ltr Fijian Fijian
fr ltr French Français
Press Enter to continue:
he rtl Hebrew עברית
hi ltr Hindi हिंदी
hr ltr Croatian Hrvatski
ht ltr Haitian Creole Haitian Creole
hu ltr Hungarian Magyar
id ltr Indonesian Indonesia
is ltr Icelandic Íslenska
It ltr Italian Italiano
ja ltr Japanese 日本語
ko ltr Korean 한국어
lt ltr Lithuanian Lietuvių
lv ltr Latvian Latviešu
mg ltr Malagasy Malagasy
ms ltr Malay Melayu
mt ltr Maltese Il-Malti
mww ltr Hmong Daw Hmong Daw
nb ltr Norwegian Norsk
nl ltr Dutch Nederlands
otq ltr Querétaro Otomi Querétaro Otomi
pl ltr Polish Polski
Press Enter to continue:
pt ltr Portuguese Português
ro ltr Romanian Română
ru ltr Russian Русский
sk ltr Slovak Slovenčina
sl ltr Slovenian Slovenščina
sm ltr Samoan Samoan
sr-Cyrl ltr Serbian (Cyrillic) srpski (ćirilica)
sr-Latn ltr Serbian (Latin) srpski (latinica)
sv ltr Swedish Svenska
sw ltr Kiswahili Kiswahili
ta ltr Tamil தமிழ்
te ltr Telugu తెలుగు
th ltr Thai ไทย
tlh ltr Klingon Klingon
to ltr Tongan lea fakatonga
tr ltr Turkish Türkçe
ty ltr Tahitian Tahitian
uk ltr Ukrainian Українська
ur rtl Urdu اردو
vi ltr Vietnamese Tiếng Việt
Press Enter to continue:
yua ltr Yucatec Maya Yucatec Maya
yue ltr Cantonese (Traditional) 粵語 (繁體中文)
zh-Hans ltr Chinese Simplified 简体中文
zh-Hant ltr Chinese Traditional 繁體中文
That's 64 languages in total.
Press Enter to continue on to translations from English:
=============================
Text Translation from English
=============================
Below we demonstrate the translation of a variety of common phrases as we might
find when interacting with a voice command system.
Hi Tom, has my parcel arrived yet?
Where is a good shop to buy mobile phones?
Has Frederick replied to my email yet?
We are running late, please start without us.
Tell me the most important message this morning?
When is a good time to meet Susan and Dave?
The supplied text was detected as 'en' with a score of '1.0'.
Press Enter for a translation to German:
Hallo Tom, ist mein Paket schon angekommen?
Wo gibt es einen guten Laden, um Handys zu kaufen?
Hat Frederick meine E-Mail schon beantwortet?
Wir laufen spät, bitte starten wir ohne uns.
Sagen Sie mir heute Morgen die wichtigste Botschaft?
Wann ist ein guter Zeitpunkt, um Susan und Dave zu treffen?
Press Enter for a translation to Italian:
Ciao Tom, il mio pacco è arrivato ancora?
Dove è un buon negozio per comprare telefoni cellulari?
Frederick ha risposto alla mia email ancora?
Siamo in ritardo, per favore iniziate senza di noi.
Dimmi il messaggio più importante di stamattina?
Quando è il momento giusto per incontrare Susan e Dave?
Press Enter for a translation to Indonesian:
Hi Tom, telah paket saya tiba belum?
Di mana toko yang baik untuk membeli ponsel?
Apakah Frederick membalas email saya?
Kami berjalan terlambat, silakan mulai tanpa kami.
Ceritakan pesan yang paling penting pagi ini?
Kapan waktu yang baik untuk bertemu Susan dan Dave?
Press Enter for a translation to Hindi:
हाय टॉम, मेरे पार्सल अभी तक आ गया है?
मोबाइल फोन खरीदने के लिए एक अच्छी दुकान कहां है?
क्या Frederick मेरे ईमेल के लिए अभी तक जवाब दिया?
हम देर से चल रहे हैं, कृपया हमारे बिना शुरू करो ।
मुझे सबसे महत्वपूर्ण संदेश आज सुबह बताओ?
जब एक अच्छा समय Susan और डेव से मिलने के लिए है?
Press Enter to continue on to translations back to English:
===========================
Translation back to English
===========================
Below we translate each of the above translations back to English. Again the
source language is automatically identified.
Here's a reminder of the original English utterances:
Hi Tom, has my parcel arrived yet?
Where is a good shop to buy mobile phones?
Has Frederick replied to my email yet?
We are running late, please start without us.
Tell me the most important message this morning?
When is a good time to meet Susan and Dave?
Press Enter for the translation from German (language id score=0.98):
HI Tom, has my Package arrived yet?
Where is a good Store to buy Phones?
Has Frederick already answered my email?
We run late, please start without us.
Tell me the most important Message this Morning?
When is a good Time to meet Susan and Dave?
Press Enter for the translation from Italian (language id score=0.94):
Hello Tom, my parcel has arrived yet?
Where is a good store to buy cell phones?
Has Frederick responded to my email yet?
We'Re late, please start without us.
Tell me the most important message this morning?
When is the right time to meet Susan and Dave?
Press Enter for the translation from Indonesian (language id score=0.98):
Hi Tom, have my package arrived yet?
Where is a good store to buy a cell phone?
Did Frederick replied to my email?
We are running late, please start without us.
Tell me the most important message this morning?
When is a good time to meet Susan and Dave?
Press Enter for the translation from Hindi (language id score=0.97):
Hi Tom, has my parcel come yet?
Where is a good shop to buy mobile phones?
Has Frederick responded to my email yet?
We are running late, please start without us.
Tell me the most important message this morning?
When is a good time to meet Susan and Dave?
To use the model to translate user provided text:
$ ml translate aztranslate
We can interact with the model simply. Here we enter a few texts in different languages and have them translated into English. Note the variability of the competency of the translation. Translation from the Indonesian language is not as well developed as other languages!
$ ml translate aztranslate
================================
Azure Text Translation to English
=================================
The following file has been found and is assumed to contain an Azure Text
Translator subscription key. We will load the file and use this information.
/home/gjw/.mlhub/aztranslate/private.py
Enter a line of text in any language and we'll attempt to translate it to English.
Exit when no text supplied.
> सभी मनुष्यों को गौरव और अधिकारों के मामले में जन्मजात स्वतन्त्रता और समानता प्राप्त है। उन्हें
> बुद्धि और अन्तरात्मा की देन प्राप्त है और परस्पर उन्हें भाईचारे के भाव से बर्ताव करना चाहिये।
The text was identified as Hindi with 100% certainty:
English: All human beings have inherent freedom and equality in
terms of pride and rights. They have the wisdom and the conscience,
and they must behave in a spirit of brotherhood.
> C’est l’exception qui confirme la règle.
The text was identified as French with 100% certainty:
English: This is the exception that confirms the rule.
> Dimana ada kemauan, di situ ada jalan
The text was identified as Indonesian with 100% certainty:
English: Where there's a will, there is no way
>
To explore limitations of translations:
$ ml limits aztranslate
Douglas Hofstadter, a professor of cognitive science and comparative literature at Indiana University at Bloomington and author of the book Gödel, Escher, Bach, highlights in a January 2018 article in The Atlantic the limitations of automated language translation. To paraphrase, the translators do not have any deep understanding of the text but have developed a shallower mechanical process to do a decent job for simple communications.
Below we illustrate with one of Hofstadter's examples which you can replicate with the LIMITS command. See the original article for details:
https://www.theatlantic.com/technology/archive/2018/01/the-shallowness-of-google-translate/551570/
$ ml limits aztranslate
[...]
*** Consider this sample text:
In their house, everything comes in pairs. There's his car and her
car, his towels and her towels, and his library and hers.
*** The French translation is:
Dans leur maison, tout se passe par paires. Il y a sa voiture, sa
voiture, ses serviettes, ses serviettes, sa bibliothèque et la sienne.
*** Translating back to English demonstrates a shallow understanding:
In their House, everything happens in pairs. There's his car, his car,
his towels, his towels, his library and hers.
This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.microsoft.com.
When you submit a pull request, a CLA-bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., label, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.
This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact [email protected] with any additional questions or comments.
Microsoft and any contributors grant you a license to the Microsoft documentation and other content in this repository under the Creative Commons Attribution 4.0 International Public License, see the LICENSE file, and grant you a license to any code in the repository under the MIT License, see the LICENSE-CODE file.
Microsoft, Windows, Microsoft Azure and/or other Microsoft products and services referenced in the documentation may be either trademarks or registered trademarks of Microsoft in the United States and/or other countries. The licenses for this project do not grant you rights to use any Microsoft names, logos, or trademarks. Microsoft's general trademark guidelines can be found at http://go.microsoft.com/fwlink/?LinkID=254653.
Privacy information can be found at https://privacy.microsoft.com/en-us/
Microsoft and any contributors reserve all other rights, whether under their respective copyrights, patents, or trademarks, whether by implication, estoppel or otherwise.