Skip to content

Chinese only. You can train an acoustic model(CER about 30%) on chinese open data sets , and do keyword spotting using this tool. The principle is to convert Chinese characters into Pinyin and match the target keywords.

Notifications You must be signed in to change notification settings

azuredream/Keyword-Spotting-Based-on-Low-Quality-Acoustic-Model-Output

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 

Repository files navigation

Keyword-Spotting-Based-on-Low-Quality-Acoustic-Model-Output

Chinese only. Do you want to implement a Chinese speech keyword spotting program like me, but only have a small number of Chinese open data sets which is not enough to generate a competitive end-to-end speech recognition/keyword spotting program (CER about 30%)? Try this tool!

The principle is to convert Chinese characters into Pinyin and match the target keywords.

Input: output sting of your recognition/spotting program. Output: "keyword1;keyword1;keyword2;keyword4....."

I strongly recommend removing the language model of your speech recognition program so that the string can convey the row speech more faithfully. After all, if your acoustic model is only 70 percent accurate, Kenlm can't help you much more than distorting row pronunciation.

Using

demo keyword: 你好;再见;没错; ###code import spotting ss = keywordSpoter(["keyword1","keyword2","keyword3"],simithreashold = 78) ss.getWord(inputstr) ###demo $ python spotting.py ##parameters simithreshold: 78 default. If the Similarity of Pinyin string of the input and keyword is greater than simithreshold, the word will be corrected to the target word and added to output.

About

Chinese only. You can train an acoustic model(CER about 30%) on chinese open data sets , and do keyword spotting using this tool. The principle is to convert Chinese characters into Pinyin and match the target keywords.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages