Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

汉字转拼音时,避免拼音被拆分为多个token不生效 #301

Open
idawwei opened this issue Jul 19, 2024 · 0 comments
Open

汉字转拼音时,避免拼音被拆分为多个token不生效 #301

idawwei opened this issue Jul 19, 2024 · 0 comments

Comments

@idawwei
Copy link

idawwei commented Jul 19, 2024

Description

测试123EDF,避免拼音拆分多个token,期望效果“ceshi123EDF”

A description of what the bug is.
出现问题:数字被拆分,EDF被拆分,拆分成ce,shi

Steps to reproduce

索引设置:
PUT /my_index
{
"settings": {
"analysis": {
"analyzer": {
"pinyin_analyzer": {
"tokenizer": "my_pinyin_tokenizer"
}
},
"tokenizer": {
"my_pinyin_tokenizer": {
"type": "pinyin",
"keep_first_letter": false,
"keep_separate_first_letter": false,
"keep_full_pinyin": true,
"limit_first_letter_length": 16,
"lowercase": true,
"none_chinese_pinyin_tokenize": true
}
}
}
}
}

分词测试:
GET /my_index/_analyze
{
"analyzer": "pinyin_analyzer",
"text": "理财123EDF"
}

Environment

  • Versions: [e.g. Elasticsearch 7.16.2]
  • analysis-pinyin 7.16.2
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant