diff --git a/README.md b/README.md index 85c2a7bf..35e5ba11 100644 --- a/README.md +++ b/README.md @@ -19,12 +19,12 @@ Play ChatGPT and other LLM with Xiaomi AI Speaker - [通义千问](https://help.aliyun.com/zh/dashscope/developer-reference/api-details) ## 获取小米音响DID -系统和Shell|Linux *sh|Windows CMD用户|Windows PowerShell用户 --|-|-|- -1、安装包|`pip install miservice_fork`|`pip install miservice_fork`|`pip install miservice_fork` -2、设置变量|`export MI_USER=xxx`
`export MI_PASS=xxx`|`set MI_USER=xxx`
`set MI_PASS=xxx`|`$env:MI_USER="xxx"`
`$env:MI_PASS="xxx"` -3、取得MI_DID|`micli list` |`micli list` |`micli list` -4、设置MI_DID|`export MI_DID=xxx`| `set MI_DID=xxx`| `$env:MI_DID="xxx"` +| 系统和Shell | Linux *sh | Windows CMD用户 | Windows PowerShell用户 | +| ------------- | ---------------------------------------------- | -------------------------------------- | ---------------------------------------------- | +| 1、安装包 | `pip install miservice_fork` | `pip install miservice_fork` | `pip install miservice_fork` | +| 2、设置变量 | `export MI_USER=xxx`
`export MI_PASS=xxx` | `set MI_USER=xxx`
`set MI_PASS=xxx` | `$env:MI_USER="xxx"`
`$env:MI_PASS="xxx"` | +| 3、取得MI_DID | `micli list` | `micli list` | `micli list` | +| 4、设置MI_DID | `export MI_DID=xxx` | `set MI_DID=xxx` | `$env:MI_DID="xxx"` | - 注意不同shell 对环境变量的处理是不同的,尤其是powershell赋值时,可能需要双引号来包括值。 - 如果获取did报错时,请更换一下无线网络,有很大概率解决问题。 @@ -146,42 +146,36 @@ ChatGLM [文档](http://open.bigmodel.cn/doc/api#chatglm_130b) ## 配置项说明 -| 参数 | 说明 | 默认值 | 可选值 | -| ------------------------ | ------------------------------------------------------------------------------------------- | --------------------------------------------------------------------------------------------------------- | -------------------------------------------------------------------------------------------------------------------------- | -| hardware | 设备型号 | | | -| account | 小爱账户 | | | -| password | 小爱账户密码 | | | -| openai_key | openai的apikey | | | -| serpapi_api_key | serpapi的key 参考 [SerpAPI](https://serpapi.com/) | | | -| glm_key | chatglm 的 apikey | | | -| gemini_key | gemini 的 apikey [参考](https://makersuite.google.com/app/apikey) | | | -| qwen_key | qwen 的 apikey [参考](https://help.aliyun.com/zh/dashscope/developer-reference/api-details) | | | -| cookie | 小爱账户cookie (如果用上面密码登录可以不填) | | | -| mi_did | 设备did | | | -| use_command | 使用 MI command 与小爱交互 | `false` | | -| mute_xiaoai | 快速停掉小爱自己的回答 | `true` | | -| verbose | 是否打印详细日志 | `false` | | -| bot | 使用的 bot 类型,目前支持 chatgptapi,newbing, qwen, gemini | `chatgptapi` | | -| tts | 使用的 TTS 类型 | `mi` | `edge`、 `openai`、`azure`、`volc` | -| tts_voice | TTS 的嗓音 | `zh-CN-XiaoxiaoNeural`(edge), `alloy`(openai), `zh-CN-XiaoxiaoMultilingualNeural`(azure) | | -| prompt | 自定义prompt | `请用100字以内回答` | | -| keyword | 自定义请求词列表 | `["请"]` | | -| change_prompt_keyword | 更改提示词触发列表 | `["更改提示词"]` | | -| start_conversation | 开始持续对话关键词 | `开始持续对话` | | -| end_conversation | 结束持续对话关键词 | `结束持续对话` | | -| stream | 使用流式响应,获得更快的响应 | `false` | | -| proxy | 支持 HTTP 代理,传入 http proxy URL | "" | | -| gpt_options | OpenAI API 的参数字典 | `{}` | | -| bing_cookie_path | NewBing使用的cookie路径,参考[这里]获取 | 也可通过环境变量 `COOKIE_FILE` 设置 | | -| bing_cookies | NewBing使用的cookie字典,参考[这里]获取 | | | -| deployment_id | Azure OpenAI 服务的 deployment ID | 参考这个[如何找到deployment_id](https://github.com/yihong0618/xiaogpt/issues/347#issuecomment-1784410784) | | -| api_base | 如果需要替换默认的api,或者使用Azure OpenAI 服务 | 例如:`https://abc-def.openai.azure.com/` | | -| azure_tts_speech_key | Azure TTS key | null | | -| azure_tts_service_region | Azure TTS 服务地区 | `eastasia` | [Regions - Speech service - Azure AI services](https://learn.microsoft.com/en-us/azure/ai-services/speech-service/regions) | -| volc_accesskey | 火山引擎accesskey [参考](https://console.volcengine.com/iam/keymanage/) | | | -| volc_secretkey | 火山引擎secretkey [参考](https://console.volcengine.com/iam/keymanage/) | | | -| volc_tts_app | 火山引擎 TTS app 服务 [参考]( https://console.volcengine.com/sami/) | | | -| volc_tts_speaker | 火山引擎 TTS speaker [参考]( https://www.volcengine.com/docs/6489/93478) | `zh_female_qingxin` | | +| 参数 | 说明 | 默认值 | 可选值 | +| --------------------- | ------------------------------------------------------------------------------------------- | --------------------------------------------------------------------------------------------------------- | ----------------------------------------------------- | +| hardware | 设备型号 | | | +| account | 小爱账户 | | | +| password | 小爱账户密码 | | | +| openai_key | openai的apikey | | | +| serpapi_api_key | serpapi的key 参考 [SerpAPI](https://serpapi.com/) | | | +| glm_key | chatglm 的 apikey | | | +| gemini_key | gemini 的 apikey [参考](https://makersuite.google.com/app/apikey) | | | +| qwen_key | qwen 的 apikey [参考](https://help.aliyun.com/zh/dashscope/developer-reference/api-details) | | | +| cookie | 小爱账户cookie (如果用上面密码登录可以不填) | | | +| mi_did | 设备did | | | +| use_command | 使用 MI command 与小爱交互 | `false` | | +| mute_xiaoai | 快速停掉小爱自己的回答 | `true` | | +| verbose | 是否打印详细日志 | `false` | | +| bot | 使用的 bot 类型,目前支持 chatgptapi,newbing, qwen, gemini | `chatgptapi` | | +| tts | 使用的 TTS 类型 | `mi` | `edge`、 `openai`、`azure`、`volc`、`baidu`、`google` | +| tts_options | TTS 参数字典,参考 [tetos](https://github.com/frostming/tetos) 获取可用参数 | | | +| prompt | 自定义prompt | `请用100字以内回答` | | +| keyword | 自定义请求词列表 | `["请"]` | | +| change_prompt_keyword | 更改提示词触发列表 | `["更改提示词"]` | | +| start_conversation | 开始持续对话关键词 | `开始持续对话` | | +| end_conversation | 结束持续对话关键词 | `结束持续对话` | | +| stream | 使用流式响应,获得更快的响应 | `false` | | +| proxy | 支持 HTTP 代理,传入 http proxy URL | "" | | +| gpt_options | OpenAI API 的参数字典 | `{}` | | +| bing_cookie_path | NewBing使用的cookie路径,参考[这里]获取 | 也可通过环境变量 `COOKIE_FILE` 设置 | | +| bing_cookies | NewBing使用的cookie字典,参考[这里]获取 | | | +| deployment_id | Azure OpenAI 服务的 deployment ID | 参考这个[如何找到deployment_id](https://github.com/yihong0618/xiaogpt/issues/347#issuecomment-1784410784) | | +| api_base | 如果需要替换默认的api,或者使用Azure OpenAI 服务 | 例如:`https://abc-def.openai.azure.com/` | | [这里]: https://github.com/acheong08/EdgeGPT#getting-authentication-required @@ -300,6 +294,7 @@ docker run -v :/config -p 9527:9527 -e XIAOGPT_HOSTNAME==1.0.2; python_version < \"3.11\"", "idna>=2.8", "sniffio>=1.1", + "typing-extensions>=4.1; python_version < \"3.11\"", ] files = [ - {file = "anyio-3.7.1-py3-none-any.whl", hash = "sha256:91dee416e570e92c64041bd18b900d1d6fa78dff7048769ce5ac5ddad004fbb5"}, - {file = "anyio-3.7.1.tar.gz", hash = "sha256:44a3c9aba0f5defa43261a8b3efb97891f2bd7d804e0e1f56419befa1adfc780"}, + {file = "anyio-4.3.0-py3-none-any.whl", hash = "sha256:048e05d0f6caeed70d731f3db756d35dcc1f35747c8c403364a8332c630441b8"}, + {file = "anyio-4.3.0.tar.gz", hash = "sha256:f75253795a87df48568485fd18cdd2a3fa5c4f7c5be8e5e36637733fce06fed6"}, ] [[package]] @@ -286,6 +287,20 @@ files = [ {file = "charset_normalizer-3.3.2-py3-none-any.whl", hash = "sha256:3e4d1f6587322d2788836a99c69062fbb091331ec940e02d12d179c1d53e25fc"}, ] +[[package]] +name = "click" +version = "8.1.7" +requires_python = ">=3.7" +summary = "Composable command line interface toolkit" +groups = ["default"] +dependencies = [ + "colorama; platform_system == \"Windows\"", +] +files = [ + {file = "click-8.1.7-py3-none-any.whl", hash = "sha256:ae74fb96c20a0277a1d615f1e4d73c8414f5a98db8b799a7931d1582f3390c28"}, + {file = "click-8.1.7.tar.gz", hash = "sha256:ca9853ad459e787e2192211578cc907e7594e294c7ccc834310722b41b9ca6de"}, +] + [[package]] name = "colorama" version = "0.4.6" @@ -327,17 +342,6 @@ files = [ {file = "dataclasses_json-0.6.3.tar.gz", hash = "sha256:35cb40aae824736fdf959801356641836365219cfe14caeb115c39136f775d2a"}, ] -[[package]] -name = "decorator" -version = "5.1.1" -requires_python = ">=3.5" -summary = "Decorators for Humans" -groups = ["default"] -files = [ - {file = "decorator-5.1.1-py3-none-any.whl", hash = "sha256:b8c3f85900b9dc423225913c5aace94729fe1fa9763b38939a95226f02d37186"}, - {file = "decorator-5.1.1.tar.gz", hash = "sha256:637996211036b6385ef91435e4fae22989472f9d571faba8927ba8253acbc330"}, -] - [[package]] name = "distro" version = "1.9.0" @@ -466,19 +470,6 @@ files = [ {file = "frozenlist-1.4.1.tar.gz", hash = "sha256:c037a86e8513059a2613aaba4d817bb90b9d9b6b69aace3ce9c877e8c8ed402b"}, ] -[[package]] -name = "google" -version = "3.0.0" -summary = "Python bindings to the Google search engine." -groups = ["default"] -dependencies = [ - "beautifulsoup4", -] -files = [ - {file = "google-3.0.0-py2.py3-none-any.whl", hash = "sha256:889cf695f84e4ae2c55fbc0cfdaf4c1e729417fa52ab1db0485202ba173e4935"}, - {file = "google-3.0.0.tar.gz", hash = "sha256:143530122ee5130509ad5e989f0512f7cb218b2d4eddbafbad40fd10e8d8ccbe"}, -] - [[package]] name = "google-ai-generativelanguage" version = "0.6.2" @@ -580,6 +571,23 @@ files = [ {file = "google_auth_httplib2-0.2.0-py2.py3-none-any.whl", hash = "sha256:b65a0a2123300dd71281a7bf6e64d65a0759287df52729bdd1ae2e47dc311a3d"}, ] +[[package]] +name = "google-cloud-texttospeech" +version = "2.16.3" +requires_python = ">=3.7" +summary = "Google Cloud Texttospeech API client library" +groups = ["default"] +dependencies = [ + "google-api-core[grpc]!=2.0.*,!=2.1.*,!=2.10.*,!=2.2.*,!=2.3.*,!=2.4.*,!=2.5.*,!=2.6.*,!=2.7.*,!=2.8.*,!=2.9.*,<3.0.0dev,>=1.34.1", + "google-auth!=2.24.0,!=2.25.0,<3.0.0dev,>=2.14.1", + "proto-plus<2.0.0dev,>=1.22.3", + "protobuf!=3.20.0,!=3.20.1,!=4.21.0,!=4.21.1,!=4.21.2,!=4.21.3,!=4.21.4,!=4.21.5,<5.0.0dev,>=3.19.5", +] +files = [ + {file = "google-cloud-texttospeech-2.16.3.tar.gz", hash = "sha256:fabc315032d137da0710bb4c268734d336212d8fa8316b23b277dd3a84ce721c"}, + {file = "google_cloud_texttospeech-2.16.3-py2.py3-none-any.whl", hash = "sha256:5d1e23f9270908a5d7ecf2af04105fbd3a7ddde60fe48506e397bd18c1ece499"}, +] + [[package]] name = "google-generativeai" version = "0.5.1" @@ -1306,17 +1314,6 @@ files = [ {file = "protobuf-4.25.1.tar.gz", hash = "sha256:57d65074b4f5baa4ab5da1605c02be90ac20c8b40fb137d6a8df9f416b0d0ce2"}, ] -[[package]] -name = "py" -version = "1.11.0" -requires_python = ">=2.7, !=3.0.*, !=3.1.*, !=3.2.*, !=3.3.*, !=3.4.*" -summary = "library with cross-python path, ini-parsing, io, code, log facilities" -groups = ["default"] -files = [ - {file = "py-1.11.0-py2.py3-none-any.whl", hash = "sha256:607c53218732647dff4acdfcd50cb62615cedf612e72d1724fb1a0cc6405b378"}, - {file = "py-1.11.0.tar.gz", hash = "sha256:51c75c4126074b472f746a24399ad32f6053d1b34b68d2fa41e558e6f4a98719"}, -] - [[package]] name = "pyasn1" version = "0.5.1" @@ -1342,21 +1339,6 @@ files = [ {file = "pyasn1_modules-0.3.0.tar.gz", hash = "sha256:5bd01446b736eb9d31512a30d46c1ac3395d676c6f3cafa4c03eb54b9925631c"}, ] -[[package]] -name = "pycryptodome" -version = "3.9.9" -requires_python = ">=2.6, !=3.0.*, !=3.1.*, !=3.2.*, !=3.3.*" -summary = "Cryptographic library for Python" -groups = ["default"] -files = [ - {file = "pycryptodome-3.9.9-cp39-cp39-manylinux1_i686.whl", hash = "sha256:2a68df525b387201a43b27b879ce8c08948a430e883a756d6c9e3acdaa7d7bd8"}, - {file = "pycryptodome-3.9.9-cp39-cp39-manylinux1_x86_64.whl", hash = "sha256:a4599c0ca0fc027c780c1c45ed996d5bef03e571470b7b1c7171ec1e1a90914c"}, - {file = "pycryptodome-3.9.9-cp39-cp39-manylinux2014_aarch64.whl", hash = "sha256:b4e6b269a8ddaede774e5c3adbef6bf452ee144e6db8a716d23694953348cd86"}, - {file = "pycryptodome-3.9.9-cp39-cp39-win32.whl", hash = "sha256:a199e9ca46fc6e999e5f47fce342af4b56c7de85fae893c69ab6aa17531fb1e1"}, - {file = "pycryptodome-3.9.9-cp39-cp39-win_amd64.whl", hash = "sha256:6e89bb3826e6f84501e8e3b205c22595d0c5492c2f271cbb9ee1c48eb1866645"}, - {file = "pycryptodome-3.9.9.tar.gz", hash = "sha256:910e202a557e1131b1c1b3f17a63914d57aac55cf9fb9b51644962841c3995c4"}, -] - [[package]] name = "pydantic" version = "2.5.3" @@ -1500,16 +1482,6 @@ files = [ {file = "pyparsing-3.1.2.tar.gz", hash = "sha256:a1bac0ce561155ecc3ed78ca94d3c9378656ad4c94c1270de543f621420f94ad"}, ] -[[package]] -name = "pytz" -version = "2020.5" -summary = "World timezone definitions, modern and historical" -groups = ["default"] -files = [ - {file = "pytz-2020.5-py2.py3-none-any.whl", hash = "sha256:16962c5fb8db4a8f63a26646d8886e9d769b6c511543557bc84e9569fb9a9cb4"}, - {file = "pytz-2020.5.tar.gz", hash = "sha256:180befebb1927b16f6b57101720075a984c019ac16b1b7575673bea42c6c3da5"}, -] - [[package]] name = "pyyaml" version = "6.0.1" @@ -1639,20 +1611,6 @@ files = [ {file = "requests-2.31.0.tar.gz", hash = "sha256:942c5a758f98d790eaed1a29cb6eefc7ffb0d1cf7af05c3d2791656dbd6ad1e1"}, ] -[[package]] -name = "retry" -version = "0.9.2" -summary = "Easy to use retry decorator." -groups = ["default"] -dependencies = [ - "decorator>=3.4.2", - "py<2.0.0,>=1.4.26", -] -files = [ - {file = "retry-0.9.2-py2.py3-none-any.whl", hash = "sha256:ccddf89761fa2c726ab29391837d4327f819ea14d244c232a1d24c67a2f98606"}, - {file = "retry-0.9.2.tar.gz", hash = "sha256:f8bfa8b99b69c4506d6f5bd3b0aabf77f98cdb17f3c9fc3f5ca820033336fba4"}, -] - [[package]] name = "rich" version = "13.7.1" @@ -1682,17 +1640,6 @@ files = [ {file = "rsa-4.9.tar.gz", hash = "sha256:e38464a49c6c85d7f1351b0126661487a7e0a14a50f1675ec50eb34d4f20ef21"}, ] -[[package]] -name = "six" -version = "1.16.0" -requires_python = ">=2.7, !=3.0.*, !=3.1.*, !=3.2.*" -summary = "Python 2 and 3 compatibility utilities" -groups = ["default"] -files = [ - {file = "six-1.16.0-py2.py3-none-any.whl", hash = "sha256:8abb2f1d86890a2dfb989f9a77cfcfd3e47c2a354b01111771326f8aa26e0254"}, - {file = "six-1.16.0.tar.gz", hash = "sha256:1e61c37477a1626458e36f7b1d82aa5c9b094fa4802892072e49de9c60c4c926"}, -] - [[package]] name = "sniffio" version = "1.3.0" @@ -1784,6 +1731,26 @@ files = [ {file = "tenacity-8.2.3.tar.gz", hash = "sha256:5398ef0d78e63f40007c1fb4c0bff96e1911394d2fa8d194f77619c05ff6cc8a"}, ] +[[package]] +name = "tetos" +version = "0.1.0" +requires_python = ">=3.8" +summary = "Unified interface for multiple Text-to-Speech (TTS) providers" +groups = ["default"] +dependencies = [ + "anyio>=4.3.0", + "azure-cognitiveservices-speech>=1.37.0", + "click>=8.1.7", + "edge-tts>=6.1.10", + "google-cloud-texttospeech>=2.16.3", + "mutagen>=1.47.0", + "openai>=1.20.0", +] +files = [ + {file = "tetos-0.1.0-py3-none-any.whl", hash = "sha256:3a3d2a2a93c2d22f950fae10c179e88bdc1cd6ee5d812c7dd5bf14751653a667"}, + {file = "tetos-0.1.0.tar.gz", hash = "sha256:fa537bb769ff4e54e3f808dd76c5d98373387a0a0c1119502e6998e00b4dcc75"}, +] + [[package]] name = "tqdm" version = "4.66.1" @@ -1845,24 +1812,6 @@ files = [ {file = "urllib3-2.1.0.tar.gz", hash = "sha256:df7aa8afb0148fa78488e7899b2c59b5f4ffcfa82e6c54ccb9dd37c1d7b52d54"}, ] -[[package]] -name = "volcengine" -version = "1.0.136" -summary = "The Volcengine SDK for Python" -groups = ["default"] -dependencies = [ - "google>=3.0.0", - "protobuf>=3.18.3", - "pycryptodome==3.9.9", - "pytz==2020.5", - "requests>=2.25.1", - "retry==0.9.2", - "six>=1.0", -] -files = [ - {file = "volcengine-1.0.136.tar.gz", hash = "sha256:7d88eb54d39b100855880bc830f73ee8769fc7341ae5909c8fe6a20ab438f32e"}, -] - [[package]] name = "wcwidth" version = "0.2.13" diff --git a/pyproject.toml b/pyproject.toml index fa04ea4e..32f9dcaf 100644 --- a/pyproject.toml +++ b/pyproject.toml @@ -18,7 +18,6 @@ dependencies = [ "rich", "zhipuai>=2.0.1", "httpx[socks]", - "edge-tts>=6.1.3", "EdgeGPT==0.1.26", "langchain>=0.0.343", "beautifulsoup4>=4.12.0", @@ -26,9 +25,7 @@ dependencies = [ "google-generativeai", "numexpr>=2.8.6", "dashscope>=1.10.0", - "azure-cognitiveservices-speech>=1.37.0", - "multidict>=6.0.5", - "volcengine>=1.0.136", + "tetos>=0.1.0", ] license = {text = "MIT"} dynamic = ["version", "optional-dependencies"] diff --git a/requirements.txt b/requirements.txt index ccc919ec..6a9289b9 100644 --- a/requirements.txt +++ b/requirements.txt @@ -4,7 +4,7 @@ aiohttp==3.9.5 aiosignal==1.3.1 annotated-types==0.6.0 -anyio==3.7.1 +anyio==4.3.0 async-timeout==4.0.3; python_version < "3.11" attrs==23.2.0 azure-cognitiveservices-speech==1.37.0 @@ -13,21 +13,21 @@ bingimagecreator==0.5.0 cachetools==5.3.2 certifi==2024.2.2 charset-normalizer==3.3.2 +click==8.1.7 colorama==0.4.6; platform_system == "Windows" dashscope==1.17.0 dataclasses-json==0.6.3 -decorator==5.1.1 distro==1.9.0 edge-tts==6.1.10 edgegpt==0.1.26 exceptiongroup==1.2.0; python_version < "3.11" frozenlist==1.4.1 -google==3.0.0 google-ai-generativelanguage==0.6.2 google-api-core==2.15.0 google-api-python-client==2.125.0 google-auth==2.26.1 google-auth-httplib2==0.2.0 +google-cloud-texttospeech==2.16.3 google-generativeai==0.5.1 google-search-results==2.4.2 googleapis-common-protos==1.62.0 @@ -61,34 +61,29 @@ packaging==23.2 prompt-toolkit==3.0.43 proto-plus==1.23.0 protobuf==4.25.1 -py==1.11.0 pyasn1==0.5.1 pyasn1-modules==0.3.0 -pycryptodome==3.9.9 pydantic==2.5.3 pydantic-core==2.14.6 pygments==2.17.2 pyjwt==2.8.0 pyparsing==3.1.2; python_version > "3.0" -pytz==2020.5 pyyaml==6.0.1 regex==2023.12.25 requests==2.31.0 -retry==0.9.2 rich==13.7.1 rsa==4.9 -six==1.16.0 sniffio==1.3.0 socksio==1.0.0 soupsieve==2.5 sqlalchemy==2.0.25 tenacity==8.2.3 +tetos==0.1.0 tqdm==4.66.1 typing-extensions==4.9.0 typing-inspect==0.9.0 uritemplate==4.1.1 urllib3==2.1.0 -volcengine==1.0.136 wcwidth==0.2.13 websockets==12.0 yarl==1.9.4 diff --git a/xiao_config.json.example b/xiao_config.json.example index bbf5ac3c..97c8e057 100644 --- a/xiao_config.json.example +++ b/xiao_config.json.example @@ -14,7 +14,7 @@ "verbose": false, "bot": "chatgptapi", "tts": "mi", - "edge_tts_voice": "zh-CN-XiaoxiaoNeural", + "tts_options": {}, "prompt": "请用100字以内回答", "keyword": ["请"], "change_prompt_keyword": ["更改提示词"], @@ -26,11 +26,5 @@ "bing_cookie_path": "", "bing_cookies": {}, "api_base": "https://abc-def.openai.azure.com/", - "deployment_id": "", - "azure_tts_speech_key": null, - "azure_tts_service_region": "eastasia", - "volc_accesskey": "", - "volc_secretkey": "", - "volc_tts_app": "", - "volc_tts_speaker": "zh_male_chunhou", -} \ No newline at end of file + "deployment_id": "" +} diff --git a/xiaogpt/cli.py b/xiaogpt/cli.py index ed5a87ed..e51ddef3 100644 --- a/xiaogpt/cli.py +++ b/xiaogpt/cli.py @@ -86,27 +86,9 @@ def main(): help="show info", ) parser.add_argument( - "--azure_tts_speech_key", - dest="azure_tts_speech_key", - help="if use azure tts", - ) - parser.add_argument( - "--azure_tts_service_region", - dest="azure_tts_service_region", - help="if use azure tts", - ) - tts_group = parser.add_mutually_exclusive_group() - tts_group.add_argument( - "--enable_edge_tts", - dest="tts", - action="store_const", - const="edge", - help="if use edge tts", - ) - tts_group.add_argument( "--tts", - help="tts type", - choices=["mi", "edge", "openai", "azure"], + help="TTS provider", + choices=["mi", "edge", "openai", "azure", "google", "baidu", "volc"], ) bot_group = parser.add_mutually_exclusive_group() bot_group.add_argument( @@ -190,9 +172,15 @@ def main(): options = parser.parse_args() config = Config.from_options(options) - miboy = MiGPT(config) + async def main(config: Config) -> None: + miboy = MiGPT(config) + try: + await miboy.run_forever() + finally: + await miboy.close() + loop = asyncio.get_event_loop() - loop.run_until_complete(miboy.run_forever()) + loop.run_until_complete(main(config)) if __name__ == "__main__": diff --git a/xiaogpt/config.py b/xiaogpt/config.py index 141d8360..4a7bf84f 100644 --- a/xiaogpt/config.py +++ b/xiaogpt/config.py @@ -33,15 +33,6 @@ # add more here } -EDGE_TTS_DICT = { - "用英语": "en-US-AriaNeural", - "用日语": "ja-JP-NanamiNeural", - "用法语": "fr-BE-CharlineNeural", - "用韩语": "ko-KR-SunHiNeural", - "用德语": "de-AT-JonasNeural", - # add more here -} - DEFAULT_COMMAND = ("5-1", "5-5") KEY_WORD = ("帮我", "请") @@ -80,23 +71,11 @@ class Config: start_conversation: str = "开始持续对话" end_conversation: str = "结束持续对话" stream: bool = False - tts: Literal["mi", "edge", "azure", "openai"] = "mi" - tts_voice: str | None = None + tts: Literal["mi", "edge", "azure", "openai", "baidu", "google", "volc"] = "mi" + tts_options: dict[str, Any] = field(default_factory=dict) gpt_options: dict[str, Any] = field(default_factory=dict) bing_cookie_path: str = "" bing_cookies: dict | None = None - azure_tts_speech_key: str | None = None - azure_tts_service_region: str = "eastasia" - volc_accesskey: str = os.getenv( - "VOLC_ACCESSKEY", "" - ) # https://console.volcengine.com/iam/keymanage/ - volc_secretkey: str = os.getenv("VOLC_SECRETKEY", "") - volc_tts_app: str = os.getenv( - "VOLC_TTS_APP", "" - ) # https://console.volcengine.com/sami - volc_tts_speaker: str = os.getenv( - "VOLC_TTS_SPEAPER", "zh_female_qingxin" - ) # https://www.volcengine.com/docs/6489/93478 def __post_init__(self) -> None: if self.proxy: @@ -121,8 +100,6 @@ def __post_init__(self) -> None: raise Exception( "Using GPT api needs openai API key, please google how to" ) - if self.tts == "azure" and not self.azure_tts_speech_key: - raise Exception("Using Azure TTS needs azure speech key") @property def tts_command(self) -> str: diff --git a/xiaogpt/tts/__init__.py b/xiaogpt/tts/__init__.py index fd5569b9..82bcc392 100644 --- a/xiaogpt/tts/__init__.py +++ b/xiaogpt/tts/__init__.py @@ -1,7 +1,5 @@ -from xiaogpt.tts.base import TTS as TTS -from xiaogpt.tts.edge import EdgeTTS as EdgeTTS -from xiaogpt.tts.mi import MiTTS as MiTTS -from xiaogpt.tts.volc import VolcTTS as VolcTTS -from xiaogpt.tts.azure import AzureTTS +from xiaogpt.tts.base import TTS +from xiaogpt.tts.mi import MiTTS +from xiaogpt.tts.tetos import TetosTTS -__all__ = ["TTS", "EdgeTTS", "MiTTS", "AzureTTS", "VolcTTS"] +__all__ = ["TTS", "TetosTTS", "MiTTS"] diff --git a/xiaogpt/tts/azure.py b/xiaogpt/tts/azure.py deleted file mode 100644 index 8e356020..00000000 --- a/xiaogpt/tts/azure.py +++ /dev/null @@ -1,98 +0,0 @@ -from __future__ import annotations - -import logging -import tempfile -from pathlib import Path -from typing import Optional - -import azure.cognitiveservices.speech as speechsdk - -from xiaogpt.tts.base import AudioFileTTS -from xiaogpt.utils import calculate_tts_elapse - -logger = logging.getLogger(__name__) - - -class AzureTTS(AudioFileTTS): - voice_name = "zh-CN-XiaoxiaoMultilingualNeural" - - async def make_audio_file(self, query: str, text: str) -> tuple[Path, float]: - output_file = tempfile.NamedTemporaryFile( - suffix=".mp3", mode="wb", delete=False, dir=self.dirname.name - ) - - speech_synthesizer = self._build_speech_synthesizer(output_file.name) - result: Optional[speechsdk.SpeechSynthesisResult] = ( - speech_synthesizer.speak_text_async(text).get() - ) - if result is None: - raise RuntimeError( - f"Failed to get tts from azure with voice={self.voice_name}" - ) - # Check result - if result.reason == speechsdk.ResultReason.SynthesizingAudioCompleted: - logger.debug("Speech synthesized for text [{}]".format(text)) - - return Path(output_file.name), calculate_tts_elapse(text) - elif result.reason == speechsdk.ResultReason.Canceled: - cancellation_details = result.cancellation_details - logger.warning(f"Speech synthesis canceled: {cancellation_details.reason}") - if cancellation_details.reason == speechsdk.CancellationReason.Error: - errmsg = f"Error details: {cancellation_details.error_details}" - logger.error(errmsg) - raise RuntimeError(errmsg) - raise RuntimeError(f"Failed to get tts from azure with voice={self.voice_name}") - - def _build_speech_synthesizer(self, filename: str): - speech_key = self.config.azure_tts_speech_key - service_region = self.config.azure_tts_service_region - if not speech_key: - raise Exception("Azure tts need speech key") - speech_config = speechsdk.SpeechConfig( - subscription=speech_key, region=service_region - ) - speech_config.set_speech_synthesis_output_format( - speechsdk.SpeechSynthesisOutputFormat.Audio16Khz32KBitRateMonoMp3 - ) - if self.config.proxy: - host, port, username, password = self._parse_proxy(self.config.proxy) - - if username and password: - speech_config.set_proxy( - hostname=host, port=port, username=username, password=password - ) - else: - speech_config.set_proxy(hostname=host, port=port) - - speech_config.speech_synthesis_voice_name = ( - self.config.tts_voice or self.voice_name - ) - speech_synthesizer = speechsdk.SpeechSynthesizer( - speech_config=speech_config, - audio_config=speechsdk.audio.AudioOutputConfig(filename=filename), # type: ignore - ) - return speech_synthesizer - - def _parse_proxy(self, proxy_str: str): - proxy_str = proxy_str - proxy_str_splited = proxy_str.split("://") - proxy_type = proxy_str_splited[0] - proxy_addr = proxy_str_splited[1] - - if proxy_type == "http": - if "@" in proxy_addr: - proxy_addr_splited = proxy_addr.split("@") - proxy_auth = proxy_addr_splited[0] - proxy_addr_netloc = proxy_addr_splited[1] - proxy_auth_splited = proxy_auth.split(":") - username = proxy_auth_splited[0] - password = proxy_auth_splited[1] - else: - proxy_addr_netloc = proxy_addr - username, password = None, None - - proxy_addr_netloc_splited = proxy_addr_netloc.split(":") - host = proxy_addr_netloc_splited[0] - port = int(proxy_addr_netloc_splited[1]) - return host, port, username, password - raise NotImplementedError diff --git a/xiaogpt/tts/base.py b/xiaogpt/tts/base.py index 5c7800de..51ae09af 100644 --- a/xiaogpt/tts/base.py +++ b/xiaogpt/tts/base.py @@ -56,7 +56,7 @@ async def get_if_xiaoai_is_playing(self): return is_playing @abc.abstractmethod - async def synthesize(self, query: str, text_stream: AsyncIterator[str]) -> None: + async def synthesize(self, lang: str, text_stream: AsyncIterator[str]) -> None: """Synthesize speech from a stream of text.""" raise NotImplementedError @@ -87,20 +87,20 @@ def __init__( self._start_http_server() @abc.abstractmethod - async def make_audio_file(self, query: str, text: str) -> tuple[Path, float]: + async def make_audio_file(self, lang: str, text: str) -> tuple[Path, float]: """Synthesize speech from text and save it to a file. Return the file path and the duration of the audio in seconds. The file path must be relative to the self.dirname. """ raise NotImplementedError - async def synthesize(self, query: str, text_stream: AsyncIterator[str]) -> None: + async def synthesize(self, lang: str, text_stream: AsyncIterator[str]) -> None: queue: asyncio.Queue[tuple[str, float]] = asyncio.Queue() finished = asyncio.Event() async def worker(): async for text in text_stream: - path, duration = await self.make_audio_file(query, text) + path, duration = await self.make_audio_file(lang, text) url = f"http://{self.hostname}:{self.port}/{path.name}" await queue.put((url, duration)) finished.set() diff --git a/xiaogpt/tts/edge.py b/xiaogpt/tts/edge.py deleted file mode 100644 index 33fb6f63..00000000 --- a/xiaogpt/tts/edge.py +++ /dev/null @@ -1,32 +0,0 @@ -import tempfile -from pathlib import Path - -import edge_tts - -from xiaogpt.config import EDGE_TTS_DICT -from xiaogpt.tts.base import AudioFileTTS -from xiaogpt.utils import find_key_by_partial_string - - -class EdgeTTS(AudioFileTTS): - default_voice = "zh-CN-XiaoxiaoNeural" - - async def make_audio_file(self, query: str, text: str) -> tuple[Path, float]: - voice = ( - find_key_by_partial_string(EDGE_TTS_DICT, query) - or self.config.tts_voice - or self.default_voice - ) - communicate = edge_tts.Communicate(text, voice, proxy=self.config.proxy) - duration = 0 - with tempfile.NamedTemporaryFile( - suffix=".mp3", mode="wb", delete=False, dir=self.dirname.name - ) as f: - async for chunk in communicate.stream(): - if chunk["type"] == "audio": - f.write(chunk["data"]) - elif chunk["type"] == "WordBoundary": - duration = (chunk["offset"] + chunk["duration"]) / 1e7 - if duration == 0: - raise RuntimeError(f"Failed to get tts from edge with voice={voice}") - return (Path(f.name), duration) diff --git a/xiaogpt/tts/mi.py b/xiaogpt/tts/mi.py index e4882437..164091b6 100644 --- a/xiaogpt/tts/mi.py +++ b/xiaogpt/tts/mi.py @@ -27,7 +27,7 @@ async def say(self, text: str) -> None: f"{self.config.tts_command} {text}", ) - async def synthesize(self, query: str, text_stream: AsyncIterator[str]) -> None: + async def synthesize(self, lang: str, text_stream: AsyncIterator[str]) -> None: async for text in text_stream: await self.say(text) await self.wait_for_duration(calculate_tts_elapse(text)) diff --git a/xiaogpt/tts/openai.py b/xiaogpt/tts/openai.py deleted file mode 100644 index 54c7ce90..00000000 --- a/xiaogpt/tts/openai.py +++ /dev/null @@ -1,46 +0,0 @@ -from __future__ import annotations - -import tempfile -from pathlib import Path -from typing import TYPE_CHECKING - -import httpx - -from xiaogpt.tts.base import AudioFileTTS -from xiaogpt.utils import calculate_tts_elapse - -if TYPE_CHECKING: - import openai - - -class OpenAITTS(AudioFileTTS): - default_voice = "alloy" - - async def make_audio_file(self, query: str, text: str) -> tuple[Path, float]: - output_file = tempfile.NamedTemporaryFile( - suffix=".mp3", mode="wb", delete=False, dir=self.dirname.name - ) - httpx_kwargs = {} - if self.config.proxy: - httpx_kwargs["proxies"] = self.config.proxy - async with httpx.AsyncClient(trust_env=True, **httpx_kwargs) as sess: - client = self._make_openai_client(sess) - - resp = await client.audio.speech.create( - model="tts-1", - input=text, - voice=self.config.tts_voice or self.default_voice, - ) - resp.stream_to_file(output_file.name) - return Path(output_file.name), calculate_tts_elapse(text) - - def _make_openai_client(self, sess: httpx.AsyncClient) -> openai.AsyncOpenAI: - import openai - - api_base = self.config.api_base - if api_base and api_base.rstrip("/").endswith("openai.azure.com"): - raise NotImplementedError("TTS is not supported for Azure OpenAI") - else: - return openai.AsyncOpenAI( - api_key=self.config.openai_key, http_client=sess, base_url=api_base - ) diff --git a/xiaogpt/tts/tetos.py b/xiaogpt/tts/tetos.py new file mode 100644 index 00000000..1178e050 --- /dev/null +++ b/xiaogpt/tts/tetos.py @@ -0,0 +1,56 @@ +from __future__ import annotations + +import tempfile +from pathlib import Path + +from miservice import MiNAService +from tetos.base import Speaker + +from xiaogpt.config import Config +from xiaogpt.tts.base import AudioFileTTS + + +class TetosTTS(AudioFileTTS): + def __init__( + self, mina_service: MiNAService, device_id: str, config: Config + ) -> None: + super().__init__(mina_service, device_id, config) + self.speaker = self._get_speaker() + + def _get_speaker(self) -> Speaker: + from tetos.azure import AzureSpeaker + from tetos.baidu import BaiduSpeaker + from tetos.edge import EdgeSpeaker + from tetos.google import GoogleSpeaker + from tetos.openai import OpenAISpeaker + from tetos.volc import VolcSpeaker + + options = self.config.tts_options + allowed_speakers: list[str] = [] + for speaker in ( + OpenAISpeaker, + EdgeSpeaker, + AzureSpeaker, + VolcSpeaker, + GoogleSpeaker, + BaiduSpeaker, + ): + if (name := speaker.__name__[:-7].lower()) == self.config.tts: + try: + return speaker(**options) + except TypeError as e: + raise ValueError( + f"{e}. Please add them via `tts_options` config" + ) from e + else: + allowed_speakers.append(name) + raise ValueError( + f"Unsupported TTS: {self.config.tts}, allowed: {','.join(allowed_speakers)}" + ) + + async def make_audio_file(self, lang: str, text: str) -> tuple[Path, float]: + output_file = tempfile.NamedTemporaryFile( + suffix=".mp3", mode="wb", delete=False, dir=self.dirname.name + ) + duration = await self.speaker.synthesize(text, output_file.name, lang=lang) + return Path(output_file.name), duration diff --git a/xiaogpt/tts/volc.py b/xiaogpt/tts/volc.py deleted file mode 100644 index 78634ead..00000000 --- a/xiaogpt/tts/volc.py +++ /dev/null @@ -1,130 +0,0 @@ -from __future__ import annotations - -import logging -import tempfile -from pathlib import Path -from typing import Optional -import json -import os -import time -import base64 -import threading -import httpx - -from volcengine.ApiInfo import ApiInfo -from volcengine.Credentials import Credentials -from volcengine.ServiceInfo import ServiceInfo -from volcengine.base.Service import Service - - -from xiaogpt.tts.base import AudioFileTTS -from xiaogpt.utils import calculate_tts_elapse - -logger = logging.getLogger(__name__) - - -class VolcTTS(AudioFileTTS): - def __init__(self, mina_service, device_id, config): - super().__init__(mina_service, device_id, config) - self.token = get_token(config) - self.client = httpx.Client() - logger.info("Initializing VolcTTS {self.token}") - - async def make_audio_file(self, query: str, text: str) -> tuple[Path, float]: - tts_payload = json.dumps( - { - "text": text, - "speaker": self.config.volc_tts_speaker, - "audio_config": { - "format": "mp3", - "sample_rate": 24000, - "speech_rate": 0, - }, - } - ) - - req = { - "appkey": self.config.volc_tts_app, - "token": self.token, - "namespace": "TTS", - "payload": tts_payload, - } - - resp = self.client.post("https://sami.bytedance.com/api/v1/invoke", json=req) - try: - sami_resp = resp.json() - logger.info(f"volc sami_resp {resp.status_code}") - if resp.status_code != 200: - print(sami_resp) - except: - logger.error(f"Failed to get tts from volcengine with voice=zh {text}") - - if sami_resp["status_code"] == 20000000 and len(sami_resp["data"]) > 0: - audio_data = base64.b64decode(sami_resp["data"]) - with tempfile.NamedTemporaryFile( - suffix=".mp3", mode="wb", delete=False, dir=self.dirname.name - ) as f: - f.write(audio_data) - - return Path(f.name), calculate_tts_elapse(text) - - -## fetch token and save it to file -## it's aimed to reduce the request to volcengine -## it'll throw error if token is requested too frequently (more than 1 times per minute) -def get_token(config): - token_file = Path.home() / ".volc.token" - if not Path.exists(token_file): - token = request_token_data(config) - else: - with open(token_file, "r") as f: - token = json.load(f) - if token["expires_at"] < time.time(): - token = request_token_data(config) - - if not Path.exists(token_file): - with open(token_file, "w") as f: - json.dump(token, f) - return token["token"] - - -def request_token_data(config): - sami_service = SAMIService() - sami_service.set_ak(config.volc_accesskey) - sami_service.set_sk(config.volc_secretkey) - - req = { - "appkey": config.volc_tts_app, - "token_version": "volc-auth-v1", - "expiration": 24 * 3600, - } - token = sami_service.common_json_handler("GetToken", req) - logger.info(f"Got token from volcengine {token}") - return token - - -class SAMIService(Service): - def __init__(self): - self.service_info = ServiceInfo( - "open.volcengineapi.com", - {}, - Credentials("", "", "sami", "cn-north-1"), - 10, - 10, - ) - self.api_info = { - "GetToken": ApiInfo( - "POST", "/", {"Action": "GetToken", "Version": "2021-07-27"}, {}, {} - ), - } - super(SAMIService, self).__init__(self.service_info, self.api_info) - - def common_json_handler(self, api, body): - params = dict() - try: - body = json.dumps(body) - res = self.json(api, params, body) - res_json = json.loads(res) - return res_json - except Exception as e: - raise Exception(str(e)) diff --git a/xiaogpt/xiaogpt.py b/xiaogpt/xiaogpt.py index b0e1e619..1b8b99c0 100644 --- a/xiaogpt/xiaogpt.py +++ b/xiaogpt/xiaogpt.py @@ -23,8 +23,7 @@ WAKEUP_KEYWORD, Config, ) -from xiaogpt.tts import TTS, EdgeTTS, MiTTS, AzureTTS, VolcTTS -from xiaogpt.tts.openai import OpenAITTS +from xiaogpt.tts import TTS, MiTTS, TetosTTS from xiaogpt.utils import ( parse_cookie_string, ) @@ -53,6 +52,9 @@ def __init__(self, config: Config): self.log.debug(config) self.mi_session = ClientSession() + async def close(self): + await self.mi_session.close() + async def poll_latest_ask(self): async with ClientSession() as session: session._cookie_jar = self.cookie_jar @@ -78,16 +80,16 @@ async def poll_latest_ask(self): # if you want force mute xiaoai, comment this line below. await asyncio.sleep(1 - d) - async def init_all_data(self, session): - await self.login_miboy(session) + async def init_all_data(self): + await self.login_miboy() await self._init_data_hardware() self.mi_session.cookie_jar.update_cookies(self.get_cookie()) self.cookie_jar = self.mi_session.cookie_jar self.tts # init tts - async def login_miboy(self, session): + async def login_miboy(self): account = MiAccount( - session, + self.mi_session, self.config.account, self.config.password, str(self.mi_token_home), @@ -179,7 +181,7 @@ def need_ask_gpt(self, record): return ( self.in_conversation and not query.startswith(WAKEUP_KEYWORD) - or query.startswith(tuple(self.config.keyword)) + or query.lower().startswith(tuple(w.lower() for w in self.config.keyword)) ) def need_change_prompt(self, record): @@ -225,7 +227,7 @@ async def get_latest_ask_from_xiaoai(self, session: ClientSession) -> dict | Non return None async def _retry(self): - await self.init_all_data(self.mi_session) + await self.init_all_data() def _get_last_query(self, data: dict) -> dict | None: if d := data.get("data"): @@ -258,16 +260,10 @@ async def do_tts(self, value): @functools.cached_property def tts(self) -> TTS: - if self.config.tts == "edge": - return EdgeTTS(self.mina_service, self.device_id, self.config) - elif self.config.tts == "azure": - return AzureTTS(self.mina_service, self.device_id, self.config) - elif self.config.tts == "openai": - return OpenAITTS(self.mina_service, self.device_id, self.config) - elif self.config.tts == "volc": - return VolcTTS(self.mina_service, self.device_id, self.config) - else: + if self.config.tts == "mi": return MiTTS(self.mina_service, self.device_id, self.config) + else: + return TetosTTS(self.mina_service, self.device_id, self.config) async def wait_for_tts_finish(self): while True: @@ -347,7 +343,7 @@ async def wakeup_xiaoai(self): ) async def run_forever(self): - await self.init_all_data(self.mi_session) + await self.init_all_data() task = asyncio.create_task(self.poll_latest_ask()) assert task is not None # to keep the reference to task, do not remove this print( @@ -389,6 +385,7 @@ async def run_forever(self): print("问题:" + query + "?") if not self.chatbot.has_history(): query = f"{query},{self.config.prompt}" + query += ",并用本段话的language code作为开头,用|分隔,如:en-US|你好……" if self.config.mute_xiaoai: await self.stop_if_xiaoai_is_playing() else: @@ -404,7 +401,7 @@ async def run_forever(self): print("小爱没回") print(f"以下是 {self.chatbot.name} 的回答: ", end="") try: - await self.tts.synthesize(query, self.ask_gpt(query)) + await self.speak(self.ask_gpt(query)) except Exception as e: print(f"{self.chatbot.name} 回答出错 {str(e)}") else: @@ -412,3 +409,18 @@ async def run_forever(self): if self.in_conversation: print(f"继续对话, 或用`{self.config.end_conversation}`结束对话") await self.wakeup_xiaoai() + + async def speak(self, text_stream: AsyncIterator[str]) -> None: + text = await anext(text_stream) + # See if the first part contains language code(e.g. en-US|Hello world) + lang, _, first_chunk = text.rpartition("|") + if len(lang) > 7: + # It is not a legal language code, discard it + lang, first_chunk = "", text + + async def gen(): # reconstruct the generator + yield first_chunk + async for text in text_stream: + yield text + + await self.tts.synthesize(lang or "zh-CN", gen())