ðïž TTSé³å£°åæãšã³ãžã³ åŸ¹åºæ¯èŒ
ð¯ ããããããš
e-learningåç»ã®ãã¬ãŒã·ã§ã³é³å£°ãäœãããïŒæ¥æ¬èªããããã«èªããŠãã³ã¹ããè¯ããŠãéçšãã©ã¯ãªTTSãšã³ãžã³ãèŠã€ãããã
ð èŠãŠããã€ã³ã: æ¥æ¬èªã®èªç¶ã / ã³ã¹ã / 声ã®ããªãšãŒã·ã§ã³ / åçšOKïŒ / éçšã®ã©ã¯ã
ð° ä»åã®ãµã³ãã«çæã³ã¹ã:
Azure TTS 7声: $0ïŒ1,225æå â ç¡ææ 50äžæå/æã«äœè£ã§åãŸãïŒ
ElevenLabs 21声: $0ïŒçŽ2å â ç¡ææ å ïŒ
OpenAI TTS 6声: â$0.00003ïŒ1,050æå à $30/çŸäžæå â ã»ãŒãŒãïŒ
Google Cloud TTS 7声: $0ïŒ1,225æå â ç¡ææ 100äžæå/æã«äœè£ã§åãŸãïŒ
Amazon Polly 4声: $0ïŒ700æå â ç¡ææ 100äžæå/æã«äœè£ã§åãŸãïŒ
VOICEVOX Nemo 9声: $0ïŒããŒã«ã«å®è¡ïŒ
Qwen3-TTS 1声: $0ïŒããŒã«ã«å®è¡ïŒ
åèš: ã»ãŒ$0 ðïŒ175æå à 55声 = 9,625æåãå šãšã³ãžã³ç¡ææ ã§åãŸã£ãïŒ
ð£ïž ãã¹ãçšããã¹ã: ãã¿ãªãããããã«ã¡ã¯ãä»åã¯ãã¯ã©ãŠããµãŒãã¹ã掻çšããæ¥åæ¹åã«ã€ããŠè§£èª¬ããŸããæè¿ã§ã¯ãAIãããŒã¿åæã®æè¡ããšãŠã身è¿ã«ãªããŸãããããšãã°ãããã¹ããå ¥åããã ãã§ãèªç¶ãªé³å£°ãèªåã§çæã§ããããã«ãªã£ãŠããŸããããããæè¡ãããŸãåãå ¥ããããšã§ãäœæ¥ã®å¹çãå€§å¹ ã«é«ããããšãã§ããŸããããã§ã¯ãå ·äœçãªæŽ»çšæ¹æ³ãèŠãŠãããŸããããã
ð 175æå / çŽ30ç§ ïœ æšæ¶âå°å ¥âå ·äœäŸâãŸãšãâèªå°ã®å®å°æ¬æ§æ ïœ AIã»ã«ã¿ã«ãèªã®çºé³ãã¹ã ïœ å°æ¬åè³ªåºæº13ã«ãŒã«é©å
ð ã²ãšç®ã§ãããæ¯èŒè¡š
ãŸãã¯ã¶ããšå šäœåãã€ãããð
| ãšã³ãžã³ | çš®å¥ | æ¥æ¬èªå質 | ð° äŸ¡æ Œåž¯ | ã©ã€ã»ã³ã¹ | ð£ïž æ¥æ¬èªå£°æ° | âš ç¹åŸŽ |
|---|---|---|---|---|---|---|
| ð Azure TTS | âïž Cloud | â â â â â | $16/çŸäžæå | åŸé課é | 7+ | SSML察å¿ã50äžæå/æã¿ãïŒ |
| ð¥ ElevenLabs | âïž Cloud | â â â â â | $5ã$330/æ | ãµãã¹ã¯ | 倿° | èªç¶ãNo.1ïŒã§ããé«ãðž |
| Google Cloud TTS | âïž Cloud | â â â â | $4ã$16/çŸäžæå | åŸé課é | 7+ | ç¡ææ ãã«ãïŒæ200äžæåïŒ |
| OpenAI TTS | âïž Cloud | â â â â | $15ã$30/çŸäžæå | åŸé課é | 6 | GPT-4oã§ææ æç€ºã§ãã |
| Amazon Polly | âïž Cloud | â â â â | $4ã$16/çŸäžæå | åŸé課é | 4 | AWS䜿ã£ãŠããªãçžæ§â |
| Fish Audio | âïž Cloud | â â â â â | åŸé課é | åŸé/ã»ã«ããã¹ã | 倿° | TTS-Arena 2äœð¥ OSSã§ãåã |
| CoeFont | âïž Cloud | â â â â | $20/æã | ãµãã¹ã¯ | 30+ | æ¥æ¬èªãã€ãã£ãéçºð¯ðµ |
| Cartesia.ai | âïž Cloud | â â â â | $0.03/å | åŸé課é | â | è¶ äœé å»¶â¡ ãªã¢ã«ã¿ã€ã åã |
| Deepgram Aura | âïž Cloud | â â â â | $0.03/åæå | åŸé課é | â | $200ã¯ã¬ãžãããããã |
| ð¥ Qwen3-TTS | ð¥ïž OSS | â â â â â | ð ç¡æ | Apache 2.0 | 1 + VoiceDesign | ããã¹ãã§å£°ããã¶ã€ã³ïŒææ°ð |
| ð VOICEVOX Nemo | ð¥ïž OSS | â â â â | ð ç¡æ | ç¬èª(ç¡æ) | 9 | ã»ããã¢ããæã«ã³ã¿ã³ïŒ |
| VOICEVOX | ð¥ïž OSS | â â â â | ð ç¡æ | MIT(ãšã³ãžã³) | 倿° | ãã£ã©é³å£°ãã£ã±ãð |
| Kokoro TTS | ð¥ïž OSS | â â â â | ð ç¡æ | Apache 2.0 | 54 | 82Mãã©ã¡ãŒã¿ã§è¶ 軜ã𪶠|
| Style-Bert-VITS2 | ð¥ïž OSS | â â â â | ð ç¡æ | AGPL v3 | ã¢ãã«äŸå | ææ ã³ã³ãããŒã«ð æ¥æ¬èªç¹å |
| Chatterbox | ð¥ïž OSS | â â â â | ð ç¡æ | MIT | 23èšèª | 声ã¯ããŒã³ãã¹ãŽãð€ |
| COEIROINK | ð¥ïž OSS | â â â â | ð ç¡æ | ç¬èª(ç¡æ) | 30+ | VOICEVOXäºæã§ä¹ãæãã©ã¯ |
| Open JTalk | ð¥ïž OSS | â â ââ | ð ç¡æ | ä¿®æ£BSD | å°æ° | è¶ è»œéã ãã©ã¡ãã£ã𿩿¢°çð€ |
| ã«ããã€ã¹ | ðŠ Other | â â â â | ãã¬ãã¢ã | ã¯ããŒãºã | â | ãã¬ãã¢ã äŒå¡å¶ã«ç§»è¡æžã¿ |
| A.I.VOICE | ðŠ Other | â â â â â | è²·ãåã/ãµãã¹ã¯ | åçšå¥éå¥çŽ | 倿° | ããå質⚠åçšã¯èŠå¥çŽ |
| AquesTalk | ðŠ Other | â â â â | 6,380å/幎 | åçšã©ã€ã»ã³ã¹ | å°æ° | ããªãã¿ããã£ãããã®æ£èŠç |
ð§ èŽããŠã¿ãã: OSS ãšã³ãžã³
åãããã¹ãã§çæããé³å£°ãèŽãæ¯ã¹ïŒåçãã¿ã³ãããããšæŒããŠã ð
ð¥ïž VOICEVOX Nemo
ð¥ Qwen3-TTS
ð§ èŽããŠã¿ãã: ã¯ã©ãŠã API
ð Azure TTSïŒå š7声ïŒ
æ¥æ¬èªãã³ãããŒã¯1äœïŒ7çš®é¡ã®å£°ãéžã¹ãã
ð¥ ElevenLabs
èªç¶ãã¯æ¥çãããïŒå€èšèªã¢ãã«ã®æ¥æ¬èªãèŽããŠã¿ãã
ð€ OpenAI TTS
ChatGPTã§ããªãã¿OpenAIã®é³å£°ã6çš®é¡ã®å£°ãããã
ð Google Cloud TTS
Googleã®å®å®æãWaveNetãšNeural2ãèŽãæ¯ã¹
ðŠ Amazon Polly
AWSã®TTSãæ¥æ¬èªNeuralé³å£°ã¯4çš®é¡
ð ãã«ã¬ã³ã°ã¹æ¯èŒ
ã·ã§ãŒããµã³ãã«ã ãããããããªãïŒå®éã®å°æ¬ãäžžããšèªãŸããé³å£°ã§ãé·æéèŽãããšãã®å°è±¡ããã§ã㯠ð§
ð LLMç¥èéçïŒ37ã»ã°ã¡ã³ãã»çŽ5åïŒ
ð ã»ãã¥ãªãã£åºç€
ðš Azureå šå£°æ¯èŒïŒåäžããã¹ãã»7声èŽãæ¯ã¹ïŒ
ð° ããããããïŒ ã¯ã©ãŠã API
å°æ¬1æ¬ â 3,000ã5,000æåã83æ¬å šéšã§ â çŽ35äžæåã
â Azureç¡ææ ïŒ50äžæå/æïŒã§äœè£ã§ã«ããŒã§ããïŒ ð
| ãµãŒãã¹ | ð ç¡ææ | ðµ Standard | ð Premium | ð æé¡ãã©ã³ | ð ã²ãšããš |
|---|---|---|---|---|---|
| ð Azure TTS | 50äžæå/æ (12ã¶æ) | $16/çŸäžæå | â | â | daihonå šéã¿ãïŒæåŒ·ã³ã¹ã ð |
| ð¥ ElevenLabs | 10å/æ | â | â | $5ã$330/æ | å質æé«ã ãã©ã財åžã«ã¯å³ãã ðž |
| Google Cloud | 200äžæå/æïŒïŒïŒ | $4/çŸäž | $16/çŸäž (WaveNet) | â | ç¡ææ ãäžçªãã«ã ð |
| OpenAI TTS | ãªã ð¢ | $15/çŸäž | $30/çŸäž (HD) | â | GPT-4o-Mini-TTS ã¯æšå®$30ã60/çŸäž |
| Amazon Polly | 500äžæå+100äž (12ã¶æ) | $4/çŸäž | $16/çŸäž (Neural) | â | ç¡ææ ãè¶ å€ªã£è ¹ïŒ |
| Fish Audio | â | åŸé課é | â | â | ã»ã«ããã¹ããªã ð |
| CoeFont | 5,000æå/æ | â | â | $20/æã | ç¡ææ ã ãšã¡ãã£ãšå°ãªã |
| Cartesia.ai | â | $0.03/å | â | â | ãªã¢ã«ã¿ã€ã åã |
| Deepgram | $200ã¯ã¬ãžãã | $0.03/åæå | â | â | $200ããããã®ã¯ãããã |
ð OSS ãšã³ãžã³ã®ã³ã¹ã
åºæ¬ããã¶ã¿ãïŒãããã®ã¯é»æ°ä»£ãšGPU代ã ã â¡
| ãšã³ãžã³ | ð ã©ã€ã»ã³ã¹ | ð® GPUå¿ èŠïŒ | ð¢ åçšå©çš | ð° å®è³ªã³ã¹ã | ð ã²ãšããš |
|---|---|---|---|---|---|
| ð VOICEVOX Nemo | ç¬èª(ç¡æ) | â äžèŠ | ã¯ã¬ãžããã ã | $0 | äžçªãæè»œïŒ |
| ð¥ Qwen3-TTS | Apache 2.0 | â ïž èŠ(4GB+) | ð å®å šããªãŒ | $0 | Macã®MPSã§ãåã |
| Kokoro TTS | Apache 2.0 | â äžèŠ | ð å®å šããªãŒ | <$1/çŸäžæå | 82Mã§è¶ 軜ã |
| Style-Bert-VITS2 | AGPL v3 | â ïž æšå¥š | ã¢ãã«äŸå | $0 | AGPLã«æ³šæâïž |
| Chatterbox | MIT | â ïž æšå¥š | ð å®å šããªãŒ | $0 | 声ã¯ããŒã³ç¹å |
| COEIROINK | ç¬èª(ç¡æ) | â äžèŠ | ã¯ã¬ãžããå¿ é | $0 | VOICEVOXäºæ |
| Open JTalk | ä¿®æ£BSD | â äžèŠ | ð å®å šããªãŒ | $0 | çµã¿èŸŒã¿åã |
ð¥ïž OSS ãšã³ãžã³ 詳ããèŠãŠã¿ãã
ð VOICEVOX / VOICEVOX Nemo
ð¥ïž OSS- ð ã©ã€ã»ã³ã¹
- MIT(ãšã³ãžã³) / Nemo: ç¬èª(ç¡æ)
- â å質
- â â â â
- ð£ïž é³å£°æ°
- VOICEVOX: ãããã / Nemo: 9声ïŒå¥³6+ç·3ïŒ
- ð§ å®è¡æ¹æ³
- ã¢ããªèµ·å â HTTP API (port 50021/50121)
- ð¢ åçšå©çš
- Nemoã¯ã¯ã¬ãžãã衚èšã ãã§OKïŒ
- ã»ããã¢ãããè¶ ã«ã³ã¿ã³ïŒã¢ããªèµ·åããã ãïŒ
- GPUãããªããå®å®ããAPI
- Nemoãªããã£ã©èŠçŽã®é¢åããŒã
- Nemoã¯9声ãããªã
- ã¢ããªãèµ·åããšãå¿ èŠãã
ð¥ Qwen3-TTS
ð¥ïž OSS- ð ã©ã€ã»ã³ã¹
- Apache 2.0ïŒäœã§ãOKïŒïŒ
- â å質
- â â â â â
- ðïž éçº
- Alibaba Cloud (QwenããŒã )
- ð ãªãªãŒã¹
- 2026幎1æ ð
- ð§ ãã©ã¡ãŒã¿
- 1.7B / 0.6B
- VoiceDesign: ã30代女æ§ãèœã¡çãã声ãã£ãŠæžãã ãã§å£°ãäœããïŒ
- 声ã¯ããŒã³: æ°ç§ã®é³å£°ãã声ãã³ããŒ
- Apache 2.0ã§äœããŠãèªç±
- Macã®Apple Silicon (MPS)ã§åã
- GPUå¿ èŠïŒ4GB以äžã®VRAMïŒ
- æ¥æ¬èªããªã»ãããOno_Annaã ã
- ã»ããã¢ããã¡ãã£ãšé¢å
𪶠Kokoro TTS
ð¥ïž OSS- ð ã©ã€ã»ã³ã¹
- Apache 2.0
- â å質
- â â â â
- ð§ ãã©ã¡ãŒã¿
- 82MïŒè¶ 軜ãïŒïŒ
- ð£ïž é³å£°æ°
- 54ãã€ã¹
- è¶ è»œéã§çéãCPUã§ãµã¯ãµã¯åã
- 54ãã€ã¹ãéžã¹ã
- HuggingFaceã§å€§äººæ°
- æ¥æ¬èªå質ã¯Qwen3ãVOICEVOXã«è² ãã
ð Style-Bert-VITS2
ð¥ïž OSS- ð ã©ã€ã»ã³ã¹
- AGPL v3 / ã¢ãã«ã¯å¥
- â å質
- â â â â
- âš ç¹åŸŽ
- åæåæ¥œã®ææ ãã©ã¡ãŒã¿ã§æŒææå°ã§ãã
- æ¥æ¬èªç¹åã®èšèšã§èªç¶
- ææ ã³ã³ãããŒã«ã现ããã§ãã
- èªåã®å£°ãåŠç¿ãããããšãå¯èœ
- AGPL v3ã¯ã¡ãã£ãšå¶çŽããã
- ã¢ãã«ããšã«ã©ã€ã»ã³ã¹ãéã
ð€ Chatterbox (Resemble AI)
ð¥ïž OSS- ð ã©ã€ã»ã³ã¹
- MIT
- â å質
- â â â â
- ð§ ãã©ã¡ãŒã¿
- 350M
- ð 察å¿èšèª
- 23èšèª
- ãŒãã·ã§ãã声ã¯ããŒã³ïŒæ°ç§ã®é³å£°ã§OKïŒ
- ææ ã®åŒ·ãããã©ã¡ãŒã¿ã§èª¿æŽå¯èœ
- MITã©ã€ã»ã³ã¹ã§èªç±åºŠMAX
- æ¥æ¬èªã¯ãŸã çºå±éäž
ðš COEIROINK
ð¥ïž OSS- ð ã©ã€ã»ã³ã¹
- ç¬èª(ç¡æ)
- â å質
- â â â â
- ð£ïž é³å£°æ°
- 30+ãã£ã©ã¯ã¿ãŒ
- VOICEVOXäºæAPIã§ä¹ãæãã«ã³ã¿ã³
- 30以äžã®ãã£ã©é³å£°
- ãã£ã©ããšã«å©çšæ¡ä»¶ãéãã®ãé¢å
ð€ Open JTalk
ð¥ïž OSS- ð ã©ã€ã»ã³ã¹
- ä¿®æ£BSD
- â å質
- â â ââ
- ðïž éçº
- åå€å±å·¥æ¥å€§åŠ
- æè»œéã§è¶ å®å®
- çµã¿èŸŒã¿åãã«ã¯æé©
- é³ãã¡ãã£ãšããããã£ãœã
- ãã¬ãŒã·ã§ã³ã«ã¯åããªãããªâŠ
âïž ã¯ã©ãŠã API 詳ããèŠãŠã¿ãã
ð Azure TTS (Microsoft)
âïž Cloud- â å質
- â â â â â
- ð° äŸ¡æ Œ
- $16/çŸäžæåïŒã§ãç¡ææ ã§è¶³ããïŒïŒ
- ð ç¡ææ
- 50äžæå/æïŒ12ã¶æïŒ
- ð£ïž æ¥æ¬èªå£°
- 7声ïŒå¥³æ§4 + ç·æ§3ïŒ
- ð§ æ©èœ
- SSMLãææ ã¹ã¿ã€ã«ãCustom Neural Voice
- æ¥æ¬èªãã³ãããŒã¯ã§1äœã®å®åðª
- 7声ããéžã¹ãè±å¯ãªããªãšãŒã·ã§ã³
- 50äžæå/æã¿ã â daihonå šéã«ããŒïŒ
- SSMLã§çްããèªã¿æ¹ã®èª¿æŽãã§ãã
- 12ã¶æåŸã¯$16/çŸäžæåããã
- Azureã¢ã«ãŠã³ãã®ç®¡çãã¡ãã£ãšé¢å
ð¥ ElevenLabs
âïž Cloud- â å質
- â â â â â
- ð° äŸ¡æ Œ
- $5ã$330/æ
- ð ç¡ææ
- 10,000ã¯ã¬ãžããïŒã¡ãã£ãšå°ãªãâŠïŒ
- ð§ ã¢ãã«
- Eleven v3 ðïŒ2026幎2æãªãªãŒã¹ïŒ/ multilingual_v2
- ð 察å¿èšèª
- 70+èšèª
- ãšã«ããèªç¶ïŒTTS Arena 2äœïŒELO 1177ïŒ
- Eleven v3ã§ææ 衚çŸãå€§å¹ åäž ð
- Audio Tags察å¿ïŒããæ¯ããããããç¬ãçïŒð
- 声ã¯ããŒã³ïŒText to Dialogue API
- 倿®µãäžçªé«ã ðžðžðž
- ç¡ææ ãå°ãªã
ð Google Cloud TTS
âïž Cloud- â å質
- â â â â
- ð° äŸ¡æ Œ
- $4ã$16/çŸäžæå
- ð ç¡ææ
- æ200äžæåïŒStandard+WaveNetåèšïŒð€¯
- ð£ïž æ¥æ¬èªå£°
- 7+ (WaveNet/Neural2)
- ç¡ææ ãã¶ã£ã¡ããã§äžçªãã«ãïŒ
- WaveNet/Neural2ã®å®å®ããå質
- Google Cloudå¥çŽãå¿ èŠ
- å質ã¯Azureã»ElevenLabsã«ã¯å±ããªã
ð€ OpenAI TTS
âïž Cloud- â å質
- â â â â ïŒgpt-4o-mini-ttsã§åäž ðïŒ
- ð° äŸ¡æ Œ
- tts-1: $15/çŸäžæå / tts-1-hd: $30 / gpt-4o-mini-tts: çŽ$0.015/å ð
- ð ç¡ææ
- æ°èŠ$5ã¯ã¬ãžãã
- ð£ïž é³å£°æ°
- 13声 ðïŒalloy, ash, ballad, coral, echo, fable, nova, onyx, sage, shimmer, verse, marin, cedarïŒ
- gpt-4o-mini-ttsã§èªç¶èšèªã®ããŒã³æç€ºãå¯èœ ð
- 13å£°ã«æ¡å ãOpenAI APIçµ±åãã©ã¯
- åŸæ¥ããå€§å¹ ã«å®ãæ°æéã¢ãã«
- æ¥æ¬èªã¯è±èªããå質èœã¡ã
- é³å£°ã¯è±èªæé©åã®ãŸãŸ
ðŠ Amazon Polly
âïž Cloud- â å質
- â â â â
- ð° äŸ¡æ Œ
- $4ã$16/çŸäžæå
- ð ç¡ææ
- 500äž+100äžæå/æïŒ12ã¶æïŒå€ªã£è ¹ïŒ
- ð£ïž æ¥æ¬èªNeural
- 4声 (Kazuha, Tomoko, Takumi, Hidetoshi)
- ç¡ææ ããã¡ããã¡ã倧ãã
- AWS䜿ã£ãŠããªãçµ±åãã©ã¯
- å®å®æ§ãã«ã€ã
- æ¥æ¬èªNeuralé³å£°ã¯4声ã ã
- AWSã¢ã«ãŠã³ã管çããã
ð Fish Audio
âïž Cloud- â å質
- â â â â â ïŒV1.5ã§å€§å¹ 匷å ðïŒ
- ð° äŸ¡æ Œ
- ç¡æ200å/æ ð / ææ$5.50/æã / API $15/çŸäžUTF-8ãã€ã
- ð ã©ã€ã»ã³ã¹
- Apache 2.0 (Fish Speech)
- ð å®çžŸ
- TTS Arena ELOã¹ã³ã¢1339ïŒãããã¯ã©ã¹ïŒð
- Fish Speech V1.5: DualARã¢ãŒããã¯ãã£ã§åè³ªå€§å¹ åäž ð
- CJKïŒæ¥äžéïŒèšèªãç¹ã«åŒ·ãïŒð
- ç¡æ200å/æ + ã»ã«ããã¹ãå¯
- Story StudioïŒé·å°ºé³å£°å¶äœã¯ãŒã¯ã¹ããŒã·ã§ã³ïŒð
- æ°ãããµãŒãã¹ã§å®çžŸã¯ãŸã çºå±éäž
ð¯ðµ CoeFont
âïž Cloud- â å質
- â â â â
- ð° äŸ¡æ Œ
- Free: 5,000æå / Business: $20/æã
- ð£ïž é³å£°æ°
- 30+ãã£ã©ã¯ã¿ãŒ
- ðïž éçº
- æ¥æ¬äŒæ¥ïŒæ¥æ¬èªãã€ãã£ãç¹åïŒ
- æ¥æ¬çãŸãã ããæ¥æ¬èªãèªç¶
- èªåã ãã®ã«ã¹ã¿ã é³å£°ãäœãã
- ç¡ææ ã5,000æåãšã¡ãã£ãšå°ãªã
â¡ Cartesia.ai
âïž Cloud- â å質
- â â â â
- ð° äŸ¡æ Œ
- $0.03/å
- ð§ ã¢ãã«
- Sonic 2
- âš ç¹åŸŽ
- ææ å¶åŸ¡ & ãªã¢ã«ã¿ã€ã ã¹ããªãŒãã³ã°
- ãã¡ããã¡ãäœé å»¶ïŒãªã¢ã«ã¿ã€ã 察話åãïŒ
- ææ ãã©ã¡ãŒã¿ã§å£°ã®èª¿åãå€ãããã
- æ¥æ¬èªã®å質ããŒã¿ãå°ãªã
- ãã¬ãŒã·ã§ã³çšéã¯èŠæ€èšŒ
ð Deepgram Aura
âïž Cloud- â å質
- â â â â ïŒAura-2ã§æ¥æ¬èªå¯Ÿå¿ ðïŒ
- ð° äŸ¡æ Œ
- $30/åæå (Growth: $27)
- ð ç¡ææ
- $200ã¯ã¬ãžããïŒãã£ãã䜿ããïŒïŒ
- Aura-2ã§æ¥æ¬èªã«æ£åŒå¯Ÿå¿ïŒæ¬èªã»ãããã¢ã¯ã»ã³ãå¯Ÿå¿ ð
- $200ã®ç¡æã¯ã¬ãžããããããã
- äœé å»¶ã§ã¬ã¹ãã³ã¹ãéã
- æ¥æ¬èªã¯æ°å¯Ÿå¿ãªã®ã§ãŸã çºå±éäž
ðŠ ãã®ä»ïŒé宿äŸã»è²·ãåãïŒ
| ãµãŒãã¹ | 圢æ | æ¥æ¬èªå質 | ð ã²ãšããš |
|---|---|---|---|
| â ãµãŒãã¹çµäº | â â â â | 2026幎2æ4æ¥ã«ãµãŒãã¹çµäº 𪊠| |
| A.I.VOICE | è²·ãåã/ãµãã¹ã¯ | â â â â â | ããå質⚠åçšã¯å¥éå¥çŽãå¿ èŠ |
| AquesTalk | 6,380å/幎 | â â â â | ããªãã¿ããã£ãããã®æ£èŠãšã³ãžã³ ð¢ |
ð å質ã©ã³ãã³ã°ïŒæ¥æ¬èªïŒ
âïž ã¯ã©ãŠã API ããã5
- ð Azure TTS â æ¥æ¬èªå£°ãæå€ãå質å®å®
- ð¥ ElevenLabs â èªç¶ãNo.1ãã§ããé«ã
- ð¥ Amazon Polly â Neuralé³å£°ãå®å®ããŠè¯ã
- 4ïžâ£ Google Cloud TTS â WaveNet/Neural2ãå å®
- 5ïžâ£ Fish Audio â TTS-Arenaã§ã®è©äŸ¡ãé«ã
ð¥ïž OSS ããã5
- ð Qwen3-TTS â VoiceDesignãé©åœçãææ°
- ð¥ Kokoro TTS â 軜éÃå質ã®ãã©ã³ã¹ãâ
- ð¥ Chatterbox â 声ã¯ããŒã³ã®å質ãé«ã
- 4ïžâ£ Fish Speech â ã»ã«ããã¹ãã§é«å質
- 5ïžâ£ Style-Bert-VITS2 â æ¥æ¬èªç¹åãææ å¶åŸ¡â
ð¡ ããããæ§æ
ãã§ãçµå±ã©ã䜿ãã°ããã®ïŒãã£ãŠæã£ãããªããž ð
ð¥ Qwen3-TTS
Apache 2.0ã§äœã§ãOKãããã¹ãã§å£°ããã¶ã€ã³ã§ãã¡ããæªæ¥æã
- ð§ ã»ããã¢ãã
- Python + GPUãã·ã³
- ð¯ åããŠã人
- ç¬èªã®å£°ã§e-learningéç£ããã人
- â ãããšãã
- å®å šç¡æãã©ã€ã»ã³ã¹æåŒ·
- â ïž æ³šæ
- GPUç°å¢ãå¿ èŠãã»ããã¢ããã«æéããã
ð VOICEVOX Nemo
ã¢ããªèµ·åããã ãïŒäžçªã«ã³ã¿ã³ã«TTSãå§ããããã
- ð§ ã»ããã¢ãã
- ã¢ããªèµ·åã®ã¿ïŒ1åïŒ
- ð¯ åããŠã人
- ãŸãã¯æè»œã«ãã¬ãŒã·ã§ã³äœããã人
- â ãããšãã
- ã»ããã¢ããæéãGPUãããªã
- â ïž æ³šæ
- 9声ãããªããã¢ããªåžžé§ãå¿ èŠ
ð Azure TTS
å質1äœãªã®ã«ç¡ææ ã§daihonå šéãŸããªãããçŸåšã®ã¡ã€ã³ãšã³ãžã³ã
- ð§ ã»ããã¢ãã
- Azureã¢ã«ãŠã³ã + APIããŒ
- ð¯ åããŠã人
- ããå質ã®e-learningãäœããã人
- â ãããšãã
- å質ãããã7å£°éžæå¯ãã¿ãã§éçšå¯
- â ïž æ³šæ
- Azureã¢ã«ãŠã³ã管çãå°ãé¢å
ð¥ ElevenLabs
æ¥çæé«ã®èªç¶ãã声ã¯ããŒã³ãææ å¶åŸ¡ããæã®ãã®ã
- ð§ ã»ããã¢ãã
- APIããŒã®ã¿ïŒè¶ ã«ã³ã¿ã³ïŒ
- ð¯ åããŠã人
- æé«å質ãããªããšæºè¶³ã§ããªã人
- â ãããšãã
- å§åçãªèªç¶ãã倿©èœ
- â ïž æ³šæ
- $22ã$330/æããã ðž
ð 2026幎 泚ç®ã®æ°çãšã³ãžã³
ãªãµãŒãã§èŠã€ãã£ããæ¥æ¬èªå¯Ÿå¿ã®æ³šç®ãšã³ãžã³ãã¡ïŒæ¯èŒè¡šã«ãŸã èŒã£ãŠãªããã€ãããã¯ã¢ãã ð
ð Inworld TTS
âïž Cloud- â å®çžŸ
- TTS Arena 1äœïŒELO 1217ïŒð¥
- ð æ¥æ¬èª
- 察å¿ïŒå®éšçïŒ
- âš ç¹åŸŽ
- 衚çŸåãæ¢åãšã³ãžã³ãã30%é«ã
ã²ãŒã åãAIäŒç€Ÿãäœã£ãTTSããã³ãããŒã¯ã§ElevenLabsãæããŠ1äœã«ïŒæ¥æ¬èªã¯6声ã§å®éšç察å¿ã
ð Inworld TTSð¯ F5-TTS v1
ð¥ïž OSS- ð ã©ã€ã»ã³ã¹
- CC-BY-NC-4.0
- ð æ¥æ¬èª
- v1ã§æ£åŒå¯Ÿå¿ ð
- âš ç¹åŸŽ
- ãŒãã·ã§ãããã€ã¹ã¯ããŒã³ã10äžæéåŠç¿
è¶ é«å質ãªå£°ã¯ããŒã³ç¹åã¢ãã«ãæ¥æ¬èªv1ã§æ£åŒå¯Ÿå¿ã«ãªã£ãããã ãéåçšã©ã€ã»ã³ã¹ã«æ³šæã
ð GitHubð£ïž CosyVoice 2
ð¥ïž OSS- ð ã©ã€ã»ã³ã¹
- Apache 2.0
- ð§ ãã©ã¡ãŒã¿
- 0.5B
- ð æ¥æ¬èª
- 察å¿ïŒæ¥äžè±é+åºæ±èªïŒ
- â¡ ã¬ã€ãã³ã·
- 150msã¹ããªãŒãã³ã°
Alibaba補ã®ããäžã€ã®TTSãçºé³ãšã©ãŒã30-50%åæžãäœã¬ã€ãã³ã·ã§ãªã¢ã«ã¿ã€ã 察話ã«ã䜿ããã
ð GitHub𪶠MeloTTS
ð¥ïž OSS- ð ã©ã€ã»ã³ã¹
- MIT
- ð æ¥æ¬èª
- 察å¿ïŒæ¥äžè±éä»è¥¿ïŒ
- ð® GPU
- äžèŠïŒCPUã§åã
- ðïž éçº
- MyShell
CPUæšè«OKã®è»œéTTSãMITã©ã€ã»ã³ã¹ã§åçšãèªç±ãæ¥æ¬èªå¯Ÿå¿ã§ãµã¯ããšè©Šããã
ð GitHubð Higgs Audio V2
ð¥ïž OSS- ð ã©ã€ã»ã³ã¹
- Apache 2.0
- ð§ ãã©ã¡ãŒã¿
- 3BïŒLlama 3.2ããŒã¹ïŒ
- ð æ¥æ¬èª
- å€èšèªå¯Ÿå¿
- âš ç¹åŸŽ
- ããã³ã°ã»BGMåæçæã1000äžæéåŠç¿
Llama 3.2ãããŒã¹ã«ããå€§èŠæš¡é³å£°ã¢ãã«ãé³å£°ã ããããªãããã³ã°ãBGMãçæã§ããã®ããŠããŒã¯ã
ð å ¬åŒããã°ð¬ Dia (Nari Labs)
ð¥ïž OSS- ð ã©ã€ã»ã³ã¹
- Apache 2.0
- ð§ ãã©ã¡ãŒã¿
- 1.6B
- ð æ¥æ¬èª
- ã³ãã¥ããã£fine-tuneãã
- âš ç¹åŸŽ
- 察話ç¹åãç¬ãã»å³çã®éèšèªé³ãçæ
察話ã·ãŒã³ã«ç¹åããTTSãç¬ããããæ¯ãªã©ã®éèšèªé³ãèªç¶ã«çæãæ¥æ¬èªã¯å ¬åŒæªå¯Ÿå¿ã ãã©ã³ãã¥ããã£ãé 匵ã£ãŠãã
ð GitHub𪊠ã«ããã€ã¹ â 2026幎2æ4æ¥ã«ãµãŒãã¹çµäº
𪊠PlayHT â 2025幎12æ31æ¥ã«ãµãŒãã¹çµäº
𪊠Coqui AI / XTTS â 2025幎12æã«äŒç€ŸééïŒOSSãã©ãŒã¯ã¯åç¶ïŒ
ð ãªã³ã¯é
ð¥ïž OSS / ããŒã«ã« TTS
âïž ã¯ã©ãŠã API
ð æ°çãšã³ãžã³
ð ïž ãµã³ãã«é³å£°ãçæããã«ã¯
python scripts/generate_tts_comparison_samples.py
ç¹å®ã®ãšã³ãžã³ã ãçæããããšãïŒ
python scripts/generate_tts_comparison_samples.py --engines azure elevenlabs openai