🎵

music Category

AI text to speech

Found 7 AI tools

tools

Primary Category: music

Subcategory: AI text to speech

Found 7 matching tools

Related AI Tools

Click any tool to view details

EzAudio

EzAudio is an advanced text-to-audio (T2A) generation model capable of creating high-quality audio from text prompts. It sets a new standard for open source T2A models, providing fast, efficient and realistic sound effect generation.

AI模型音频处理文本到音频 +1

音乐 Visit

Bark

Bark is a Transformer-based text-to-audio model developed by Suno that is capable of generating realistic multilingual speech as well as other types of audio such as music, background noise, and simple sound effects. It also supports the generation of non-verbal communication such as laughter, sighs and cries. Bark supports the research community, providing pre-trained model checkpoints suitable for inference and available for commercial use.

多语言研究音频生成 +1

音乐 Visit

AudioLCM

AudioLCM is a text-to-audio generation model based on PyTorch, which uses a latent consistency model to generate high-quality and efficient audio. This model was developed by Huadai Liu and others, providing an open source implementation and pre-trained model. It can convert text descriptions into near-real audio and has important application value, especially in fields such as speech synthesis and audio production.

语音合成音频生成 PyTorch +1

音乐 Visit

Whisper Speech

Whisper Speech is a fully open source text-to-speech model trained by Collabora and Lion on Juwels supercomputers. It supports multiple languages and multiple forms of input, including Node.js, Python, Elixir, HTTP, Cog, and Docker. The advantages of this model are efficient speech synthesis and flexible deployment. In terms of pricing, Whisper Speech is completely free. It is positioned to provide developers and researchers with a powerful, customizable text-to-speech solution.

开源语音合成文本转语音

音乐 Visit

GPT-SoVITS

GPT-SoVITS-WebUI is a powerful zero-sample speech conversion and text-to-speech WebUI. It has features such as zero-sample TTS, few-sample TTS, cross-language support, and WebUI tools. The product supports English, Japanese and Chinese, and provides integrated tools, including voice accompaniment separation, automatic training set segmentation, Chinese ASR and text annotation, to help beginners create training data sets and GPT/SoVITS models. Users can experience instant text-to-speech conversion by inputting a 5-second sound sample, and can fine-tune the model to improve speech similarity and fidelity by using only 1 minute of training data. Product support environment preparation, Python and PyTorch versions, quick installation, manual installation, pre-trained models, dataset format, to-do items and acknowledgments.

文本到语音语音转换 WebUI

音乐 Visit

RealtimeTTS

RealtimeTTS is an easy-to-use, low-latency text-to-speech library for real-time applications. It can convert text streams into immediate audio output. Key features include real-time streaming synthesis and playback, advanced sentence boundary detection, modular engine design, and more. The library supports multiple text-to-speech engines and is suitable for voice assistants and applications requiring instant audio feedback. Please refer to the official website for detailed pricing and positioning information.

文本转语音语音助手实时音频

音乐 Visit

StyleTTS 2

StyleTTS 2 is a text-to-speech (TTS) model that uses large-scale speech language models (SLMs) for style diffusion and adversarial training to achieve human-level TTS synthesis. It models style as a latent random variable through a diffusion model to generate a style that best fits the text without reference to speech. Furthermore, we use large pre-trained SLMs (such as WavLM) as the discriminator and combine them with our innovative differentiable duration modeling for end-to-end training, thereby improving the naturalness of speech. StyleTTS 2 outperformed human recordings on the single-speaker LJSpeech dataset and matched them on the multi-speaker VCTK dataset, gaining approval from native English-speaking reviewers. Furthermore, our model outperforms previous publicly available zero-shot extension models when trained on the LibriTTS dataset. By demonstrating the potential of style diffusion and adversarial training with large SLMs, this work enables a human-level TTS synthesis on single and multi-speaker datasets.

大型语言模型语音合成文本转语音 +2

音乐 Visit

Related Subcategories

Explore other subcategories under music Other Categories

music generation

260 tools

audio generation

85 tools

AI design tools

80 tools

AI model

44 tools

AI music generation

44 tools

Development and Tools

32 tools

Text to sound

28 tools

Voice cloning

27 tools

🎵

Explore More music Tools

AI text to speech Hot music is a popular subcategory under 7 quality AI tools

Browse music Category Categories