🔧

other Category

AI speech recognition

Found 7 AI tools

tools

Primary Category: other

Subcategory: AI speech recognition

Found 7 matching tools

Related AI Tools

Click any tool to view details

SafeEar

SafeEar is an innovative audio depth detection framework that is capable of detecting depth audio without relying on speech content. This framework protects the privacy of speech content by designing a neural audio codec that separates semantic and acoustic information from audio samples and only uses acoustic information (such as prosody and timbre) for deep detection. SafeEar improves the detector's capabilities by enhancing the codec in the real world, allowing it to recognize a wide range of deep audio. Extensive experiments on the framework on four benchmark datasets show that SafeEar is highly effective in detecting various deep techniques, with equal error rates (EER) as low as 2.02%. At the same time, it also protects speech content in five languages from being deciphered by machine and human auditory analysis, as demonstrated by our user research and word error rate (WER) above 93.93%. In addition, SafeEar also builds a benchmark for anti-depth and anti-content recovery evaluation, providing a basis for future research in the field of audio privacy protection and depth detection.

机器学习隐私保护深度 +1

其他 Visit

Seed-ASR

Seed-ASR is a speech recognition model based on Large Language Model (LLM) developed by ByteDance. It leverages the power of LLM by feeding continuous speech representations and contextual information into LLM, guided by large-scale training and context-aware capabilities, to significantly improve performance on a comprehensive evaluation set that includes multiple domains, accents/dialects, and languages. Compared with recently released large-scale ASR models, Seed-ASR achieves a 10%-40% word error rate reduction on Chinese and English public test sets, further demonstrating its powerful performance.

多语言支持大型语言模型语音识别 +2

其他 Visit

HeAR

Health Acoustic Representations (HeAR) is a basic bioacoustic model developed by Google's research team that aims to identify early signs of disease by analyzing the sounds made by the human body, such as coughs. The model was trained on 300 million pieces of audio data, and about 100 million pieces of data were used specifically for cough sounds. HeAR is able to identify health-related sound patterns, providing a strong foundation for medical audio analysis. The HeAR model outperforms other models in a variety of tasks and has better generalization capabilities across different microphones. In addition, models trained using HeAR can achieve high performance with less training data, which is crucial in the data-scarce medical research field. HeAR is now available to researchers to accelerate the development of custom bioacoustic models, reducing the need for data, setup, and computation.

AI 健康监测疾病检测 +1

其他 Visit

Emilia

Emilia is an open source multilingual wild speech dataset designed for large-scale speech generation research. It contains more than 101,000 hours of high-quality speech data and corresponding text transcriptions in six languages, covering a variety of speaking styles and content types such as talk shows, interviews, debates, sports commentary and audiobooks.

开源高质量多语种 +1

其他 Visit

FunAudioLLM

FunAudioLLM is a framework designed to enhance natural speech interaction between humans and Large Language Models (LLMs). It contains two innovative models: SenseVoice is responsible for high-precision multilingual speech recognition, emotion recognition and audio event detection; CosyVoice is responsible for natural speech generation and supports multilingual, timbre and emotion control. SenseVoice supports more than 50 languages and has extremely low latency; CosyVoice is good at multilingual voice generation, zero-sample context generation, cross-language voice cloning and command following capabilities. The relevant models have been open sourced on Modelscope and Huggingface, and the corresponding training, inference and fine-tuning codes have been released on GitHub.

开源语音识别语音合成 +2

其他 Visit

SenseVoice

SenseVoice is a basic speech model that includes multiple speech understanding capabilities such as automatic speech recognition (ASR), speech language recognition (LID), speech emotion recognition (SER), and audio event detection (AED). It focuses on high-precision multilingual speech recognition, speech emotion recognition and audio event detection, supports more than 50 languages, and its recognition performance exceeds the Whisper model. The model uses a non-autoregressive end-to-end framework with extremely low inference latency, making it ideal for real-time speech processing.

语音识别情感分析多语种 +2

其他 Visit

Azure Cognitive Services Speech

Azure Cognitive Services Speech is a speech recognition and synthesis service launched by Microsoft that supports speech-to-text and text-to-speech functions in more than 100 languages and dialects. It improves the accuracy of your transcriptions by creating custom speech models that handle specific terminology, background noise, and accents. In addition, the service also supports real-time speech-to-text, speech translation, text-to-speech and other functions, and is suitable for a variety of business scenarios, such as subtitle generation, post-call transcription analysis, video translation, etc.

多语言支持语音识别语音合成 +2

其他 Visit

Related Subcategories

Explore other subcategories under other Other Categories

AI model

195 tools

AI information platform

178 tools

Development and Tools

113 tools

personal assistant

102 tools

Other categories

62 tools

chatbot

61 tools

research tools

49 tools

study education

45 tools

🔧

Explore More other Tools

AI speech recognition Hot other is a popular subcategory under 7 quality AI tools

Browse other Category Categories