💻

programming Category

speech recognition

Found 8 AI tools

tools

Primary Category: programming

Subcategory: speech recognition

Found 8 matching tools

Related AI Tools

Click any tool to view details

Hibiki

Hibiki is an advanced model focused on streaming speech translation. It generates correct translation block by block by accumulating enough contextual information in real time, supports speech and text translation, and can perform sound conversion. The model is based on a multi-stream architecture and is able to process source and target speech simultaneously, generating a continuous audio stream and timestamped text translation. Its key benefits include high-fidelity speech conversion, low-latency real-time translation, and compatibility with complex reasoning strategies. Hibiki currently supports French to English translation, which is suitable for scenarios that require efficient real-time translation, such as international conferences, multi-language live broadcasts, etc. The model is open source and free, suitable for developers and researchers.

开源模型实时翻译语音翻译 +2

编程 Visit

PengChengStarling

PengChengStarling is an open source toolkit focusing on multilingual automatic speech recognition (ASR), developed based on the icefall project. It supports the complete ASR process, including data processing, model training, inference, fine-tuning, and deployment. This toolkit significantly improves the performance of multilingual ASR systems by optimizing parameter configurations and integrating language IDs into the RNN-Transducer architecture. Its main advantages include efficient multi-language support, flexible configuration design, and powerful inference performance. PengChengStarling's model performs well in multiple languages, has a small model size and extremely fast inference speed, making it suitable for scenarios that require efficient speech recognition.

开源多语言模型 +3

编程 Visit

BetterWhisperX

BetterWhisperX is an improved automatic speech recognition model based on WhisperX. It can provide fast speech-to-text services, and has word-level timestamps and speaker recognition functions. This tool is very important for researchers and developers who need to process large amounts of audio data, because it can greatly improve the efficiency and accuracy of speech data processing. The product background is based on OpenAI's Whisper model, but has been further optimized and improved. Currently, the project is free and open source, and is positioned to provide the developer community with more efficient and accurate speech recognition tools.

开源多语言支持自动语音识别 +2

编程 Visit

LiveKit Plugins Turn Detector

LiveKit Plugins Turn Detector is a plug-in for LiveKit Agents that introduces end-to-end end-of-speech detection by using a custom open weight model to determine when a user has finished speaking. Compared to traditional acoustic activity detection (VAD) models, the plug-in provides a more accurate and robust end-of-speech detection method using a language model trained specifically for this task. The current version only supports English and is not recommended for other languages.

语言模型实时通信 LiveKit +2

编程 Visit

Moonshine Web

Moonshine Web is a simple application built on React and Vite that runs Moonshine Base, a powerful speech recognition model optimized for fast and accurate automatic speech recognition (ASR) for resource-constrained devices. The app runs natively on the browser side, using Transformers.js and WebGPU acceleration (or WASM as an alternative). Its importance lies in its ability to provide users with a solution for local speech recognition without a server, which is particularly important for application scenarios that require fast processing of speech data.

开源语音识别自动语音识别 +3

编程 Visit

hertz-dev

hertz-dev is Standard Intelligence's open source full-duplex, audio-only converter base model with 8.5 billion parameters. The model represents a scalable cross-modal learning technique capable of converting mono 16kHz speech into an 8Hz latent representation with a bitrate of 1kbps, outperforming other audio encoders. The main advantages of hertz-dev include low latency, high efficiency and ease of fine-tuning and building by researchers. Product background information shows that Standard Intelligence is committed to building general intelligence that is beneficial to all mankind, and hertz-dev is the first step in this journey.

人工智能语音识别音频处理 +2

编程 Visit

Llama3-s v0.2

Llama3-s v0.2 is a multi-modal checkpoint developed by Homebrew Computer Company focused on improving speech understanding. The model is improved through early integration of semantic tags and community feedback to simplify the model structure, improve compression efficiency, and achieve consistent speech feature extraction. Llama3-s v0.2 performs stably on multiple speech understanding benchmarks and provides live demos, allowing users to experience its capabilities for themselves. Although the model is still in the early stages of development and has some limitations, such as being sensitive to audio compression and unable to handle audio longer than 10 seconds, the team plans to address these issues in future updates.

自然语言处理机器学习语音识别 +1

编程 Visit

FreeSubtitles.Ai

FreeSubtitles.Ai is a free online speech recognition and machine translation tool. Users can upload audio or video files and it will automatically transcribe the text and provide multilingual translation. This product provides two versions: free version and paid version. The free version has certain usage restrictions, and the paid version can enjoy larger file size, longer duration, and higher-precision transcription services. The main functions include speech-to-text, video subtitle extraction, multi-language translation, etc. It is suitable for scenarios such as learning foreign languages, processing meeting records, and generating subtitles. It has the advantages of free, convenient and high accuracy.

效率助手 Ai办公助手机器翻译

编程 Visit

Related Subcategories

Explore other subcategories under programming Other Categories

Development and Tools

768 tools

AI model

465 tools

code assistant

368 tools

AI development assistant

294 tools

Model training and deployment

140 tools

AI code assistant

85 tools

Development platform

66 tools

research tools

61 tools

💻

Explore More programming Tools

speech recognition Hot programming is a popular subcategory under 8 quality AI tools

Browse programming Category Categories