Found 4 AI tools
Click any tool to view details
WhisperKit is a tool for automatic speech recognition model compression and optimization. It supports compression and optimization of models and provides detailed performance evaluation data. WhisperKit also provides quality assurance certification for different datasets and model formats, and supports local reproduction of test results.
Whisper is a general speech recognition model. It is trained on a large number of diverse audios and is a multi-task model that can perform multi-lingual speech recognition, speech translation and language recognition.
NVAS3d is a project for estimating sound at any location in a scene containing multiple unknown sound sources, enabling a new perspective on acoustic synthesis by using audio recordings from multiple microphones and the 3D geometry and materials of the scene.
SALMONN is a large language model (LLM) developed by the Department of Electronic Engineering of Tsinghua University and ByteDance that supports speech, audio events and music input. Unlike models that only support speech or audio event input, SALMONN can perceive and understand a variety of audio inputs, enabling emerging capabilities such as multilingual speech recognition and translation, and audio-speech co-reasoning. This can be seen as giving LLM "hearing" and cognitive hearing capabilities, making SALMONN a step towards an artificial general intelligence with hearing capabilities.
Explore other subcategories under music Other Categories
260 tools
85 tools
80 tools
44 tools
32 tools
28 tools
27 tools
AI speech recognition Hot music is a popular subcategory under 4 quality AI tools