Found 50 AI tools
Click any tool to view details
11.ai is a personal AI voice assistant built with ElevenLabs Conversational AI. It can plan your schedule, research customer information, manage tickets and communicate with your Slack team, all through voice.
Speechly is a tool designed to turn your speech into structured emails, making it easy to get clear and easy-to-read messages without manual typing. It supports up to 100 languages.
Pinch is an innovative real-time AI voice translation tool designed to eliminate language barriers in video calls. It leverages advanced AI technology to provide instant, accurate voice translation in more than 30 languages. This product is suitable for multinational enterprises, educational institutions, families and individuals, helping users achieve seamless communication. Key benefits of Pinch include high translation accuracy, support for multiple languages, and the ability to use without additional equipment. It promotes business cooperation, educational exchanges and family connections on a global scale by reducing language barriers, and has important business and educational value.
DuRT is a speech recognition and translation tool focused on macOS systems. It realizes real-time recognition and translation of speech through local AI models and system services, supports multiple speech recognition methods, and improves recognition accuracy and language support range. The product displays results in the form of a floating box, allowing users to quickly obtain information during use. Its main advantages include high accuracy, privacy protection (no user information is collected), and convenient operation experience. DuRT is positioned as an efficient productivity tool designed to help users communicate and work more efficiently in multi-language environments. The product is currently available for download on the Mac App Store, and the specific price is not clearly mentioned on the page.
Sesame is an interdisciplinary product and research team focused on voice technology, aiming to make user interaction with computers more natural and efficient through natural voice interaction. Its main products include personal voice companions and lightweight wearable eyewear devices, which are designed to personify computers and help users better organize information and improve efficiency. The main advantages of the product are the naturalness of voice interaction and the portability of the device, making it suitable for daily use. Currently, Sesame is actively recruiting and is committed to driving innovation in voice technology.
Scribe is a high-precision speech-to-text model developed by ElevenLabs designed to handle the unpredictability of real-world audio. It supports 99 languages and provides features such as word-level timestamping, speaker separation and audio event tagging. Scribe performs well on the FLEURS and Common Voice benchmarks, outperforming leading models such as Gemini 2.0 Flash, Whisper Large V3, and Deepgram Nova-3. It significantly reduces error rates for traditionally underserved languages such as Serbian, Cantonese, and Malayalam, which typically achieve error rates in excess of 40% in competing models. Scribe provides API interfaces for developers to integrate, and will launch a low-latency version to support real-time applications.
Chirp AI is a smart voice assistant app designed specifically for Apple Watch. It uses powerful voice recognition and artificial intelligence technology to allow users to complete various operations through voice commands only, such as sending information, obtaining information, searching the network, etc., which greatly improves users' operating efficiency in mobile scenarios. The main advantage of this product is that it can achieve efficient information interaction and task processing without frequent use of mobile phones. It is suitable for users who want to reduce their reliance on mobile phones in daily life, while still being able to quickly obtain information and complete tasks. The application is currently available for free download and is positioned as a smart tool to improve user productivity and convenience.
FireRedASR-AED-L is an open source industrial-grade automatic speech recognition model designed to meet the needs of high efficiency and high performance speech recognition. The model uses an attention-based encoder-decoder architecture and supports multiple languages such as Mandarin, Chinese dialects and English. It reached new top levels on public Mandarin speech recognition benchmarks and performed well in singing lyrics recognition. The main advantages of this model include high performance, low latency, and broad applicability to a variety of voice interaction scenarios. Its open source feature allows developers to freely use and modify the code, further promoting the development of speech recognition technology.
FireRedASR is an open source industrial-grade Mandarin automatic speech recognition model that uses Encoder-Decoder and LLM integrated architecture. It comes in two variants: FireRedASR-LLM and FireRedASR-AED, designed for high performance and energy efficiency requirements respectively. The model performed well on the Mandarin benchmark, as well as on dialect and English speech recognition. It is suitable for industrial-level applications that require efficient speech-to-text conversion, such as smart assistants, video subtitle generation, etc. The model is open source, making it easy for developers to integrate and optimize.
Whisper Turbo is a speech recognition tool optimized based on the Whisper Large-v3 model and designed for fast speech transcription. It leverages advanced AI technology to efficiently convert speech to text from different audio sources, supporting multiple languages and accents. This tool is provided to users for free and is designed to help people save time and energy and improve work efficiency. It is mainly aimed at users who need to quickly and accurately transcribe voice content, such as bloggers, content creators, enterprises, etc., providing them with convenient speech-to-text solutions.
RealtimeSTT is an open source speech recognition model that converts speech to text in real time. It uses advanced voice activity detection technology to automatically detect the start and end of speech without manual operation. In addition, it also supports wake word activation function, users can start voice recognition by speaking a specific wake word. This model has the characteristics of low latency and high efficiency, and is suitable for application scenarios that require real-time voice transcription, such as voice assistants, meeting records, etc. It is developed based on Python and is easy to integrate and use. It is open source on GitHub and has an active community with constant new updates and improvements.
Home Assistant Voice Preview Edition is an open source, privacy-focused voice assistant hardware product designed to provide an open, localized, and private voice control solution. It allows users to control smart devices at home through voice, while ensuring that the user's voice data does not leave the local network, protecting user privacy. The product background is in response to the growing demand for privacy protection, especially in the smart home field. In terms of price, the product is priced at $59, recommended retail price, and the specific price may vary by retailer.
OmniAudio-2.6B is a 2.6B parameter multi-modal model capable of seamlessly processing text and audio input. This model combines Gemma-2B, Whisper turbo and a custom projection module. Unlike the traditional method of concatenating ASR and LLM models, it unifies these two capabilities in an efficient architecture and implements it with minimal latency and resource overhead. This enables secure and fast processing of audio text directly on edge devices such as smartphones, laptops and robots.
Shortcut by Poised is a voice-based AI assistant designed to improve users' work efficiency through natural conversations. It allows users to quickly get answers, organize thoughts, and draft messages, emails, and documents through voice input while maintaining a consistent workflow. The product uses AI technology to convert natural language into refined text, and provides a variety of language style options to meet the needs of different occasions. The background information of Shortcut by Poised shows that it was published on Product Hunt and will soon launch Windows and mobile app versions. The Mac version is currently available for download.
ClearerVoice-Studio is an open source AI-driven speech processing toolkit designed for researchers, developers and end users. It provides speech enhancement, speech separation, target speaker extraction and more, and provides the latest pre-trained models and training and inference scripts, all accessible through this repository. The toolkit is favored for its pre-trained models, ease of use, comprehensive functionality, and community-driven features.
Najva is an AI-powered voice assistant designed specifically for Mac that combines advanced local speech recognition technology with powerful AI models to convert your speech into smart text. This app is especially suitable for users who can think faster than they can type, such as writers, developers, medical professionals, etc. With its lightweight, native Swift application, zero tracking, and completely free, Najva provides users with a workflow solution that focuses on privacy and efficiency.
Trancribro is a private, device-side speech recognition keyboard and text service application running on the Android platform. It uses whisper.cpp to run the OpenAI Whisper series model and combines it with Silero VAD for voice activity detection. The application provides a voice input keyboard that allows users to enter text by voice, and can be used explicitly by other applications, or set as a user-selected speech-to-text application, and some applications may use it for speech-to-text. The background of Transcribebro is to provide users with a more secure and private voice-to-text solution, avoiding the privacy leakage issues that may arise from cloud processing. The application is open source and users are free to view, modify and distribute the code.
Universal-2 is the latest speech recognition model launched by AssemblyAI. It surpasses the previous generation Universal-1 in accuracy and precision. It can better capture the complexity of human language and provide users with audio data without the need for secondary inspection. The importance of this technology lies in its ability to provide sharper insights, faster workflows, and a best-in-class product experience. Universal-2 has significantly improved in proper noun recognition, text formatting and alphanumeric recognition, reducing word error rates in practical applications.
Moonshine is a family of speech-to-text models optimized for resource-constrained devices, ideal for real-time, on-device applications such as live transcription and voice command recognition. On the test dataset used in the OpenASR leaderboard maintained by HuggingFace, Moonshine outperforms the OpenAI Whisper model of the same size in word error rate (WER). Additionally, Moonshine's computational requirements scale with the length of the input audio, meaning shorter input audio is processed faster, unlike the Whisper model, which processes everything as 30-second chunks. Moonshine processes 10-second audio clips 5 times faster than Whisper while maintaining the same or better WER.
GLM-4-Voice is an end-to-end speech model developed by the Tsinghua University team. It can directly understand and generate Chinese and English speech for real-time voice dialogue. It uses advanced speech recognition and synthesis technology to achieve seamless conversion from speech to text to speech, with low latency and high IQ conversational capabilities. This model is optimized for IQ and synthetic expressiveness in speech mode, and is suitable for scenarios requiring real-time speech interaction.
Whispo is a voice dictation tool that uses artificial intelligence technology to convert the user's voice into text in real time. This tool uses OpenAI Whisper technology for speech recognition, supports speech transcription using a custom API, and allows for post-transcription processing through large language models. Whispo supports multiple operating systems, including macOS (Apple Silicon) and Windows x64, and all data is stored locally, ensuring user privacy. Its design background is to improve the work efficiency of users who require a lot of text input, whether it is programming, writing or daily record keeping. Whispo is currently free to trial, but the specific pricing strategy is not yet clear on the page.
Flow by Wispr is an application dedicated to improving the efficiency of voice input. It uses advanced speech recognition technology to enable users to enter text three times faster than typing on a traditional keyboard. Flow by Wispr is especially suitable for users who need to quickly record and edit text, such as writers, journalists, students and professionals. The product currently only supports Mac computers with Apple silicon chips, and will be expanded to more platforms in the future.
Silvia is a voice input system that adapts to the way users speak, allowing users to switch freely between different languages, even in the middle of a sentence. It supports English and Spanish, and will soon support French, Romanian, German, and Dutch. As an extension in the Apple App Store, Silvia can be used on all chat platforms, such as iMessage, WhatsApp, Signal, Telegram, Messenger, etc., allowing users to use voice input wherever they need to type.
Text-to-speech technology is a technology that converts text information into speech. It is widely used in assisted reading, voice assistants, audiobook production and other fields. It improves the convenience of information acquisition by simulating human speech, which is especially helpful for visually impaired people or those who cannot use their eyes to read.
TTSMaker is an online text-to-speech platform that easily converts text into audio through AI artificial intelligence algorithms. It supports more than 50 languages and more than 300 voice package styles, and is suitable for various scenarios such as video dubbing, audio books, education training, and product marketing. Users can use TTSMaker to synthesize speech for free, and own 100% copyright of the synthesized audio files, which can be used for any legal commercial purposes.
This product is an advanced online text-to-speech tool that uses artificial intelligence technology to convert text into natural and realistic speech. It supports multiple languages and voice styles and is suitable for advertising, video narration, audiobook production and other scenarios, enhancing the accessibility and attractiveness of content. Product background information shows that it provides great convenience for digital marketers, content creators, audiobook authors, and educators.
BeMyEars is a real-time subtitle generation tool that uses local devices to complete speech recognition, providing the ultimate experience for the hearing-impaired and users who need subtitles. Its main advantages include multi-language support, multi-source input, privacy protection, etc.
boff.ai is a website based on artificial intelligence speech recognition and natural language processing technology. Its main advantage is that it can quickly and accurately recognize the user's voice input and be able to understand its intention to provide corresponding answers and suggestions. boff.ai is positioned to provide intelligent voice assistant services to help users process information and complete tasks more efficiently.
Talkatoo is a dictation software that can transcribe content 5 times faster than the average typing speed, helping users save time. It offers three levels of control, allowing users to choose a more automated approach. Talkatoo has functions such as verifying records, automatically formatting records, and desktop dictation, and is suitable for professionals in industries such as veterinary medicine. Pricing is based on specific needs. Talkatoo can also automatically convert into SOAP (chief complaint, physical examination, diagnosis, prescription) template to improve the efficiency of medical records.
01 Light is a voice control interface that allows you to use your voice to control your home computer to perform various operations. Its advantages are easy operation and accurate voice recognition. Pricing has not been announced yet, and it is positioned as a voice control auxiliary tool for home computers.
WhisperKit, launched by Argmax, is an inference toolkit based on the Whisper project that allows speech recognition and transcription in iOS and macOS applications. The goal of the project is to gather developer feedback and release a stable release candidate within a few weeks to accelerate the production of on-device inference.
OutSkill is an AI desktop voice assistant designed for everyday PC users. It can easily perform multiple tasks, integrate various applications and games in a personalized manner, intelligently identify user needs and operate accordingly. It can completely change the way we interact with computers, getting rid of the trouble of frequently switching applications and tasks. With only voice commands, AI can complete the work, improve productivity and reduce work burden. Join the waiting list now and experience unlimited convenience!
Speechforms is an application for filling out forms through voice input. It allows users to get rid of the keyboard and complete form filling in a more intuitive way, realizing the future of form filling. Speechforms provides a free trial, please refer to the official website for specific pricing.
Free Text to Speech Online Converter is a multi-language text-to-speech online platform. It supports more than 20 languages, has natural pronunciation, is free to use without registration, and has fast conversion speed.
Koe is an AI speech transcription tool that supports multiple audio and video file formats. It uses the OpenAI Whisper model for local transcription, provides API services, and supports subtitle generation during video playback, AI translation, voice dictation and other functions. The early bird price is $12, with a permanent license for two devices.
Audioread is a tool that uses artificial intelligence to convert text into speech. It features an ultra-realistic text-to-speech engine that reads any text aloud in a natural and professional narration style designed for long listening sessions, so well-trained that it is virtually indistinguishable from a real audiobook narrator. Users can use web apps, browser plug-ins, iOS shortcuts, or Android apps to convert text to audio. They can also forward emails, drag and drop PDFs, copy/paste text, or highlight text. Audioread also supports the creation and subscription of private podcasts, and users can subscribe to private podcasts in any podcast application, such as Apple Podcasts, Google Podcasts, Spotify, etc. Additionally, users can listen in their browser without installing any apps. Audioread also offers paid services, including a monthly subscription for $9.99 per month, with up to 100,000 words per conversion, up to 500,000 words per day, and support for 77 languages.
Azure AI Speech Studio is a speech service platform that provides speech-to-text, text-to-speech and other functions. It helps applications achieve voice listening, understanding and communication capabilities. Speech Studio provides a variety of speech functions, including speech to text, real-time speech to text, batch speech to text, custom speech recognition, speech translation, text to speech, etc. Users can choose the appropriate functions according to their needs and get started quickly through sample codes. Speech Studio also provides learning resources, including documentation, quick start guides, Microsoft Q&A, and Microsoft Learn.
SpeechPulse is a speech recognition and translation software. It uses OpenAI's Whisper speech-to-text model to achieve real-time speech recognition and supports multiple languages. Users can use the microphone to input text, or transcribe recorded video files for speech recognition and translation. SpeechPulse can be used in various scenarios, such as office document editing, web browsing, file transcription, video subtitle generation, etc. It has extremely high accuracy, low latency, and works completely offline. SpeechPulse offers free and paid versions, with the paid version supporting more features and better accuracy.
Pronounce is a free English speech checker that helps you improve your pronunciation. Improve your English pronunciation accuracy and fluency by recording your voice.
Language Assistant is an intelligent language processing application that provides multiple language translation, speech recognition, speech synthesis and other functions. Advantages include high accuracy, fast response, support for multiple languages, etc. The product is available in free and paid versions, with the paid version offering more advanced features and an ad-free experience. Positioned to provide users with convenient and efficient language processing services.
Lugs.ai is a plug-in that can accurately generate subtitles on your computer in real time. No Internet connection is required, and all audio in the computer is supported, including microphone recording and sound on the computer. It uses AI technology to deeply understand dialogue content and accurately transcribe and generate subtitles based on context. Lugs.ai was developed by people with hearing loss and is constantly optimized based on actual user experience. With the best accuracy and continuous updates.
Voice Dictation is a free online speech recognition software that helps you write emails, documents, and articles through voice input without typing.
Speech Intellect is the first speech-to-text/text-to-speech solution that works in real time, completely using a new AI-focused mathematical theory - Sense Theory. It takes into account the meaning of each word pronounced by the customer. Our solution is based on a self-developed Sense-to-Sense algorithm, which allows text to be regenerated into sounds with intonation and specific tonality. The solution can be easily integrated into various business scenarios, such as copying scripted text with a humanoid voice in video games, communication with customers in call centers, virtual conversations on websites, comfortable conversations in smart homes, and more. Our algorithm uses Sense, which is different from the algorithms of other solutions on the market.
Intelligent voice assistant is an intelligent assistant application based on artificial intelligence technology. It uses speech recognition and natural language processing technologies to realize functions such as voice interaction, information query, and task reminders. It can help users manage schedules efficiently, provide real-time weather information, play music, etc. The product is reasonably priced and positioned as an intelligent assistant that improves users’ work and life efficiency.
AI Live Subtitle Service is an artificial intelligence-based online subtitle service that can provide subtitles and interactive transcriptions for meetings or conference services in real time. Easily integrated into your service without programming. Supports multiple languages and dialects and provides real-time subtitle data to help improve meeting accessibility and user experience.
Smart voice assistant is a plug-in that converts the user's voice into a voice assistant. It can help users realize speech synthesis, speech recognition and other functions, turning the user's voice into a practical tool. Advantages: Highly customized, supporting multiple languages and sound styles; simple and easy to use, configuration can be completed in just a few steps; multi-scenario applications, can be used in personal assistants, voice broadcasts and other fields. Pricing: Free trial, paid version offers more features and support. Positioning: Provide users with a fast, convenient and efficient voice assistant tool.
Online Text to Speech is a free tool that converts text into real speech. It features high-quality, natural voice effects, and supports multiple languages and voice options. Users only need to enter text, select language and voice, and then generate customized voice content. This tool is suitable for a variety of scenarios, such as video dubbing, educational assistance, voice navigation, etc. Whether you are a Mac or Windows user, you can use this tool easily.
Voiser is a text-to-speech tool with over 550 different voice options. It can convert text into realistic machine speech and provide the closest machine speech to human voice. In addition, Voiser can also convert voice files into text, providing fast and accurate speech-to-text services. Voiser is the best text-to-speech and speech-to-speech solution.
Hour AI is an intelligent voice assistant that helps users improve productivity through voice commands. It has speech recognition, speech synthesis, intelligent dialogue and other functions, which can help users complete daily tasks, such as reminding schedules, checking the weather, sending text messages, etc. Hourly AI has flexible pricing, with free and paid versions available for individual and enterprise users. It is positioned to become the user's personal assistant, providing users with an efficient and convenient voice interaction experience.
TTSMaker is a free online text-to-speech tool that supports multiple languages and voice styles. It can convert text into natural and smooth speech, and provides downloading of audio files in MP3 and WAV formats. TTSMaker can be widely used in scenarios such as reading text and reading e-books aloud, and is suitable for personal and commercial use.
Explore other subcategories under productive forces Other Categories
1361 tools
904 tools
767 tools
619 tools
607 tools
431 tools
406 tools
398 tools
speech recognition Hot productive forces is a popular subcategory under 50 quality AI tools