💼

productive forces Category

AI speech to text

Found 49 AI tools

49

tools

Primary Category: productive forces

Subcategory: AI speech to text

Found 49 matching tools

Related AI Tools

Click any tool to view details

FunASR

FunASR

FunASR is a voice offline file transcription service software package that integrates voice endpoint detection, speech recognition, punctuation and other models. It can convert long audio and video into text with punctuation, and supports simultaneous transcription of multiple requests. It supports ITN and user-defined hot words, the server is integrated with ffmpeg, supports input of multiple audio and video formats, and provides multiple programming language clients. It is suitable for enterprises and developers who require efficient and accurate voice transcription services.

多语言支持语音识别语音转写 +2

生产力 Visit

AsrTools

AsrTools

AsrTools is a speech-to-text tool based on artificial intelligence technology. It implements efficient speech recognition functions without GPU and complex configuration by calling the ASR service interface of major manufacturers. This tool supports batch processing and multi-thread concurrency, and can quickly convert audio files into subtitle files in SRT or TXT format. The user interface of AsrTools is based on PyQt5 and qfluentwidgets, providing a high-looking and easy-to-operate interactive experience. Its main advantages include the stability of calling interfaces from major manufacturers, the convenience of not requiring complex configuration, and the flexibility of multi-format output. AsrTools is suitable for users who need to quickly convert speech content into text, especially in the fields of video production, audio editing and subtitle generation. Currently, AsrTools provides free use of ASR services from major manufacturers, which can significantly reduce costs and improve work efficiency for individuals and small teams.

语音识别批量处理音频转文字 +4

生产力 Visit

NotesGPT

NotesGPT

NotesGPT is an online service that uses artificial intelligence technology to convert users' voice notes into organized summaries and clear action items. It uses advanced speech recognition and natural language processing technology to help users record and manage notes more efficiently. It is especially suitable for users who need to quickly record information and organize it into structured content. Product background information shows that NotesGPT is technically supported by Together.ai and Convex, which shows that there is strong AI technology support behind it. At present, the product seems to be in the promotion stage, and the specific price and positioning information are not clearly displayed on the page.

AI 自然语言处理生产力工具 +2

生产力 Visit

Echo

Echo

Echo is a voice and text note-taking application that combines artificial intelligence technology. It uses AI technology to help users organize and refine their thinking. Utilizing the GPT-4o large-scale language model for transcription, recall, and insight generation, Echo is able to accurately transcribe the user's voice input and provide meaningful answers based on the user's past thoughts, making the diary experience more interactive and engaging. This product focuses on privacy and security, encrypts notes, does not view user data, does not use data to train AI, and follows industry best practices for data protection. Echo is currently in a free testing phase, with plans to introduce advanced features in the future.

隐私保护语音转录 AI笔记 +2

生产力 Visit

gardener teleprompter

gardener teleprompter

Gardener Teleprompter is a desktop teleprompter application specially designed for live broadcast, speech, teaching and other scenarios. It uses intelligent speech recognition technology to sense the user's speaking speed in real time, intelligently adjust the text scrolling speed, and ensure that word prompts and expressions are synchronized. The product integrates cutting-edge AI technology to provide copywriting optimization, omni-channel copywriting extraction, watermark-free video downloading, banned word detection, copywriting dubbing and other functions, significantly improving the efficiency of text creation. The Gardener teleprompter supports simultaneous playback of multiple windows to meet various display needs. All windows can be placed on top to avoid obstruction and achieve a truly invisible teleprompter. Product background information shows that the Gardener teleprompter has been tested in thousands of live broadcasts and is stable and durable. The team continues to innovate, iterate stably, and provide excellent services.

AI技术直播演讲 +2

生产力 Visit

FineVoice

FineVoice

FineVoice is a multifunctional AI dubbing platform that uses advanced artificial intelligence technology to provide users with realistic and personalized voice services. This platform can not only convert text into natural and lifelike sounds, but also perform speech-to-text, voice-change and other operations, greatly enriching the possibilities of content creation. The main advantages of FineVoice include high efficiency, low cost, multi-language support and ease of use. It is especially suitable for individual and enterprise users who need to quickly generate large amounts of dubbing content.

多语言支持文本转语音 AI配音 +2

生产力 Visit

Rev AI

Rev AI

Rev AI provides high-precision speech transcription services, supports more than 58 languages, and can convert speech to text in video and voice applications. It sets the accuracy standard for video and speech applications by training with the world's most diverse collection of sounds. Rev AI also provides services such as live streaming transcription, human transcription, language recognition, sentiment analysis, topic extraction, summarization and translation. Rev AI’s technical strengths include low word error rates, minimal bias against gender and racial accent, support for more languages, and the most readable transcripts possible. Additionally, it complies with the world's top security standards, including SOC II, HIPAA, GDPR, and PCI compliance.

多语言支持语音识别实时转录 +2

生产力 Visit

Youtube-Whisper

Youtube-Whisper

Youtube-Whisper is a Gradio-based application that extracts the audio of YouTube videos and transcribes them into text using OpenAI’s Whisper model. This tool is useful for users who need to convert video content into text for analysis, archiving or translation. It leverages the latest artificial intelligence technology to improve the accessibility and usability of video content.

人工智能数据提取视频分析 +1

生产力 Visit

Whisper large-v3-turbo

Whisper large-v3-turbo

Whisper large-v3-turbo is an advanced automatic speech recognition (ASR) and speech translation model proposed by OpenAI. It is trained on over 5 million hours of labeled data and is able to generalize to many datasets and domains in a zero-shot setting. This model is a fine-tuned version of Whisper large-v3, with the decoding layers reduced from 32 to 4 to increase speed, but may slightly reduce quality.

多语言支持语音翻译零样本学习 +1

生产力 Visit

OmniSenseVoice

OmniSenseVoice

OmniSenseVoice is a speech recognition model optimized based on SenseVoice, designed for fast reasoning and precise timestamps, providing a smarter and faster audio transcription method.

开源多语言支持语音识别 +2

生产力 Visit

CrisperWhisper

CrisperWhisper

CrisperWhisper is an advanced variant of OpenAI-based Whisper model designed for fast, accurate, word-by-word speech recognition, providing accurate word-level timestamps. Compared to the original Whisper model, CrisperWhisper is designed to transcribe every spoken word word for word, including fillers, pauses, stutters and false starts. The model ranked first on verbatim datasets (e.g. TED, AMI) and was accepted at INTERSPEECH 2024.

时间戳自动语音识别逐字转录 +1

生产力 Visit

babelfish.ai

babelfish.ai

babelfish.ai is a browser-based real-time speech-to-text and translation application. It utilizes Huggingface Transformer.js and Supabase Realtime technology to implement localized real-time speech recognition and multi-language translation functions. The application supports real-time conversion of speech into text and can translate text into 200 languages, greatly improving the efficiency and convenience of cross-language communication.

多语言翻译实时语音转写本地化应用 +1

生产力 Visit

King of Han Dynasty Voice

King of Han Dynasty Voice

Hanwang Voice King App is an intelligent voice flagship application independently developed by Hanwang Technology based on its self-developed multi-modal world model. It integrates AI voice recording, intelligent translation and simultaneous interpretation, and supports functions such as AI accurate transcription, recording synchronization, script organization, intelligent summary and uninterrupted real-time translation. Relying on full-stack AI technology, Hanwang Voice King is committed to helping users overcome language barriers and improve efficiency and convenience in office, study, conference, travel and other scenarios.

语音识别智能翻译同声传译 +2

生产力 Visit

Real-time-translation-typing

Real-time-translation-typing

Real-time-translation-typing is a software that integrates real-time typing translation, real-time voice typing and translation, and LOL voice typing functions. It is implemented through AutoHotkey technology and supports multiple translation APIs, such as Sogou, Baidu, Youdao, etc., providing users with an efficient and convenient translation experience. The software is suitable for business people, students and gamers who need to quickly translate text and speech.

实时翻译多平台支持语音输入

生产力 Visit

CLASI

CLASI

CLASI is a high-quality, human-like simultaneous interpretation system developed by ByteDance’s research team. It balances translation quality and latency with a novel data-driven reading and writing strategy, employs multi-modal retrieval modules to enhance translation of domain-specific terms, and leverages large language models (LLMs) to generate fault-tolerant translations that take into account input audio, historical context, and retrieval information. In real-world scenarios, CLASI achieved a valid information ratio (VIP) of 81.3% and 78.0% in the Chinese-English and English-Chinese translation directions respectively, far exceeding other systems.

人工智能多语言大型语言模型 +1

生产力 Visit

aTrain

aTrain

aTrain is an offline speech transcription tool developed by researchers at the Center for Business Analytics and Data Science at the University of Graz and tested by researchers at the Graz Knowledge Center. It leverages the latest machine learning models to automatically transcribe voice recordings without uploading any data. aTrain was introduced in a paper published in the Journal of Behavioral and Experimental Finance, please cite that paper if used for research. It supports Windows 10 and 11 systems, and users can download and install it through the Microsoft App Store or the BANDAS Center website. For Linux systems, an installation guide on the Wiki is provided. The main advantages of aTrain include privacy protection without the need to upload data, high-quality transcription quality, and fast processing speed on the local computer.

机器学习隐私保护多平台支持 +2

生产力 Visit

Video text extraction tool

Video text extraction tool

AIbase video text extraction tool is a tool that uses artificial intelligence and machine learning technology to provide users with fast and accurate video text transcription services. It optimizes text layout, making the transcript easy to understand and faithful to the original video. As a basic service, this tool is completely free and requires no installation, download or paid subscription, greatly simplifying the video content processing work of creatives.

免费工具视频转录视频转文字 +1

生产力 Visit

Audio text extraction tool

Audio text extraction tool

AIbase Audio Text Extraction Tool uses artificial intelligence technology to quickly generate high-quality audio text descriptions through machine learning models, optimizes text layout, and improves readability. It is completely free to use and requires no installation, downloading, or payment, providing convenient basic services for creatives.

人工智能机器学习免费工具 +1

生产力 Visit

Voice Pen

Voice Pen

Voice Pen is an app that uses artificial intelligence technology to convert speech to text. It supports more than 50 languages and uses OpenAI’s Whisper technology to provide flawless transcription and punctuation. Users can use Voice Pen to record speech and generate notes, summaries, emails, messages, blog posts, and more. In addition, it also has AI rewriting function to help users clearly organize text, summarize, make lists, create blogs/posts/tweets, Instagram captions and emails. Voice Pen focuses on user privacy and does not collect any recording or text data.

多语言支持隐私保护语音转文字 +1

生产力 Visit

RTranslator

RTranslator

RTranslator is the world's first open source real-time translation application, designed specifically for Android and supporting real-time conversation translation in multiple languages. It uses Meta's NLLB and OpenAI's Whisper model to achieve high-quality translation and speech recognition, protect user privacy, and support offline use.

AI 隐私保护翻译 +2

生产力 Visit

StreamSpeech

StreamSpeech

StreamSpeech is a real-time speech-to-speech translation model based on multi-task learning. It simultaneously learns translation and synchronization strategies through a unified framework, effectively identifies translation opportunities in streaming speech input, and achieves a high-quality real-time communication experience. The model achieves leading performance on the CVSS benchmark and can provide low-latency intermediate results such as ASR or translation results.

语音识别语音合成实时翻译 +1

生产力 Visit

Seed-TTS

Seed-TTS

Seed-TTS is a series of large-scale autoregressive text-to-speech (TTS) models launched by ByteDance that can generate speech that is indistinguishable from human speech. It excels in speech context learning, speaker similarity, and naturalness, and can be fine-tuned to further improve subjective ratings. Seed-TTS also provides superior control over speech attributes such as emotion, and can generate highly expressive and diverse speech. Furthermore, a self-distillation method is proposed for speech decomposition, as well as a reinforcement learning method to enhance model robustness, speaker similarity, and controllability. Also presented is Seed-TTSDiT, a non-autoregressive (NAR) variant of the Seed-TTS model, which adopts a completely diffusion-based architecture and does not rely on pre-estimated phoneme durations for speech generation through end-to-end processing.

AI 自然语言处理语音合成 +1

生产力 Visit

Subtitle

Subtitle

Subtitle is an open source subtitle generation tool that uses advanced machine learning technology to provide users with accurate and natural-sounding subtitles. It supports multiple languages, is easy to integrate into existing workflows, and allows users to self-host on their own servers for increased control and privacy.

开源机器学习多语言支持 +2

生产力 Visit

Transkriptor Transcribe Audio to Text

Transkriptor Transcribe Audio to Text

Transkriptor is a browser plug-in that converts audio to text. It uses advanced artificial intelligence technology to automatically record and transcribe different types of voice content such as meetings, interviews, and lectures. Transkriptor has a simple and intuitive interface, supports multiple file formats, provides secure transcription services, and has functions such as generating subtitles, supporting multi-language transcription, and remote collaborative editing.

人工智能语音识别会议记录 +2

生产力 Visit

Universal-1

Universal-1

Explore AssemblyAI’s current research, news, and updates on voice AI technology. AssemblyAI’s Universal-1 enables industry-leading performance across multiple languages and is accurate, powerful, and robust, helping customers and developers around the world build a variety of speech AI applications. Universal-1 delivers 10% or greater improvements in speech-to-text accuracy in English, Spanish and German, reduced hallucination rates with respect to speech data and ambient noise, customer preference for Universal-1 output, transcoding capabilities, and more.

多语言研究高效推理 +2

生产力 Visit

Fathom AI Meeting Assistant for Google Meet

Fathom AI Meeting Assistant for Google Meet

Fathom is an AI assistant that can record, transcribe and summarize Zoom, Google Meet or Microsoft Teams meetings. It automatically transcribes meetings and generates summaries, providing instant access and searchable full records. At the same time, Fathom can also integrate with CRM systems such as Salesforce and Hubspot to automatically update meeting information. Fathom is completely free to use and can help users save time and energy.

摘要 CRM 转录 +2

生产力 Visit

Tencent Cloud Speech Recognition ASR

Tencent Cloud Speech Recognition ASR

Tencent Cloud Speech Recognition (ASR) provides developers with the best experience in speech-to-text services. The speech recognition service has the characteristics of high recognition accuracy, convenient access, and stable performance. Tencent Cloud speech recognition service opens three service forms: real-time speech recognition, sentence recognition and recording file recognition to meet the needs of different types of developers. Advanced technology, high cost performance, multi-language support, suitable for customer service, conferences, courts and other scenarios.

语音识别语音转文字 ASR

生产力 Visit

Summify - Summarize speech

Summify - Summarize speech

Summify - Summarize speech is a mobile app that allows you to easily record and summarize any speech, from a university lecture or school class to an artificial intelligence business meeting! It leverages the power of OpenAI’s Whisper AI model and ChatGPT to transcribe and summarize text with the highest possible accuracy, capturing every important detail. Summify helps you increase productivity, focus, revise presentations at home, and protect your privacy.

人工智能隐私保护学习 +2

生产力 Visit

33 subtitles

33 subtitles

33 Subtitles is a desktop software that accurately identifies audio and video to text or SRT subtitles. It supports recognition and translation in more than 50 languages. The translation supports DeepL and ChatGPT. It can search and edit subtitles, supports batch processing, and can also cut spoken broadcasts and podcasts with one click.

字幕翻译字幕识别音视频转字幕 +1

生产力 Visit

Whisper Memo Dictation

Whisper Memo Dictation

Use advanced artificial intelligence technology to transcribe voice memos into text. The app can easily handle large audio recordings and generate accurate transcriptions. Supports offline transcription, with all data processed on the device. Free features include: Easily record and transcribe audio files, no Internet required for transcription, all data processing on the device, instant access to transcription results, automatic language detection, support for 5 transcription results, simple and easy-to-use interface, support for background recording and sharing of transcription results to email and other applications. Pro features include unlimited transcription results. Download now!

语音转文字语音备忘录录音转录

生产力 Visit

VoiceRec

VoiceRec

VoiceRec is an artificial intelligence voice application that integrates voice recording, text recognition and sharing. Supports speech-to-text, accurate recognition, supports multiple languages, and supports exporting to multiple formats.

会议记录语音转文字音频编辑

生产力 Visit

Recty AI

Recty AI

Smart Translation is a powerful translation tool that can quickly and accurately translate text and speech. It has real-time translation, offline translation, speech-to-text and other functions. At the same time, it supports translation between multiple languages and provides users with a convenient international communication tool. Pricing is flexible, with free and paid plans available. Targeted at individual users, students, business people, etc.

多语言翻译工具语音翻译

生产力 Visit

Transcribe

Transcribe

Transcribe ~ Speech to Text is a speech-to-text iOS app. It uses OpenAI's Whisper technology and Apple's neural engine to achieve high-precision recognition of voice files, and can directly transcribe audio and video files into readable text. Supports offline identification and cloud identification modes. It is suitable for all types of speech-to-text needs and is simple and convenient to use.

语音识别语音转文字录音转文字

生产力 Visit

Whisper Notes

Whisper Notes

Whisper Notes is an accurate speech-to-text tool that uses OpenAI’s Whisper model. No internet connection is required, user data is not uploaded, and over 80 languages are supported. Can be used for taking notes, sending messages quickly, etc.

语音转文字记笔记消息发送

生产力 Visit

Fathom AI Notetaker for Google Meet

Fathom AI Notetaker for Google Meet

Fathom can record, transcribe, and highlight key moments in Google Meet, letting you focus on the conversation instead of taking notes. Free to use. Supports full-text transcription, automatically generates meeting summaries, integrates with Salesforce and Hubspot, easily shares key excerpts, searches across meetings and transcriptions, and more.

AI助手生产力工具会议记录 +2

生产力 Visit

TextScan AI

TextScan AI

TextScan AI is a free mobile app that can easily convert text from images and chat with AI, allowing you to say goodbye to manual typing and enjoy a faster and more accurate chat experience. It provides intelligent messaging functions to make chatting with AI more convenient. TextScan AI is an intelligent and efficient chat tool that makes your chat more intelligent and efficient.

AI聊天智能消息文字扫描

生产力 Visit

Hanami Live Translator

Hanami Live Translator

Hanami Live Translator is a real-time translator that can capture any audio from WINDOWS speakers and microphones. It uses lightweight multi-processing and chunking to process audio, with each chunk processing taking about 3-5 seconds. The application uses low-level access to create a hardware loop that allows content to be listened to even when the speaker is muted. It uses the soundcard library to capture audio signals, the SpeechRecognition library to convert binary audio to text, and the selenium library to simulate the network calls of the deepl server for free translation. The application requires an internet connection to run and logs all operations through the Traces.log file.

语音识别音频处理实时翻译

生产力 Visit

Freed AI Medical Scribe

Freed AI Medical Scribe

Freed's AI medical recorder can help doctors reduce documentation time and improve work efficiency. It uses artificial intelligence technology to automatically recognize the doctor's dictation and convert it into a text record, greatly reducing the doctor's burden. Freed also has a highly accurate recognition rate and can accurately understand and record the doctor's voice input. The product is flexibly priced and can be customized to meet the needs of healthcare organizations. Freed is positioned as a professional tool to improve doctors’ work efficiency.

AI 工具文档 +2

生产力 Visit

Caiyun Xiaoyi

Caiyun Xiaoyi

"Caiyun Xiaoyi" is an online translation tool that provides simultaneous interpretation, bilingual comparison, document translation and other functions. It can realize mutual translation between four languages: Chinese, Japanese, Korean and English, and supports functions such as document translation and video subtitle translation. Caiyun Xiaoyi uses artificial intelligence and deep learning technology to provide users with high-quality translation services. Users can directly enter the text that needs to be translated on the web page, or upload documents, videos and other files for translation.

文档翻译双语对照同声传译 +2

生产力 Visit

VNSplit

VNSplit

VNSplit is an AI voice note summary tool that provides you with powerful and detailed voice note summaries in seconds. Send voice note summaries via AI and get rid of all the hassle of listening to voice notes on iMessage and Whatsapp. Just subscribe to any plan and provide your iMessage or Whatsapp number to Stripe and you will receive messages from the AI bot. Just forward future messages to this number.

AI 摘要语音笔记

生产力 Visit

Speechless

Speechless

Speechless is the ultimate app powered by OpenAI’s Whisper API, providing seamless audio transcription and translation capabilities. With Speechless, you can easily import audio and get accurate transcriptions instantly. Break language barriers with real-time translation and easily share your transcriptions for unparalleled connection and communication. Speechless supports apps such as WhatsApp and Voice Memos, allowing you to easily transcribe or translate audio.

翻译音频转录语言沟通

生产力 Visit

WisprNote

WisprNote

WisprNote is an intelligent speech-to-text tool that supports transcribing voice memos, audio and video files into plain text. It offers extremely high accuracy and transcription speed while ensuring privacy and security. Suitable for meeting minutes, interview transcriptions, study notes, etc.

语音识别语音转文字文本转录

生产力 Visit

Live Transcribe: Voice to text

Live Transcribe: Voice to text

Live Transcribe is an app that converts speech to text in real time, making voice recording easy with your iPhone.

效率助手 Ai办公助手语音转文本 +1

生产力 Visit

Call Recorder & Transcriber

Call Recorder & Transcriber

This is an app for recording phone calls on Apple and Android phones. It uses IVR technology to record calls with the best quality, and can also use machine learning and artificial intelligence technology to transcribe the recording into a readable text document, including voice separation, time code, etc. The main functions are: record calls with high quality; transcribe calls to generate text files; share recordings and text files via email; purchase additional time; no ads, no subscription required.

效率助手通话记录通话录音 +1

生产力 Visit

Free AI Voice: Best Text to Speech Tool

Free AI Voice: Best Text to Speech Tool

Free AI Voice is a Chrome browser plug-in that uses text-to-speech (TTS) technology to convert web articles into speech and supports more than 40 languages. Works well with a variety of websites, including news sites, blogs, fan works, publications, educational materials, school and classroom sites, and online university course materials. The free AI voice can choose from a variety of TTS sounds, including those provided by the browser. Some cloud sounds may require additional in-app purchases to activate. Free AI Voice is suitable for people who prefer listening to content rather than reading, people with dyslexia or other learning disabilities, and children who are learning to read.

效率助手学习文字转语音 +2

生产力 Visit

NaturalReader - AI Text to Speech

NaturalReader - AI Text to Speech

NaturalReader - AI Text to Speech is a Chrome plug-in that converts online text into natural and smooth audio. Just click play and have your emails, web pages, PDF files, Google Docs, and Kindle books read to you! By using our voice reader, users can save time, listen to text faster than they can read, and be more productive during times when they can't read, such as commuting, walking the dog, or cooking! The free version is feature-rich, and there are two paid premium plans available to suit every budget.

AI 在线阅读语音阅读

生产力 Visit

Speech to Text

Speech to Text

Speech to Text is a Chrome extension that generates notes by speaking or copying and pasting. You can choose a background image, choose a font, and print. This plugin can be used in a variety of scenarios such as Thanksgiving, holidays, other occasions or just for the fun of speaking or writing.

插件笔记语音转文字

生产力 Visit

SpeechFlow - Advanced Speech-to-Text API

SpeechFlow - Advanced Speech-to-Text API

SpeechFlow is a powerful speech-to-text API that can transcribe in 13 languages with extremely high accuracy. It is a powerful tool for converting sound to text, speech to text and audio to text. SpeechFlow supports cloud and local deployment, providing a solution that is reliable and easy to deploy and expand. It also has fast processing speeds and can process audio files up to 1 hour in just a few minutes.

语音转文字自动语音识别声音转文字 +1

生产力 Visit

Free Subtitles AI

Free Subtitles AI

FreeSubtitles.AI is a free online tool that automatically transcribes audio and video to text. It can help users quickly convert various types of audio and video files such as conference recordings, interviews, speeches, etc. into editable and searchable text. The tool offers free auto-translation functionality that can automatically translate transcribed text into multiple languages. Users can upload audio or video files directly on the web page, or drag and drop files onto the page for transcription. FreeSubtitles.AI also offers a paid version, which saves the user's transcription history and offers more advanced features.

语音识别音频转文本自动翻译 +2

生产力 Visit

Related Subcategories

Explore other subcategories under productive forces Other Categories

Development and Tools

1361 tools

Productivity tools

904 tools

personal assistant

767 tools

AI model

619 tools

writing assistant

607 tools

knowledge management

431 tools

chatbot

406 tools

AI design tools

398 tools

💼

Explore More productive forces Tools

AI speech to text Hot productive forces is a popular subcategory under 49 quality AI tools

Browse productive forces Category Categories