Found 100 AI tools
Click any tool to view details
GPT OSS is an open source language model launched by OpenAI, with powerful reasoning capabilities and Apache 2.0 license. This model has the characteristics of high efficiency, security, API compatibility, etc., and is a pioneer of future open source language models.
Dyad is a powerful application building tool that uses open source technology so that users can freely customize and build AI applications. Its main advantages include high flexibility, powerful functions, and support for local development and customization.
SandboxAQ uses technologies such as AI simulation, encryption management, and AI perception of global organizations to solve major challenges affecting society. It is an advanced computing product of great significance.
Dia is a text-to-speech (TTS) model developed by Nari Labs with 160 million parameters capable of generating highly realistic dialogue directly from text. The model supports emotion and intonation control and is able to generate non-verbal communications such as laughter and coughs. Its pre-trained model weights are hosted on Hugging Face and are suitable for English generation. This product is critical for research and educational use, enabling the advancement of conversation generation technology.
GenPRM is an emerging process reward model (PRM) that improves computational efficiency at test time by generating inferences. This technology can provide more accurate reward evaluation when processing complex tasks and is suitable for a variety of applications in the field of machine learning and artificial intelligence. Its main advantage is the ability to optimize model performance under limited resources and reduce computational costs in practical applications.
EasyControl Ghibli is a newly released model based on the Hugging Face platform designed to simplify controlling and managing various artificial intelligence tasks. The model combines advanced technology with a user-friendly interface, allowing users to interact with the AI in a more intuitive way. Its main advantages are its ease of use and powerful functions, making it suitable for users from different backgrounds, whether beginners or professionals.
Hunyuan T1 is a very large-scale inference model launched by Tencent. It is based on reinforcement learning technology and significantly improves inference capabilities through extensive post-training. It performs outstandingly in long text processing and context capture, while optimizing the consumption of computing resources and having efficient reasoning capabilities. It is suitable for all kinds of reasoning tasks, especially in mathematics, logical reasoning and other fields. This product is based on deep learning and continuously optimized based on actual feedback. It is suitable for applications in scientific research, education and other fields.
MC-Bench is an online platform designed to evaluate and compare different AI-generated buildings through the Minecraft gaming environment. It allows users to vote and participate in AI evaluation, promoting the development of AI technology. The platform’s main advantage is its fun and interactive nature, providing users with an easy and fun way to learn about the capabilities of AI.
SpatialLM is a large-scale language model designed for processing 3D point cloud data, capable of producing structured 3D scene understanding output, including semantic categories of architectural elements and objects. It is capable of processing point cloud data from a variety of sources including monocular video sequences, RGBD images, and LiDAR sensors without the need for specialized equipment. SpatialLM has important application value in autonomous navigation and complex 3D scene analysis tasks, significantly improving spatial reasoning capabilities.
Mistral-Small-3.1-24B-Base-2503 is an advanced open source model with 24 billion parameters, supports multi-language and long context processing, and is suitable for text and vision tasks. It is the basic model of Mistral Small 3.1, has strong multi-modal capabilities and is suitable for enterprise needs.
Agent Network Protocol (ANP) aims to define how intelligent agents connect and communicate with each other. It ensures data security and privacy protection through decentralized identity authentication and end-to-end encrypted communication. Its dynamic protocol negotiation function can automatically organize agent networks to achieve efficient collaboration. The goal of ANP is to break down data silos and enable AI to access complete contextual information, thus promoting the era of intelligent agents. This technology has the advantages of openness, security and efficiency, and is suitable for a variety of scenarios that require intelligent agent collaboration.
This product showcases Meta's latest AI research results, covering many fields such as vision and language. The advantage is that it explores the future possibilities of AI, is free for users to experience, and is positioned to showcase cutting-edge AI technology.
Project Aria is a project launched by Meta that focuses on first-person perspective research and aims to promote the development of augmented reality (AR) and artificial intelligence (AI) through innovative technologies. This project collects information from the user's perspective through devices such as Aria Gen 2 glasses to support machine perception and AR research. Its key strengths include innovative hardware design, rich open source datasets and challenges, and close collaboration with global research partners. The project comes amid Meta’s long-term investment in future AR technology and aims to drive industry progress through open research.
Scira AI is a powerful AI platform that provides users with a wide range of application support by integrating multiple API interfaces. It supports a variety of data processing and analysis functions and can meet the needs of different users in different scenarios. The main advantages of this platform are its high flexibility, rich functionality, and ability to be quickly deployed and used. It is suitable for users and businesses that require support for multiple AI capabilities, and pricing and specific positioning may vary based on user needs.
Elimination Game is an innovative benchmarking framework for evaluating the performance of large language models (LLMs) in complex social environments. It simulates a multi-player competition scenario similar to 'Werewolf' and tests the model's social reasoning, strategy selection and deception capabilities through public discussions, private communication and voting elimination mechanisms. This framework not only provides an important tool for studying the intelligence of AI in social games, but also provides developers with the opportunity to gain insights into the potential of models in real-life social scenarios. Its main advantages include multi-round interaction design, dynamic alliance and defection mechanisms, and detailed evaluation indicators that can comprehensively measure the social ability of AI.
Evo 2 is an AI basic model launched by NVIDIA, designed to analyze the genetic code of biomolecules through deep learning technology. Developed on the NVIDIA DGX Cloud platform, the model is capable of processing large-scale genomic data and provides a powerful tool for biomedical research. The main advantage of Evo 2 is its ability to process gene sequences of up to 1 million tokens, allowing for a more complete understanding of the complexity of the genome. The model has broad application prospects in the biomedical field, including disease diagnosis, drug development and gene editing. Evo 2 was developed with support from the Arc Institute and Stanford University with the goal of driving innovation and breakthroughs in biomedical research.
WebGames is a platform built by convergence.ai designed to test the abilities of general web browsing AI agents through a series of challenges. These challenges are simple for humans but difficult for AI agents to complete. Successful completion of each mission provides a unique password. The platform not only provides AI developers with the opportunity to test and optimize AI agents, but also provides researchers with scenarios where AI interacts with humans. WebGames is designed to advance AI technology, particularly in natural language processing and visual recognition. Currently, the platform is free and primarily targeted at AI researchers and developers.
GeForce RTX 5070 Ti is a high-performance graphics card launched by NVIDIA, using the latest Blackwell architecture and supporting DLSS 4 multi-frame generation technology. This graphics card can provide gamers with the ultimate graphics performance, support full light chasing gaming experience, and can also significantly improve the speed of AI generation and video export in the field of content creation. Its powerful performance makes it an ideal choice for users seeking high frame rates and high-quality graphics experience.
AlphaMaze is a project focused on improving the visual reasoning capabilities of large language models (LLM). It trains the model through maze tasks described in text form to enable it to understand and plan spatial structures. This method not only avoids complex image processing, but also directly evaluates the model's spatial understanding ability through text descriptions. Its main advantage is that it reveals how the model thinks about spatial problems, not just whether it can solve them. This model is based on an open source framework and aims to promote the research and development of language models in the field of visual reasoning.
Muse is a generative AI model developed by Microsoft Research in partnership with Xbox Games Studios to support creative ideation for games. It is trained on large-scale human game data and is able to generate coherent game visuals and action sequences. This technology demonstrates the potential of AI in game design and provides new creative methods and experiences for future game development.
Memobase is a user portrait-based memory system designed for generative artificial intelligence applications. It avoids data bloat by extracting and storing meaningful user insights while maintaining structured user personas to deliver highly relevant responses. Key benefits of Memobase include simplifying memory management, providing a personalized user experience, supporting massive scalability, and being flexibly deployed in the cloud or on-premises. This product is suitable for AI applications that require personalized interaction, such as AI companionship, education, and games.
Majorana 1 is a revolutionary quantum chip launched by Microsoft. It adopts a topological core architecture and uses topological superconductor materials to achieve more stable and scalable qubits. This technology aims to promote quantum computing from laboratories to commercial applications and solve complex industrial-level problems. Its main advantages include high stability, low error rate and scalability, laying the foundation for future million-qubit quantum computers.
OpenAI Model Spec is an AI model behavior specification released by OpenAI, which aims to guide AI models on how to interact safely and beneficially with users. The specification details the model's code of conduct in different scenarios, including how to handle sensitive content, how to avoid generating harmful information, how to provide assistance within legal and ethical frameworks, etc. It emphasizes the transparency, controllability and security of AI models, ensuring that the models can provide users with reliable and beneficial tools while avoiding potential risks. OpenAI demonstrates its responsible attitude towards AI technology through this specification, provides developers and users with clear guidance, and promotes the healthy development of AI technology.
MedRAX is an innovative AI framework designed for intelligent analysis of chest X-rays (CXR). It is capable of dynamically processing complex medical queries by integrating state-of-the-art CXR analysis tools and multi-modal large-scale language models. MedRAX can run without additional training, supports real-time CXR interpretation, and is suitable for a variety of clinical scenarios. Its main advantages include high flexibility, powerful reasoning capabilities, and transparent workflows. This product is aimed at medical professionals and aims to improve diagnostic efficiency and accuracy and promote the practical use of medical AI.
WeatherNext is the latest AI weather forecast technology developed by Google DeepMind and Google Research. It provides fast and accurate weather predictions through advanced AI models to help combat extreme weather events, improve the reliability of renewable energy, and enhance global food security. The technology is provided free of charge to scientists and forecasters to accelerate the research and application of global weather forecasting.
Open Thoughts is a project led by Bespoke Labs and the DataComp community to curate high-quality open source inference datasets for training advanced small models. The project brings together researchers and engineers from Stanford University, University of California, Berkeley, University of Washington and other universities and research institutions, and is committed to promoting the development of inference models through high-quality data sets. The background is that the application demand of current reasoning models in fields such as mathematics and code reasoning is growing, and high-quality data sets are the key to improving model performance. The project is currently free and open to researchers, developers, and professionals interested in inference models. The open source nature of its data sets and tools makes it an important resource for promoting artificial intelligence education and research.
Humanity's Last Exam is a multi-modal benchmark developed by a global collaboration of experts to measure the performance of large language models in academic settings. It contains 3,000 questions contributed by nearly 1,000 experts from more than 500 institutions in 50 countries, covering more than 100 disciplines. The test is intended to be the ultimate closed academic benchmark, pushing the boundaries of artificial intelligence technology by pushing the limits of models. Its main advantage is that it is highly difficult and can effectively evaluate the performance of models on complex academic problems.
Llasa-1B is a text-to-speech model developed by the Hong Kong University of Science and Technology Audio Laboratory. It is based on the LLaMA architecture and can convert text into natural and smooth speech by combining speech tags in the XCodec2 codebook. The model was trained on 250,000 hours of Chinese and English speech data and supports speech generation from plain text or synthesis using given speech cues. Its main advantage is that it can generate high-quality multi-language speech and is suitable for a variety of speech synthesis scenarios, such as audio books, voice assistants, etc. This model is licensed under CC BY-NC-ND 4.0 and commercial use is prohibited.
Llasa-3B is a powerful text-to-speech (TTS) model developed based on the LLaMA architecture and focuses on Chinese and English speech synthesis. By combining the speech coding technology of XCodec2, this model can efficiently convert text into natural and smooth speech. Its main advantages include high-quality speech output, support for multi-language synthesis, and flexible voice prompt functions. This model is suitable for a variety of scenarios that require speech synthesis, such as audiobook production, voice assistant development, etc. Its open source nature also allows developers to freely explore and extend its functionality.
NEAR AI is committed to building a future where users own data and AI. Through open standards and protocols, it allows users to control their own data rather than being controlled by a few companies. The vision of NEAR AI is to promote the democratization of AI technology by allowing users to truly own and control their own AI through open models and protocols. It is currently in its early stages but already shows great potential and possibilities for future development.
Procyon AI Image Generation Benchmark is a benchmark tool developed by UL Solutions to provide professional users with a consistent, accurate, and easy-to-understand workload for measuring the inference performance of on-device AI accelerators. The benchmark was developed in collaboration with multiple key industry members to ensure fair and comparable results across all supported hardware. It includes three tests that measure performance from low-power NPUs to high-end discrete graphics cards. Users can configure and run through the Procyon application or the command line, supporting multiple inference engines such as NVIDIA® TensorRT™, Intel® OpenVINO™ and ONNX with DirectML. The product is intended primarily for engineering teams and is suitable for evaluating general-purpose AI performance on inference engine implementations and specialized hardware. In terms of price, a free trial is provided, and the official version is an annual venue license. You need to pay to get a quote.
MiniCPM-o 2.6 is the latest and most powerful model in the MiniCPM-o series. The model is built based on SigLip-400M, Whisper-medium-300M, ChatTTS-200M and Qwen2.5-7B and has 8B parameters. It performs well in visual understanding, voice interaction and multi-modal live broadcast, supporting real-time voice dialogue and multi-modal live broadcast functions. This model has performed well in the open source community, surpassing several well-known models. Its advantages lie in efficient inference speed, low latency, low memory and power consumption, and it can efficiently support multi-modal live broadcast on terminal devices such as iPad. In addition, MiniCPM-o 2.6 is easy to use and supports multiple usage methods, including CPU inference of llama.cpp, quantization models in int4 and GGUF formats, high-throughput inference of vLLM, etc.
The Chinese Internet Corpus Resource Platform is a professional website hosted by the China Cyberspace Security Association. It aims to provide high-quality, safe and compliant Chinese corpus resources for the pre-training of large artificial intelligence models. The platform brings together the synergistic advantages from enterprises, universities and scientific research units, and relies on the "co-construction and sharing" mechanism to form multiple high-quality corpora including the Chinese Internet Basic Corpus 2.0, the People's Daily Online Mainstream Value Dataset, and the National Version Library's Ming and Qing literature corpora. These corpora have gone through strict source screening, format cleaning, language filtering, data deduplication, content filtering, privacy filtering and other processing steps to ensure the legality, authenticity, accuracy and objectivity of the data. The resources of the platform are of great significance in promoting national artificial intelligence technology innovation and industrial development. They can help large models better understand and generate Chinese content and improve their knowledge capabilities and value alignment.
NVIDIA® GeForce RTX™ 5090 is powered by the NVIDIA Blackwell architecture and features 32 GB of ultra-fast GDDR7 memory, delivering unprecedented AI performance to gamers and creators. It supports full ray tracing and the lowest latency gaming experience, capable of handling the most advanced models and the most challenging creative workloads.
Moondream AI is an open source visual language model with powerful multi-modal processing capabilities. It supports multiple quantization formats, such as fp16, int8, and int4, and can perform GPU and CPU optimized inference on a variety of target devices such as servers, PCs, and mobile devices. Its main advantages include being fast, efficient, easy to deploy, and using the Apache 2.0 license, allowing users to use and modify it freely. Moondream AI is positioned to provide developers with a flexible and efficient artificial intelligence solution that is suitable for various application scenarios that require visual and language processing capabilities.
METAGENE-1 is a basic metagenomic model developed by researchers at the University of Southern California, Prime Intellect, and the Nucleic Acid Observatory. The model has 7 billion parameters and was trained on 1.5 trillion base pairs of DNA and RNA sequences derived from human wastewater samples. The primary function of METAGENE-1 is to aid public health applications such as epidemic surveillance, pathogen detection and early detection of emerging health threats. Its advantage is that it can capture the complete distribution of genomic information in the human microbiome and has strong generalization capabilities.
HuatuoGPT-o1-70B is a large-scale language model (LLM) in the medical field developed by Freedom Intelligence, specially designed for complex medical reasoning. The model generates a complex thought process that reflects and refines its reasoning before providing a final response. HuatuoGPT-o1-70B is able to handle complex medical problems and provide thoughtful answers, which is crucial to improving the quality and efficiency of medical decision-making. The model is based on the LLaMA-3.1-70B architecture, supports English, and can be deployed on a variety of tools, such as vllm or Sglang, or directly for inference.
HuatuoGPT-o1-8B is a large language model (LLM) in the medical field designed for advanced medical reasoning. It generates a complex thought process that reflects and refines its reasoning before providing a final response. The model is built based on LLaMA-3.1-8B, supports English, and adopts the 'thinks-before-it-answers' method. The output format includes the reasoning process and final response. This model is of great significance in the medical field because of its ability to handle complex medical problems and provide thoughtful answers, which is crucial to improving the quality and efficiency of medical decision-making.
The Little Fox AI digital human avatar system is a digital mouth synchronization product that combines artificial intelligence technology. It supports unlimited openings and OEMs and is suitable for scenarios that require avatars to perform lip synchronization interactions. The background of this product is based on the development of artificial intelligence technology, especially the growing application demand in the fields of virtual anchors and online education. The product is priced at 3,580 yuan and is positioned in the mid-to-high-end market. Its main advantages include being completely open source, supporting independent secondary development and customized secondary development, and free construction services.
Valley is a multi-modal large-scale model (MLLM) developed by ByteDance and is designed to handle a variety of tasks involving text, image and video data. The model achieved the best results in internal e-commerce and short video benchmarks, far outperforming other open source models, and demonstrated excellent performance on the OpenCompass multimodal model evaluation rankings, with an average score of 67.40, ranking among the top two among known open source MLLMs (<10B).
2AGI-AI product tool is a platform that integrates a variety of AI technologies and tools, aiming to provide users with a comprehensive AI product navigation. The platform covers tools in multiple fields from AI programming, AI art generation to AI chatbots, helping users discover and utilize the latest AI technology. The background information of the platform shows that it not only provides rankings and classifications of AI tools, but also provides sections such as AI hotspot information and hall of fame, allowing users to keep up to date with the latest developments and pioneers in the AI field.
FlagEval is a model evaluation platform that focuses on the evaluation of large language models and multi-modal models. It provides a fair and transparent environment that allows different models to be compared under the same standards, helps researchers and developers understand model performance, and promotes the development of artificial intelligence technology. The platform covers a variety of model types such as dialogue models and visual language models, supports the evaluation of open source and closed source models, and provides special evaluations such as K12 subject tests and financial quantitative trading evaluations.
OCTAVE (Omni-Capable Text and Voice Engine) is a next-generation speech language model that combines cutting-edge language models and speech system capabilities. It is able to generate not just a voice, but a personality (language, accent, expression, underlying personality, etc.) from a short descriptive prompt or recording, and can generate multiple interactive AI personalities and voices in real-time response. OCTAVE maintains the capabilities of cutting-edge large language models (LLMs) of similar size, making it ideal for driving AI systems that communicate richly with humans while following detailed instructions, using tools or control interfaces.
ExploreToM is a framework developed by Facebook Research that aims to generate diverse and challenging theory-of-mind data at scale for enhanced training and evaluation of large language models (LLMs). The framework utilizes the A* search algorithm to generate complex story structures and novel, diverse, and plausible scenarios on a custom domain-specific language to test the limits of LLMs.
Astris AI is a subsidiary of Lockheed Martin established to drive the adoption of high-assurance artificial intelligence solutions across the U.S. defense industrial base and commercial industry sectors. Astris AI helps customers develop and deploy secure, resilient and scalable AI solutions by providing Lockheed Martin's leading technology and professional teams in artificial intelligence and machine learning. The establishment of Astris AI demonstrates Lockheed Martin's commitment to advancing 21st century security, strengthening the defense industrial base and national security, while also demonstrating its leadership in integrating commercial technologies to help customers address the growing threat environment.
FACTS Grounding is a comprehensive benchmark launched by Google DeepMind that aims to evaluate whether the responses generated by large language models (LLMs) are not only factually accurate with respect to the given input, but also detailed enough to provide users with satisfactory answers. This benchmark is critical to increasing the trust and accuracy of LLMs in real-world applications, helping to drive factual and fundamental progress across the industry.
Android XR is a platform jointly launched by Google, Samsung and Qualcomm to expand reality and allow users to explore, connect and create in new ways. It combines artificial intelligence, augmented reality (AR) and virtual reality (VR) technologies to bring beneficial experiences to headsets and glasses. The launch of Android XR marks the expansion of the Android system into the next generation of computing platforms, which will enable developers to use familiar Android tools and frameworks to build experiences for a wide range of devices.
Boltz-1 is the first truly open-source biomolecular structure prediction model developed by researchers at the Abdul Latif Jameel Health Machine Learning Clinic at the Massachusetts Institute of Technology (MIT), achieving AlphaFold3-level accuracy. The model is named after the Boltzmann distribution, a probability measure that describes the distribution of molecular structures. Boltz-1 was developed to encourage innovation beyond academia to support commercial use. It was developed by doctoral students Jeremy Wohlwend and Gabriele Corso and MIT Jameel Clinic researcher Saro Passaro, with guidance from MIT Electrical Engineering and Computer Science (EECS) professors Regina Barzilay and Tommi Jaakkola. The development of Boltz-1 faced challenges of scale and data processing, but ultimately succeeded in building the necessary computing power, providing a basis for standardizing structural biology research practices and hopefully accelerating the creation of life-changing drugs.
allenai/tulu-3-sft-olmo-2-mixture is a large-scale multilingual dataset containing diverse text samples for training and fine-tuning language models. The importance of this dataset is that it provides researchers and developers with rich language resources to improve and optimize the performance of multilingual AI models. Product background information includes that it is a blend of data from multiple sources, is suitable for education and research, and is subject to a specific license agreement.
P-MMEval is a multilingual benchmark covering both basic and ability-specialized datasets. It extends existing benchmarks to ensure consistent language coverage across all datasets and provides parallel samples across multiple languages, supporting up to 10 languages and covering 8 language families. P-MMEval facilitates comprehensive assessment of multilingual proficiency and conducts comparative analysis of cross-language transferability.
MAmmoTH-VL is a large-scale multi-modal reasoning platform that significantly improves the performance of multi-modal large language models (MLLMs) in multi-modal tasks through instruction tuning technology. The platform uses open models to create a dataset of 12 million command-response pairs, covering diverse, inference-intensive tasks with detailed and faithful justification. MAmmoTH-VL achieved state-of-the-art performance on benchmarks such as MathVerse, MMMU-Pro and MuirBench, demonstrating its importance in education and research.
AI Tools Dir is a directory website that brings together a variety of valuable and interesting AI applications. We are committed to providing users with the latest and most comprehensive AI tool information to help users discover and utilize the powerful capabilities of AI technology. The website includes, but is not limited to, AI writing assistants, AI code generators, AI data analysis tools, AI image generators, AI music creation tools, AI video editing tools, etc.
AIBest.Tools is a platform that collects various AI tools, aiming to help users discover the latest and best AI tools and stay ahead of the industry. The platform covers AI tools in education, images, applications and other fields, providing users with a convenient channel to discover and explore AI tools.
Willow quantum chip is the latest generation of quantum chip developed by Google's quantum artificial intelligence team. It has made major breakthroughs in quantum error correction and performance. This chip can significantly reduce errors that occur as the number of qubits increases, achieving a key challenge that has been pursued in the field of quantum computing for nearly 30 years. Additionally, Willow completed a standard benchmark calculation in less than five minutes that would have taken today's fastest supercomputers 10^25 years, or well beyond the age of the universe. This achievement marks an important step towards building commercially significant large-scale quantum computers, which have the potential to revolutionize fields such as medicine, energy and artificial intelligence.
GraphCast is a deep learning model developed by Google DeepMind, focusing on global medium-term weather forecasting. This model uses advanced machine learning technology to predict weather changes and improve the accuracy and speed of forecasts. GraphCast models play an important role in scientific research, helping to better understand and predict weather patterns, and are of great value to many fields such as meteorology, agriculture, and aviation.
Anduril Industries is a defense technology company partnering with OpenAI to develop and responsibly deploy advanced artificial intelligence solutions for national security missions. By combining OpenAI's advanced models with Anduril's high-performance defense systems and Lattice software platform, the collaboration aims to improve defense systems that protect U.S. and allied military personnel from attacks by drones and other aerial devices. Collaboration underscores U.S. leadership in artificial intelligence
GenCast is a new high-resolution (0.25°) AI ensemble model developed by Google DeepMind that is more accurate than the European Center for Medium-Range Weather Forecasts (ECMWF)'s ENS system in predicting daily weather and extreme weather events, providing faster and more accurate forecasts up to 15 days in advance. This model is based on the diffusion model, which is a type of generative AI model that has recently made rapid progress in image, video and music generation. GenCast learns global weather patterns by analyzing historical weather data and can accurately generate complex probability distributions of future weather scenarios. The model's code, weights, and predictions will be publicly released to support the broader weather forecasting community.
OLMo 2 1124 7B Preference Mixture is a large-scale text dataset provided by Hugging Face, containing 366.7k generated pairs. This dataset is used to train and fine-tune natural language processing models, especially in preference learning and user intent understanding. It combines data from multiple sources, including SFT hybrid data, WildChat data, and DaringAnteater data, covering a wide range of language usage scenarios and user interaction patterns.
OLMo 2 1124 13B Preference Mixture is a large multilingual dataset provided by Hugging Face, containing 377.7k generated pairs, used for training and optimizing language models, especially in preference learning and instruction following. The importance of this dataset is that it provides a diverse and large-scale data environment that helps develop more precise and personalized language processing technologies.
Genie 2 is a large-scale basic world model developed by Google DeepMind that can generate endless, actionable, and playable 3D environments based on a single cue image for training and evaluating embodied agents. Genie 2 represents a major advance in the field of deep learning and artificial intelligence. It demonstrates a variety of emergent capabilities in large-scale generative models by simulating virtual worlds and the consequences of their actions, such as object interaction, complex character animation, physics simulation, etc. Genie 2 research drives new creative workflows for prototyping interactive experiences and opens up new possibilities for future research on more general AI systems and agents.
The allenai/olmo-mix-1124 data set is a large-scale multi-modal pre-training data set provided by Hugging Face, which is mainly used to train and optimize natural language processing models. This dataset contains a large amount of text information, covers multiple languages, and can be used for various text generation tasks. Its importance lies in providing a rich resource that enables researchers and developers to train more accurate and efficient language models, thereby promoting the development of natural language processing technology.
The CHIEF (Clinical Histopathology Imaging Evaluation Foundation) model is a pathology-based model used for cancer diagnosis and prognosis prediction. It extracts pathology imaging features through two complementary pre-training methods, including unsupervised pre-training to identify tile-level features and weakly supervised pre-training to identify patterns across slides. The CHIEF model was developed using 60,530 whole slide images (WSIs) covering 19 different anatomical sites and pre-trained on a 44TB high-resolution pathology imaging dataset to extract microscopic representations useful for cancer cell detection, tumor origin identification, molecular profile characterization, and prognosis prediction. The CHIEF model was validated on 19,491 whole-slide images on 32 independent slide sets from 24 international hospitals and cohorts, with overall performance exceeding state-of-the-art deep learning methods by up to 36.1%, demonstrating its ability to address domain shifts observed in diverse population samples and different slide preparation methods. CHIEF provides a generalizable basis for efficient digital pathology assessment of cancer patients.
Nous Research focuses on developing human-centered language models and simulators, working to align AI systems with real-world user experiences. Our main research areas include model architecture, data synthesis, fine-tuning, and inference. We prioritize the development of open source, human-compatible models that challenge traditional closed model approaches.
Fish Speech is a product that focuses on speech synthesis. It uses advanced deep learning technology to convert text into natural and smooth speech. This product supports multiple languages, including Chinese, English, etc., and is suitable for scenarios that require text-to-speech conversion, such as voice assistants, audiobook production, etc. Fish Speech is characterized by its high-quality speech output, ease of use, and flexibility as its main advantages. Background information shows that the product is continuously updated, increasing the data set size, and improving the parameters of the quantizer to provide better services.
Lingyi Zhihui is an AI medical brand driven by Baidu Brain Technology. It adheres to the vision of "evidence-based AI and empowers the big health industry". Based on the middle-end capabilities of Lingyi Zhihui technology, it constructs product series such as clinical assistant decision-making systems, fundus image analysis systems, overall medical big data solutions, intelligent pre-diagnosis assistants, and chronic disease management platforms to serve all scenarios inside and outside the hospital. We have extensively cooperated with hospitals, doctors, HIS manufacturers, electronic medical record manufacturers, governments, regulators and other partners to jointly promote the standardization and normalization of primary medical care processes, improve primary medical care capabilities, reduce medical risks, control medical expenses, and serve the national strategy of Healthy China 2030.
MaskGCT TTS Demo is a text-to-speech (TTS) demonstration based on the MaskGCT model, provided by amphion on the Hugging Face platform. This model uses deep learning technology to convert text into natural and smooth speech, which is suitable for multiple languages and scenarios. The MaskGCT model has attracted attention due to its efficient speech synthesis capabilities and support for multiple languages. It can not only improve the accuracy of speech recognition and synthesis, but also provide personalized speech services in different application scenarios. Currently, the product is available for free trial on the Hugging Face platform. Further information on the specific price and positioning information is required.
MaskGCT is an innovative zero-shot text-to-speech (TTS) model that solves problems existing in autoregressive and non-autoregressive systems by eliminating the need for explicit alignment information and phoneme-level duration prediction. MaskGCT employs a two-stage model: the first stage uses text to predict semantic tags extracted from a speech self-supervised learning (SSL) model; the second stage, the model predicts acoustic tags based on these semantic tags. MaskGCT follows a mask-and-predict learning paradigm where during training it learns to predict masked semantic or acoustic tags based on given conditions and cues. During inference, the model generates tokens of a specified length in parallel. Experiments show that MaskGCT surpasses the current state-of-the-art zero-shot TTS systems in terms of quality, similarity, and understandability.
The AMD Instinct MI325X accelerator is based on the AMD CDNA 3 architecture and is designed for AI tasks, including basic model training, fine-tuning and inference, providing excellent performance and efficiency. These products enable AMD customers and partners to create high-performance and optimized AI solutions at the system, rack and data center levels. The AMD Instinct MI325X accelerator provides industry-leading memory capacity and bandwidth, supporting 256GB HBM3E at 6.0TB/s, 1.8 times more capacity and 1.3 times more bandwidth than H200, providing higher FP16 and FP8 computing performance.
AMD Ryzen™ AI PRO 300 series processors are third-generation commercial AI mobile processors designed for enterprise users. They provide up to 50+ TOPS of AI processing power through the integrated NPU, making them the most powerful among similar products on the market. These processors are not only capable of handling daily work tasks, but are also specifically designed to meet the needs for AI computing power in business environments, such as real-time subtitles, language translation, and advanced AI image generation. They are manufactured on the 4nm process and use innovative power management technology to provide ideal battery life, making them ideal for business people who need to maintain high performance and productivity on the move.
GR-2 is an advanced general purpose robotic agent designed for diverse and generalizable robotic operations. It is first pre-trained on a large number of Internet videos to capture the dynamics of the world. This large-scale pre-training, involving 38 million video clips and over 50 billion tokens, enables GR-2 to generalize across a wide range of robotic tasks and environments in subsequent policy learning. Subsequently, GR-2 was fine-tuned for video generation and action prediction using robot trajectories. It demonstrates impressive multi-task learning capabilities, achieving an average success rate of 97.7% on more than 100 tasks. Additionally, the GR-2 excels in new, previously unseen scenarios, including new backgrounds, environments, objects and tasks. Notably, GR-2 scales efficiently as model size increases, highlighting its potential for continued growth and application.
The Tianmu intelligent recognition system is a product developed by the National Key Laboratory of Communication Content Cognition of the People's Daily Online and focuses on detecting text content generated by AI. It uses advanced AI technology to identify and govern AI-generated content to ensure the authenticity and reliability of the information. The main advantages of the product include high accuracy, large text volume detection, one-click PDF report generation, data privacy protection, etc. It is suitable for news communication, academic research and other fields, aiming to improve content quality and maintain academic integrity.
EMOVA (EMotionally Omni-present Voice Assistant) is a multi-modal language model that enables end-to-end speech processing while maintaining leading visual-linguistic performance. The model enables emotionally rich multimodal dialogue through a semantic-acoustic decoupled speech tokenizer and achieves state-of-the-art performance on visual-linguistic and speech benchmarks.
Outspeed is a platform that provides networking and inference infrastructure for building fast, real-time voice and video AI applications. Developed by engineers at Google and MIT to provide intuitive and powerful tools for real-time AI applications, Outspeed helps users innovate faster and with more confidence, whether building the next big application or extending an existing solution.
multispecies-whale-detection is an open source project developed by Google that aims to detect and classify whale sounds of different species and geographical regions through neural networks. This tool can help researchers and environmental groups better understand and protect marine biodiversity.
omni-moderation-latest is a new generation of multi-modal content review model built on GPT-4o. It is more accurate in detecting harmful information in text and image content, helping developers build more powerful review systems. The model supports text and image input, and performs more accurately in non-English languages. It evaluates content for categories such as hate, violence, self-harm and provides more granular control over moderation decisions. Additionally, it provides a probability score that reflects the likelihood that the content matches the detection category. The model is free and open to all developers and is designed to help developers benefit from the latest research and investments in secure systems.
ai-by-hand-excel is a resource library for AI technology practice through Excel. It provides a series of Excel files to allow users to manually perform and understand key operations of AI models, such as Softmax, LeakyReLU, Backpropagation and Transformer, etc. This resource library is suitable for beginners and educators who want to gain a deeper understanding of the inner workings of AI models, and can help them deepen their understanding of AI technology through practical operations.
Diabetica-7B is a large language model optimized for the diabetes care domain. It excels at a variety of diabetes-related tasks, including diagnosis, treatment recommendations, medication management, lifestyle recommendations, patient education, and more. The model is fine-tuned based on open source models, using disease-specific data sets and fine-tuning techniques, providing a reproducible framework that can accelerate the development of AI-assisted medicine. In addition, it has undergone comprehensive evaluation and clinical trials to verify its effectiveness in clinical applications.
Diabetica-1.5B is a large-scale language model specially customized for the field of diabetes care. It performs well in multiple diabetes-related tasks such as diagnosis, treatment recommendations, medication management, lifestyle recommendations, and patient education. The model is developed based on open source models and fine-tuned using disease-specific data sets, providing a reproducible framework that can accelerate the development of AI-assisted medicine.
Diabetica is a high-level language model developed specifically for diabetes treatment and care. Through deep learning and big data analysis, it is able to provide a variety of services including diagnosis, treatment recommendations, medication management, lifestyle advice and patient education. Diabetica’s models Diabetica-7B and Diabetica-1.5B demonstrate excellent performance on multiple diabetes-related tasks and provide a reproducible framework that allows other medical fields to benefit from such AI technology.
Brightband is a company dedicated to making weather and climate predictable through advanced earth system AI technology to help humans adapt to increasingly extreme weather changes. The platform encourages the global community to work together to improve the technical level of weather prediction through open source benchmark data sets, models and indicators. Brightband provides tools used by academia, government and companies to improve weather and climate-related decision-making to benefit people and the planet in the long term.
SiFive is a leader in RISC-V architecture, providing high-performance and efficient computing solutions for automotive, AI, data center and other applications. Its products promote the development and application of RISC-V technology with superior performance and efficiency, as well as support from the global community.
Google DeepMind is a leading artificial intelligence company owned by Google, focused on developing advanced machine learning algorithms and systems. DeepMind is known for its pioneering work in deep learning and reinforcement learning, with research spanning fields from gaming to healthcare. DeepMind's goal is to advance science and medicine by building intelligent systems to solve complex problems.
DataGemma is the world's first open model designed to help solve the problem of AI hallucinations through massive amounts of real-world statistics from Google's data sharing platform. These models enhance the factuality and reasoning capabilities of the language model through two different methods, thereby reducing hallucinations and improving the accuracy and reliability of AI. The launch of the DataGemma model is an important advancement in AI technology in improving data accuracy and reducing the spread of misinformation. It is of great significance to researchers, policymakers, and ordinary users.
Azure Quantum is a quantum computing platform launched by Microsoft that aims to accelerate discoveries in scientific research and materials science through advanced quantum computing technology. By combining artificial intelligence, high-performance computing, and quantum computing, it provides a complete set of tools and resources to help researchers and developers achieve breakthroughs in the quantum field. Azure Quantum's vision is to accelerate 250 years of scientific progress to the next 25 years, using quantum supercomputers to solve the most difficult problems facing humanity.
Intel® Core™ Ultra 200V series processors are Intel's most efficient x86 processor family to date, designed specifically for the AI PC era, delivering superior performance, breakthrough x86 energy efficiency, a huge leap in graphics performance, uncompromising application compatibility, enhanced security and unparalleled AI computing power. These processors will provide the industry's most complete and powerful AI PCs, combined with more than 80 consumer designs from more than 20 top manufacturer partners, including Acer, ASUS, Dell Technologies, HP, Lenovo, LG, MSI and Samsung.
Anuttacon is committed to creating new, innovative, intelligent and deeply engaging virtual world experiences and artificial intelligence general technology (AGI) products. By fully leveraging the potential of AI technology, Anuttacon aims to bring unprecedented interactive experiences to users.
Maia 100 is Microsoft's first custom AI accelerator designed for Azure. It is specially built for large-scale AI workloads. Through the collaborative optimization of software and hardware, it maximizes performance, scalability and flexibility. It uses TSMC N5 process and COWOS-S interconnect technology, has a bandwidth of up to 1.8TB/s and a capacity of 64GB, supports a thermal design power consumption (TDP) of up to 700W, but operates at 500W, ensuring a high energy efficiency ratio. Maia 100 integrates high-speed tensor units, vector processors, DMA engines and hardware semaphores, supports multiple data types and tensor slicing schemes, and supports large-scale AI models through Ethernet interconnection. In addition, Maia SDK provides a rich set of components to support rapid deployment of PyTorch and Triton models, and ensures efficient data processing and synchronization through dual programming models.
Geekbench AI is a cross-platform AI benchmarking tool that uses real-world machine learning tasks to evaluate the performance of AI workloads. It helps users determine whether their devices are ready for today's and tomorrow's cutting-edge machine learning applications by measuring the performance of CPU, GPU and NPU.
ControlMM is a full-body motion generation framework with plug-and-play multi-modal control capabilities, capable of generating robust motion in multiple domains including Text-to-Motion, Speech-to-Gesture, and Music-to-Dance. This model has obvious advantages in controllability, sequence and motion rationality, providing a new motion generation solution for the field of artificial intelligence.
Black Forest Labs is a team of professionals focused on the manufacture of models and the development of innovative technologies. Team members have diverse backgrounds and professional skills and are committed to promoting technological advancement and providing solutions for different fields.
NeuralGCM is a climate model developed by the Google research team. Compared with traditional physics-based climate models, it combines machine learning technology to improve the accuracy and efficiency of simulations. NeuralGCM is able to generate weather forecasts from 2 to 15 days with an accuracy that exceeds current gold standard physical models and is more accurate than traditional atmospheric models in reproducing temperature data from the past 40 years. Although NeuralGCM has not yet been built as a complete climate model, it marks an important step in the development of more powerful and easier-to-use climate models.
Graphcore is a company specializing in artificial intelligence hardware accelerators. Its products are mainly targeted at artificial intelligence fields that require high-performance computing. Graphcore's IPU (Intelligent Processing Unit) technology provides powerful computing support for AI applications such as machine learning and deep learning. The company's products include cloud IPUs, data center IPUs, and Bow IPU processors. These products are optimized through Poplar® Software and can significantly improve the training and inference speed of AI models. Graphcore's products and technologies are used in many industries such as finance, biotechnology, and scientific research, helping enterprises and research institutions accelerate the experimental process of AI projects and improve efficiency.
The website provides performance indicators of API services of common domestic model providers, including detailed data such as TTFT (first token delay), TPS (output tokens per second), total time spent, context length, and input and output prices. It provides developers and enterprises with a basis for evaluating the performance of different large models and helps them choose the model service that best suits their needs.
CosyVoice is a large-scale multi-lingual speech generation model that not only supports speech generation in multiple languages, but also provides full-stack capabilities from inference to training to deployment. This model is important in the field of speech synthesis because it can generate natural and smooth speech that is close to real people and is suitable for multiple language environments. Background information on CosyVoice shows that it was developed by the FunAudioLLM team under the Apache-2.0 license.
Wenxin agent platform AgentBuilder is an agent platform based on Wenxin large model, which supports developers to choose different development methods to build agents according to industry fields and application scenarios. Its main advantages include low-cost development, traffic distribution path support, and providing users with a complete closed-loop product development loop.
Mooncake is Kimi's service platform, provided by Moonshot AI, and is a leading large language model (LLM) service. It adopts a KVCache-centered decoupling architecture to achieve KVCache's decoupled cache by separating prefill and decoding clusters, and utilizing underutilized CPU, DRAM and SSD resources in the GPU cluster. At the heart of Mooncake is its KVCache central scheduler, which balances maximizing overall effective throughput while ensuring latency-related service level objectives (SLOs) are met. Different from traditional research, Mooncake faces highly overloaded scenarios and develops an early rejection strategy based on prediction. Experiments show that Mooncake performs well in long context scenarios, achieving a 525% increase in throughput in some simulation scenarios compared to baseline methods, while adhering to SLOs. Under actual workloads, Mooncake's innovative architecture enables Kimi to handle more than 75% of requests.
Rakis is a decentralized inference network that runs entirely in the browser. It uses blockchain technology to allow inference requests and result sharing of AI models between nodes, enabling distributed execution of AI models without the need for a server. Rakis supports WebGPU compatible platforms by using browsers as nodes, allowing ordinary users to participate in the inference process of AI models. The project is open source, emphasizing transparency and verifiability, and aims to solve the problems of certainty, scalability and security in decentralized AI reasoning.
prism-alignment is a dataset created by HannahRoseKirk that focuses on studying the preference and value alignment problem of large language models (LLMs). The dataset collects ratings and feedback on model responses from participants from different countries and cultural backgrounds through questionnaires and multiple rounds of dialogue with the language model. This data is critical to understanding and improving AI’s value alignment.
HelpSteer2 is an open source dataset released by NVIDIA designed to support training that aligns models to be more helpful, factually correct, and coherent, while being tunable in terms of complexity and redundancy of responses. Created in partnership with Scale AI, this dataset achieved an 88.8% performance on RewardBench when used with the Llama 3 70B base model, one of the best reward models as of June 12, 2024.
Sonic is a low-latency speech model developed by the Carteisa team to provide realistic speech generation capabilities for a variety of devices. The model leverages an innovative state-space model architecture to enable efficient, low-latency generation of high-resolution audio and video. The Sonic model has a latency of just 135 milliseconds, making it the fastest model in its class. The Carteisa team is focused on optimizing the efficiency of intelligence, making it faster, cheaper and more accessible. The release of the Sonic model marks the initial progress of real-time conversational AI and long-term memory computing platforms, and heralds new AI experiences in real-time gaming, customer support and other fields in the future.
Explore other subcategories under other Other Categories
178 tools
113 tools
102 tools
62 tools
61 tools
49 tools
45 tools
44 tools
AI model Hot other is a popular subcategory under 195 quality AI tools