Found 253 related AI tools
DeerFlow is a deep research framework designed to drive deep research by combining language models with specialized tools such as web search, crawlers, and Python execution. This project originated from the open source community, emphasizes contribution and feedback, and has a variety of flexible functions suitable for various research needs.
Search-R1 is a reinforcement learning framework designed to train language models (LLMs) capable of reasoning and invoking search engines. It is built on veRL and supports multiple reinforcement learning methods and different LLM architectures, making it efficient and scalable in tool-enhanced inference research and development.
Llama-3.1-Nemotron-Ultra-253B-v1 is a large language model based on Llama-3.1-405B-Instruct, which undergoes multi-stage post-training to improve reasoning and chatting capabilities. This model supports context lengths up to 128K, has a good balance between accuracy and efficiency, is suitable for commercial use, and aims to provide developers with powerful AI assistant functions.
Fin-R1 is a large-scale language model designed specifically for the financial field to improve financial reasoning capabilities. Jointly developed by Shanghai University of Finance and Economics and Caiyue Xingchen, it is based on Qwen2.5-7B-Instruct for fine-tuning and reinforcement learning. It has efficient financial reasoning capabilities and is suitable for core financial scenarios such as banks and securities. The model is free and open source, making it easy for users to use and improve.
Jamba 1.6 is the latest language model launched by AI21 and is designed for enterprise private deployment. It performs well in long text processing, is able to handle context windows up to 256K, and uses a hybrid SSM-Transformer architecture to handle long text question answering tasks efficiently and accurately. This model surpasses similar models such as Mistral, Meta, and Cohere in terms of quality while supporting flexible deployment methods, including private deployment on-premises or in a VPC to ensure data security. It provides enterprises with a solution that does not require compromise between data security and model quality, and is suitable for scenarios where large amounts of data and long texts need to be processed, such as research and development, legal and financial analysis, etc. Currently, Jamba 1.6 has been used in many enterprises, such as Fnac using it for data classification, Educa Edtech using it to build personalized chatbots, etc.
Inception Labs is a company focused on developing diffusion large language models (dLLMs). Its technology is inspired by advanced image and video generation systems such as Midjourney and Sora. With diffusion models, Inception Labs offers 5-10 times faster speeds, greater efficiency, and greater control over generation than traditional autoregressive models. Its model supports parallel text generation, is able to correct errors and illusions, is suitable for multi-modal tasks, and performs well in inference and structured data generation. The company, comprised of researchers and engineers from Stanford, UCLA and Cornell University, is a pioneer in the field of diffusion modeling.
OpenManus is an open source intelligent agent project that aims to implement functions similar to Manus through open source, but can be used without an invitation code. The project was jointly developed by multiple developers and is based on a powerful language model and flexible plug-in system, which can quickly implement various complex tasks. The main advantages of OpenManus are that it is open source, free and easy to extend, making it suitable for developers and researchers for secondary development and research. The project background stems from the need to improve existing intelligent agent tools, with the goal of creating a fully open and easy-to-use intelligent agent platform.
Instella is a series of high-performance open source language models developed by the AMD GenAI team and trained on the AMD Instinct™ MI300X GPU. The model significantly outperforms other open source language models of the same size and is functionally comparable to models such as Llama-3.2-3B and Qwen2.5-3B. Instella provides model weights, training code, and training data to advance the development of open source language models. Its key benefits include high performance, open source and optimized support for AMD hardware.
GPT-4.5 is the latest language model released by OpenAI, which represents the current cutting-edge level of unsupervised learning technology. Through large-scale computing and data training, this model improves the understanding of world knowledge and pattern recognition capabilities, reduces hallucinations, and can interact with humans more naturally. It excels at tasks such as writing, programming, and problem-solving, and is especially suitable for scenarios that require high creativity and emotional understanding. GPT-4.5 is currently in the research preview stage and is open to Pro users and developers to explore its potential capabilities.
Gemini 2.0 Flash-Lite is an efficient language model launched by Google, optimized for long text processing and complex tasks. It performs well on inference, multimodal, mathematical, and factual benchmarks, and has a simplified pricing strategy that makes million-level context windows more affordable. Gemini 2.0 Flash-Lite is fully open in Google AI Studio and Vertex AI, and is suitable for enterprise-level production use.
Phi-4-mini-instruct is a lightweight open source language model launched by Microsoft and belongs to the Phi-4 model family. It is trained on synthetic data and filtered public website data, focusing on high-quality, inference-intensive data. The model supports 128K token context lengths and enhances instruction compliance and security through supervised fine-tuning and direct preference optimization. Phi-4-mini-instruct performs well in multi-language support, reasoning capabilities (especially mathematical and logical reasoning), and low-latency scenarios, making it suitable for resource-constrained environments. The model was released in February 2025 and supports multiple languages, including English, Chinese, Japanese, and more.
DeepSeek is an advanced language model developed by China AI Lab supported by the High-Flyer Fund, focusing on open source models and innovative training methods. Its R1 series of models excel in logical reasoning and problem solving, using reinforcement learning and a hybrid expert framework to optimize performance and achieve efficient training at low cost. DeepSeek’s open source strategy drives community innovation while igniting industry discussions about AI competition and the impact of open source models. Its free and registration-free usage further lowers the user threshold and is suitable for a wide range of application scenarios.
AlphaMaze is a project focused on improving the visual reasoning capabilities of large language models (LLM). It trains the model through maze tasks described in text form to enable it to understand and plan spatial structures. This method not only avoids complex image processing, but also directly evaluates the model's spatial understanding ability through text descriptions. Its main advantage is that it reveals how the model thinks about spatial problems, not just whether it can solve them. This model is based on an open source framework and aims to promote the research and development of language models in the field of visual reasoning.
AlphaMaze is a decoder language model designed specifically to solve visual reasoning tasks. It demonstrates the potential of language models for visual reasoning by training them on a maze-solving task. The model is built on the 1.5 billion parameter Qwen model and trained through supervised fine-tuning (SFT) and reinforcement learning (RL). Its main advantage is that it can convert visual tasks into text format for reasoning, thus making up for the shortcomings of traditional language models in spatial understanding. The model was developed to improve AI performance on vision tasks, especially in scenarios that require step-by-step reasoning. Currently, AlphaMaze is a research project and its commercial pricing and market positioning have not yet been clarified.
Smithery is a platform based on the Model Context Protocol that allows users to extend the functionality of language models by connecting to various servers. It provides users with a flexible toolset that can dynamically enhance the capabilities of language models based on needs to better complete various tasks. The core advantage of this platform is its modularity and scalability, and users can choose the appropriate server for integration according to their needs.
Moonlight-16B-A3B is a large-scale language model developed by Moonshot AI and trained with the advanced Muon optimizer. This model significantly improves language generation capabilities by optimizing training efficiency and performance. Its main advantages include efficient optimizer design, fewer training FLOPs, and excellent performance. This model is suitable for scenarios that require efficient language generation, such as natural language processing, code generation, and multilingual dialogue. Its open source implementation and pre-trained models provide powerful tools for researchers and developers.
DeepHermes 3 is an advanced language model developed by NousResearch that improves answer accuracy through systematic reasoning. It supports inference mode and regular response mode, and users can switch through system prompts. The model performs well in multi-turn dialogue, role-playing, reasoning, etc., and is designed to provide users with more powerful and flexible language generation capabilities. The model is fine-tuned based on Llama-3.1-8B, has 8.03 billion parameters, and supports a variety of application scenarios, such as reasoning, dialogue, function calling, etc.
Lora is a local language model optimized for mobile devices that can be quickly integrated into mobile applications through its SDK. It supports iOS and Android platforms, has performance comparable to GPT-4o-mini, has a size of 1.5GB and 2.4 billion parameters, and is optimized for real-time mobile inference. Lora's main advantages include low energy consumption, lightweight and fast response, giving it significant advantages in energy consumption, volume and speed compared to other models. Lora is provided by PeekabooLabs, mainly for developers and enterprise customers, helping them quickly integrate advanced language model capabilities into mobile applications to improve user experience and application competitiveness.
PaliGemma 2 mix is an upgraded version of the visual language model launched by Google and belongs to the Gemma family. It can handle a variety of visual and language tasks, such as image segmentation, video subtitle generation, scientific question answering, etc. The model provides pre-trained checkpoints of different sizes (3B, 10B, and 28B parameters) and can be easily fine-tuned to suit a variety of visual language tasks. Its main advantages are versatility, high performance and developer-friendliness, supporting multiple frameworks (such as Hugging Face Transformers, Keras, PyTorch, etc.). This model is suitable for developers and researchers who need to efficiently handle visual and language tasks, and can significantly improve development efficiency.
Mistral Saba is the first customized language model launched by Mistral AI specifically for the Middle East and South Asia. With 24 billion parameters and trained on carefully curated datasets, the model delivers more accurate, relevant and lower-cost responses than comparable large models. It supports Arabic and multiple languages of Indian origin, and is especially good at South Indian languages (such as Tamil). It is suitable for scenarios that require precise language understanding and cultural background support. Mistral Saba can be used via API or deployed locally. It is lightweight, single-GPU system deployment and fast response, suitable for enterprise-level applications.
OLMoE is an open source language model application developed by Ai2 to provide researchers and developers with a completely open toolkit for conducting artificial intelligence experiments on devices. The app supports offline operation on iPhone and iPad, ensuring user data is completely private. It is built on an efficient OLMoE model and is optimized and quantized to maintain high performance when running on mobile devices. The open source nature of the application makes it an important foundation for research and development of a new generation of on-device artificial intelligence applications.
Podscript is a powerful audio transcription tool that leverages language models and a speech-to-text (STT) API to generate high-quality transcripts for podcasts and other audio content. The tool supports multiple popular STT services such as Deepgram, AssemblyAI, and Groq, and can handle automatically generated subtitles for YouTube videos. The main advantage of Podscript is its flexibility and ease of use, which can be operated through a simple command line interface or a convenient web interface. It's suitable for podcast creators, content producers, and users who need to quickly transcribe audio. Podscript is open source and users can customize and extend it according to their needs.
Xwen-Chat was developed by xwen-team to meet the needs of high-quality Chinese conversation models and fill the gaps in the field. It has multiple versions and has powerful language understanding and generation capabilities. It can handle complex language tasks and generate natural conversation content. It is suitable for scenarios such as intelligent customer service. It is provided free of charge on the Hugging Face platform.
LLM Codenames is a creative naming tool based on language models. It uses advanced natural language processing technology to quickly generate a series of unique and creative names based on keywords or topics entered by the user. This tool is very useful for users who need to do brand naming, product naming or creative writing. It can help users save a lot of time and energy and avoid duplication of work in the naming process. The main advantage of LLM Codenames is its efficiency and creativity, which can provide a variety of naming options to meet the needs of different users. The tool currently provides services in the form of a website, which users can access and use directly through a browser without installing any software.
Deeptrain is a platform focused on video processing, designed to seamlessly integrate video content into language models and AI agents. With its powerful video processing technology, users can leverage video content as easily as text and images. The product supports more than 200 language models, including GPT-4o, Gemini, etc., and supports multi-language video processing. Deeptrain offers free development support and only charges for use in production environments, making it ideal for developing AI applications. Its main advantages include powerful video processing capabilities, multi-language support, and seamless integration with mainstream language models.
Exa & Deepseek Chat App is an open source chat application designed to conduct real-time web search through Exa's API and combine with Deepseek R1 language model for inference to provide a more accurate chat experience. The app is built on Next.js, TailwindCSS and TypeScript and is hosted using Vercel. It allows users to get the latest network information in chat and conduct intelligent conversations through powerful language models. The application is free and open source, suitable for developers and enterprise users, and can be used as the basis for the development of chat tools.
DeepSeek-R1-Distill-Llama-8B is a high-performance language model developed by the DeepSeek team, based on the Llama architecture and optimized for reinforcement learning and distillation. The model performs well in reasoning, code generation, and multilingual tasks, and is the first model in the open source community to improve reasoning capabilities through pure reinforcement learning. It supports commercial use, allows modifications and derivative works, and is suitable for academic research and corporate applications.
This product is a 4-bit quantized language model based on Qwen2.5-32B, which achieves efficient reasoning and low resource consumption through GPTQ technology. It significantly reduces the storage and computing requirements of the model while maintaining high performance, making it suitable for use in resource-constrained environments. This model is mainly aimed at application scenarios that require high-performance language generation, such as intelligent customer service, programming assistance, content creation, etc. Its open source license and flexible deployment methods make it suitable for a wide range of applications in commercial and research fields.
ReaderLM v2 is a small language model with 1.5B parameters launched by Jina AI. It is specially used for HTML to Markdown conversion and HTML to JSON extraction, with excellent accuracy. The model supports 29 languages and can handle input and output combination lengths of up to 512K tokens. It adopts a new training paradigm and higher quality training data. Compared with the previous generation product, it has made significant progress in processing long text content and generating Markdown syntax. It can skillfully use Markdown syntax and is good at generating complex elements. In addition, ReaderLM v2 also introduces a direct HTML to JSON generation function, allowing users to extract specific information from the original HTML according to a given JSON schema, eliminating the need for intermediate Markdown conversion.
MiniMax-Text-01 is a large language model developed by MiniMaxAI with 456 billion total parameters, of which 45.9 billion parameters are activated by each token. It adopts a hybrid architecture that combines Lightning Attention, Softmax Attention, and Mix of Experts (MoE) technologies, and extends the training context length to 1 million tokens through advanced parallel strategies and innovative computing-communication overlapping methods, such as Linear Attention Sequence Parallelism Plus (LASP+), Variable Length Ring Attention, Expert Tensor Parallelism (ETP), etc., and can handle contexts up to 4 million tokens during inference. In multiple academic benchmark tests, MiniMax-Text-01 demonstrated the performance of top models.
MiniMax-01 is a powerful language model with 456 billion total parameters, of which 45.9 billion parameters are activated per token. It adopts a hybrid architecture that combines Lightning Attention, Softmax Attention and Mix of Experts (MoE), and extends the training context length to 1 million tokens through advanced parallel strategies and innovative computing-communication overlapping methods, such as Linear Attention Sequence Parallelism Plus (LASP+), Varlen Ring Attention, Expert Tensor Parallelism (ETP), etc., and can handle contexts of up to 4 million tokens during inference. In multiple academic benchmark tests, MiniMax-01 demonstrated the performance of top models.
fullmoon is a local intelligence application developed by Mainframe that allows users to chat with large language models on their local device. It supports complete offline operation, optimizes the model operation of Apple silicon chips, and provides personalized theme, font and system prompt adjustment functions. As a free, open source, and privacy-focused app, it provides users with a simple, secure way to communicate and create using powerful language models.
MiniCPM-o 2.6 is the latest and most powerful model in the MiniCPM-o series. The model is built based on SigLip-400M, Whisper-medium-300M, ChatTTS-200M and Qwen2.5-7B and has 8B parameters. It performs well in visual understanding, voice interaction and multi-modal live broadcast, supporting real-time voice dialogue and multi-modal live broadcast functions. This model has performed well in the open source community, surpassing several well-known models. Its advantages lie in efficient inference speed, low latency, low memory and power consumption, and it can efficiently support multi-modal live broadcast on terminal devices such as iPad. In addition, MiniCPM-o 2.6 is easy to use and supports multiple usage methods, including CPU inference of llama.cpp, quantization models in int4 and GGUF formats, high-throughput inference of vLLM, etc.
rStar-Math is a study that aims to demonstrate that small language models (SLMs) can match or exceed the mathematical reasoning capabilities of OpenAI’s o1 model without relying on higher-level models. The study implements "deep thinking" through Monte Carlo Tree Search (MCTS), in which mathematical policy SLM searches at test time guided by an SLM-based process reward model. rStar-Math introduces three innovative methods to address the challenge of training two SLMs, advancing the mathematical reasoning capabilities of SLMs to the state-of-the-art through 4 rounds of self-evolution and millions of synthetic solutions. The model significantly improves performance on the MATH benchmark and outperforms in the AIME competition.
PatronusAI/Llama-3-Patronus-Lynx-70B-Instruct is a large language model based on the Llama-3 architecture, designed to detect hallucination problems in RAG settings. The model analyzes a given document, question, and answer and evaluates whether the answer is faithful to the document content. Its main advantages lie in its high-precision hallucination detection capabilities and strong language understanding capabilities. This model was developed by Patronus AI and is suitable for scenarios that require high-precision information verification, such as financial analysis, medical research, etc. The model is currently free to use, but specific commercial applications may require contacting the developer.
CAG (Cache-Augmented Generation) is an innovative language model enhancement technology designed to solve the problems of retrieval delay, retrieval errors and system complexity existing in the traditional RAG (Retrieval-Augmented Generation) method. By preloading all relevant resources in the model context and caching their runtime parameters, CAG is able to generate responses directly during inference without the need for real-time retrieval. Not only does this approach significantly reduce latency and improve reliability, it also simplifies system design, making it a practical and scalable alternative. As the context window of large language models (LLMs) continues to expand, CAG is expected to play a role in more complex application scenarios.
PRIME-RL/Eurus-2-7B-PRIME is a 7B parameter language model trained based on the PRIME method, aiming to improve the reasoning capabilities of the language model through online reinforcement learning. The model is trained from Eurus-2-7B-SFT, using the Eurus-2-RL-Data dataset for reinforcement learning. The PRIME method uses an implicit reward mechanism to make the model pay more attention to the reasoning process during the generation process, rather than just the results. The model performed well in multiple inference benchmarks, with an average improvement of 16.7% compared to its SFT version. Its main advantages include efficient inference improvements, lower data and model resource requirements, and excellent performance in mathematical and programming tasks. This model is suitable for scenarios that require complex reasoning capabilities, such as programming problem solving and mathematical problem solving.
Eurus-2-7B-SFT is a large language model fine-tuned based on the Qwen2.5-Math-7B model, focusing on improving mathematical reasoning and problem-solving capabilities. This model learns reasoning patterns through imitation learning (supervised fine-tuning), and can effectively solve complex mathematical problems and programming tasks. Its main advantage lies in its strong reasoning ability and accurate processing of mathematical problems, and is suitable for scenarios that require complex logical reasoning. This model was developed by the PRIME-RL team and aims to improve the model's reasoning capabilities through implicit rewards.
Memory Layers at Scale is an innovative memory layer implementation that uses a trainable key-value lookup mechanism to add additional parameters to the model without increasing the number of floating point operations. This approach is particularly important in large-scale language models because it can significantly improve the storage and retrieval capabilities of the model while maintaining computational efficiency. The main advantages of this technology include efficiently expanding model capacity, reducing computing resource consumption, and improving model flexibility and scalability. This project was developed by the Meta Lingua team and is suitable for scenarios that require processing large-scale data and complex models.
Sonus AI is a large-scale language model with the Sonus-1 model as its core, which redefines the boundaries of language understanding and computing. Sonus-1 is known for its superior ability to solve complex problems, far beyond typical language models. Sonus AI provides enhanced search and real-time information retrieval capabilities, ensuring users have access to the latest and most accurate information. In addition, Sonus AI also plans to launch a developer-friendly API to integrate the powerful capabilities of Sonus-1 into various applications. The product background information of Sonus AI shows that it is a future-oriented technology designed to improve users’ work efficiency and accuracy of information acquisition through advanced AI capabilities.
HuatuoGPT-o1-70B is a large-scale language model (LLM) in the medical field developed by Freedom Intelligence, specially designed for complex medical reasoning. The model generates a complex thought process that reflects and refines its reasoning before providing a final response. HuatuoGPT-o1-70B is able to handle complex medical problems and provide thoughtful answers, which is crucial to improving the quality and efficiency of medical decision-making. The model is based on the LLaMA-3.1-70B architecture, supports English, and can be deployed on a variety of tools, such as vllm or Sglang, or directly for inference.
HuatuoGPT-o1-7B is a large-scale language model (LLM) in the medical field developed by Freedom Intelligence, specially designed for advanced medical reasoning. The model generates complex thought processes that reflect and refine its reasoning before providing a final answer. HuatuoGPT-o1-7B supports Chinese and English, can handle complex medical problems, and outputs results in a 'think-and-answer' format, which is crucial for improving the transparency and reliability of medical decision-making. The model is based on Qwen2.5-7B and has been specially trained to adapt to the needs of the medical field.
YuLan-Mini is a lightweight language model developed by the AI Box team of Renmin University of China with 240 million parameters. Although it only uses 1.08T of pre-training data, its performance is comparable to industry-leading models trained with more data. The model is particularly good at mathematics and coding. In order to promote reproducibility, the team will open source relevant pre-training resources.
This is a multi-modal language model framework developed by a Stanford University research team that aims to unify verbal and non-verbal language in 3D human movements. The model is capable of understanding and generating multimodal data including text, speech, and motion, which is critical for creating virtual characters that can communicate naturally and is widely used in games, movies, and virtual reality. Key advantages of this model include high flexibility, low training data requirements, and the ability to unlock new tasks such as editable gesture generation and predicting emotions from actions.
LiveKit Plugins Turn Detector is a plug-in for LiveKit Agents that introduces end-to-end end-of-speech detection by using a custom open weight model to determine when a user has finished speaking. Compared to traditional acoustic activity detection (VAD) models, the plug-in provides a more accurate and robust end-of-speech detection method using a language model trained specifically for this task. The current version only supports English and is not recommended for other languages.
FACTS Grounding is a comprehensive benchmark launched by Google DeepMind that aims to evaluate whether the responses generated by large language models (LLMs) are not only factually accurate with respect to the given input, but also detailed enough to provide users with satisfactory answers. This benchmark is critical to increasing the trust and accuracy of LLMs in real-world applications, helping to drive factual and fundamental progress across the industry.
Clio is an automated analysis tool developed by Anthropic, designed to analyze the use of language models in the real world while protecting privacy. It helps us understand how users use Claude AI models in their daily lives by abstracting conversations into topic clusters, similar to the Google Trends tool. The main advantage of Clio is its ability to provide insights into AI model usage without violating user privacy, which is crucial for improving the security of AI models. Anthropic takes the protection of user data very seriously, and Clio's design reflects this, ensuring user privacy through multiple layers of privacy protection measures.
Phi-4 is the latest member of Microsoft's Phi series of small language models. It has 14B parameters and is good at complex reasoning fields such as mathematics. Phi-4 strikes a balance between size and quality by using high-quality synthetic datasets, curated organic data, and post-training innovations. Phi-4 embodies Microsoft's technological progress in the field of small language models (SLM) and pushes the boundaries of AI technology. Phi-4 is currently available on Azure AI Foundry and will be available on the Hugging Face platform in the coming weeks.
P-MMEval is a multilingual benchmark covering both basic and ability-specialized datasets. It extends existing benchmarks to ensure consistent language coverage across all datasets and provides parallel samples across multiple languages, supporting up to 10 languages and covering 8 language families. P-MMEval facilitates comprehensive assessment of multilingual proficiency and conducts comparative analysis of cross-language transferability.
DeepSeek-V2.5-1210 is an upgraded version of DeepSeek-V2.5 with improvements in multiple capabilities, including mathematics, coding, and writing reasoning. The model's performance on the MATH-500 benchmark increased from 74.8% to 82.8%, and its accuracy on the LiveCodebench (08.01 - 12.01) benchmark increased from 29.2% to 34.38%. In addition, the new version optimizes the user experience of file upload and web page summary functions. DeepSeek-V2 series (including basic and chat) supports commercial use.
Proofreading AI is an online AI proofreading tool that uses the advanced language model GPT-4/4o to proofread documents and provide accurate results. This tool can not only correct grammatical errors and spelling errors, but also detect plagiarism, remove plagiarized content, detect AI-generated text, humanize AI text, generate citations, and rewrite text. The main advantages of Proofreading AI include seamless document uploading, instant download of corrected documents, and a variety of writing assistance tools. Its background information shows that Proofreading AI offers more features than traditional proofreading tools and is relatively affordable.
INTELLECT-1 Chat is a chat tool driven by a 10B parameter language model trained through global cooperation. It represents the latest progress in large-scale language models in the field of artificial intelligence, improving model diversity and adaptability through decentralized training. Key benefits of this technology include the ability to understand and generate natural language, provide a smooth conversational experience, and be able to process large amounts of language data. The product background information shows that this is the first demonstration of the possibility of distributed training, which is easy to use and fun. In terms of price, the page provides the ability to log in to save and revisit chats, hinting at a possible paid or membership service model.
OLMo-2-1124-13B-DPO is a 13B parameter large-scale language model that has undergone supervised fine-tuning and DPO training. It is mainly targeted at English and aims to provide excellent performance on a variety of tasks such as chat, mathematics, GSM8K and IFEval. This model is part of the OLMo series, which is designed to advance scientific research on language models. Model training is based on the Dolma dataset, and the code, checkpoints, logs and training details are disclosed.
OpenScholar is a retrieval-enhanced language model (LM) designed to help scientists efficiently navigate and synthesize the scientific literature by first searching the literature for relevant papers and then generating answers based on these sources. The model is important for processing the millions of scientific papers published every year, as well as helping scientists find the information they need or keep up with the latest discoveries in a single subfield.
OLMo 2 13B is a Transformer-based autoregressive language model developed by the Allen Institute for AI (Ai2), focusing on English academic benchmarks. The model used up to 5 trillion tokens in the training process, demonstrated performance comparable to or better than fully open models of the same size, and competed with Meta and Mistral's open weight models on English academic benchmarks. The release of OLMo 2 13B includes all code, checkpoints, logs and related training details and is designed to advance scientific research in language models.
OLMo 2 is the latest fully open language model launched by Ai2, including models of 7B and 13B sizes, with training data up to 5T tokens. These models perform on par or better than fully open models of the same size and compete with open weight models such as Llama 3.1 on English academic benchmarks. OLMo 2 was developed with a focus on model training stability, staged training intervention, state-of-the-art post-training methods, and actionable evaluation frameworks. The application of these technologies makes OLMo 2 perform well on multiple tasks, especially in knowledge recall, general knowledge, general and mathematical reasoning.
Tülu 3 is an open source family of advanced language models that are post-trained to adapt to a wider range of tasks and users. These models enable complex training processes by combining partial details of proprietary methodologies, novel techniques, and established academic research. Tülu 3's success is rooted in careful data management, rigorous experimentation, innovative methodologies and improved training infrastructure. By openly sharing data, recipes and findings, Tülu 3 aims to empower the community to explore new and innovative post-training methods.
Lingma SWE-GPT is an open source large-scale language model that focuses on tasks in the field of software engineering and aims to provide intelligent development support. This model is based on the Qwen series of basic models, with additional training to enhance its capabilities in complex software engineering tasks. It performs well on the authoritative ranking of software engineering intelligent agents and is suitable for development teams and researchers who need to automate software improvements.
Nous Research focuses on developing human-centered language models and simulators, working to align AI systems with real-world user experiences. Our main research areas include model architecture, data synthesis, fine-tuning, and inference. We prioritize the development of open source, human-compatible models that challenge traditional closed model approaches.
browser-use is an open source web page automation library that allows large language models (LLM) to interact with websites and implement complex web page operations through a simple interface. The main advantages of this technology include universal support for multiple language models, automatic detection of interactive elements, multi-tab management, XPath extraction, visual model support, etc. It solves some pain points in traditional web page automation, such as dynamic content processing, long task solving, etc. With its flexibility and ease of use, browser-use provides developers with a powerful tool to build more intelligent and automated web interaction experiences.
OuteTTS-0.1-350M is a text-to-speech synthesis technology based on a pure language model. It does not require external adapters or complex architectures and achieves high-quality speech synthesis through carefully designed prompts and audio tags. This model is based on the LLaMa architecture and uses 350M parameters, demonstrating the potential of directly using language models for speech synthesis. It processes audio in three steps: audio tokenization using WavTokenizer, CTC-enforced alignment to create precise word-to-audio token mapping, and creation of structured prompts that follow a specific format. Key advantages of OuteTTS include a pure language modeling approach, sound cloning capabilities, and compatibility with llama.cpp and GGUF formats.
The autoregressive language model developed by Meta adopts an optimized architecture and is suitable for resource-constrained devices. It has many advantages, such as integrating multiple technologies, supporting zero-sample reasoning, etc. It is free and targeted at natural language processing researchers and developers.
MobileLLM-600M is an autoregressive language model developed by Meta. It adopts an optimized Transformer architecture and is designed for device-side applications with limited resources. The model integrates key technologies such as SwiGLU activation function, deep thin architecture, embedding sharing and grouped query attention. MobileLLM-600M has achieved significant performance improvements in zero-sample common sense reasoning tasks. Compared with the previous 125M/350M SoTA model, the accuracy rate has increased by 2.7%/4.3% respectively. The design concept of this model can be extended to larger models, such as MobileLLM-1B/1.5B, which have achieved SoTA results.
MobileLLM-350M is an autoregressive language model developed by Meta. It adopts an optimized Transformer architecture and is specially designed for device-side applications to meet resource-constrained environments. This model integrates key technologies such as SwiGLU activation function, deep thin architecture, embedding sharing and grouped query attention, achieving significant accuracy improvements in zero-sample common sense reasoning tasks. MobileLLM-350M provides performance comparable to larger models while maintaining a smaller model size, making it ideal for on-device natural language processing applications.
MobileLLM-125M is an auto-regressive language model developed by Meta that utilizes an optimized transformer architecture and is designed for resource-constrained device-side applications. The model integrates a number of key technologies including SwiGLU activation function, deep thin architecture, embedding sharing and grouped query attention. MobileLLM-125M/350M achieved 2.7% and 4.3% accuracy improvements respectively in the zero-sample common sense reasoning task compared with the previous generation 125M/350M SoTA model. The design concept of this model can be effectively extended to larger models, and MobileLLM-600M/1B/1.5B have achieved SoTA results.
MobileLLM is a small language model optimized for mobile devices, focusing on designing high-quality LLMs with less than a billion parameters to adapt to the practicality of mobile deployment. Contrary to conventional wisdom, this study emphasizes the importance of model architecture in small LLMs. Through a deep and thin architecture, combined with embedding sharing and grouped query attention mechanisms, MobileLLM achieves significant improvements in accuracy and proposes a block-level weight sharing method that does not increase model size and has small latency overhead. Additionally, the MobileLLM family of models showed significant improvements over previous small models in the chat benchmark and approached LLaMA-v2 7B correctness in the API call task, highlighting the capabilities of small models in common device use cases.
SimpleQA is a factual benchmark released by OpenAI that measures the ability of language models to answer short, fact-seeking questions. It helps evaluate and improve the accuracy and reliability of language models by providing datasets with high accuracy, diversity, challenge, and a good researcher experience. This benchmark is an important advance for training models that produce factually correct responses, helping to increase the model's trustworthiness and broaden its range of applications.
AudioLM is a framework developed by Google Research for high-quality audio generation with long-term consistency. It maps input audio to discrete token sequences and treats audio generation as a language modeling task in this representation space. By training on large amounts of raw audio waveforms, AudioLM learns to generate natural and coherent audio continuations, even in the absence of text or annotations, and is able to generate grammatically and semantically sound speech continuations while maintaining speaker identity and prosody. In addition, AudioLM can generate coherent piano music continuations, although it does not use any symbolic representation of music during training.
CoI-Agent is an intelligent agent based on large language models (LLM), aiming to revolutionize the development of new ideas in the research field through Chain of Ideas. This model provides researchers with innovative ideas and research directions by integrating and analyzing large amounts of data. Its importance lies in its ability to accelerate the scientific research process, improve research efficiency, and help researchers discover new patterns and connections in complex data. CoI-Agent is developed by the DAMO-NLP-SG team and is an open source project that can be used for free.
Spirit LM is a basic multi-modal language model that can freely mix text and speech. The model is based on a 7B pre-trained text language model and extends to speech modes by continuously training on text and speech units. Speech and text sequences are concatenated into a single token stream and trained using a small automatically curated speech-text parallel corpus using a word-level interleaved approach. Spirit LM has two versions: the basic version uses speech phoneme units (HuBERT), while the expression version uses pitch and style units in addition to phoneme units to simulate expressivity. For both versions, the text is encoded using subword BPE tokens. This model not only demonstrates the semantic capabilities of the text model, but also demonstrates the expressive capabilities of the speech model. Furthermore, we show that Spirit LM is able to learn new tasks (e.g., ASR, TTS, speech classification) across modalities with a small number of samples.
Zamba2-7B is a small language model developed by the Zyphra team. It surpasses current leading models such as Mistral, Google's Gemma and Meta's Llama3 series at the 7B scale, both in quality and performance. The model is designed to run on-device and on consumer-grade GPUs, as well as for numerous enterprise applications that require a powerful yet compact and efficient model. The release of Zamba2-7B demonstrates that even at 7B scale, cutting-edge technology can still be reached and surpassed by small teams and modest budgets.
LLMWare.ai is an AI tool designed for finance, legal, compliance and regulatory-intensive industries, focusing on small specialized language models in private clouds and an AI framework designed specifically for SLMs. It provides an integrated, high-quality, and well-organized framework for developing LLM applications for AI agent workflows, Retrieval Augmented Generation (RAG), and other use cases, including many core objects so that developers can get started immediately.
o1 in Medicine is an artificial intelligence model focused on the medical field, aiming to improve medical data processing capabilities and diagnostic accuracy through advanced language model technology. The model, developed by researchers at UC Santa Cruz, the University of Edinburgh, and the National Institutes of Health, demonstrated its potential for application in the medical field by testing on multiple medical data sets. The main advantages of the o1 model include high accuracy, multi-language support, and in-depth understanding of complex medical problems. The development background of this model is based on the current needs in the medical field for efficient and accurate data processing and analysis, especially in diagnosis and treatment recommendations. At present, the research and application of this model are still in the preliminary stage, but its application in medical education and clinical practice has broad prospects.
Platea AI is a platform that provides high-quality prompt information so that users can quickly obtain and compare results from different language model providers and models. It supports running prompts in parallel and quickly compares results to help users choose the most appropriate model.
Entropy-based sampling is a sampling technology based on entropy theory, which is used to improve the diversity and accuracy of language models when generating text. This technique evaluates the uncertainty of the model by calculating the entropy and variance entropy of the probability distribution, thereby adjusting the sampling strategy when the model may fall into a local optimum or be overconfident. This approach helps avoid monotonic repetition of model outputs while increasing output diversity when model uncertainty is high.
WebLLM is a high-performance in-browser language model inference engine that uses WebGPU for hardware acceleration so that powerful language model operations can be executed directly in the web browser without server-side processing. This project aims to integrate large language models (LLMs) directly into the client, thereby enabling cost reduction, enhanced personalization, and privacy protection. It supports multiple models, is compatible with the OpenAI API, is easy to integrate into projects, supports real-time interaction and streaming, and is ideal for building personalized AI assistants.
AMD-Llama-135m is a language model trained based on the LLaMA2 model architecture and can be loaded and used smoothly on AMD MI250 GPU. The model supports generating text and code and is suitable for a variety of natural language processing tasks.
SFR-Judge is a series of evaluation models launched by Salesforce AI Research, aiming to accelerate the evaluation and fine-tuning process of large language models (LLMs) through artificial intelligence technology. These models are capable of performing a variety of evaluation tasks, including pairwise comparisons, single-item scoring, and binary classification, while providing explanations and avoiding black-box problems. SFR-Judge performs well on multiple benchmarks, demonstrating its effectiveness in evaluating model output and guiding fine-tuning.
Show-Me is an open source application designed to provide a visual and transparent alternative to traditional large language model interactions such as ChatGPT. It enables users to understand the step-by-step thought process of language models by decomposing complex problems into a series of reasoning subtasks. The application uses LangChain to interact with language models and visualize the inference process through a dynamic graphical interface.
Llama-3.1-Nemotron-51B is a new language model developed by NVIDIA based on Meta's Llama-3.1-70B. It is optimized through neural architecture search (NAS) technology to achieve high accuracy and efficiency. The model is able to run on a single NVIDIA H100 GPU, significantly reducing memory footprint, reducing memory bandwidth and computational effort while maintaining excellent accuracy. It represents a new balance between accuracy and efficiency of AI language models, providing developers and enterprises with cost-controllable, high-performance AI solutions.
Stability AI is a company focused on generative artificial intelligence technology, providing a variety of AI models, including text to image, video, audio, 3D and language models. These models are capable of processing complex cues and producing realistic images and videos, as well as high-quality music and sound effects. The company offers flexible licensing options, including self-hosted licenses and platform APIs, to meet the needs of different users. Stability AI is committed to providing high-quality AI services to everyone around the world through open models.
DataGemma is the world's first open model designed to help solve the problem of AI hallucinations through massive amounts of real-world statistics from Google's data sharing platform. These models enhance the factuality and reasoning capabilities of the language model through two different methods, thereby reducing hallucinations and improving the accuracy and reliability of AI. The launch of the DataGemma model is an important advancement in AI technology in improving data accuracy and reducing the spread of misinformation. It is of great significance to researchers, policymakers, and ordinary users.
Chat With Your Docs is a Python application that allows users to chat with multiple document formats such as PDFs, web pages, and YouTube videos. Users can ask questions using natural language and the application will provide relevant answers based on the document content. The app uses language models to generate accurate answers. Note that the app only responds to questions related to loaded documents.
ell is a lightweight language model programming library that treats prompts as functions instead of simple strings. ell's design is based on years of experience building and using language models in OpenAI and the startup ecosystem. It provides a new way of programming, allowing developers to define functions to generate string prompts or lists of messages sent to the language model. This encapsulation method creates a clear interface for users, who only need to focus on the data required by LMP. ell also provides a wealth of tools to support monitoring, version control and visualization, turning prompt engineering from a dark art into a science.
rStar is a self-playing mutual reasoning method that significantly improves the reasoning capabilities of small language models (SLMs) by breaking down the reasoning process into solution generation and mutual verification, without the need for fine-tuning or using more advanced models. rStar constructs higher-quality reasoning trajectories through a combination of Monte Carlo Tree Search (MCTS) and human reasoning actions, and verifies the correctness of these trajectories through another SLM with similar capabilities as a discriminator. This approach is extensively experimented on multiple SLMs, demonstrating its effectiveness in solving diverse inference problems.
MiniCPM3-4B is the third generation of the MiniCPM series, surpassing Phi-3.5-mini-Instruct and GPT-3.5-Turbo-0125 in overall performance and on par with many recent 7B to 9B models. Compared with the previous two generations, MiniCPM3-4B has greater versatility and supports function calls and code interpreters, making it more widely applicable to various scenarios. In addition, MiniCPM3-4B has a 32k context window, and with LLMxMapReduce technology, it can theoretically handle unlimited contexts without requiring a lot of memory.
Zamba2-mini is a small language model released by Zyphra Technologies Inc., designed for device-side applications. It achieves evaluation scores and performance comparable to larger models while maintaining a very small memory footprint (<700MB). This model uses 4-bit quantization technology, which has the characteristics of 7 times parameter reduction while maintaining the same performance. Zamba2-mini excels in inference efficiency, with faster first token generation time, lower memory overhead, and lower generation latency than larger models such as Phi3-3.8B. Additionally, the model's weights have been released as open source (Apache 2.0), allowing researchers, developers, and companies to leverage its capabilities and push the boundaries of efficient base models.
Phi-3 is a series of small language models (SLMs) launched by Microsoft Azure with breakthrough performance while keeping cost and latency low. Designed specifically for generative AI solutions, these models are smaller and have lower computational requirements. The Phi-3 model is developed following Microsoft AI principles, including responsibility, transparency, fairness, reliability and safety, privacy and security, and inclusivity, ensuring safety. In addition, Phi-3 also provides functions such as local deployment, accurate and relevant answers, low-latency scenario deployment, cost-constrained task processing, and customized accuracy.
Grok-2 is xAI’s cutting-edge language model with state-of-the-art inference capabilities. This release includes two members of the Grok family: Grok-2 and Grok-2 mini. Both models are now released on the 𝕏 platform for Grok users. Grok-2 is a significant advancement from Grok-1.5, with cutting-edge capabilities in chatting, programming, and inference. At the same time, xAI introduces the Grok-2 mini, a small but powerful brother model of the Grok-2. An early version of Grok-2 has been tested on the LMSYS leaderboard under the name "sus-column-r". It surpasses Claude 3.5 Sonnet and GPT-4-Turbo in terms of overall Elo score.
Turtle Benchmark is a new, uncheatable benchmark based on the 'Turtle Soup' game that focuses on evaluating the logical reasoning and context understanding capabilities of large language models (LLMs). It provides objective and unbiased testing results by eliminating the need for background knowledge, with quantifiable results, and by using real user-generated questions so that the model cannot be 'gamified'.
Qwen2-Math is a series of language models built based on Qwen2 LLM specifically for mathematical problem solving. Its performance on mathematics-related tasks surpasses existing open source and closed source models, providing important help to the scientific community in solving advanced mathematical problems that require complex multi-step logical reasoning.
Peach-9B-8k-Roleplay is a large language model fine-tuned specifically for role-playing dialogue. It is based on the 01-ai/Yi-1.5-9B model and is trained on over 100K conversations through data synthesis methods. Although the model parameters are smaller, it may perform best in language models with less than 34B parameters.
RedCache-AI is a dynamic memory framework designed for large language models and agents, which allows developers to build a wide range of applications from AI-driven dating apps to medical diagnostic platforms. It solves the problem of existing solutions being expensive, closed source, or lacking extensive support for external dependencies.
llm-colosseum is an innovative benchmarking tool that uses the Street Fighter 3 game to evaluate the real-time decision-making capabilities of large language models (LLMs). Unlike traditional benchmarks, this tool tests a model's quick reactions, smart strategies, innovative thinking, adaptability, and resilience by simulating actual game scenarios.
Llama3.1-8B-Chinese-Chat is an imperative tuning language model based on the Meta-Llama-3.1-8B-Instruct model. It is specially designed for Chinese and English users and has multiple capabilities such as role-playing and tool usage. The model is fine-tuned through the ORPO algorithm, which significantly reduces the number of Chinese questions answered in English and answers mixed Chinese and English questions, especially in role-playing, function calling and mathematical abilities.
Meta Llama 3.1 is a series of pre-trained and instruction-tuned multilingual large language models (LLMs) supporting 8 languages, optimized for conversational use cases, and improved safety and usefulness through supervised fine-tuning (SFT) and reinforcement learning with human feedback (RLHF).
Meta Llama 3.1-405B is a series of large-scale multi-language pre-trained language models developed by Meta, including models of three sizes: 8B, 70B and 405B. These models feature an optimized Transformer architecture, tuned using Supervised Fine-Tuning (SFT) and Reinforcement Learning with Human Feedback (RLHF) to match human preferences for helpfulness and safety. Llama 3.1 models support multiple languages, including English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai. The model performs well on a variety of natural language generation tasks and outperforms many existing open source and closed chat models on industry benchmarks.
Aphrodite is the official backend engine of PygmalionAI, designed to provide the inference endpoint for the PygmalionAI website and allow serving Pygmalion models to large numbers of users at extremely fast speeds. Aphrodite utilizes vLLM's paging attention technology to implement features such as continuous batch processing, efficient key value management, and optimized CUDA kernels, and supports multiple quantization schemes to improve inference performance.
DCLM-baseline is a pre-training data set for language model benchmark testing, containing 4T tokens and 3B documents. It is extracted from the Common Crawl dataset through carefully planned data cleaning, filtering and deduplication steps, aiming to demonstrate the importance of data curation in training efficient language models. This dataset is for research use only and is not suitable for production environments or domain-specific model training, such as coding and mathematics.
DCLM-Baseline-7B is a 700 million parameter language model developed by the DataComp for Language Models (DCLM) team and mainly uses English. This model aims to improve the performance of language models through systematic data curation techniques. The model training uses PyTorch and OpenLM framework, the optimizer is AdamW, the learning rate is 2e-3, the weight attenuation is 0.05, the batch size is 2048 sequences, the sequence length is 2048 tokens, and the total number of training tokens reaches 2.5T. The model training hardware uses H100 GPU.