Found 100 AI tools
Click any tool to view details
GPT 5 is the next milestone in the development of AI, with unparalleled capabilities. Benefits include enhanced reasoning, advanced problem-solving, and unprecedented understanding. Please refer to the official website for price information.
Grok 4 is the latest version of the large-scale language model launched by xAI, which will be officially released in July 2025. It has leading natural language, mathematics and reasoning capabilities and is a top model AI. Grok 4 represents a huge step forward, skipping the expected Grok 3.5 version to speed up progress in the fierce AI competition.
This platform is a resource platform focusing on AI pre-training models, integrating a large number of pre-training models of different types, scales and application scenarios. Its importance lies in providing AI developers and researchers with convenient access to models and lowering the threshold for model development. The main advantages include detailed model classification, powerful multi-dimensional filtering function, detailed information display and intelligent recommendations. The product background is that with the development of AI technology, the demand for pre-trained models is growing day by day, and the platform emerged as the times require. The platform is mainly positioned as an AI model resource platform. Some models are free for commercial use, and some may require payment. The specific price varies depending on the model.
Pythagora is an all-round AI development platform that provides real debugging tools and production capabilities to help you launch practical applications. Its main advantage is that it provides powerful AI development capabilities to make applications more intelligent.
DeepSeek R1-0528 is the latest version released by DeepSeek, a well-known open source large model platform, with high-performance natural language processing and programming capabilities. Its release attracted widespread attention due to its excellent performance in programming tasks and its ability to accurately answer complex questions. This model supports a variety of application scenarios and is an important tool for developers and AI researchers. It is expected that more detailed model information and usage guides will be released in the future to enhance its functionality and application breadth.
DMind-1 and DMind-1-mini are domain-specific large-scale language models for Web3 tasks, providing higher domain accuracy, instruction following capabilities, and professional understanding than other general-purpose models. Fine-tuned with expert-curated Web3 data and aligned with human feedback through reinforcement learning, DMind-1 is suitable for complex instructions and multi-turn conversations, and is suitable for areas such as blockchain, DeFi and smart contracts. DMind-1-mini, as a lighter version, is designed to meet real-time and resource-efficient application scenarios, and is especially suitable for agent deployment and on-chain tools. Product pricing and specific information require further confirmation.
ZeroSearch is a novel reinforcement learning framework designed to motivate the search capabilities of large language models (LLMs) without interacting with actual search engines. Through supervised fine-tuning, ZeroSearch transforms LLM into a retrieval module capable of generating relevant and irrelevant documents, and introduces a course rollout mechanism to gradually stimulate the model's reasoning capabilities. The main advantage of this technology is that it outperforms models based on real search engines while incurring zero API cost. It is suitable for LLMs of all sizes and supports different reinforcement learning algorithms, making it suitable for research and development teams that require efficient retrieval capabilities.
DeepSeek-Prover-V2-671B is an advanced artificial intelligence model designed to provide powerful inference capabilities. It is based on the latest technology and suitable for a variety of application scenarios. This model is open source and aims to promote the democratization and popularization of artificial intelligence technology, lower technical barriers, and enable more developers and researchers to use AI technology to innovate. By using this model, users can improve their work efficiency and promote the progress of various projects.
Xiaomi MiMo is the first large-scale reasoning model open sourced by Xiaomi. It is specially designed for reasoning tasks and has excellent mathematical reasoning and code generation capabilities. The model performed well on the public evaluation sets of mathematical reasoning (AIME 24-25) and code competition (LiveCodeBench v5), surpassing larger-scale models such as OpenAI's o1-mini and Alibaba Qwen's QwQ-32B-Preview with only 7B parameter scale. MiMo significantly improves reasoning capabilities through multi-level innovations in the pre-training and post-training stages, including data mining, training strategies, and reinforcement learning algorithms. The open source of this model provides researchers and developers with powerful tools and promotes the further development of artificial intelligence in the field of reasoning.
Arkain is a CDE service designed to maximize developer and team productivity. It provides powerful collaboration capabilities to develop and deploy services anytime, anywhere.
Qwen3 is the latest large-scale language model launched by the Tongyi Qianwen team, aiming to provide users with efficient and flexible solutions through powerful thinking and rapid response capabilities. The model supports multiple thinking modes, can flexibly adjust the depth of reasoning according to task requirements, and supports 119 languages and dialects, making it suitable for international applications. The release and open source of Qwen3 will greatly promote the research and development of large-scale basic models and help researchers, developers and organizations around the world use cutting-edge models to build innovative solutions.
XcodeBuildMCP is a server that implements the Model Context Protocol (MCP), designed for programmatic interaction with Xcode projects through a standardized interface. The tool eliminates reliance on manual operations and potentially erroneous command line calls, providing developers and AI assistants with an efficient and reliable workflow. It streamlines the development process by allowing AI agents to automatically verify code changes, build projects, and check for errors.
GPT-4.1 is a family of new models that provide significant performance improvements, particularly in encoding, instruction following, and processing long text contexts. Its context window expands to 1 million tokens and performs well in real-world applications, making it suitable for developers to create more efficient applications. This model is relatively low-priced and offers fast response times, making it more efficient when developing and executing complex tasks.
GLM-4-32B is a high-performance generative language model designed to handle a variety of natural language tasks. It is trained using deep learning technology to generate coherent text and answer complex questions. This model is suitable for academic research, commercial applications and developers. It is reasonably priced and accurately positioned. It is a leading product in the field of natural language processing.
Skywork-OR1 is a high-performance mathematical code reasoning model developed by the Kunlun Wanwei Tiangong team. This model series achieves industry-leading reasoning performance under the same parameter scale, breaking through the bottleneck of large models in logical understanding and complex task solving. The Skywork-OR1 series includes three models: Skywork-OR1-Math-7B, Skywork-OR1-7B-Preview and Skywork-OR1-32B-Preview, which focus on mathematical reasoning, general reasoning and high-performance reasoning tasks respectively. This open source not only covers model weights, but also fully opens the training data set and complete training code. All resources have been uploaded to GitHub and Huggingface platforms, providing a fully reproducible practical reference for the AI community. This comprehensive open source strategy helps promote the common progress of the entire AI community in reasoning ability research.
Dream 7B is the latest diffusion large language model jointly launched by the NLP Group of the University of Hong Kong and Huawei's Noah's Ark Laboratory. It has demonstrated excellent performance in the field of text generation, especially in areas such as complex reasoning, long-term planning, and contextual coherence. This model adopts advanced training methods, has strong planning capabilities and flexible reasoning capabilities, and provides more powerful support for various AI applications.
Llama-3.1-Nemotron-Ultra-253B-v1 is a large language model based on Llama-3.1-405B-Instruct, which undergoes multi-stage post-training to improve reasoning and chatting capabilities. This model supports context lengths up to 128K, has a good balance between accuracy and efficiency, is suitable for commercial use, and aims to provide developers with powerful AI assistant functions.
ComfyUI-Copilot is an intelligent assistant based on the Comfy-UI framework, designed to simplify and enhance the debugging and deployment process of AI algorithms through natural language interaction. The product is designed to lower the development barrier and make it easy for even beginners to use. Its intelligent recommendation function and real-time support can significantly improve development efficiency and solve problems encountered during the development process. At the same time, ComfyUI-Copilot supports multiple models and provides detailed node queries and workflow suggestions to provide users with comprehensive development assistance. The project is still in its early stages, and users can get the latest code and feature updates via GitHub.
Qwen2.5-Omni is a new generation of end-to-end multi-modal flagship model launched by Alibaba Cloud Tongyi Qianwen team. Designed for all-round multi-modal perception, the model can seamlessly process multiple input forms such as text, images, audio and video, and simultaneously generate text and natural speech synthesis output through real-time streaming responses. Its innovative Thinker-Talker architecture and TMRoPE positional encoding technology enable it to perform well in multi-modal tasks, especially in audio, video and image understanding. The model outperforms similar-sized single-modal models on multiple benchmarks, demonstrating strong performance and broad application potential. Currently, Qwen2.5-Omni is open source on Hugging Face, ModelScope, DashScope and GitHub, providing developers with rich usage scenarios and development support.
Gemini 2.5 is the most advanced AI model launched by Google. It has efficient inference and coding performance, can handle complex problems, and performs well in multiple benchmark tests. The model introduces new thinking capabilities, combines enhanced basic models and post-training to support more complex tasks, aiming to provide powerful support for developers and enterprises. Gemini 2.5 Pro is available in Google AI Studio and the Gemini app for users who require advanced inference and coding capabilities.
Miaida is the first no-code tool created by Baidu. It aims to allow everyone to realize any idea through natural language and build various applications without writing code. The platform greatly lowers the threshold for application development and improves development efficiency through functions such as conversational development, multi-agent collaboration, and multi-tool invocation. The launch of Miaida marks that application development has entered a new era, making the realization of ideas simpler, faster and more efficient. Miaida is currently in the free trial phase. Users can experience its powerful functions for free, providing efficient and low-cost application development solutions for individuals and enterprises.
Gemini Embedding is an experimental text embedding model launched by Google and served through the Gemini API. The model outperforms previous top models on the Multilingual Text Embedding Benchmark (MTEB). It can convert text into high-dimensional numerical vectors, capture semantic and contextual information, and is widely used in retrieval, classification, similarity detection and other scenarios. Gemini Embedding supports more than 100 languages, has 8K input mark length and 3K output dimension, and introduces nested representation learning (MRL) technology to flexibly adjust the dimension to meet storage needs. The model is currently in the experimental stage and a stable version will be launched in the future.
Instella is a series of high-performance open source language models developed by the AMD GenAI team and trained on the AMD Instinct™ MI300X GPU. The model significantly outperforms other open source language models of the same size and is functionally comparable to models such as Llama-3.2-3B and Qwen2.5-3B. Instella provides model weights, training code, and training data to advance the development of open source language models. Its key benefits include high performance, open source and optimized support for AMD hardware.
Mahilo is a powerful AI agent integration platform designed to connect AI agents from different frameworks together to enable real-time communication and human supervision. It supports a variety of popular agent frameworks, such as LangGraph, Pydantic AI, etc., by providing a framework-independent communication protocol, while allowing proprietary agents to be connected through APIs. The platform emphasizes intelligent collaboration, organizational-level policy management, and human-centered design to ensure automation while maintaining human control. The emergence of Mahilo provides a flexible solution for building complex multi-agent systems, suitable for a variety of application scenarios from content creation to emergency response. Currently, Mahilo has 251 stars on GitHub and over 500 PyPI downloads per month, showing its popularity among the developer community. Mahilo is mainly aimed at developers and enterprise users, helping them quickly build and deploy multi-agent systems to improve work efficiency and innovation capabilities.
GibberLink is an AI communication model based on the ggwave data transmission protocol. It allows two independent AI agents to switch from English to a voice-level protocol to communicate after recognizing each other as AI in a conversation. This technology demonstrates the flexibility of AI in identifying and switching communication methods, and has important research and application value. The project is based on an open source protocol and is suitable for developers to carry out secondary development and integration. There's no explicit mention of price, but its open-source nature means it's free for developers to use and extend.
OOMOL Studio is an AI workflow IDE for developers and data scientists. It helps users easily connect code snippets and API services through intuitive visual interactions, thereby shortening the distance from ideas to products. This product supports programming languages such as Python and Node.js, and has built-in rich AI function nodes and large model APIs, which can meet the needs of users in multiple scenarios such as data processing and multimedia processing. Its main advantages include intuitive interaction, pre-installed environment, programming friendliness and community sharing. The product is positioned as an efficient and convenient AI development tool suitable for users with different technical levels.
ViDoRAG is a new multi-modal retrieval-enhanced generation framework developed by Alibaba's natural language processing team, specifically designed for complex reasoning tasks in processing visually rich documents. This framework significantly improves the robustness and accuracy of the generative model through dynamic iterative inference agents and a Gaussian Mixture Model (GMM)-driven multi-modal retrieval strategy. The main advantages of ViDoRAG include efficient processing of visual and textual information, support for multi-hop reasoning, and high scalability. The framework is suitable for scenarios where information needs to be retrieved and generated from large-scale documents, such as intelligent question answering, document analysis and content creation. Its open source nature and flexible modular design make it an important tool for researchers and developers in the field of multimodal generation.
M2RAG is a benchmark code library for retrieval augmentation generation in multimodal contexts. It answers questions by retrieving documents across multiple modalities and evaluates the ability of multimodal large language models (MLLMs) in leveraging multimodal contextual knowledge. The model was evaluated on tasks such as image description, multimodal question answering, fact verification, and image rearrangement, aiming to improve the effectiveness of the model in multimodal context learning. M2RAG provides researchers with a standardized testing platform that helps advance the development of multimodal language models.
Phi-4-mini-instruct is a lightweight open source language model launched by Microsoft and belongs to the Phi-4 model family. It is trained on synthetic data and filtered public website data, focusing on high-quality, inference-intensive data. The model supports 128K token context lengths and enhances instruction compliance and security through supervised fine-tuning and direct preference optimization. Phi-4-mini-instruct performs well in multi-language support, reasoning capabilities (especially mathematical and logical reasoning), and low-latency scenarios, making it suitable for resource-constrained environments. The model was released in February 2025 and supports multiple languages, including English, Chinese, Japanese, and more.
Cloudflare AI Agents is a platform based on Cloudflare Workers and Workers AI, designed to help developers build AI agents that can perform tasks autonomously. The platform enables developers to quickly create, deploy and manage AI agents by providing agents-sdk and other tools. Its main advantages are low latency, high scalability and cost-effectiveness, while supporting automation and dynamic decision-making for complex tasks. Cloudflare's globally distributed network and Durable Objects technology provide a powerful foundation for AI agents.
bRAG-langchain is an open source project focusing on the research and application of Retrieval-Augmented Generation (RAG) technology. RAG is an AI technology that combines retrieval and generation, providing users with more accurate and richer information by retrieving relevant documents and generating answers. This project provides a basic to advanced RAG implementation guide to help developers quickly get started and build their own RAG applications. Its main advantages are that it is open source, flexible and easy to expand, and is suitable for various application scenarios requiring natural language processing and information retrieval.
Langflow is a low-code tool for developers focused on simplifying the process of building AI agents and workflows. It allows developers to quickly build complex AI applications through a visual interface and supports the integration of multiple APIs, models and databases. This tool helps developers focus on creativity rather than complex code implementation by providing a rich set of pre-built components and customizable options. Langflow provides a free trial and supports deployment on cloud platforms, making it suitable for a wide range of users from individual developers to enterprise teams.
NovaSky is an artificial intelligence technology platform focused on improving the performance of code generation and inference models. It significantly improves the performance of non-inference models through innovative test-time expansion techniques (such as S*), reinforcement learning distilled inference and other techniques, making it outstanding in the field of code generation. The platform is committed to providing developers with efficient, low-cost model training and optimization solutions to help them achieve higher efficiency and accuracy in programming tasks. NovaSky's technical background originates from Sky Computing Lab @ Berkeley, with strong academic support and cutting-edge technical research foundation. Currently, NovaSky provides a variety of model optimization methods, including but not limited to inference cost optimization and model distillation technology, to meet the needs of different developers.
Lora is a local language model optimized for mobile devices that can be quickly integrated into mobile applications through its SDK. It supports iOS and Android platforms, has performance comparable to GPT-4o-mini, has a size of 1.5GB and 2.4 billion parameters, and is optimized for real-time mobile inference. Lora's main advantages include low energy consumption, lightweight and fast response, giving it significant advantages in energy consumption, volume and speed compared to other models. Lora is provided by PeekabooLabs, mainly for developers and enterprise customers, helping them quickly integrate advanced language model capabilities into mobile applications to improve user experience and application competitiveness.
PIKE-RAG is a domain knowledge and reasoning enhanced generative model developed by Microsoft, designed to enhance the capabilities of large language models (LLM) through knowledge extraction, storage and reasoning logic. Through multi-module design, this model can handle complex multi-hop question and answer tasks, and significantly improves the accuracy of question and answer in fields such as industrial manufacturing, mining, and pharmaceuticals. The main advantages of PIKE-RAG include efficient knowledge extraction capabilities, powerful multi-source information integration capabilities, and multi-step reasoning capabilities, making it perform well in scenarios that require deep domain knowledge and complex logical reasoning.
SWE-Lancer is a benchmark launched by OpenAI to evaluate the performance of cutting-edge language models on real-world free software engineering tasks. The benchmark covers a variety of independent engineering tasks ranging from a $50 bug fix to a $32,000 feature implementation, as well as management tasks such as the selection of a model between technical implementation options. By mapping performance to monetary value through models, SWE-Lancer provides a new perspective on the economic impact of AI model development and advances related research.
Prototype is a template for quickly building Django projects. It integrates OpenAI functions and achieves convenient deployment through Docker containerization. It provides developers with an efficient starting point to quickly get up and running a web application with artificial intelligence capabilities. This template helps developers focus on the development of core functions by simplifying the environment configuration and project construction process, while using the powerful capabilities of OpenAI to expand the intelligent features of applications. The project is open source and adopts the MIT license, making it suitable for developers who want to quickly develop intelligent web applications.
OmniParser V2 is an advanced artificial intelligence model developed by Microsoft's research team, designed to transform large language models (LLM) into intelligent agents capable of understanding and operating graphical user interfaces (GUIs). This technology enables LLM to more accurately identify interactable icons and perform predetermined actions on the screen by converting interface screenshots from pixel space into interpretable structural elements. OmniParser V2 has made significant improvements in detecting small icons and fast inference, achieving an average accuracy of 39.6% on the ScreenSpot Pro benchmark when combined with GPT-4o, far exceeding the original model's 0.8%. In addition, OmniParser V2 also provides the OmniTool tool, which supports use with a variety of LLMs, further promoting the development of GUI automation.
OpenThinker-32B is an open source reasoning model developed by the Open Thoughts team. It achieves powerful inference capabilities by scaling data scale, validating inference paths, and scaling model sizes. The model outperforms existing open data inference models on inference benchmarks in mathematics, code, and science. Its main advantages include open source data, high performance and scalability. The model is fine-tuned based on Qwen2.5-32B-Instruct and trained on large-scale data sets, aiming to provide researchers and developers with powerful inference tools.
R1-V is a project focused on enhancing the generalization capabilities of visual language models (VLM). It significantly improves the generalization ability of VLM in visual counting tasks through reinforcement learning with verifiable rewards (RLVR) technology, especially in the out-of-distribution (OOD) test. The importance of this technology lies in its ability to achieve efficient optimization of large-scale models at extremely low cost (only a training cost of $2.62), providing new ideas for the practical use of visual language models. The project background is based on the improvement of existing VLM training methods. The goal is to improve the model's performance in complex visual tasks through innovative training strategies. The open source nature of R1-V also makes it an important resource for researchers and developers to explore and apply advanced VLM technology.
Dolphin R1 is a dataset created by the Cognitive Computations team to train inference models similar to the DeepSeek-R1 Distill model. This data set contains 300,000 inference samples from DeepSeek-R1, 300,000 inference samples from Gemini 2.0 flash thinking, and 200,000 Dolphin chat samples. The combination of these data sets provides researchers and developers with rich training resources that help improve the model's reasoning and conversational capabilities. The creation of this data set was sponsored by Dria, Chutes, Crusoe Cloud and other companies, which provided computing resources and financial support for the development of the data set. The release of the Dolphin R1 data set provides an important foundation for research and development in the field of natural language processing and promotes the development of related technologies.
This product is a React component designed for RAG (Retrieval Augmented Generation) AI Assistant. It combines Upstash Vector for similarity search, Together AI for LLM (Large Language Model), and Vercel AI SDK for streaming responses. This component-based design allows developers to quickly integrate RAG capabilities into Next.js applications, greatly simplifying the development process and providing a high degree of customizability. Its main advantages include responsive design, support for streaming responses, persistence of chat history, and support for dark/light mode. This component is mainly aimed at developers who need to integrate intelligent chat functions in web applications, especially those teams using the Next.js framework. It reduces development costs by simplifying the integration process while providing powerful functionality.
Tülu 3 405B is an open source language model developed by the Allen Institute for AI with 405 billion parameters. The model improves performance through an innovative reinforcement learning framework (RLVR), especially in mathematics and instruction following tasks. It is optimized based on the Llama-405B model and uses techniques such as supervised fine-tuning and preference optimization. The open source nature of Tülu 3 405B makes it a powerful tool in research and development for a variety of applications requiring high-performance language models.
DeepSeek-R1-Distill-Qwen-1.5B is an open source language model developed by the DeepSeek team and is optimized for distillation based on the Qwen2.5 series. The model uses large-scale reinforcement learning and data distillation techniques to significantly improve reasoning capabilities and performance while maintaining a small model size. It performs well on multiple benchmarks, with significant advantages in math, code generation, and reasoning tasks. The model supports commercial use and allows users to modify and develop derivative works. It is suitable for research institutions and enterprises to develop high-performance natural language processing applications.
DeepSeek-R1-Distill-Qwen-14B is a distillation model based on Qwen-14B developed by the DeepSeek team, focusing on reasoning and text generation tasks. This model uses large-scale reinforcement learning and data distillation technology to significantly improve reasoning capabilities and generation quality, while reducing computing resource requirements. Its main advantages include high performance, low resource consumption, and broad applicability to scenarios that require efficient reasoning and text generation.
DeepSeek-R1-Distill-Llama-70B is a large language model developed by the DeepSeek team, based on the Llama-70B architecture and optimized through reinforcement learning. The model performs well in reasoning, conversational and multilingual tasks and supports a variety of application scenarios, including code generation, mathematical reasoning and natural language processing. Its main advantages are efficient reasoning capabilities and the ability to solve complex problems, while supporting open source and commercial use. This model is suitable for enterprises and research institutions that require high-performance language generation and reasoning capabilities.
ShipAny is a NextJS template designed specifically for building AI SaaS startup projects. It helps developers quickly launch projects within hours through rich templates, components, and preconfigured infrastructure. Its main advantages include efficient time saving, strong technical support and flexible customization capabilities. ShipAny aims to lower the technical threshold for AI entrepreneurship, allowing developers and entrepreneurs to focus on core business logic and quickly transform ideas into actual products. Its pricing strategy is clear and suitable for entrepreneurs at different stages.
DeepSeek-R1-Zero is an inference model developed by the DeepSeek team, focusing on improving the model's inference capabilities through reinforcement learning. The model exhibits powerful reasoning behaviors such as self-verification, reflection, and generation of long-chain reasoning without the need for supervised fine-tuning. Its key benefits include efficient inference capabilities, availability without pre-training, and superior performance on math, coding, and reasoning tasks. The model is developed based on the DeepSeek-V3 architecture, supports large-scale inference tasks, and is suitable for research and commercial applications.
DeepSeek-R1 is the first-generation inference model launched by the DeepSeek team. It is trained through large-scale reinforcement learning and can demonstrate excellent inference capabilities without supervised fine-tuning. The model performs well on math, coding, and inference tasks and is comparable to the OpenAI-o1 model. DeepSeek-R1 also provides a variety of distillation models suitable for scenarios with different scales and performance requirements. Its open source nature provides powerful tools for the research community and supports commercial use and secondary development.
kokoro-onnx is a text-to-speech (TTS) project based on the Kokoro model and ONNX runtime. It supports English, with plans to support French, Japanese, Korean and Chinese. This model features fast, near-real-time performance on macOS M1 and offers multiple sound options, including whispers. The model is lightweight, about 300MB (about 80MB after quantization). The project is open source on GitHub and adopts the MIT license to facilitate integration and use by developers.
SakanaAI/self-adaptive-llms is an adaptive framework called Transformer² that aims to solve the challenges of traditional fine-tuning methods being computationally intensive and static in their ability to handle diverse tasks. The framework is able to adapt large language models (LLMs) to unseen tasks in real time during inference via a two-step mechanism: first, the scheduling system identifies task attributes; then, task-specific 'expert' vectors trained using reinforcement learning are dynamically blended to obtain target behaviors for the input prompts. Key advantages include real-time task adaptability, computational efficiency, and flexibility. The project was developed by the SakanaAI team and is currently open source on GitHub, with 195 stars and 12 forks.
InternLM3-8B-Instruct is a large language model developed by the InternLM team with excellent reasoning capabilities and knowledge-intensive task processing capabilities. While using only 4 trillion high-quality words for training, this model achieves a training cost that is more than 75% lower than models of the same level. At the same time, it surpasses models such as Llama3.1-8B and Qwen2.5-7B in multiple benchmark tests. It supports deep thinking mode, can solve complex reasoning tasks through long thinking chains, and also has smooth user interaction capabilities. This model is open source based on the Apache-2.0 license and is suitable for various application scenarios that require efficient reasoning and knowledge processing.
MiniMax-01 is a powerful language model with 456 billion total parameters, of which 45.9 billion parameters are activated per token. It adopts a hybrid architecture that combines Lightning Attention, Softmax Attention and Mix of Experts (MoE), and extends the training context length to 1 million tokens through advanced parallel strategies and innovative computing-communication overlapping methods, such as Linear Attention Sequence Parallelism Plus (LASP+), Varlen Ring Attention, Expert Tensor Parallelism (ETP), etc., and can handle contexts of up to 4 million tokens during inference. In multiple academic benchmark tests, MiniMax-01 demonstrated the performance of top models.
Dria-Agent-a-3B is a large language model based on the Qwen2.5-Coder series, focusing on agent applications. It uses Pythonic function calling, with the advantages of single parallel multiple function calls, free-form reasoning and action, and on-the-fly complex solution generation. The model performs well on multiple benchmarks such as Berkeley Function Calling Leaderboard (BFCL), MMLU-Pro, and Dria-Pythonic-Agent-Benchmark (DPAB). The model size is 3.09B parameters and supports BF16 tensor type.
Dria-Agent-α is a large language model (LLM) tool interaction framework launched by Hugging Face. It calls tools through Python code. Compared with the traditional JSON mode, it can more fully utilize the reasoning capabilities of LLM, allowing the model to solve complex problems in a way that is closer to human natural language. The framework leverages Python’s popularity and near-pseudocode syntax to make LLM perform better in agency scenarios. Dria-Agent-α was developed using the synthetic data generation tool Dria, which uses a multi-stage pipeline to generate realistic scenes and train models for complex problem solving. Currently, two models, Dria-Agent-α-3B and Dria-Agent-α-7B, have been released on Hugging Face.
Nemotron-CC is a 6.3 trillion token data set based on Common Crawl. It transforms the English Common Crawl into a 6.3 trillion token long-term pre-training data set through classifier integration, synthetic data rewriting, and reduced dependence on heuristic filters, containing 4.4 trillion global deduplicated original tokens and 1.9 trillion synthetically generated tokens. This dataset achieves a better balance between accuracy and data volume and is of great significance for training large language models.
Smolagents is a minimalist AI agent framework developed by the Hugging Face team, aiming to allow developers to deploy powerful agents with only a small amount of code. It focuses on code agents, where agents perform tasks by writing and executing Python code snippets, rather than generating JSON or text blocks. This model takes advantage of the ability of large language models (LLMs) to generate and understand code, provides better composition, flexibility and rich training data utilization, and can efficiently handle complex logic and object management. Smolagents is deeply integrated with Hugging Face Hub to facilitate the sharing and loading of tools and promote community collaboration. In addition, it also supports traditional tool calling agents and is compatible with a variety of LLMs, including models on Hugging Face Hub and models integrated through LiteLLM such as OpenAI and Anthropic. The emergence of Smolagents lowers the threshold for AI agent development and enables developers to build and deploy AI-driven applications more conveniently.
Sky-T1-32B-Preview is an inference model developed by the NovaSky team at the University of California, Berkeley. The model performed well on popular inference and programming benchmarks, on par with o1-preview, and cost less than $450 to train, demonstrating the possibility of cost-effectively replicating advanced inference capabilities. The model is fully open source, including data, code, and model weights, and is designed to advance the academic and open source communities. Its main advantages are low cost, high performance and open source, providing a valuable resource for researchers and developers.
This model is a quantified version of a large-scale language model. It uses 4-bit quantization technology to reduce storage and computing requirements. It is suitable for natural language processing. The parameter size is 8.03B. It is free and can be used for non-commercial purposes. It is suitable for those who require high-performance language applications in resource-constrained environments.
Voyage-3-large is the latest multi-language universal embedding model launched by Voyage AI. The model ranks first in 100 data sets in eight fields including law, finance, and code, surpassing OpenAI-v3-large and Cohere-v3-English. It uses Matryoshka learning and quantization-aware training to support smaller dimensions and int8 and binary quantization, significantly reducing vector database costs while having minimal impact on retrieval quality. The model also supports 32K token context length, far exceeding OpenAI (8K) and Cohere (512).
PaliGemma 2 is a visual-language model developed by Google. It combines the capabilities of the SigLIP visual model and the Gemma 2 language model. It can process image and text input and generate corresponding text output. This model performs well on a variety of visual-language tasks, such as image description, visual question answering, etc. Its main advantages include powerful multi-language support, efficient training architecture, and excellent performance on a variety of tasks. The development background of PaliGemma 2 is to solve the complex interaction problem between vision and language and help researchers and developers make breakthroughs in related fields.
PaliGemma 2 is a visual-language model developed by Google. It inherits the capabilities of the Gemma 2 model and is able to process image and text input and generate text output. The model performs well on a variety of visual language tasks, such as image description, visual question answering, etc. Its main advantages include strong multi-language support, efficient training architecture and wide applicability. This model is suitable for various application scenarios that require processing of visual and textual data, such as social media content generation, intelligent customer service, etc.
InternVL2_5-26B-MPO-AWQ is a multi-modal large-scale language model developed by OpenGVLab, aiming to improve the model's reasoning capabilities through mixed preference optimization. The model performs well in multi-modal tasks and is able to handle complex relationships between images and text. It adopts advanced model architecture and optimization technology, giving it significant advantages in multi-modal data processing. This model is suitable for scenarios that require efficient processing and understanding of multi-modal data, such as image description generation, multi-modal question answering, etc. Its main advantages include powerful inference capabilities and efficient model architecture.
WebUI is a user interface built on Gradio, designed to provide a convenient browser interaction experience for AI agents. This product supports a variety of large language models (LLM), such as Gemini, OpenAI, etc., allowing users to choose the appropriate model for interaction according to their own needs. The main advantage of WebUI is its user-friendly interface design and powerful customization functions. Users can use their own browsers to operate, avoiding the problem of repeated login and authentication. In addition, WebUI also supports high-definition screen recording function, providing users with more usage scenarios. This product is positioned to provide developers and researchers with a simple and easy-to-use AI interaction platform to help them better develop and research AI applications.
mlabonne/llm-datasets is a collection of high-quality datasets and tools focused on fine-tuning large language models (LLMs). The product provides researchers and developers with a range of carefully selected and optimized datasets to help them better train and optimize their language models. Its main advantage lies in the diversity and high quality of the data set, which can cover a variety of usage scenarios, thus improving the generalization ability and accuracy of the model. In addition, the product provides tools and concepts to help users better understand and use these data sets. Background information includes being created and maintained by mlabonne to advance the field of LLM.
CAG (Cache-Augmented Generation) is an innovative language model enhancement technology designed to solve the problems of retrieval delay, retrieval errors and system complexity existing in the traditional RAG (Retrieval-Augmented Generation) method. By preloading all relevant resources in the model context and caching their runtime parameters, CAG is able to generate responses directly during inference without the need for real-time retrieval. Not only does this approach significantly reduce latency and improve reliability, it also simplifies system design, making it a practical and scalable alternative. As the context window of large language models (LLMs) continues to expand, CAG is expected to play a role in more complex application scenarios.
PRIME-RL/Eurus-2-7B-PRIME is a 7B parameter language model trained based on the PRIME method, aiming to improve the reasoning capabilities of the language model through online reinforcement learning. The model is trained from Eurus-2-7B-SFT, using the Eurus-2-RL-Data dataset for reinforcement learning. The PRIME method uses an implicit reward mechanism to make the model pay more attention to the reasoning process during the generation process, rather than just the results. The model performed well in multiple inference benchmarks, with an average improvement of 16.7% compared to its SFT version. Its main advantages include efficient inference improvements, lower data and model resource requirements, and excellent performance in mathematical and programming tasks. This model is suitable for scenarios that require complex reasoning capabilities, such as programming problem solving and mathematical problem solving.
VITA-1.5 is an open source multi-modal large language model designed to achieve near real-time visual and voice interaction. It provides users with a smoother interactive experience by significantly reducing interaction latency and improving multi-modal performance. The model supports English and Chinese and is suitable for a variety of application scenarios, such as image recognition, speech recognition, and natural language processing. Its main advantages include efficient speech processing capabilities and powerful multi-modal understanding capabilities.
EurusPRM-Stage2 is an advanced reinforcement learning model that optimizes the inference process of the generative model through implicit process rewards. This model uses the log-likelihood ratio of a causal language model to calculate process rewards, thereby improving the model's reasoning capabilities without increasing additional annotation costs. Its main advantage is the ability to learn process rewards implicitly using only response-level labels, thereby improving the accuracy and reliability of generative models. The model performs well in tasks such as mathematical problem solving and is suitable for scenarios requiring complex reasoning and decision-making.
EurusPRM-Stage1 is part of the PRIME-RL project, which aims to enhance the inference capabilities of generative models through implicit process rewards. This model utilizes an implicit process reward mechanism to obtain process rewards during the inference process without the need for additional process labels. Its main advantage is that it can effectively improve the performance of generative models in complex tasks while reducing labeling costs. This model is suitable for scenarios that require complex reasoning and generation capabilities, such as mathematical problem solving, natural language generation, etc.
Memory Layers at Scale is an innovative memory layer implementation that uses a trainable key-value lookup mechanism to add additional parameters to the model without increasing the number of floating point operations. This approach is particularly important in large-scale language models because it can significantly improve the storage and retrieval capabilities of the model while maintaining computational efficiency. The main advantages of this technology include efficiently expanding model capacity, reducing computing resource consumption, and improving model flexibility and scalability. This project was developed by the Meta Lingua team and is suitable for scenarios that require processing large-scale data and complex models.
PRIME is an open source online reinforcement learning solution that enhances the reasoning capabilities of language models through implicit process rewards. The main advantage of this technology is its ability to effectively provide dense reward signals without relying on explicit process labels, thereby accelerating model training and improving inference capabilities. PRIME performs well on mathematics competition benchmarks, outperforming existing large-scale language models. Its background information includes that it was jointly developed by multiple researchers and related code and data sets were released on GitHub. PRIME is positioned to provide powerful model support for users who require complex reasoning tasks.
EXAONE 3.5 is a series of instruction-tuned bilingual (English and Korean) generative models developed by LG AI Research, with parameters ranging from 2.4B to 32B. These models support long context processing up to 32K tokens and demonstrate state-of-the-art performance on real-world use cases and long context understanding, while remaining competitive in the general domain compared to recently released models of similar size. EXAONE 3.5 models include: 1) 2.4B model, optimized for deployment on small or resource-constrained devices; 2) 7.8B model, matching the size of the previous generation model but offering improved performance; 3) 32B model, delivering powerful performance.
Bespoke Curator is an open source project that provides a rich Python-based library for generating and curating synthetic data. It features high-performance optimization, intelligent caching and failure recovery, and can work directly with the HuggingFace Dataset object. Key benefits of Bespoke Curator include its programmatic and structured output capabilities, the ability to design complex data generation pipelines, and the ability to inspect and optimize data generation strategies in real time via the built-in Curator Viewer.
StoryWeaver is a unified world model designed for knowledge-enhanced story character customization, designed to enable single and multi-character story visualization. This model is based on the AAAI 2025 paper and can handle the customization and visualization of characters in stories through a unified framework, which is of great significance to the fields of natural language processing and artificial intelligence. The main advantages of StoryWeaver include its ability to handle complex story situations and its ability to be continuously updated and expanded on its functionality. Product background information shows that the model will continue to update arXiv papers and add more experimental results.
ModernBERT is a new generation encoder model jointly released by Answer.AI and LightOn. It is a comprehensive upgrade of the BERT model, providing longer sequence length, better downstream performance and faster processing speed. ModernBERT adopts the latest Transformer architecture improvements, pays special attention to efficiency, and uses modern data scales and sources for training. As an encoder model, ModernBERT performs well in various natural language processing tasks, especially in code search and understanding. It provides two model sizes: basic version (139M parameters) and large version (395M parameters), suitable for application needs of various sizes.
PatronusAI/Llama-3-Patronus-Lynx-70B-Instruct-Q4_K_M-GGUF is a large quantized language model based on 70B parameters, using 4-bit quantization technology to reduce model size and improve inference efficiency. This model belongs to the PatronusAI series and is built based on the Transformers library. It is suitable for application scenarios that require high-performance natural language processing. The model follows the cc-by-nc-4.0 license, which means it can be used and shared non-commercially.
ReactAI Components is a platform that uses artificial intelligence technology to help developers quickly build React components. It provides users with a solution to generate React components without writing code by integrating advanced AI models such as Claude/Anthropic. The main advantage of this product is that it can greatly improve development efficiency, reduce duplication of work, and enable non-professional developers to easily create high-quality React components. The product is currently in the Beta stage and is free to use without credit card information. It is suitable for developers and teams who want to quickly develop React applications.
FlagCX is a scalable and adaptive cross-chip communication library developed with support from the Beijing Academy of Artificial Intelligence (BAAI). It is part of the FlagAI-Open open source initiative, which aims to promote the open source ecosystem of AI technology. FlagCX utilizes the native collective communication library to fully support single-chip communication on different platforms. Supported communication backends include NCCL, IXCCL and CNCL.
SakanaAI/asal is a scientific research project that uses Foundation Models (FMs) to automatically search for Artificial Life (ALife). The project combines the latest artificial intelligence techniques, especially visual language-based models, to discover artificial life simulations capable of producing target phenomena, generating temporally open novelties, and illuminating an entire interesting and diverse simulation space. It can span a variety of ALife substrates, including Boids, Particle Life, Game of Life, Lenia, and Neural Cell Automata, demonstrating the potential of accelerating artificial life research through technical means.
YuLan-Mini is a lightweight language model developed by the AI Box team of Renmin University of China with 240 million parameters. Although it only uses 1.08T of pre-training data, its performance is comparable to industry-leading models trained with more data. The model is particularly good at mathematics and coding. In order to promote reproducibility, the team will open source relevant pre-training resources.
The CogAgent-9B-20241220 model is based on the GLM-4V-9B bilingual open source VLM basic model. Through data collection and optimization, multi-stage training and strategy improvement, it has made significant progress in GUI perception, reasoning prediction accuracy, action space integrity and task generalization. The model supports bilingual (Chinese and English) interaction and can handle screenshots and language input. This version has been applied to ZhipuAI’s GLM-PC product and is designed to help researchers and developers advance the research and application of GUI agents based on visual language models.
PatronusAI/Llama-3-Patronus-Lynx-8B-v1.1-Instruct-Q8-GGUF is a quantized version based on the Llama model, designed for dialogue and hallucination detection. This model uses the GGUF format, has 803 million parameters, and is a large language model. Its importance lies in its ability to provide high-quality dialogue generation and hallucination detection capabilities while maintaining efficient model operation. This model is built based on the Transformers library and GGUF technology, and is suitable for application scenarios that require high-performance dialogue systems and content generation.
EXAONE-3.5-2.4B-Instruct-AWQ is a series of bilingual (English and Korean) instruction tuning generation models developed by LG AI Research, with parameters ranging from 2.4B to 32B. These models support long context processing up to 32K tokens and demonstrate state-of-the-art performance in real-world use cases and long context understanding, while remaining competitive in the general domain compared to recently released models of similar size. The model is optimized for deployment on small or resource-constrained devices and uses AWQ quantization technology to achieve 4-bit group weight quantization (W4A16g128).
Smolagents is a lightweight library that allows users to run powerful smart agents with just a few lines of code. It is characterized by simplicity and supports any language model (LLM), including models on Hugging Face Hub and OpenAI, Anthropic and other models integrated through LiteLLM. Special support is provided for code proxies, where the agent performs actions by writing code rather than having the agent write the code. Smolagents also provides security options for code execution, including a secure Python interpreter and a sandbox environment using E2B.
CogAgent is a GUI agent based on the visual language model (VLM), which implements bilingual (Chinese and English) communication through screenshots and natural language. CogAgent has achieved significant progress in GUI awareness, inference prediction accuracy, operation space completeness, and task generalization. This model has been applied in ZhipuAI's GLM-PC product, aiming to help researchers and developers advance the research and application of GUI agents based on visual language models.
EXAONE-3.5-32B-Instruct is a series of instruction-tuned bilingual (English and Korean) generative models developed by LG AI Research, containing different models from 2.4B to 32B parameters. These models support long context processing up to 32K tokens and demonstrate state-of-the-art performance in real-world use cases and long context understanding, while remaining competitive in the general domain when compared to recently released models of similar size.
Llama-Lynx-70b-4bit-Quantized is a large text generation model developed by PatronusAI with 7 billion parameters and 4-bit quantized to optimize model size and inference speed. This model is built on Hugging Face's Transformers library, supports multiple languages, and performs particularly well in the fields of dialogue generation and text generation. Its importance lies in its ability to reduce the storage and computing requirements of the model while maintaining high performance, allowing powerful AI models to be deployed in resource-constrained environments.
Llama-lynx-70b-4bitAWQ is a 7 billion parameter text generation model hosted by Hugging Face, using 4-bit precision and AWQ technology. This model is of importance in the field of natural language processing, especially when large amounts of data and complex tasks need to be processed. Its advantage lies in its ability to generate high-quality text while keeping computational costs low. Product background information shows that the model is compatible with the 'transformers' and 'safetensors' libraries and is suitable for text generation tasks.
OpenAI o1 is a high-performance AI model designed to handle complex multi-step tasks and deliver advanced accuracy. It is the successor to o1-preview and has been used to build agent applications to streamline customer support, optimize supply chain decisions and predict complex financial trends. The o1 model has key production-ready features, including function calls, structured output, developer messaging, visual capabilities, and more. The o1-2024-12-17 version achieved new top scores in multiple benchmarks, improving cost efficiency and performance.
PatronusAI/glider-gguf is a high-performance quantized language model based on the Hugging Face platform. It adopts the GGUF format and supports multiple quantized versions, such as BF16, Q8_0, Q5_K_M, Q4_K_M, etc. This model is based on the phi3 architecture and has 3.82B parameters. Its main advantages include efficient computing performance and smaller model size. It is suitable for scenarios that require fast reasoning and low resource consumption. Product background information shows that this model is provided by PatronusAI and is suitable for developers and enterprises that need to perform natural language processing and text generation.
ModernBERT-base is a modern bidirectional encoder Transformer model, pre-trained on 2 trillion English and code data, and natively supports contexts up to 8192 tokens. The model adopts the latest architectural improvements such as Rotary Positional Embeddings (RoPE), Local-Global Alternating Attention and Unpadding, making it perform well in long text processing tasks. ModernBERT-base is suitable for tasks that require processing of long documents, such as retrieval, classification and semantic search in large corpora. The model training data is mainly in English and code, so the performance in other languages may be reduced.
Large Concept Models (LCM) is a large language model developed by Facebook Research that operates in the representation space of sentences, using SONAR embedding space to support text in up to 200 languages and speech in 57 languages. LCM is a sequence-to-sequence model for autoregressive sentence prediction, and a variety of methods are explored, including mean square error regression, diffusion-based generative variants, etc. These explorations used a 1.6B parameter model and approximately 1.3T of training data. The main advantages of LCM include its ability to operate on high-level semantic representations and its ability to handle multilingual data. In addition, the open source nature of LCM enables researchers and developers to access and use these models, promoting the development of natural language processing technology.
Flock of Finches 37B-A11B v0.1 is the latest member of the RWKV family, an experimental model with 1.1 billion active parameters that scores roughly on par with the recently released Finch 14B model on common benchmarks despite being trained on only 109 billion tokens. The model uses an efficient sparse mixed expert (MoE) method to activate only a subset of parameters on any given token, thereby saving time and reducing the use of computing resources during training and inference. Although this architectural choice comes at the cost of higher VRAM usage, from our perspective the ability to train and run models with greater capabilities at low cost is well worth it.
RWKV-6 Finch 7B World 3 is an open source artificial intelligence model with 7B parameters and trained on 3.1 trillion multi-language tokens. Known for its environmentally friendly design concepts and high performance, the model aims to provide high-quality open source AI models to users around the world, regardless of nationality, language or economic status. The RWKV architecture is designed to reduce environmental impact and consumes a fixed amount of power per token, independent of context length.
Q-RWKV-6 32B Instruct Preview is the latest RWKV model variant developed by Recursal AI. It surpasses all previous RWKV, State Space and Liquid AI models in multiple English benchmark tests. This model successfully replaces the existing Transformer attention head with the RWKV-V6 attention head by converting the weights of the Qwen 32B Instruct model into a custom QRWKV6 architecture, a process jointly developed by the Recursal AI team in conjunction with the RWKV and EleutherAI open source communities. The main advantages of this model include significant reduction in large-scale computing costs and environmentally friendly open source AI technology.
WePOINTS is a series of multi-modal models developed by the WeChat AI team, aiming to create a unified framework that accommodates various modalities. These models leverage the latest multimodal modeling advances and technologies to drive seamless unification of content understanding and generation. The WePOINTS project not only provides models, but also includes pre-training data sets, evaluation tools and usage tutorials, which is an important contribution to the field of multi-modal artificial intelligence.
Meta Motivo is the first behavior-based model released by Meta FAIR. It is pre-trained through a novel unsupervised reinforcement learning algorithm and is used to control complex virtual humanoid agents to complete full-body tasks. The model is able to solve unseen tasks such as motion tracking, pose attainment, and reward optimization with hints at test time, without the need for additional learning or fine-tuning. The importance of this technology lies in its zero-shot learning capability, which enables it to handle a variety of complex tasks while maintaining behavioral robustness. The development background of Meta Motivo is based on the pursuit of generalization capabilities for more complex tasks and different types of agents. Its open source pre-training model and training code encourage the community to further develop research on behavioral basic models.
Android XR is a platform provided by Google to help developers create and optimize extended reality (XR) applications. It includes a series of tools, APIs and frameworks that enable developers to build immersive XR experiences for Android devices. The importance of Android XR lies in its ability to transform traditional Android applications into XR experiences, while supporting developers to use familiar tools and languages for development, lowering the threshold for XR development. Android XR will include Gemini, Google's AI assistant, which will provide device control and insights into what the wearer is seeing.
Phi-4 is the latest member of Microsoft's Phi series of small language models. It has 14B parameters and is good at complex reasoning fields such as mathematics. Phi-4 strikes a balance between size and quality by using high-quality synthetic datasets, curated organic data, and post-training innovations. Phi-4 embodies Microsoft's technological progress in the field of small language models (SLM) and pushes the boundaries of AI technology. Phi-4 is currently available on Azure AI Foundry and will be available on the Hugging Face platform in the coming weeks.
Explore other subcategories under programming Other Categories
768 tools
368 tools
294 tools
140 tools
85 tools
66 tools
61 tools
AI model Hot programming is a popular subcategory under 465 quality AI tools