💻

programming Category

Model training and deployment

Found 100 AI tools

100

tools

Primary Category: programming

Subcategory: Model training and deployment

Found 100 matching tools

Related AI Tools

Click any tool to view details

AgentSphere

AgentSphere

AgentSphere is a cloud infrastructure designed specifically for AI agents, providing secure code execution and file processing to support various AI workflows. Its built-in functions include AI data analysis, generated data visualization, secure virtual desktop agent, etc., designed to support complex workflows, DevOps integration, and LLM assessment and fine-tuning.

AI 数据可视化数据处理 +2

Seed-Coder

Seed-Coder

Seed-Coder is a series of open source code large-scale language models launched by the ByteDance Seed team. It includes basic, instruction and inference models. It aims to autonomously manage code training data with minimal human investment, thereby significantly improving programming capabilities. This model has superior performance among similar open source models and is suitable for various coding tasks. It is positioned to promote the development of the open source LLM ecosystem and is suitable for research and industry.

开源代码生成编程 +2

Agent-as-a-Judge

Agent-as-a-Judge

Agent-as-a-Judge is a new automated evaluation system designed to improve work efficiency and quality through mutual evaluation of agent systems. The product significantly reduces evaluation time and costs while providing a continuous feedback signal that promotes self-improvement of the agent system. It is widely used in AI development tasks, especially in the field of code generation. The system has open source features, making it easy for developers to carry out secondary development and customization.

AI 开源开发工具 +2

Search-R1

Search-R1 is a reinforcement learning framework designed to train language models (LLMs) capable of reasoning and invoking search engines. It is built on veRL and supports multiple reinforcement learning methods and different LLM architectures, making it efficient and scalable in tool-enhanced inference research and development.

自然语言处理开源语言模型 +2

automcp

automcp

automcp is an open source tool designed to simplify the process of converting various existing agent frameworks (such as CrewAI, LangGraph, etc.) into MCP servers. This makes it easier for developers to access these servers through standardized interfaces. The tool supports the deployment of multiple agent frameworks and is operated through an easy-to-use CLI interface. It is suitable for developers who need to quickly integrate and deploy AI agents. The price is free and suitable for individuals and teams.

AI 开源开发者工具 +2

PokemonGym

PokemonGym

PokemonGym is a server-client architecture-based platform designed for AI agents to be evaluated and trained in the Pokemon Red game. It provides game state through FastAPI, supports human interaction with AI agents, and helps researchers and developers test and improve AI solutions.

AI 游戏评估 +2

Pruna

Pruna

Pruna is a model optimization framework designed for developers. Through a series of compression algorithms, such as quantization, pruning and compilation technologies, it makes machine learning models faster, smaller and less computationally expensive during inference. The product is suitable for a variety of model types, including LLMs, visual converters, etc., and supports multiple platforms such as Linux, MacOS, and Windows. Pruna also provides the enterprise version Pruna Pro, which unlocks more advanced optimization features and priority support to help users improve efficiency in practical applications.

机器学习深度学习开发者工具 +2

Bytedance Flux

Bytedance Flux

Flux is a high-performance communication overlay library developed by ByteDance, designed for tensor and expert parallelism on GPUs. It supports multiple parallelization strategies through efficient kernels and compatibility with PyTorch, making it suitable for large-scale model training and inference. Key benefits of Flux include high performance, ease of integration, and support for multiple NVIDIA GPU architectures. It performs well in large-scale distributed training, especially in Mixture-of-Experts (MoE) models, significantly improving computational efficiency.

深度学习高性能计算 PyTorch +2

AoT

AoT

Atom of Thoughts (AoT) is a new reasoning framework that transforms the reasoning process into a Markov process by representing solutions as combinations of atomic problems. This framework significantly improves the performance of large language models on inference tasks through the decomposition and contraction mechanism, while reducing the waste of computing resources. AoT can not only be used as an independent inference method, but also as a plug-in for existing test-time extension methods, flexibly combining the advantages of different methods. The framework is open source and implemented in Python, making it suitable for researchers and developers to conduct experiments and applications in the fields of natural language processing and large language models.

开源 Python 大语言模型 +2

3FS

3FS

3FS is a high-performance distributed file system designed for AI training and inference workloads. It leverages modern SSD and RDMA networks to provide a shared storage layer to simplify distributed application development. Its core advantages lie in high performance, strong consistency and support for multiple workloads, which can significantly improve the efficiency of AI development and deployment. The system is suitable for large-scale AI projects, especially in the data preparation, training and inference phases.

AI 机器学习高性能计算 +2

DeepSeek-V3/R1 inference system

DeepSeek-V3/R1 inference system

The DeepSeek-V3/R1 inference system is a high-performance inference architecture developed by the DeepSeek team to optimize the inference efficiency of large-scale sparse models. It uses cross-node expert parallelism (EP) technology to significantly improve GPU matrix computing efficiency and reduce latency. The system adopts a double-batch overlapping strategy and a multi-level load balancing mechanism to ensure efficient operation in a large-scale distributed environment. Its key benefits include high throughput, low latency, and optimized resource utilization for high-performance computing and AI inference scenarios.

高性能计算负载均衡专家并行 +2

Thunder Compute

Thunder Compute

Thunder Compute is a GPU cloud service platform focused on AI/ML development. Through virtualization technology, it helps users use high-performance GPU resources at very low cost. Its main advantage is its low price, which can save up to 80% of costs compared with traditional cloud service providers. The platform supports a variety of mainstream GPU models, such as NVIDIA Tesla T4, A100, etc., and provides 7+ Gbps network connection to ensure efficient data transmission. The goal of Thunder Compute is to reduce hardware costs for AI developers and enterprises, accelerate model training and deployment, and promote the popularization and application of AI technology.

AI 机器学习高性能计算 +2

TensorPool

TensorPool

TensorPool is a cloud GPU platform focused on simplifying machine learning model training. It helps users easily describe tasks and automate GPU orchestration and execution by providing an intuitive command line interface (CLI). TensorPool's core technology includes intelligent Spot node recovery technology that can immediately resume jobs when a preemptible instance is interrupted, thus combining the cost advantages of preemptible instances with the reliability of on-demand instances. In addition, TensorPool selects the cheapest GPU options with real-time multi-cloud analysis, so users only pay for actual execution time without worrying about the additional cost of idle machines. The goal of TensorPool is to make machine learning projects faster and more efficient by eliminating the need for developers to spend a lot of time configuring cloud providers. It offers Personal and Enterprise plans, with the Personal plan offering $5 in free credits per week, while the Enterprise plan offers more advanced support and features.

自动化机器学习成本优化 +2

MLGym

MLGym

MLGym is an open source framework and benchmark developed by Meta's GenAI team and UCSB NLP team for training and evaluating AI research agents. It promotes the development of reinforcement learning algorithms by providing diverse AI research tasks and helping researchers train and evaluate models in real-world research scenarios. The framework supports a variety of tasks, including computer vision, natural language processing and reinforcement learning, and aims to provide a standardized testing platform for AI research.

自然语言处理计算机视觉强化学习 +2

DeepEP

DeepEP

DeepEP is a communication library designed for Hybrid Model of Experts (MoE) and Expert Parallel (EP). It provides high-throughput and low-latency fully connected GPU cores supporting low-precision operations (such as FP8). The library is optimized for asymmetric domain bandwidth forwarding and is suitable for training and inference pre-population tasks. In addition, it supports stream processor (SM) number control and introduces a hook-based communication-computation overlap method that does not occupy any SM resources. Although the implementation of DeepEP is slightly different from the DeepSeek-V3 paper, its optimized kernel and low-latency design make it perform well in large-scale distributed training and inference tasks.

深度学习低延迟混合专家模型 +4

FlexHeadFA

FlexHeadFA

FlexHeadFA is an improved model based on FlashAttention that focuses on providing a fast and memory-efficient precise attention mechanism. It supports flexible head dimension configuration and can significantly improve the performance and efficiency of large language models. Key advantages of this model include efficient utilization of GPU resources, support for multiple head dimension configurations, and compatibility with FlashAttention-2 and FlashAttention-3. It is suitable for deep learning scenarios that require efficient computing and memory optimization, especially when processing long sequence data.

自然语言处理深度学习高性能计算 +2

FlashMLA

FlashMLA

FlashMLA is an efficient MLA decoding kernel optimized for Hopper GPUs, designed for serving variable-length sequences. It is developed based on CUDA 12.3 and above and supports PyTorch 2.0 and above. The main advantage of FlashMLA is its efficient memory access and computing performance, capable of achieving up to 3000 GB/s memory bandwidth and 580 TFLOPS of computing performance on the H800 SXM5. This technology is of great significance for deep learning tasks that require massively parallel computing and efficient memory management, especially in the fields of natural language processing and computer vision. The development of FlashMLA was inspired by the FlashAttention 2&3 and cutlass projects to provide researchers and developers with an efficient computing tool.

自然语言处理深度学习高效计算 +2

The Ultra-Scale Playbook

The Ultra-Scale Playbook

The Ultra-Scale Playbook is a model tool based on Hugging Face Spaces, focusing on the optimization and design of ultra-large-scale systems. It leverages advanced technology frameworks to help developers and enterprises efficiently build and manage large-scale systems. The tool's main advantages include high scalability, optimized performance and easy integration. It is suitable for scenarios that require processing complex data and large-scale computing tasks, such as artificial intelligence, machine learning, and big data processing. The product is currently available in open source form and is suitable for use by businesses and developers of all sizes.

人工智能开源机器学习 +3

Crawl4LLM

Crawl4LLM

Crawl4LLM is an open source web crawler project that aims to provide efficient data crawling solutions for the pre-training of large language models (LLM). It helps researchers and developers obtain high-quality training corpus by intelligently selecting and crawling web page data. The tool supports multiple document scoring methods and can flexibly adjust crawling strategies according to configuration to meet different pre-training needs. The project is developed based on Python, has good scalability and ease of use, and is suitable for use in academic research and industrial applications.

开源 LLM Python +3

KET-RAG

KET-RAG

KET-RAG (Knowledge-Enhanced Text Retrieval Augmented Generation) is a powerful retrieval-enhanced generation framework that combines knowledge graph technology. It achieves efficient knowledge retrieval and generation through multi-granularity indexing frameworks such as knowledge graph skeleton and text-keyword bipartite graph. This framework significantly improves retrieval and generation quality while reducing indexing costs, and is suitable for large-scale RAG application scenarios. KET-RAG is developed based on Python, supports flexible configuration and expansion, and is suitable for developers and researchers who need efficient knowledge retrieval and generation.

自然语言处理 Python 知识图谱 +2

Goedel-Prover

Goedel-Prover

Goedel-Prover is an open source large-scale language model focused on automated theorem proving. It significantly improves the efficiency of automated proof of mathematical problems by translating natural language mathematical problems into formal languages (such as Lean 4) and generating formal proofs. The model achieved a success rate of 57.6% on the miniF2F benchmark, surpassing other open source models. Its main advantages include high performance, open source scalability, and deep understanding of mathematical problems. Goedel-Prover aims to promote the development of automated theorem proving technology and provide powerful tool support for mathematical research and education.

开源大型语言模型数学 +2

LangGraph Multi-Agent Supervisor

LangGraph Multi-Agent Supervisor

LangGraph Multi-Agent Supervisor is a Python library built on the LangGraph framework for creating hierarchical multi-agent systems. It allows developers to coordinate multiple professional agents through a centralized supervisory agent to achieve dynamic assignment of tasks and communication management. The importance of this technology lies in its ability to efficiently organize complex multi-agent tasks and improve the flexibility and scalability of the system. It is suitable for scenarios that require multi-agent collaboration, such as automated task processing, complex problem solving, etc. The product is positioned for advanced developers and enterprise-level applications. The price has not yet been disclosed, but its open source feature allows users to customize and expand it according to their own needs.

Python 多智能体系统 LangGraph +2

Huginn-0125

Huginn-0125

Huginn-0125 is a latent variable loop depth model developed in Tom Goldstein's laboratory at the University of Maryland, College Park. The model has 3.5 billion parameters, was trained on 800 billion tokens, and performs well in inference and code generation. Its core feature is to dynamically adjust the amount of calculation during testing through the loop depth structure, which can flexibly increase or decrease calculation steps according to task requirements, thereby optimizing resource utilization while maintaining performance. The model is released based on the open source Hugging Face platform and supports community sharing and collaboration. Users can freely download, use and further develop it. Its open source nature and flexible architecture make it an important tool in research and development, especially in scenarios where resources are constrained or high-performance inference is required.

人工智能开源深度学习 +3

recurrent-pretraining

recurrent-pretraining

This product is a pre-training code library for large-scale deep recurrent language models, developed based on Python. It is optimized on AMD GPU architecture and can run efficiently on 4096 AMD GPUs. The core advantage of this technology lies in its deep loop architecture, which can effectively improve the model's reasoning capabilities and efficiency. It is mainly used for research and development of high-performance natural language processing models, especially in scenarios that require large-scale computing resources. The code base is open source and under the Apache-2.0 license, suitable for academic research and industrial applications.

自然语言处理深度学习大规模训练 +2

Steev

Steev

Steev is a tool designed specifically for AI model training, aiming to simplify the training process and improve model performance. It helps users complete model training more efficiently by automatically optimizing training parameters, monitoring the training process in real time, and providing code reviews and suggestions. The main advantage of Steev is that it can be used without configuration, making it suitable for engineers and researchers who want to improve the efficiency and quality of model training. Currently in the free trial phase, users can experience all its features for free.

AI 自动化模型训练 +2

Kolosal AI

Kolosal AI

Kolosal AI is a tool for training and running large language models (LLMs) on-device. It enables users to efficiently use AI technology on their local devices by simplifying the model training, optimization, and deployment process. The tool supports a variety of hardware platforms, provides fast inference speed and flexible customization capabilities, and is suitable for a wide range of application scenarios from individual developers to large enterprises. Its open source feature also allows users to conduct secondary development according to their own needs.

AI 开源隐私保护 +3

RAG-FiT

RAG-FiT

RAG-FiT is a powerful tool designed to improve the capabilities of large language models (LLMs) through retrieval-augmented generation (RAG) technology. It helps models better utilize external information by creating specialized RAG augmented datasets. The library supports the entire process from data preparation to model training, inference, and evaluation. Its main advantages include modular design, customizable workflows and support for multiple RAG configurations. RAG-FiT is based on an open source license and is suitable for researchers and developers for rapid prototyping and experimentation.

人工智能自然语言处理开源 +3

MNN

MNN

MNN is an open source deep learning inference engine developed by Alibaba Taoxi Technology. It supports mainstream model formats such as TensorFlow, Caffe, and ONNX, and is compatible with common networks such as CNN, RNN, and GAN. It optimizes operator performance to the extreme, fully supports CPU, GPU, and NPU, fully utilizes the computing power of the device, and is widely used in Alibaba’s AI applications in 70+ scenarios. Known for its high performance, ease of use, and versatility, MNN aims to lower the threshold for AI deployment and promote the development of end intelligence.

深度学习高性能易用性 +2

LLaSA_training

LLaSA_training

LLaSA_training is a speech synthesis training project based on LLaMA, which aims to improve the efficiency and performance of speech synthesis models by optimizing computing resources for training time and inference time. The project uses open source data sets and internal data sets for training, supports multiple configurations and training methods, and has high flexibility and scalability. Its main advantages include efficient data processing capabilities, powerful speech synthesis effects, and support for multiple languages. This project is suitable for researchers and developers who need high-performance speech synthesis solutions, and can be used to develop application scenarios such as intelligent voice assistants and voice broadcast systems.

深度学习语音合成分布式训练 +2

Dolphin R1

Dolphin R1

Dolphin R1 is a dataset created by the Cognitive Computations team to train inference models similar to the DeepSeek-R1 Distill model. This data set contains 300,000 inference samples from DeepSeek-R1, 300,000 inference samples from Gemini 2.0 flash thinking, and 200,000 Dolphin chat samples. The combination of these data sets provides researchers and developers with rich training resources that help improve the model's reasoning and conversational capabilities. The creation of this data set was sponsored by Dria, Chutes, Crusoe Cloud and other companies, which provided computing resources and financial support for the development of the data set. The release of the Dolphin R1 data set provides an important foundation for research and development in the field of natural language processing and promotes the development of related technologies.

自然语言处理数据集对话系统 +2

DeepSeek-R1-Distill-Qwen-7B

DeepSeek-R1-Distill-Qwen-7B

DeepSeek-R1-Distill-Qwen-7B is a reinforcement learning-optimized inference model based on distillation optimization of Qwen-7B. It performs well on math, coding, and reasoning tasks, producing high-quality reasoning chains and solutions. This model significantly improves reasoning capabilities and efficiency through large-scale reinforcement learning and data distillation technology, and is suitable for scenarios that require complex reasoning and logical analysis.

开源代码生成强化学习 +2

Kimi k1.5

Kimi k1.5

Kimi k1.5 is a multi-modal language model developed by MoonshotAI. Through reinforcement learning and long context expansion technology, it significantly improves the model's performance in complex reasoning tasks. The model has reached industry-leading levels on multiple benchmarks, surpassing GPT-4o and Claude Sonnet 3.5 in mathematical reasoning tasks such as AIME and MATH-500. Its main advantages include an efficient training framework, powerful multi-modal reasoning capabilities, and support for long contexts. Kimi k1.5 is mainly targeted at application scenarios that require complex reasoning and logical analysis, such as programming assistance, mathematical problem solving, and code generation.

多模态编程辅助推理 +2

RLLoggingBoard

RLLoggingBoard

RLLoggingBoard is a tool focused on visualizing the training process of Reinforcement Learning with Human Feedback (RLHF). It helps researchers and developers intuitively understand the training process, quickly locate problems, and optimize training effects through fine-grained indicator monitoring. This tool supports a variety of visualization modules, including reward curves, response sorting, and token-level indicators, etc., and is designed to assist existing training frameworks and improve training efficiency and effectiveness. It works with any training framework that supports saving required metrics and is highly flexible and scalable.

人工智能编程可视化 +2

OpenLIT

OpenLIT

OpenLIT is an open source AI engineering platform focused on observability for generative AI and large language model (LLM) applications. It helps developers simplify the AI development process and improve development efficiency and application performance by providing code transparency, privacy protection, performance visualization and other functions. As an open source project, users are free to view the code or host it themselves, ensuring data security and privacy. Its main advantages include easy integration, support for OpenTelemetry native integration, and provision of fine-grained usage insights. OpenLIT is aimed at AI developers, data scientists and enterprises, aiming to help them better build, optimize and manage AI applications. The specific price is not yet clear, but judging from the open source features, it may provide free use of basic functions.

AI 开源隐私保护 +2

MiniRAG

MiniRAG

MiniRAG is a retrieval enhancement generation system designed for small language models, aiming to simplify the RAG process and improve efficiency. It solves the problem of limited performance of small models in the traditional RAG framework through a semantic-aware heterogeneous graph indexing mechanism and a lightweight topology-enhanced retrieval method. This model has significant advantages in resource-constrained scenarios, such as in mobile devices or edge computing environments. The open source nature of MiniRAG also makes it easy to be accepted and improved by the developer community.

自然语言处理开源模型检索增强生成 +3

AutoGen v0.4

AutoGen v0.4

AutoGen v0.4 is an agent-based AI model launched by Microsoft Research, aiming to improve code quality, robustness, versatility and scalability through its asynchronous, event-driven architecture. The model has been completely refactored through community feedback to support a wider range of agent scenarios, including multi-agent collaboration, distributed computing, and cross-language support. The release of AutoGen v0.4 has laid a solid foundation for agent-based AI applications and research, and promoted the application and development of AI technology in multiple fields.

AI 多代理系统可扩展性 +4

PocketFlow

PocketFlow

PocketFlow is a minimalist LLM framework, implemented with only 100 lines of code, designed to enable LLM to program independently. It emphasizes high-level programming paradigms and removes low-level implementation details, allowing LLM to focus on the important parts. This framework can be used as a learning resource for LLM because of its simplicity, easy to understand and get started. It adopts the core abstraction of nested directed graphs, decomposes tasks into multiple LLM steps, and supports branching and recursive decision-making. PocketFlow is an open source project using the MIT license and is highly flexible and scalable.

开源 LLM框架自主编程 +1

Bakery

Bakery

Bakery is an online platform focused on fine-tuning and monetizing open source AI models. It provides AI start-ups, machine learning engineers and researchers with a convenient tool that allows them to easily fine-tune AI models and monetize them in the market. The platform’s main advantages are its easy-to-use interface and powerful functionality, which allows users to quickly create or upload datasets, fine-tune model settings, and monetize in the market. Bakery’s background information indicates that it aims to promote the development of open source AI technology and provide developers with more business opportunities. Although specific pricing information is not clearly displayed on the page, it is positioned to provide an efficient tool for professionals in the AI field.

AI 开源机器学习 +3

NVIDIA Project DIGITS

NVIDIA Project DIGITS

NVIDIA Project DIGITS is a desktop supercomputer powered by the NVIDIA GB10 Grace Blackwell superchip, designed to deliver powerful AI performance to AI developers. It delivers one petaflop of AI performance in a power-efficient, compact form factor. The product comes pre-installed with the NVIDIA AI software stack and comes with 128GB of memory, enabling developers to prototype, fine-tune and infer large AI models of up to 200 billion parameters locally and seamlessly deploy to the data center or cloud. The launch of Project DIGITS marks another important milestone in NVIDIA’s drive to advance AI development and innovation, providing developers with a powerful tool to accelerate the development and deployment of AI models.

AI 开发工具高性能计算 +2

NVIDIA Cosmos

NVIDIA Cosmos

NVIDIA Cosmos is an advanced world-based model platform designed to accelerate the development of physical AI systems such as autonomous vehicles and robots. It provides a series of pre-trained generative models, advanced tokenizers and accelerated data processing pipelines, making it easier for developers to build and optimize physics AI applications. Cosmos reduces development costs and improves development efficiency through its open model license, and is suitable for enterprises and research institutions of all sizes.

AI 机器人模拟 +3

FlashInfer

FlashInfer

FlashInfer is a high-performance GPU kernel library designed for serving large language models (LLM). It significantly improves the performance of LLM in inference and deployment by providing efficient sparse/dense attention mechanism, load balancing scheduling, memory efficiency optimization and other functions. FlashInfer supports PyTorch, TVM and C++ API, making it easy to integrate into existing projects. Its main advantages include efficient kernel implementation, flexible customization capabilities and broad compatibility. The development background of FlashInfer is to meet the growing needs of LLM applications and provide more efficient and reliable inference support.

编程 LLM 高性能计算 +2

Eurus-2-7B-PRIME

Eurus-2-7B-PRIME

PRIME-RL/Eurus-2-7B-PRIME is a 7B parameter language model trained based on the PRIME method, aiming to improve the reasoning capabilities of the language model through online reinforcement learning. The model is trained from Eurus-2-7B-SFT, using the Eurus-2-RL-Data dataset for reinforcement learning. The PRIME method uses an implicit reward mechanism to make the model pay more attention to the reasoning process during the generation process, rather than just the results. The model performed well in multiple inference benchmarks, with an average improvement of 16.7% compared to its SFT version. Its main advantages include efficient inference improvements, lower data and model resource requirements, and excellent performance in mathematical and programming tasks. This model is suitable for scenarios that require complex reasoning capabilities, such as programming problem solving and mathematical problem solving.

文本生成语言模型编程 +3

Eurus-2-7B-SFT

Eurus-2-7B-SFT

Eurus-2-7B-SFT is a large language model fine-tuned based on the Qwen2.5-Math-7B model, focusing on improving mathematical reasoning and problem-solving capabilities. This model learns reasoning patterns through imitation learning (supervised fine-tuning), and can effectively solve complex mathematical problems and programming tasks. Its main advantage lies in its strong reasoning ability and accurate processing of mathematical problems, and is suitable for scenarios that require complex logical reasoning. This model was developed by the PRIME-RL team and aims to improve the model's reasoning capabilities through implicit rewards.

人工智能语言模型编程 +2

llmstxt-generator

llmstxt-generator

llmstxt-generator is a tool for generating website content integration text files required for LLM (Large Language Model) training and inference. It crawls website content and merges it into a text file, supporting the generation of standard llms.txt and complete llms-full.txt versions. This tool is powered by firecrawl_dev for web crawling and uses GPT-4-mini for text processing. Its main advantages include the ability to use basic functions without the need for an API key, while providing a web interface and API access for users to quickly generate the required text files.

文本生成 LLM 编程工具 +2

EurusPRM-Stage2

EurusPRM-Stage2

EurusPRM-Stage2 is an advanced reinforcement learning model that optimizes the inference process of the generative model through implicit process rewards. This model uses the log-likelihood ratio of a causal language model to calculate process rewards, thereby improving the model's reasoning capabilities without increasing additional annotation costs. Its main advantage is the ability to learn process rewards implicitly using only response-level labels, thereby improving the accuracy and reliability of generative models. The model performs well in tasks such as mathematical problem solving and is suitable for scenarios requiring complex reasoning and decision-making.

强化学习生成模型数学问题解答 +2

EurusPRM-Stage1

EurusPRM-Stage1

EurusPRM-Stage1 is part of the PRIME-RL project, which aims to enhance the inference capabilities of generative models through implicit process rewards. This model utilizes an implicit process reward mechanism to obtain process rewards during the inference process without the need for additional process labels. Its main advantage is that it can effectively improve the performance of generative models in complex tasks while reducing labeling costs. This model is suitable for scenarios that require complex reasoning and generation capabilities, such as mathematical problem solving, natural language generation, etc.

强化学习生成模型自然语言生成 +3

PRIME-RL

PRIME-RL

PRIME is an open source online reinforcement learning solution that enhances the reasoning capabilities of language models through implicit process rewards. The main advantage of this technology is its ability to effectively provide dense reward signals without relying on explicit process labels, thereby accelerating model training and improving inference capabilities. PRIME performs well on mathematics competition benchmarks, outperforming existing large-scale language models. Its background information includes that it was jointly developed by multiple researchers and related code and data sets were released on GitHub. PRIME is positioned to provide powerful model support for users who require complex reasoning tasks.

开源强化学习推理能力 +2

Llama-3-Patronus-Lynx-8B-Instruct

Llama-3-Patronus-Lynx-8B-Instruct

Llama-3-Patronus-Lynx-8B-Instruct is a fine-tuned version based on the meta-llama/Meta-Llama-3-8B-Instruct model developed by Patronus AI, mainly used to detect hallucinations in RAG settings. The model is trained on multiple data sets including CovidQA, PubmedQA, DROP, RAGTruth, etc., including manual annotation and synthetic data. It evaluates whether a given document, question, and answer is faithful to the document content, does not provide new information outside the document, and does not contradict the document information.

文本生成开源模型对话系统 +2

Llama-3-Patronus-Lynx-8B-Instruct-v1.1

Llama-3-Patronus-Lynx-8B-Instruct-v1.1

Patronus-Lynx-8B-Instruct-v1.1 is a fine-tuned version based on the meta-llama/Meta-Llama-3.1-8B-Instruct model, mainly used to detect hallucinations in RAG settings. The model has been trained on multiple data sets such as CovidQA, PubmedQA, DROP, RAGTruth, etc., and contains manual annotation and synthetic data. It evaluates whether a given document, question, and answer is faithful to the document content, does not provide new information beyond the scope of the document, and does not contradict the document information.

自然语言处理文本生成对话系统 +1

Orchestra

Orchestra

Orchestra is a framework for creating AI-driven task pipelines and multi-agent teams. It allows developers and enterprises to build complex workflows and automate task processing by integrating different AI models and tools. Orchestra’s background information shows that it was developed by Mainframe and aims to provide a powerful platform to support the integration and application of AI technology. The main advantages of the product include its flexibility and scalability to adapt to different business needs and scenarios. Currently, Orchestra provides a free trial, and further inquiries are required for specific pricing and positioning information.

AI 自动化编程 +2

DRT-o1

DRT-o1

DRT-o1 is a neural machine translation model that optimizes the translation process through long thinking chains. The model mines English sentences containing metaphors or metaphors and adopts a multi-agent framework (including translators, consultants, and evaluators) to synthesize long-thinking machine translation samples. DRT-o1-7B and DRT-o1-14B are large language models trained based on Qwen2.5-7B-Instruct and Qwen2.5-14B-Instruct. The main advantage of DRT-o1 is its ability to handle complex language structures and deep semantic understanding, which are crucial to improving the accuracy and naturalness of machine translation.

自然语言处理深度学习神经机器翻译 +2

PromptWizard

PromptWizard

PromptWizard is a task-aware prompt optimization framework developed by Microsoft. It uses a self-evolution mechanism to enable large language models (LLM) to generate, criticize and improve their own prompts and examples, and continuously improve through iterative feedback and synthesis. This adaptive approach is fully optimized by evolving instructions and contextually learning examples to improve task performance. The three key components of the framework include: feedback-driven optimization, critique and synthesis of diverse examples, and self-generated Chain of Thought (CoT) steps. The importance of PromptWizard is that it can significantly improve the performance of LLM on specific tasks, enhancing the performance and interpretability of the model by optimizing prompts and examples.

机器学习 LLM 微软 +2

LiteMCP

LiteMCP

LiteMCP is a TypeScript framework for elegantly building MCP (Model Context Protocol) servers. It supports simple tool, resource, and prompt definitions, provides complete TypeScript support, and has built-in error handling and CLI tools to facilitate testing and debugging. The emergence of LiteMCP provides developers with an efficient and easy-to-use platform for developing and deploying MCP servers, thereby promoting the interaction and collaboration of artificial intelligence and machine learning models. LiteMCP is open source and follows the MIT license. It is suitable for developers and enterprises who want to quickly build and deploy MCP servers.

编程模型 MCP +2

Unitree RL GYM

Unitree RL GYM

Unitree RL GYM is a reinforcement learning platform based on Unitree robots, supporting Unitree Go2, H1, H1_2, G1 and other models. The platform provides an integrated environment that allows researchers and developers to train and test reinforcement learning algorithms on real or simulated robots. Its importance lies in promoting the development of robot autonomy and intelligence technology, especially in applications requiring complex decision-making and motion control. Unitree RL GYM is open source and free to use, mainly for scientific researchers and robotics enthusiasts.

开源机器人强化学习 +2

c4ai-command-r7b-12-2024

c4ai-command-r7b-12-2024

CohereForAI/c4ai-command-r7b-12-2024 is a 7B parameter multi-language model focused on high-level tasks such as reasoning, summarization, question answering and code generation. The model supports Retrieval Augmented Generation (RAG) and tool usage, enabling the use and combination of multiple tools to complete more complex tasks. It excels in enterprise-related code use cases and supports 23 languages.

文本生成多语言检索增强生成 +2

mwp_ReFT

mwp_ReFT

ReFT is an open source research project that aims to fine-tune large language models through deep reinforcement learning techniques to improve their performance on specific tasks. The project provides detailed code and data so that researchers and developers can reproduce the results in the paper. The main advantages of ReFT include the ability to automatically adjust model parameters using reinforcement learning and improve model performance on specific tasks through fine-tuning. Product background information shows that ReFT is based on the Codellama and Galactica models and follows the Apache2.0 license.

自然语言处理深度学习强化学习 +1

O1-CODER

O1-CODER

O1-CODER is a project aiming to reproduce OpenAI's O1 model, focusing on programming tasks. The project combines reinforcement learning (RL) and Monte Carlo Tree Search (MCTS) techniques to enhance the model's system-two thinking capabilities, with the goal of generating more efficient and logical code. This project is of great significance for improving programming efficiency and code quality, especially in scenarios that require a large amount of automated testing and code optimization.

代码生成编程辅助强化学习 +2

prime

prime

PrimeIntellect-ai/prime is a framework for efficient, globally distributed training of AI models on the Internet. Through technological innovation, it realizes cross-regional AI model training, improves the utilization of computing resources, and reduces training costs. It is of great significance to AI research and application development that require large-scale computing resources.

AI 模型训练分布式训练 +2

OLMo-2-1124-13B-DPO

OLMo-2-1124-13B-DPO

OLMo-2-1124-13B-DPO is a 13B parameter large-scale language model that has undergone supervised fine-tuning and DPO training. It is mainly targeted at English and aims to provide excellent performance on a variety of tasks such as chat, mathematics, GSM8K and IFEval. This model is part of the OLMo series, which is designed to advance scientific research on language models. Model training is based on the Dolma dataset, and the code, checkpoints, logs and training details are disclosed.

人工智能自然语言处理机器学习 +2

dolmino-mix-1124

dolmino-mix-1124

DOLMino dataset mix for OLMo2 stage 2 annealing training is a dataset that mixes a variety of high-quality data and is used in the second stage of OLMo2 model training. This data set contains various types of data such as web pages, STEM papers, encyclopedias, etc., and is designed to improve the performance of the model in text generation tasks. Its importance lies in providing rich training resources for developing smarter and more accurate natural language processing models.

自然语言处理机器学习文本生成 +2

Learning to Fly

Learning to Fly

Learning to Fly (L2F) is an open source project that aims to train end-to-end control policies through deep reinforcement learning and quickly complete the training on consumer laptops. The main advantages of this project are that the training speed is fast and can be completed in a few seconds, and the trained strategy has good generalization ability and can be directly deployed to a real quadcopter. The L2F project relies on the RLtools deep reinforcement learning library and provides detailed installation and deployment guides, allowing researchers and developers to quickly get started and conduct experiments.

Docker 四旋翼控制策略 +3

Star-Attention

Star-Attention is a new block sparse attention mechanism proposed by NVIDIA, aiming to improve the reasoning efficiency of Transformer-based large language models (LLM) on long sequences. This technology significantly improves inference speed through two stages of operation while maintaining 95-100% accuracy. It is compatible with most Transformer-based LLMs, can be used directly without additional training or fine-tuning, and can be combined with other optimization methods such as Flash Attention and KV cache compression technology to further improve performance.

大型语言模型 Transformer NVIDIA +3

Qwen2.5-Coder-0.5B

Qwen2.5-Coder-0.5B

Qwen2.5-Coder is the latest series of Qwen large-scale language models, focusing on code generation, code reasoning and code repair. Based on the powerful Qwen2.5, this series of models significantly improves coding capabilities by increasing training tokens to 5.5 trillion, including source code, text code base, synthetic data, etc. Qwen2.5-Coder-32B has become the most advanced large-scale language model for open source code, with coding capabilities equivalent to GPT-4o. In addition, Qwen2.5-Coder also provides a more comprehensive foundation for practical applications such as code agents, which not only enhances coding capabilities, but also maintains its advantages in mathematics and general capabilities.

AI 自然语言处理机器学习 +4

Qwen2.5-Coder-3B

Qwen2.5-Coder-3B

Qwen2.5-Coder-3B is a large language model in the Qwen2.5-Coder series, focusing on code generation, reasoning and repair. Based on the powerful Qwen2.5, the model achieves significant improvements in code generation, inference and repair by increasing training tokens to 5.5 trillion, including source code, text code base, synthetic data and more. Qwen2.5-Coder-32B has become the current most advanced large-scale language model for open source code, and its coding capabilities match GPT-4o. In addition, Qwen2.5-Coder-3B also provides a more comprehensive foundation for real-world applications, such as code agents, which not only enhances coding capabilities, but also maintains advantages in mathematics and general capabilities.

开源代码生成编程 +3

Qwen2.5-Coder-7B-Instruct

Qwen2.5-Coder-7B-Instruct

Qwen2.5-Coder-7B-Instruct is a code-specific large-scale language model in the Qwen2.5-Coder series, covering six mainstream model sizes of 0.5, 1.5, 3, 7, 1.4, and 3.2 billion parameters to meet the needs of different developers. The model has significant improvements in code generation, code reasoning and code repair. Based on the powerful Qwen2.5, the training tokens are expanded to 5.5 trillion, including source code, text code base, synthetic data, etc. Qwen2.5-Coder-32B has become the most advanced open source code LLM currently, and its coding capabilities match GPT-4o. Additionally, the model supports long contexts up to 128K tokens and provides a more comprehensive foundation for practical applications such as code proxies.

代码生成编程辅助代码修复 +3

Qwen2.5-Coder Technical Report

Qwen2.5-Coder Technical Report

The Qwen2.5-Coder series is a code-specific model based on the Qwen2.5 architecture, including Qwen2.5-Coder-1.5B and Qwen2.5-Coder-7B. These models continue to be pre-trained on a large-scale corpus of more than 5.5 trillion tokens, and through granular data cleaning, scalable synthetic data generation, and balanced data blending, they demonstrate impressive code generation capabilities while maintaining generality. Qwen2.5-Coder achieves state-of-the-art performance on over 10 benchmarks on a variety of code-related tasks including code generation, completion, inference, and repair, and consistently outperforms larger models of the same size. The launch of this series not only pushes the boundaries of code intelligence research, but through its licensing, encourages wider developer adoption in real-world applications.

代码生成预训练模型代码修复 +3

Qwen2.5-Coder-14B

Qwen2.5-Coder-14B

Qwen2.5-Coder-14B is a large-scale language model in the Qwen series that focuses on code, covering different model sizes from 0.5 to 3.2 billion parameters to meet the needs of different developers. The model has significant improvements in code generation, code reasoning and code repair. Based on the powerful Qwen2.5, the training tokens are expanded to 5.5 trillion, including source code, text code grounding, synthetic data and more. Qwen2.5-Coder-32B has become the most advanced open source code LLM currently, and its coding capabilities match GPT-4o. In addition, it provides a more comprehensive foundation for real-world applications such as code agents, not only enhancing coding capabilities but also maintaining advantages in mathematical and general abilities. Supports long contexts up to 128K tokens.

代码生成编程辅助代码修复 +3

Qwen2.5-Coder-14B-Instruct

Qwen2.5-Coder-14B-Instruct

Qwen2.5-Coder-14B-Instruct is a large language model in the Qwen2.5-Coder series, focusing on code generation, code reasoning and code repair. Based on the powerful Qwen2.5, this model has become the latest technology of current open source code LLM by extending the training token to 5.5 trillion, including source code, text code grounding, synthetic data, etc. It not only enhances coding capabilities, but also maintains its advantages in mathematical and general capabilities, and supports long contexts up to 128K tokens.

开源代码生成代码修复 +3

Qwen2.5-Coder-32B

Qwen2.5-Coder-32B

Qwen2.5-Coder-32B is a code generation model based on Qwen2.5. It has 3.2 billion parameters and is one of the models with the most parameters among current open source code language models. It has significant improvements in code generation, code reasoning and code repair, can handle long texts up to 128K tokens, and is suitable for practical application scenarios such as code agency. The model also maintains its advantages in mathematics and general capabilities, supports long text processing, and is a powerful assistant for developers when developing code.

代码生成编程辅助代码修复 +3

Qwen2.5-Coder-32B-Instruct

Qwen2.5-Coder-32B-Instruct

Qwen2.5-Coder is a series of Qwen large-scale language models designed specifically for code generation, including six mainstream model sizes of 0.5, 1.5, 3, 7, 1.4, and 3.2 billion parameters to meet the needs of different developers. The model has significant improvements in code generation, code reasoning and code repair. Based on the powerful Qwen2.5, the training tokens are expanded to 5.5 trillion, including source code, text code base, synthetic data, etc. Qwen2.5-Coder-32B is currently the most advanced open source code generation large language model, and its coding capabilities match GPT-4o. It not only enhances coding capabilities, but also maintains its advantages in mathematical and general capabilities, and supports long contexts up to 128K tokens.

开源代码生成代码修复 +3

Lamatic.ai

Lamatic.ai

Lamatic.ai is a managed PaaS platform designed for building, testing and deploying high-performance GenAI applications at the edge, providing a low-code visual builder, VectorDB and integrated applications and models. It helps AI founders and builders quickly implement complex AI workflows by integrating multiple tools and technologies. Key benefits of the platform include reducing back-and-forth communication between teams, automating workflows, increasing deployment speed, and reducing latency. Lamatic.ai’s background information shows that it was built by a group of engineers and community members who have a deep understanding and rich experience in GenAI application development. The platform is priced as a monthly subscription that includes all available management integrations, vector databases, hosting, edge deployments and SDKs, with hourly professional services available.

集成低代码向量数据库 +3

O1-Journey

O1-Journey

O1-Journey is a project initiated by the GAIR research group at Shanghai Jiao Tong University to replicate and reimagine the capabilities of OpenAI’s O1 model. This project proposes a new training paradigm of "journey learning" and builds the first model to successfully integrate search and learning in mathematical reasoning. This model becomes an effective way to handle complex reasoning tasks through processes of trial and error, correction, backtracking, and reflection.

人工智能自然语言处理机器学习 +3

hertz-dev

hertz-dev

hertz-dev is Standard Intelligence's open source full-duplex, audio-only converter base model with 8.5 billion parameters. The model represents a scalable cross-modal learning technique capable of converting mono 16kHz speech into an 8Hz latent representation with a bitrate of 1kbps, outperforming other audio encoders. The main advantages of hertz-dev include low latency, high efficiency and ease of fine-tuning and building by researchers. Product background information shows that Standard Intelligence is committed to building general intelligence that is beneficial to all mankind, and hertz-dev is the first step in this journey.

人工智能语音识别音频处理 +2

ManiSkill

ManiSkill

ManiSkill is a leading open source platform focused on robot simulation, unlimited robot data generation and generalized robot AI. Led by HillBot.ai, the platform supports rapid training of robots via state and/or visual input, with ManiSkill/SAPIEN achieving 10-100 times faster visual data collection compared to other platforms. It supports parallel simulation and rendering of RGB-D on the GPU at speeds up to 30,000+FPS. ManiSkill provides more than 40 skills/tasks and pre-built tasks of more than 2000 objects, with millions of frames of demonstrations and intensive reward functions. Users do not need to collect assets or design tasks themselves and can focus on algorithm development. In addition, it supports simulating different objects and joints simultaneously in each parallel environment, reducing the time to train generalized robot strategies/AI from days to minutes. ManiSkill is easy to use, can be installed via pip, and provides a simple and flexible GUI as well as extensive documentation for all features.

AI 开源 GPU加速 +3

xAI API

xAI API

The xAI API provides programmatic access to the Grok family of base models, supports text and image input, has a context length of 128,000 tokens, and supports function calls and system prompts. The API is fully compatible with OpenAI and Anthropic’s APIs, simplifying the migration process. Product background information shows that xAI is undergoing public beta testing until the end of 2024, during which each user can receive $25 in free API points per month.

AI 自然语言处理机器学习 +2

SELA

SELA

SELA is an innovative system that enhances automated machine learning (AutoML) by combining Monte Carlo Tree Search (MCTS) with large language model (LLM) based agents. Traditional AutoML methods often produce low diversity and suboptimal code, limiting their effectiveness in model selection and integration. By representing pipeline configurations as trees, SELA enables agents to intelligently explore the solution space and iteratively improve their strategies based on experimental feedback.

大型语言模型模型集成自动化机器学习 +2

Laminar.ai

Laminar.ai

Laminar is an open source, full-stack platform focused on AI engineering from first principles. It helps users collect, understand and use data to improve the quality of large language model (LLM) applications. Laminar supports tracking of text and image models, and will soon support audio models. Key benefits of the product include zero-overhead observability, online evaluation, dataset construction and LLM chain management. Laminar is completely open source and easy to self-host, suitable for developers and teams who need to build and manage LLM products.

开源 LLM 数据集构建 +3

HOVER

HOVER

HOVER is a multifunctional neural whole-body controller for humanoid robots that provides universal motor skills by simulating whole-body movements and learning multiple whole-body control modes. HOVER integrates different control modes into a unified strategy through a multi-mode strategy distillation framework, achieving seamless switching between different control modes while retaining the unique advantages of each mode. This controller improves the control efficiency and flexibility of humanoid robots in multiple modes, providing a robust and scalable solution for future robotic applications.

神经网络人形机器人多模式 +2

Dabarqus

Dabarqus

Dabarqus is a Retrieval Augmented Generation (RAG) framework that allows users to feed private data to large language models (LLMs) in real time. This tool enables users to easily store various data sources (such as PDFs, emails, and raw data) into semantic indexes, called "memories", by providing REST APIs, SDKs, and CLI tools. Dabarqus supports LLM-style prompts, enabling users to interact with memories in a simple way without having to build special queries or learn a new query language. In addition, Dabarqus also supports the creation and use of multi-semantic indexes (memories) so that data can be organized according to topics, categories, or other groupings. Dabarqus’ product background information shows that it aims to simplify the integration process of private data and AI language models and improve the efficiency and accuracy of data retrieval.

LLM RAG 数据集成 +3

ROCKET-1

ROCKET-1

ROCKET-1 is a visual-language model (VLMs) designed specifically for embodied decision-making in open-world environments. This model connects communication between VLMs and policy models through a visual-temporal context cueing protocol, leveraging object segmentation from past and current observations to guide policy-environment interactions. In this way, ROCKET-1 is able to unlock the visual-verbal reasoning capabilities of VLMs, enabling them to solve complex creative tasks, especially in spatial understanding. Experiments with ROCKET-1 in Minecraft show that this approach enables agents to accomplish previously unachievable tasks, highlighting the effectiveness of visual-temporal contextual cues in embodied decision-making.

零样本学习视觉-语言模型 Minecraft +2

Aya Expanse

Aya Expanse

Aya Expanse is a Hugging Face Space developed by CohereForAI, which may involve the development and application of machine learning models. Hugging Face is an artificial intelligence platform focused on natural language processing, providing various models and tools to help developers build, train and deploy NLP applications. Aya Expanse, as a Space on the platform, may have specific functions or technologies to support developers' work in the NLP field.

自然语言处理机器学习开发者工具 +2

agibot_x1_infer

agibot_x1_infer

Agibot X1 is a modular humanoid robot developed by Agibot with a high degree of freedom. It is based on the Agibot open source framework AimRT as middleware and uses reinforcement learning for motion control. The project includes multiple functional modules such as model reasoning, platform driver and software simulation. The AimRT framework is an open source framework for robot application development that provides a complete set of tools and libraries to support robot perception, decision-making, and action. The importance of the Agibot X1 project is that it provides a highly customizable and scalable platform for robotics research and education.

AI 开源机器人 +3

GitHub to LLM Converter

GitHub to LLM Converter

GitHub to LLM Converter is an online tool designed to help users convert project, file or folder links on GitHub into a format suitable for Large Language Model (LLM) processing. This tool is critical for developers and researchers who need to work with large amounts of code or document data, because it simplifies the data preparation process so that the data can be used more efficiently for machine learning or natural language processing tasks. This tool was developed by Skirano and provides a simple user interface. Users only need to enter the GitHub link to convert with one click, greatly improving work efficiency.

自然语言处理 LLM GitHub +2

AgileCoder

AgileCoder

AgileCoder is an innovative multi-agent software development framework inspired by agile methodologies widely used in professional software engineering. The key to the framework is its task-oriented approach. Instead of assigning fixed roles to agents, AgileCoder mimics real-world software development by creating a backlog of tasks and dividing the development process into sprints, with each sprint dynamically updating the backlog. AgileCoder supports multiple models, including OpenAI, Azure OpenAI, Anthropic, and self-hosted Ollama models.

代码生成软件开发敏捷开发 +1

Playnode

Playnode

Playnode is a web-based AI workflow construction platform that allows users to create and deploy AI models through drag-and-drop, and supports the combination of multiple AI models and data streams to achieve complex data processing and analysis tasks. The main advantage of this platform is its visual operation interface, which makes it easy for even non-technical users to get started and quickly build and deploy AI workflows. Playnode’s background information shows that it aims to lower the threshold of AI technology and enable more people to use AI technology to solve practical problems. Currently, Playnode offers a free trial, where users can start using it for free and earn 20 points per week, no credit card information required.

AI 机器学习可视化 +3

Janus

Janus

Janus is an innovative autoregressive framework that addresses the limitations of previous approaches by separating visual encoding into distinct paths while utilizing a single, unified transformer architecture for processing. This decoupling not only alleviates the conflicting roles of the visual encoder in understanding and generation, but also enhances the flexibility of the framework. Janus' performance surpasses previous unified models and meets or exceeds the performance of task-specific models. Janus' simplicity, high flexibility, and effectiveness make it a strong candidate for the next generation of unified multimodal models.

多模态开源模型变换器架构 +2

BitNet

BitNet

BitNet is an official inference framework developed by Microsoft and designed for 1-bit large language models (LLMs). It provides a set of optimized cores that support fast and lossless 1.58-bit model inference on the CPU (NPU and GPU support coming soon). BitNet achieved a speed increase of 1.37 times to 5.07 times on ARM CPU, and the energy efficiency ratio increased by 55.4% to 70.0%. On x86 CPUs, the speed improvement ranges from 2.37 times to 6.17 times, and the energy efficiency ratio increases by 71.9% to 82.2%. In addition, BitNet is able to run the 100B parameter BitNet b1.58 model on a single CPU, achieving inference speeds close to human reading speed, broadening the possibility of running large language models on local devices.

大型语言模型推理框架 CPU优化 +2

MetaGPT Framework

MetaGPT Framework

MetaGPT is a multi-agent framework that uses natural language programming technology to simulate a complete software company team to achieve rapid development and automated workflows. It represents the latest progress of artificial intelligence in the field of software development, which can significantly improve development efficiency and reduce costs. The main advantages of MetaGPT include a high degree of automation, multi-agent collaboration, and the ability to handle complex software development tasks. Product background information shows that MetaGPT aims to provide users with a platform that can quickly respond to development needs through AI technology. Currently, the product appears to be in beta, and users can try it out by joining a waiting list.

AI 自动化软件开发 +2

Meta Lingua

Meta Lingua

Meta Lingua is a lightweight, efficient large language model (LLM) training and inference library designed for research. It uses easily modifiable PyTorch components, allowing researchers to experiment with new architectures, loss functions, and data sets. The library is designed to enable end-to-end training, inference, and evaluation, and provide tools to better understand model speed and stability. Although Meta Lingua is still under development, several sample applications are provided to demonstrate how to use this code base.

LLM PyTorch 分布式训练 +2

TEN-framework

TEN-framework

TEN-framework is an innovative AI agent framework designed to provide high-performance support for real-time multi-modal interaction. It supports multiple languages and platforms, enables edge-cloud integration, and has the flexibility to transcend the limitations of a single model. TEN-framework enables AI agents to dynamically respond and adjust behavior in real time by managing agent status in real time. The background of this framework is to meet the increasing needs of complex AI applications, especially in audio-visual scenarios. It not only provides efficient development support, but also promotes the innovation and application of AI technology through modularization and reusable expansion.

AI 跨平台插件系统 +2

FastAgency

FastAgency

FastAgency is an AI model construction and deployment platform for developers and enterprise users. It provides an easy-to-use interface and powerful back-end support, allowing users to quickly develop and deploy AI models, thereby accelerating the transformation process of products from concept to market. The main advantages of this platform include rapid iteration, high efficiency and easy integration, making it suitable for enterprises and developers who need to respond quickly to market changes.

AI 自动化云服务 +1

Velvet

Velvet

Velvet AI gateway is an AI request warehouse solution designed for engineers. It allows users to store OpenAI and Anthropic requests into a PostgreSQL database and optimize AI functions through log analysis, evaluation and generation of data sets. Key product benefits include ease of use, cost optimization, data transparency and support for custom queries. The background of Velvet AI gateway is to help innovation teams manage and utilize AI technology more effectively, enhancing the competitiveness of products by reducing costs and improving efficiency.

成本优化数据仓库 AI请求 +1

Llama3-s v0.2

Llama3-s v0.2

Llama3-s v0.2 is a multi-modal checkpoint developed by Homebrew Computer Company focused on improving speech understanding. The model is improved through early integration of semantic tags and community feedback to simplify the model structure, improve compression efficiency, and achieve consistent speech feature extraction. Llama3-s v0.2 performs stably on multiple speech understanding benchmarks and provides live demos, allowing users to experience its capabilities for themselves. Although the model is still in the early stages of development and has some limitations, such as being sensitive to audio compression and unable to handle audio longer than 10 seconds, the team plans to address these issues in future updates.

自然语言处理机器学习语音识别 +1

Helicone AI

Helicone AI

Helicone AI is an open source platform designed for developers, focusing on logging, monitoring and debugging. It features millisecond latency impact, 100% log coverage, and industry-leading query times, and is designed for production-grade workloads. The platform achieves low latency and high reliability through Cloudflare Workers, and supports risk-free experimentation. There is no need to install an SDK, just add header information to access all functions.

开源监控调试 +2

Evidently AI

Evidently AI

Evidently AI is an open source Python library for monitoring machine learning models, supporting the evaluation of LLM-driven products from RAGs to AI assistants. It provides monitoring of data drift, data quality and production ML model performance. With more than 20 million downloads and 5000+ GitHub stars, it is a trustworthy monitoring tool in the field of machine learning.

开源机器学习监控 +2

Moonglow

Moonglow

Moonglow is a service that allows users to run local Jupyter notebooks on remote GPUs without having to manage SSH keys, package installations and other DevOps issues. The service was founded by Leila, who worked building high-performance infrastructure at Jane Street, and Trevor, who conducted machine learning research at Stanford's Hazy Research Lab.

机器学习深度学习 GPU加速 +2

Not Diamond

Not Diamond

Not Diamond is a powerful AI model router designed specifically for developers that can intelligently select the most appropriate AI model based on task requirements to achieve significant reductions in cost and latency. It supports out-of-the-box or the ability to train custom routers to optimize model routing to suit specific use cases. The product has the ability to quickly select models, supports joint hint optimization, and can program the best hints for each large language model (LM) without manual adjustment and experimentation.

开发者工具 AI模型成本效益 +2

MInference 1.0

MInference 1.0

MInference 1.0 is a sparse computing method designed to speed up the pre-population stage of long sequence processing. It implements a dynamic sparse attention approach to long-context large language models (LLMs) by identifying three unique patterns in the long-context attention matrix, accelerating the pre-population stage of 1M token cues while maintaining the capabilities of LLMs, especially retrieval capabilities.

自然语言处理机器学习性能优化 +1

prompteasy.ai

prompteasy.ai

prompteasy.ai is an online platform that allows users to fine-tune GPT models through a simple chat, without any technical skills. The goal of the platform is to make AI smarter and easier for anyone to access and use. Currently, the service is free for all users during the v1 release.

个性化免费无编程 +1

vLLM

vLLM

vLLM is a fast, easy-to-use, and efficient library for inferring and serving large language models (LLM). It provides high-performance inference services by using the latest service throughput technology, efficient memory management, continuous batch processing of requests, CUDA/HIP graph fast model execution, quantization technology, optimized CUDA kernels, etc. vLLM supports seamless integration with the popular HuggingFace model, supports multiple decoding algorithms, including parallel sampling, beam search, etc., supports tensor parallelism, is suitable for distributed reasoning, supports streaming output, and is compatible with the OpenAI API server. Additionally, vLLM supports NVIDIA and AMD GPUs, as well as experimental prefix caching and multi-LORA support.

LLM 推理 GPU +4

Related Subcategories

Explore other subcategories under programming Other Categories

Development and Tools

768 tools

AI model

465 tools

code assistant

368 tools

AI development assistant

294 tools

AI code assistant

85 tools

Development platform

66 tools

research tools

61 tools

AI code generation

61 tools

💻

Explore More programming Tools

Model training and deployment Hot programming is a popular subcategory under 140 quality AI tools

Browse programming Category Categories