Found 53 AI tools
Click any tool to view details
GPTACG transfer API provides OpenAI official api forwarding service, focusing on stability and suitable for application scenarios with high stability requirements. The product background is to provide users with enterprise-level stable services that remove regional restrictions, ultra-high concurrency support, and high cost performance, and promises not to collect user request and return information. In terms of price, we provide discounts for different purchase amounts, such as different rates for single purchases of less than $500 and greater than or equal to $500.
Mistral-8B-Instruct-2410 is a large-scale language model developed by the Mistral AI team and is designed for local intelligence, device-side computing and edge usage scenarios. The model performs well among similar sized models, supports 128k context windows and interleaved sliding window attention mechanisms, can be trained on multi-language and code data, supports function calls, and has a vocabulary of 131k. The Ministral-8B-Instruct-2410 model performs well in various benchmarks, including knowledge and general knowledge, code and mathematics, and multi-language support. The model performs particularly well in chat/arena (judged by gpt-4o) and is able to handle complex conversations and tasks.
Aria is a multi-modal native hybrid expert model with strong performance on multi-modal, language and encoding tasks. It excels in video and document understanding, supports multi-modal inputs up to 64K, and is able to describe a 256-frame video in 10 seconds. The Aria model has a parameter size of 25.3B and can be loaded with bfloat16 precision on a single A100 (80GB) GPU. Aria was developed to meet the need for multimodal data understanding, particularly in video and document processing. It is an open source model designed to advance the development of multi-modal artificial intelligence.
Open O1 is an open source project that aims to match the proprietary and powerful O1 model capabilities through open source innovation. The project gives these smaller models more powerful long-term reasoning and problem-solving capabilities by curating a set of O1-style thinking data for training LLaMA and Qwen models. As the Open O1 project progresses, we will continue to push what is possible with large language models, and our vision is to create a model that not only achieves O1-like performance, but also leads in test-time scalability, making advanced AI capabilities available to everyone. Through community-driven development and a commitment to ethical practices, Open O1 will become a cornerstone of AI progress, ensuring that the future development of the technology is open and beneficial to all.
GRIN-MoE is a Mixture of Experts (MoE) model developed by Microsoft, focusing on improving the performance of the model in resource-constrained environments. This model estimates the gradient of expert routing by using SparseMixer-v2. Compared with the traditional MoE training method, GRIN-MoE achieves the expansion of model training without relying on expert parallel processing and token discarding. It performs especially well on coding and math tasks, and is suitable for scenarios that require strong reasoning capabilities.
OneGen is an efficient single-pass generation and retrieval framework designed for large language models (LLMs) to fine-tune generation, retrieval, or hybrid tasks. Its core idea is to integrate generation and retrieval tasks into the same context, enabling LLM to perform both tasks in a single forward pass by assigning retrieval tasks to retrieval tokens generated in an autoregressive manner. Not only does this approach reduce deployment costs, it also significantly reduces inference costs because it avoids the need for two forward-pass computations on the query.
Mistral-Small-Instruct-2409 is an instruction-based fine-tuning AI model with 22B parameters developed by Mistral AI Team. It supports multiple languages and can support sequence lengths up to 128k. This model is particularly suitable for scenarios that require long text processing and complex instruction understanding, such as natural language processing, machine learning and other fields.
g1 is an experimental project that aims to create an inference chain similar to OpenAI's o1 model on Groq hardware by using the Llama-3.1 70b model. This project demonstrates that it is possible to significantly improve the performance of existing open source models on logic problem solving using hinting technology alone, without the need for complex training. g1 helps the model achieve more accurate reasoning on logical problems through visual reasoning steps, which is of great significance for improving the logical reasoning ability of artificial intelligence.
Skywork-Reward-Llama-3.1-8B is an advanced reward model based on the Meta-Llama-3.1-8B-Instruct architecture, trained using the Skywork Reward Data Collection, which contains 80K high-quality preference pairs. The model excels at handling preferences in complex scenarios, including challenging preference pairs, covering multiple domains including mathematics, programming, and security. As of September 2024, the model ranks third on the RewardBench rankings.
Flux Gym is a simple Web UI designed for FLUX LoRA model training, especially suitable for devices with only 12GB, 16GB or 20GB VRAM. It combines the ease of use of the AI-Toolkit project and the flexibility of Kohya Scripts, allowing users to train models without complex terminal operations. Flux Gym allows users to upload images and add descriptions through a simple interface, and then start the training process.
How Much VRAM is an open source project designed to help users estimate the amount of video memory their models require during training or inference. This project enables users to decide on the desired hardware configuration without having to try multiple configurations. This project is very important for developers and researchers who need to train deep learning models because it can reduce the trial and error cost of hardware selection and improve efficiency. The project is licensed under the MPL-2.0 license and is provided free of charge.
Phi-3.5-vision is a lightweight, latest-generation multi-modal model developed by Microsoft, built on datasets including synthetic data and filtered publicly available websites, focusing on high-quality, dense inference data for text and vision. The model belongs to the Phi-3 model family and has undergone a rigorous enhancement process that combines supervised fine-tuning and direct preference optimization to ensure precise instruction following and strong safety measures.
Phi-3.5-MoE-instruct is a lightweight, multi-language AI model developed by Microsoft. It is built based on high-quality, inference-intensive data and supports a context length of 128K. The model undergoes a rigorous enhancement process, including supervised fine-tuning, proximal policy optimization, and direct preference optimization to ensure precise instruction following and strong safety measures. It is designed to accelerate research on language and multimodal models as building blocks for generative AI capabilities.
T-MAC is a kernel library that directly supports mixed-precision matrix multiplication by using lookup tables without dequantization operations, and is designed to accelerate low-bit large language model inference on the CPU. It supports multiple low-bit models, including W4A16 for GPTQ/gguf, W2A16 for BitDistiller/EfficientQAT, and BitNet W1(.58)A8 for ARM/Intel CPUs on OSX/Linux/Windows. T-MAC achieved 3B BitNet token generation throughput on Surface Laptop 7, 20 per second on a single core and 48 per second on a quad core, which is 4 to 5 times faster than the existing most advanced CPU low-bit framework (llama.cpp).
Falcon Mamba is the first 7B large-scale model released by the Technology Innovation Institute (TII) in Abu Dhabi that does not require attention mechanisms. The model is not limited by the increased computational and storage costs caused by increasing sequence length while maintaining performance comparable to existing state-of-the-art models when processing large sequences.
Gemma Scope is a set of sparse autoencoders designed for the 9B and 2B models of Gemma 2. It helps us analyze the activations inside the model like a microscope to understand the concepts behind it. These autoencoders can be used to study the internal activation of models, similar to how biologists use microscopes to study the cells of plants and animals.
The Meta Llama 3.1 series of models is a set of pre-trained and instruction-tuned multilingual large language models (LLMs), including models in three sizes: 8B, 70B, and 405B. It is optimized for multilingual conversation use cases and outperforms many open source and closed source chat models.
Mistral-Large-Instruct-2407 is an advanced large language model (LLM) with 123B parameters, equipped with the latest reasoning, knowledge and programming capabilities. It supports multiple languages, including ten languages including Chinese, English, French, etc., and is trained on more than 80 programming languages, such as Python, Java, etc. In addition, it has agent-centered abilities and advanced math and reasoning abilities.
Aphrodite is the official backend engine of PygmalionAI, designed to provide the inference endpoint for the PygmalionAI website and allow serving Pygmalion models to large numbers of users at extremely fast speeds. Aphrodite utilizes vLLM's paging attention technology to implement features such as continuous batch processing, efficient key value management, and optimized CUDA kernels, and supports multiple quantization schemes to improve inference performance.
DCLM-baseline is a pre-training data set for language model benchmark testing, containing 4T tokens and 3B documents. It is extracted from the Common Crawl dataset through carefully planned data cleaning, filtering and deduplication steps, aiming to demonstrate the importance of data curation in training efficient language models. This dataset is for research use only and is not suitable for production environments or domain-specific model training, such as coding and mathematics.
DataComp-LM (DCLM) is a comprehensive framework designed for building and training large language models (LLMs), providing a standardized corpus, efficient pre-training recipes based on the open_lm framework, and more than 50 evaluation methods. DCLM enables researchers to experiment with different dataset construction strategies at different computational scales, from 411M to 7B parameter models. DCLM significantly improves model performance through optimized dataset design and has led to the creation of multiple high-quality datasets that outperform all open datasets at different scales.
Mistral-Nemo-Instruct-2407 is a large language model (LLM) jointly trained by Mistral AI and NVIDIA, and is an guided fine-tuned version of Mistral-Nemo-Base-2407. The model was trained on multilingual and coding data and significantly outperformed existing models of similar size or smaller. Its main features include: support for multi-language and code data training, 128k context window, and replacement for Mistral 7B. The model architecture includes 40 layers, 5120 dimensions, 128 head dimensions, 1436 hidden dimensions, 32 heads, 8 kv heads (GQA), 2^17 vocabulary (about 128k), and rotation embedding (theta=1M). The model performs well in various benchmark tests, such as HellaSwag (0-shot), Winogrande (0-shot), OpenBookQA (0-shot), etc.
Mistral-Nemo-Base-2407 is a 12B parameter large-scale pre-trained generative text model jointly trained by Mistral AI and NVIDIA. The model is trained on multi-language and code data and significantly outperforms existing models of the same or smaller size. Its main features include: Apache 2.0 license release, support for pre-training and instruction versions, 128k context window training, support for multiple languages and code data, and is a replacement for Mistral 7B. The model architecture includes 40 layers, 5120 dimensions, 128 head dimensions, 14364 hidden dimensions, 32 heads, 8 kv heads (GQA), vocabulary of about 128k, and rotated embedding (theta=1M). The model performs well on multiple benchmarks such as HellaSwag, Winogrande, OpenBookQA, etc.
Llama-3-70B-Tool-Use is a large language model with 70B parameters, designed for advanced tool use and function calling tasks. The model's overall accuracy on the Berkeley Function Call Ranking (BFCL) reached 90.76%, outperforming all open source 70B language models. The model optimizes the transformer architecture and is trained on the Llama 3 70B base model with complete fine-tuning and direct preference optimization (DPO). The input is text and the output is text, which enhances the ability to use tools and call functions. Although its main use is tool usage and function invocation, in general knowledge or open-ended tasks, a general language model may be more suitable. This model may produce inaccurate or biased content in some cases, and users should take care to implement appropriate security measures suitable for their specific use cases. The model is very sensitive to temperature and top_p sampling configuration.
Gemma 2 is the next generation open source AI model launched by Google DeepMind. It provides 900 million and 2.7 billion parameter versions. It has excellent performance and inference efficiency, supports efficient operation with full precision on different hardware, and greatly reduces deployment costs. In its 2.7 billion parameter version, Gemma 2 offers twice the competitiveness of models its size and can be implemented on a single NVIDIA H100 Tensor Core GPU or TPU host, significantly reducing deployment costs.
Tele-FLM (also known as FLM-2) is a 5.2 billion parameter open source multi-lingual large-scale language model with a stable and efficient pre-training paradigm and enhanced fact judgment capabilities. Based on a decoder-only transformer architecture, it has been trained on approximately 2T tokens. Tele-FLM demonstrates superior performance at the same scale, sometimes even surpassing larger models. In addition to sharing model weights, we provide core design, engineering practice, and training details that we expect will benefit both the academic and industrial communities.
Llama3-70B-SteerLM-RM is a 7 billion parameter language model used as an attribute prediction model, a multifaceted reward model that scores the model response on multiple aspects rather than a single score in traditional reward models. The model was trained using the HelpSteer2 dataset and trained with NVIDIA NeMo-Aligner, an extensible toolkit for efficient and effective model alignment.
MathBlackBox is a deep learning model designed to explore black-box methods of mathematical problem solving. It uses VLLM or other OpenAI-compatible methods for inference through the Huggingface toolkit and OpenAI, supports running in the Slurm environment, and is able to handle a variety of data sets. The project is currently in its early stages and needs to be fully tested before it can be deployed into actual products.
ARC-AGI is a dataset designed to test whether artificial intelligence systems have abstraction and reasoning capabilities similar to human general fluid intelligence. It consists of 400 training tasks and 400 evaluation tasks, each task is stored in JSON format, including input and output pairs. This dataset can serve as an artificial intelligence benchmark, a procedural synthesis benchmark, or a psychometric intelligence test.
HippoRAG is a novel retrieval-augmented generation (RAG) framework inspired by human long-term memory, which enables large language models (LLMs) to continuously integrate knowledge across external documents. This framework experimentally demonstrates that HippoRAG is able to provide RAG system capabilities that typically require expensive and high-latency iterative LLM pipelines at lower computational costs.
Aya-23-8B is an instruction fine-tuning model developed by Cohere For AI. It has powerful multi-language capabilities in 23 languages. It focuses on combining high-performance pre-trained models with the Aya Collection to provide researchers with high-performance multi-language models.
mistral-finetune is a lightweight codebase based on the LoRA training paradigm that allows fine-tuning in the form of low-rank matrix perturbation by training only 1-2% additional weights while freezing most of the weights. It is optimized for multi-GPU single-node training setups, and for smaller models, such as the 7B model, a single GPU is sufficient. The code base is intended to provide a simple, instructive entry point for fine-tuning, particularly in data formatting, and is not intended to cover multiple model architectures or hardware types.
Dolphin 2.9.1 Mixtral 1x22b is an AI model carefully trained and curated by the Cognitive Computations team. It is based on the Dolphin-2.9-Mixtral-8x22b version and has an Apache-2.0 license. The model has a 64k context capacity, was fine-tuned with full weights of 16k sequence length, and was trained on 8 H100 GPUs in 27 hours. Dolphin 2.9.1 has a variety of commands, dialogue and coding skills, as well as preliminary agent capabilities and support for function calls. The model was not censored and the data set was filtered to remove alignment and bias, making it more compliant. It is recommended to implement your own alignment layer before exposing it as a service.
This is an open source project. The author naklecha implemented the Llama3 model from scratch, which is a large language model. The project provides detailed code implementation, including various components of the model, such as attention mechanism, feed-forward network, etc. Through this project, developers can gain a deep understanding of how large language models work, and can also conduct their own experiments and improvements on this basis.
Gemma 2B - 10M Context is a large-scale language model optimized through an innovative attention mechanism that can handle sequences up to 10M while using less than 32GB of memory. This model uses recurrent local attention technology, inspired by the Transformer-XL paper, and is a powerful tool for processing large-scale language tasks.
phi3-Chinese is a public GitHub repository that focuses on collecting and organizing various training variants of the phi3 model in the open source community. It not only provides download links for different versions of phi3 models, but also includes related tutorials on training, inference, and deployment, aiming to help developers better understand and use phi3 models.
LLaVA++ is an open source project that aims to extend the visual capabilities of the LLaVA model by integrating the Phi-3 and LLaMA-3 models. Developed by researchers at Mohamed bin Zayed University of AI (MBZUAI), the project combines the latest large-scale language models to enhance model performance on following instructions and academic task-oriented data sets.
Bunny is a family of lightweight yet powerful multimodal models that offer a variety of plug-and-play view encoders and language backbone networks. Build richer training data to compensate for reduced model size by curating selections from a wider range of data sources. The Bunny-v1.0-3B model outperforms similarly sized and even larger MLLMs (7B) models and is on par with the 13B model.
Phi-3 Mini is a lightweight, state-of-the-art open source model built on the synthetic data and filtering sites used by Phi-2, focusing on high-quality inference-intensive data. This model belongs to the Phi-3 series, and the mini version comes in two variants supporting 4K and 128K context lengths. The model has undergone a rigorous enhancement process, including supervised fine-tuning and direct preference optimization to ensure precise follow-through and strong safety measures. These ONNX-optimized Phi-3 Mini models run efficiently on CPUs, GPUs, and mobile devices. Microsoft also launched the ONNX Runtime Generate() API, which simplifies the use of Phi-3.
Phi-3 Mini is a lightweight, state-of-the-art open source large model built on the synthetic data and filtered website data used for Phi-2, dedicated to providing extremely high-quality, inference-intensive data. The model has undergone a rigorous enhancement process that combines supervised fine-tuning and direct preference optimization to ensure precise follow-through of instructions and strong safety measures. This warehouse provides an optimized ONNX version of Phi-3 Mini, which can accelerate inference on CPU and GPU through ONNX Runtime. It supports multiple platforms such as server, Windows, Linux, Mac, etc., and provides the best accuracy configuration for each platform. ONNX Runtime's DirectML support also enables developers to achieve large-scale hardware acceleration on AMD, Intel and NVIDIA GPU-powered Windows devices.
The Intel NPU Acceleration Library is an acceleration library developed by Intel for the Neural Processing Unit (NPU), designed to improve the performance of deep learning and machine learning applications. This library provides algorithms and tools optimized for Intel hardware, supports a variety of deep learning frameworks, and can significantly improve the inference speed and efficiency of the model.
C3PO is an LLM model alignment technique based on user feedback, which can adjust LLM from a single feedback sentence to avoid over-generalization. The technology provides a reference implementation, associated baselines, and necessary components to facilitate research on the technology proposed in the paper.
OpenDiT is an open source project that provides a high-performance implementation of Colossal-AI-based Diffusion Transformer (DiT), specifically designed to enhance training and inference efficiency for DiT applications, including text-to-video generation and text-to-image generation. OpenDiT improves performance through the following technologies: up to 80% acceleration and 50% memory reduction on the GPU; including FlashAttention, Fused AdaLN and Fused layernorm core optimization; including hybrid parallel methods of ZeRO, Gemini and DDP, as well as sharding the ema model to further reduce memory costs; FastSeq: a novel sequence parallel method, especially suitable for workloads such as DiT, where the activation size is large but the parameter size is small; single-node sequence parallelism can save up to 48% of communication costs; breaking through a single GPU memory constraints, reducing overall training and inference time; Huge performance improvements with small code modifications; Users do not need to know the implementation details of distributed training; Complete text-to-image and text-to-video generation process; Researchers and engineers can easily use and adapt our process to practical applications without modifying the parallel part; Perform text-to-image training on ImageNet and publish checkpoints.
MobiLlama is a small language model (SLM) designed for resource-constrained devices. It aims to provide accurate and lightweight solutions that meet on-device processing needs, energy efficiency, low memory footprint, and responsiveness. MobiLlama starts from a larger model and reduces pre-training and deployment costs through a carefully designed parameter sharing scheme.
Universal predictive learners are a powerful method that leverages meta-learning to quickly learn new tasks from limited data. Through broad exposure to different tasks, common representations can be obtained, enabling general problem solving. This product explores the potential of amortizing the most powerful general predictor - Solomonoff induction (SI) - through meta-learning. We utilize Universal Turing Machines (UTM) to generate training data to expose the network to a wide range of patterns. We provide a theoretical analysis of the UTM data generation process and meta-training protocol. We conducted comprehensive experiments on neural architectures (e.g., LSTM, Transformer) using algorithmic data generators of varying complexity and generality. Our results demonstrate that UTM data is a valuable resource for meta-learning and can be used to train neural networks capable of learning general prediction strategies.
FP6-LLM is a new support scheme for large language models that effectively reduces model size through six-bit quantization (FP6) and consistently maintains model quality across a variety of applications. We propose TC-FPx, the first complete GPU kernel design that uniformly supports floating-point weights of various quantization bitwidths. We integrate the TC-FPx core into existing inference systems and provide new end-to-end support for quantized LLM inference (called FP6-LLM), achieving a better trade-off between inference cost and model quality. Experiments show that FP6-LLM makes it possible to use a single GPU for LLaMA-70b inference, achieving normalized inference throughput that is 1.69x to 2.65x higher than the FP16 baseline.
SpacTor is a new training procedure that includes (1) a hybrid objective that combines paragraph destruction (SC) and token replacement detection (RTD), and (2) a two-stage course that optimizes the hybrid objective in an initial tau iteration and then transitions to the standard SC loss. We conduct experiments on a variety of NLP tasks. Using an encoder-decoder architecture (T5), SpacTor-T5 is comparable to standard SC pre-training in terms of downstream performance, while reducing the number of pre-training iterations by 50% and the total FLOPs by 40%. Additionally, under the same computing budget, we found that SpacTor is able to significantly improve downstream benchmark performance.
Zero Bubble Pipeline Parallelism is one of the key components of large-scale distributed training, and its efficiency is affected by pipeline bubbles. We introduce a scheduling strategy that successfully achieves zero pipeline bubbles under synchronous training semantics. The key idea of this improvement is to divide the reverse calculation into two parts, one part calculates the gradient of the input, and the other part calculates the gradient of the parameters. Based on this idea, we hand-design a novel pipeline schedule that significantly outperforms baseline methods. We further developed an algorithm to automatically find the optimal schedule based on specific model configurations and memory constraints. Furthermore, to truly achieve zero bubbles, we introduce a novel technique to bypass synchronization during the optimizer step. Experimental evaluation shows that our method achieves up to 23% higher throughput than 1F1B scheduling under similar memory constraints. When memory constraints are relaxed, this number can be further improved to 31%. We believe our results mark an important step toward realizing the potential of pipeline parallelism.
SwiftInfer is a large-scale language model (LLM) inference acceleration library based on the Nvidia TensorRT framework. Through GPU acceleration, it greatly improves the inference performance of LLM in the production environment. This project implements the Attention Sink mechanism proposed by the streaming language model and supports unlimited length text generation. The code is concise, easy to run, and supports mainstream large-scale language models.
PromptBench is a Pytorch-based Python package for evaluating large language models (LLM). It provides researchers with a user-friendly API to evaluate LLMs. The main functions include: rapid model performance evaluation, prompt engineering, confrontation prompt evaluation and dynamic evaluation, etc. The advantage is that it is simple to use, you can quickly start evaluating existing data sets and models, and you can also easily customize your own data sets and models. Positioned as a unified open source library for LLM evaluation.
Eureka is a human-level reward design algorithm implemented by encoding large language models. It leverages the zero-shot generation, code writing, and context improvement capabilities of state-of-the-art language models such as GPT-4 to evolve the reward code. The generated rewards can be used to acquire complex skills through reinforcement learning. The reward functions generated by Eureka outperformed reward functions designed by human experts in 29 open source reinforcement learning environments, including 10 different robot forms. Eureka also has the flexibility to improve the reward function to improve the quality and security of generated rewards. By combining it with course learning and using the Eureka reward function, we demonstrated for the first time that a simulated Shadow Hand was able to perform the skill of rotating a pen, skillfully maneuvering the pen in a circle at a rapid speed.
Flash-Decoding is a technology for long-context reasoning that can significantly accelerate the attention mechanism in reasoning, resulting in 8 times faster generation. This technique enables faster inference by loading keys and values in parallel, then rescaling and combining the results separately to maintain the correct attention output. Flash-Decoding is suitable for large language models and can handle long contexts such as long documents, long conversations, or entire code bases. Flash-Decoding is already provided in the FlashAttention package and xFormers, which can automatically select the Flash-Decoding or FlashAttention method, or use the efficient Triton kernel.
Teachable Machine is a web-based tool that allows users to create machine learning models quickly and easily, without requiring specialized knowledge or coding abilities. Users only need to collect and organize sample data, Teachable Machine will automatically train the model, and then users can test the accuracy of the model, and finally export the model for use.
Explore other subcategories under programming Other Categories
768 tools
465 tools
368 tools
294 tools
140 tools
85 tools
66 tools
61 tools
AI model inference training Hot programming is a popular subcategory under 53 quality AI tools