Found 34 AI tools
Click any tool to view details
NVLM 1.0 is a cutting-edge multi-modal large-scale language model series launched by NVIDIA ADLR. It has reached the industry-leading level in visual-language tasks and is comparable to top proprietary models and open access models. The model improved in accuracy even on plain text tasks after multi-modal training. NVLM 1.0’s open source model weights and Megatron-Core training code provide a valuable resource to the community.
Llama 3.2 is a multi-lingual large-scale language model (LLMs) launched by Meta Company, including 1B and 3B scale pre-training and instruction tuning generation models. These models are optimized in multiple language conversation use cases, including agent retrieval and summarization tasks. Llama 3.2 outperforms many existing open source and closed chat models on many industry benchmarks.
Llama 3.2 is a series of large language models (LLMs), pre-trained and fine-tuned on 1B and 3B size multi-language text models, and 11B and 90B size text and image input and output text models. These models can be used to develop high-performance and efficient applications. Llama 3.2's models can run on mobile and edge devices, support multiple programming languages, and agent applications can be built through the Llama Stack.
Intel® Gaudi® 3 AI Accelerator is a high-performance artificial intelligence accelerator launched by Intel. It is built on the efficient Intel® Gaudi® platform and has excellent MLPerf benchmark performance, designed to handle demanding training and inference tasks. The accelerator supports AI applications such as large language models, multimodal models, and enterprise RAGs in the data center or cloud, running on Ethernet infrastructure you may already have. Whether you need a single accelerator or thousands of accelerators, Intel Gaudi 3 can play a critical role in your AI success.
Nemotron-Mini-4B-Instruct is a small language model developed by NVIDIA that is optimized through distillation, pruning, and quantization for speed and ease of deployment on devices. It is a fine-tuned version of nvidia/Minitron-4B-Base pruned and distilled from Nemotron-4 15B through NVIDIA's large language model compression technology. This instruction model is optimized for role-playing, retrieval-augmented question answering (RAG QA), and function invocation, supports a context length of 4096 tokens, and is ready for commercial use.
DataGemma RIG is a family of fine-tuned Gemma 2 models designed to help large language models (LLMs) access and integrate reliable public statistics from Data Commons. The model uses a search query generation method to annotate the statistical data in the response through the existing natural language interface of natural language query Data Commons. The DataGemma RIG uses JAX for training on TPUv5e and is currently an early version intended primarily for academic and research purposes and is not yet ready for commercial or public use.
OLMoE-1B-7B is an expert hybrid large language model (LLM) with 100 million active parameters and 700 million total parameters, released in September 2024. This model performs well among models of similar cost, competing with larger models such as the Llama2-13B. OLMoE is completely open source and supports a variety of functions, including text generation, model training and deployment, etc.
The AI21 Jamba 1.5 series models are among the most powerful long context models on the market, delivering 2.5x faster inference than class-leading models. These models demonstrate superior long-context processing capabilities, speed and quality, and are the first to successfully extend non-Transformer models to the quality and strength of market-leading models.
AI21-Jamba-1.5-Mini is the latest generation of hybrid SSM-Transformer command-following basic model developed by AI21 Lab. This model stands out in the market for its superior long text processing capabilities, speed and quality, with inference speeds up to 2.5 times faster than leading models of similar size. Jamba 1.5 Mini and Jamba 1.5 Large are optimized for business use cases and features such as function calls, structured output (JSON) and basic generation.
Jamba 1.5 Open Model Family is the latest AI model series launched by AI21. It is based on the SSM-Transformer architecture and has ultra-long text processing capabilities, high speed and high quality. It is the best performing product of its kind on the market. These models are designed for enterprise-level applications, taking into account resource efficiency, quality, speed and the ability to solve critical tasks.
Phi-3 is a series of small language models (SLMs) launched by Microsoft Azure with breakthrough performance while keeping cost and latency low. Designed specifically for generative AI solutions, these models are smaller and have lower computational requirements. The Phi-3 model is developed following Microsoft AI principles, including responsibility, transparency, fairness, reliability and safety, privacy and security, and inclusivity, ensuring safety. In addition, Phi-3 also provides functions such as local deployment, accurate and relevant answers, low-latency scenario deployment, cost-constrained task processing, and customized accuracy.
Gemini Pro is a high-performance multi-modal AI model launched by DeepMind. It is designed for a wide range of tasks. It has a long context window of up to two million tokens and can handle large-scale documents, code, audio and video, etc. It performs well on multiple benchmarks, including code generation, mathematical problem solving, and multi-language translation.
H2O-Danube2-1.8B is the latest open source small language model released by H2O.ai. It is designed for offline applications and enterprise-level applications. It has cost-effective interfaces and training costs, and is easy to be embedded into edge devices such as mobile phones and drones. This model ranks first in the <2B range of the Hugging Face Open LLM Leaderboard, providing up to 200x query cost savings while delivering better accuracy on document processing and cost reductions of up to 100%. The H2O.ai platform also provides cost control and flexibility, supporting the mixed use of more than 30 Large Language Models (LLMs), including proprietary and open source LLMs.
Nemotron-4 340B is a series of open models released by NVIDIA designed for generating synthetic data to train large language models (LLMs). These models are optimized for use with NVIDIA NeMo and NVIDIA TensorRT-LLM to increase the efficiency of training and inference. Nemotron-4 340B includes base, instruction, and reward models, forming a pipeline that generates synthetic data for training and refining LLMs. The models are available for download on Hugging Face and will soon be available on ai.nvidia.com as part of the NVIDIA NIM microservices.
Buffer of Thoughts (BoT) is a novel thought-augmented inference method designed to improve the accuracy, efficiency, and robustness of large language models (LLMs). By introducing a meta-buffer to store high-level thinking templates extracted from the problem-solving process of various tasks, called thinking templates. For each question, a relevant thinking template is retrieved and adaptively instantiated into a specific reasoning structure for efficient reasoning. Additionally, a buffer manager is proposed to dynamically update the meta-buffer, thereby enhancing its capacity as more tasks are solved.
Mistral-7B-v0.3 is a Large Language Model (LLM) developed by the Mistral AI team. It is an upgraded version of Mistral-7B-v0.2 and has a vocabulary expanded to 32768. This model supports text generation and is suitable for application scenarios that require text generation capabilities. Currently, this model does not have a content moderation mechanism, and the team is seeking community cooperation to achieve more refined content moderation to meet deployment environments that require content moderation.
MiniCPM-Llama3-V 2.5 is the latest large end-side multi-modal model released in the OpenBMB project. It has 8B parameters, supports multi-modal interaction in more than 30 languages, and surpasses multiple commercial closed-source models in multi-modal comprehensive performance. This model achieves efficient terminal device deployment through model quantification, CPU, NPU, compilation optimization and other technologies, and has excellent OCR capabilities, trusted behavior and multi-language support.
Llama-3-Giraffe-70B-Instruct is a large language model launched by Abacus.AI. It uses PoSE and dynamic NTK interpolation training methods to have a longer effective context length and can handle large amounts of text data. The model uses about 1.5B tokens in training, and through adapter conversion technology, the adapter of the Llama-3-70B-Base model is applied to the Llama-3-Giraffe-70B-Instruct to improve the performance of the model.
Amazon Titan Text Premier is a new member of the Amazon Titan family of models designed for text-based, enterprise-grade applications that can be customized and fine-tuned to suit specific domains, organizations, branding styles, and use cases. This model is available in Amazon Bedrock, has a maximum context length of 32K tokens, is particularly suitable for English tasks, and integrates responsible AI practices.
Mistral-22b-v.02 is a powerful model that exhibits excellent mathematical prowess and programming abilities. Compared with V1, the V2 model has significant improvements in coherence and multi-turn dialogue capabilities. The model has been repurposed to remove censorship and be able to answer any question. The training data mainly consists of multiple rounds of dialogue, with special emphasis on programming content. In addition, the model has agent capabilities and can perform real-world tasks. Training uses a context length of 32k. The GUANACO prompt format must be followed when using it.
Stable LM 2 12B is a 1.21 billion parameter decoder language model pre-trained on a multi-lingual and code data set of 2 trillion tokens. This model can be used as a base model for fine-tuning downstream tasks, but needs to be evaluated and fine-tuned before use to ensure safe and reliable performance. This model may contain inappropriate content. It is recommended to use it with caution and not to use it in applications that may cause harm to others.
The Mencius 3-13B large model developed by Lanzhou Technology is based on the Llama architecture. It has been trained on the 3T Tokens data set and has powerful multi-language processing and interactive reasoning capabilities. Supports free commercial use and creates high-quality large models for ToB scenarios.
Cappy is a novel method designed to improve the performance and efficiency of large multi-task language models. It is a lightweight pre-trained scorer based on RoBERTa with only 360 million parameters. Cappy can solve classification tasks independently or be used as an auxiliary component to improve language model performance. Fine-tuning Cappy in downstream tasks can effectively integrate supervision information, improve model performance, and does not require backpropagation to language model parameters, reducing memory requirements. Cappy is suitable for open source and closed source language models and is an efficient model fine-tuning method.
Grok-1 is a 31.4 billion-parameter Mixture-of-Experts model trained from scratch by xAI. This model is not fine-tuned for specific applications (such as dialogue) and is the original base model checkpoint for the Grok-1 pre-training phase.
nasa-smd-ibm-v0.1 is a RoBERTa-based encoder conversion model that is domain adapted and optimized for NASA scientific missions. It is fine-tuned and trained on scientific journals and articles related to NASA scientific missions, and is designed to enhance natural language technologies such as information retrieval and intelligent search. The model has 125 million parameters and is pre-trained using a masked language model. It can be used for tasks such as named entity recognition, information retrieval, sentence conversion, and scalable question answering, and is specifically targeted at scientific use cases related to NASA scientific missions.
Claude 3 Haiku is the latest enterprise-level AI model launched by Anthropic. It has industry-leading visual capabilities and excellent benchmark performance, making it a flexible solution for a wide range of enterprise application scenarios. The model is now available through the Claude API and Claude Pro subscription on the claude.ai website. Speed is an urgent pain point for enterprise users, who need to quickly analyze large amounts of data and generate timely output, such as customer support tasks. The processing speed of Claude 3 Haiku is 3 times that of models of the same level. For prompts with tokens below 32K, it can process 21K tokens (about 30 pages) per second. It also generates fast output, enabling responsive and smooth chat interactions and the execution of multiple small tasks in parallel. Haiku's pricing model (1:5 input-to-output token ratio) is designed for enterprise workloads that typically require longer prompts. Businesses can rely on Haiku to quickly analyze large volumes of documents, such as quarterly reports, contracts or legal cases, at half the cost. For example, Claude 3 Haiku can process and analyze 400 Supreme Court cases or 2,500 images for just $1. In addition to speed and affordability, Claude 3 Haiku also focuses on enterprise-grade security and robustness. We conduct rigorous testing to reduce the possibility of harmful output and model escape, ensuring that the model is as safe as possible. Additional layers of protection include continuous system monitoring, endpoint hardening, secure coding practices, strong data encryption protocols, and strict access controls. We also conduct regular security audits and work with experienced penetration testers to proactively identify and resolve vulnerabilities. More information on relevant measures can be found in the Claude 3 model card.
Gemma-2B-IT is a 2B parameter instruction adjustment model launched by Google. It is based on the Gemini architecture and is designed to improve mathematics, reasoning and code processing capabilities. This model can be run on an ordinary laptop, does not require huge AI computing power, and is suitable for a variety of application scenarios.
Large World Models is a neural network trained using RingAttention technology, focusing on processing long video and language sequences to understand human knowledge and the multi-modal world. It achieves unprecedented context sizes by training on large-scale datasets, and open-sources a series of 7 billion parameter models capable of processing more than 1 million labeled texts and videos.
Large language models increasingly rely on distributed technologies for training and inference. These technologies require communication between devices, which can reduce scaling efficiency as the number of devices increases. While some distributed techniques can overlap, thus hiding communication for independent computations, techniques like tensor parallelism (TP) inherently serialize communication with model execution. One way to hide this serialized communication is to interleave it with producer operations (the production of communication data) in a fine-grained way. However, implementing this fine-grained interleaving of communication and computation in software can be difficult. Furthermore, like any concurrent execution, it requires sharing of computing and memory resources between computation and communication, leading to resource contention and thus reducing overlap efficiency. To overcome these challenges, we propose T3, which applies hardware-software co-design to transparently overlap serial communications while minimizing resource contention with computation. T3 transparently blends producer operation and subsequent communication by simply configuring the producer's output address space, requiring minor software changes. At the hardware level, T3 adds lightweight tracking and triggering mechanisms to orchestrate producer computation and communication. It further utilizes compute-enhanced memory to perform communication-related computations. As a result, T3 reduces resource contention and effectively overlaps serial communications with computation. For important Transformer models such as T-NLG, T3 speeds up communication-intensive sublayers by 30% of the geometric mean (max 47%) and reduces data movement by 22% of the geometric mean (max 36%). Furthermore, the benefits of T3 persist as the model scales: for sublayers of the SIM50 billion parameter model, the geometric mean is 29% for PALM and MT-NLG.
Honeybee is a locality-enhanced predictor for multi-modal language models. It can improve the performance of multi-modal language models on different downstream tasks, such as natural language reasoning, visual question answering, etc. The advantage of Honeybee is that it introduces a locality-aware mechanism, which can better model the dependencies between input samples, thereby enhancing the reasoning and question-answering capabilities of multi-modal language models.
ASPIRE is a well-designed framework for enhancing the selective prediction capabilities of large language models. It trains an LLM for self-evaluation through efficient fine-tuning of parameters, enabling it to output confidence scores for generated answers. Experimental results show that ASPIRE significantly outperforms current selective prediction methods on various question and answer data sets.
MiniMax fully releases the large language model abab6, which is the first MoE large language model in China. Under the MoE structure, abab6 has the ability to handle complex tasks brought by large parameters. At the same time, the model can train enough data per unit time, and the computing efficiency can also be greatly improved. Improved the problem of abab5.5 in dealing with more complex scenarios that have more precise requirements for model output.
RoboGen is an automatic robot learning product based on generative simulation. It enables large-scale robot skill learning by automatically generating diverse tasks, scenarios, and training supervision. RoboGen has the ability to independently propose, generate, and learn, and can continuously generate skill demonstrations related to various tasks and environments.
Sagify is a command line tool that can train and deploy machine learning/deep learning models on AWS SageMaker in a few simple steps! It eliminates the pain of provisioning cloud instances for model training, simplifies the process of running hyperparameter jobs on the cloud, and eliminates the need to hand off models to software engineers for deployment. Sagify provides a wealth of functions, including AWS account configuration, Docker image construction, data upload, model training, model deployment, etc. It is suitable for various usage scenarios and helps users quickly build and deploy machine learning models.
Explore other subcategories under productive forces Other Categories
1361 tools
904 tools
767 tools
619 tools
607 tools
431 tools
406 tools
398 tools
AI model inference training Hot productive forces is a popular subcategory under 34 quality AI tools