Found 27 AI tools
Click any tool to view details
Llama-3.1-Nemotron-70B-Instruct is a large language model customized by NVIDIA, focusing on improving the helpfulness of answers generated by large language models (LLM). The model performs well on multiple automatic alignment benchmarks, such as Arena Hard, AlpacaEval 2 LC, and GPT-4-Turbo MT-Bench. It is trained on the Llama-3.1-70B-Instruct model by using RLHF (specifically the REINFORCE algorithm), Llama-3.1-Nemotron-70B-Reward and HelpSteer2-Preference hints. This model not only demonstrates NVIDIA's technology in improving the helpfulness of common domain instruction compliance, but also provides a model transformation format that is compatible with the HuggingFace Transformers code library and can be used for free managed inference through NVIDIA's build platform.
Zamba2-7B is a small language model developed by the Zyphra team. It surpasses current leading models such as Mistral, Google's Gemma and Meta's Llama3 series at the 7B scale, both in quality and performance. The model is designed to run on-device and on consumer-grade GPUs, as well as for numerous enterprise applications that require a powerful yet compact and efficient model. The release of Zamba2-7B demonstrates that even at 7B scale, cutting-edge technology can still be reached and surpassed by small teams and modest budgets.
tiiuae/falcon-mamba-7b is a high-performance causal language model developed by TII UAE, based on the Mamba architecture and designed specifically for generation tasks. The model demonstrates excellent performance on multiple benchmarks and is able to run on different hardware configurations, supporting multiple precision settings to accommodate different performance and resource needs. The training of the model uses advanced 3D parallel strategies and ZeRO optimization technology, making it possible to train efficiently on large-scale GPU clusters.
Llama-3.1-Nemotron-51B is a new language model developed by NVIDIA based on Meta's Llama-3.1-70B. It is optimized through neural architecture search (NAS) technology to achieve high accuracy and efficiency. The model is able to run on a single NVIDIA H100 GPU, significantly reducing memory footprint, reducing memory bandwidth and computational effort while maintaining excellent accuracy. It represents a new balance between accuracy and efficiency of AI language models, providing developers and enterprises with cost-controllable, high-performance AI solutions.
OLMoE is a fully open, state-of-the-art expert mixture model with 130 million active parameters and 690 million total parameters. All data, code, and logs for this model have been published. It provides an overview of all resources for the paper 'OLMoE: Open Mixture-of-Experts Language Models'. This model has important applications in pre-training, fine-tuning, adaptation and evaluation, and is a milestone in the field of natural language processing.
C4AI Command R 08-2024 is a 3.5 billion parameter large-scale language model developed by Cohere and Cohere For AI, optimized for a variety of use cases such as reasoning, summarization, and question answering. The model supports training in 23 languages and is evaluated in 10 languages, with high-performance RAG (Retrieval Augmentation Generation) capabilities. It is trained through supervised fine-tuning and preference to match human preferences for usefulness and safety. Additionally, the model has conversational tooling capabilities, enabling tool-based responses to be generated through specific prompt templates.
Mistral-NeMo-Minitron 8B is a small language model released by NVIDIA. It is a streamlined version of the Mistral NeMo 12B model that provides computational efficiency while maintaining high accuracy, allowing it to run on GPU-accelerated data centers, clouds, and workstations. The model was custom developed through the NVIDIA NeMo platform and combines two AI optimization methods, pruning and distillation, to reduce computational costs while providing accuracy comparable to the original model.
Grok-2 is xAI’s cutting-edge language model with state-of-the-art inference capabilities. This release includes two members of the Grok family: Grok-2 and Grok-2 mini. Both models are now released on the 𝕏 platform for Grok users. Grok-2 is a significant advancement from Grok-1.5, with cutting-edge capabilities in chatting, programming, and inference. At the same time, xAI introduces the Grok-2 mini, a small but powerful brother model of the Grok-2. An early version of Grok-2 has been tested on the LMSYS leaderboard under the name "sus-column-r". It surpasses Claude 3.5 Sonnet and GPT-4-Turbo in terms of overall Elo score.
Meta Llama 3.1 is a series of pre-trained and instruction-tuned multilingual large language models (LLMs), including 8B, 70B and 405B sized versions, supporting 8 languages, optimized for multilingual conversation use cases, and performing well in industry benchmarks. The Llama 3.1 model adopts an autoregressive language model, uses an optimized Transformer architecture, and improves the usefulness and safety of the model through supervised fine-tuning (SFT) and reinforcement learning combined with human feedback (RLHF).
Meta Llama 3.1-405B is a series of large-scale multi-language pre-trained language models developed by Meta, including models of three sizes: 8B, 70B and 405B. These models feature an optimized Transformer architecture, tuned using Supervised Fine-Tuning (SFT) and Reinforcement Learning with Human Feedback (RLHF) to match human preferences for helpfulness and safety. Llama 3.1 models support multiple languages, including English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai. The model performs well on a variety of natural language generation tasks and outperforms many existing open source and closed chat models on industry benchmarks.
GPT-4o mini is an extremely cost-effective small smart model launched by OpenAI. It surpasses other small models in multimodal reasoning and text intelligence, and supports the same range of languages as GPT-4o. The model performs well on mathematical reasoning and coding tasks, handles large amounts of contextual information, and supports fast, real-time text responses. The launch of GPT-4o mini aims to make smart technology more widely used in various application scenarios, reduce costs and improve accessibility.
Gemma-2-9b-it is a series of lightweight, state-of-the-art open models developed by Google, built on the same research and technology as the Gemini models. These models are text-to-text decoder-only large-scale language models, available in English, suitable for diverse text generation tasks such as question answering, summarization, and inference. Due to its relatively small size, it can be deployed in resource-limited environments such as laptops, desktops, or personal cloud infrastructure, making advanced AI models more accessible and promoting innovation.
Gemma 2 is a series of lightweight, advanced open models developed by Google, built on the same research and technology as the Gemini models. They are text-to-text decoder-only large language models, available in English only, with open weights for both pre-trained and instruction-tuned variants. Gemma models are well suited for a variety of text generation tasks, including question answering, summarization, and inference. Its relatively small size enables deployment in resource-constrained environments such as laptops, desktops, or your own cloud infrastructure, democratizing access to advanced AI models and helping to promote innovation for everyone.
Fugaku-LLM is an artificial intelligence language model developed by the Fugaku-LLM team, focusing on the field of text generation. Through advanced machine learning technology, it can generate smooth and coherent text, suitable for multiple languages and scenarios. The main advantages of Fugaku-LLM include its efficient text generation capabilities, support for multiple languages, and continuous model updates to stay ahead of the technology. The model has a wide range of applications in the community, including but not limited to writing assistance, chatbot development, and educational tools.
Qwen1.5-110B is the largest model in the Qwen1.5 series, with 110 billion parameters, supports multiple languages, uses an efficient Transformer decoder architecture, and includes grouped query attention (GQA), making it more efficient during model inference. It is comparable to Meta-Llama3-70B in basic capability evaluations and performs well in Chat evaluations, including MT-Bench and AlpacaEval 2.0. The release of this model demonstrates the huge potential in model scale expansion and indicates that greater performance improvements can be achieved in the future by expanding the data and model scale.
The abab 6.5 series contains two models: abab 6.5 and abab 6.5s, both supporting a context length of 200k tokens. abab 6.5 contains trillions of parameters, while abab 6.5s is more efficient and can process nearly 30,000 words of text in 1 second. They perform well in core competency tests such as knowledge, reasoning, mathematics, programming, and instruction compliance, and are close to the industry-leading level.
JetMoE-8B is an open source large-scale language model that achieves performance beyond Meta AI LLaMA2-7B at a cost of less than $100,000 by using public datasets and optimized training methods. The model activates only 2.2 billion parameters during inference, significantly reducing computational costs while maintaining excellent performance.
360Zhinao is a series of 7B-scale intelligent language models open sourced by Qihoo 360, including a basic model and three dialogue models of different length contexts. These models have been pre-trained on large-scale Chinese and English corpora, perform well on a variety of tasks such as natural language understanding, knowledge, mathematics, code generation, etc., and have powerful long-text dialogue capabilities. The model can be used for the development and deployment of various conversational applications.
Grok-1.5 is an advanced large-scale language model with excellent long text understanding and reasoning capabilities. It can handle long contexts of up to 128,000 tokens, far exceeding the capabilities of previous models. In tasks such as math and coding, Grok-1.5 performed extremely well, achieving extremely high scores on multiple recognized benchmarks. The model is built on a powerful distributed training framework to ensure an efficient and reliable training process. Grok-1.5 aims to provide users with powerful language understanding and generation capabilities to assist various complex language tasks.
Jamba is an open language model based on the SSM-Transformer hybrid architecture, providing top quality and performance. It combines the advantages of Transformer and SSM architectures, performs well in inference benchmarks, and provides a 3x throughput improvement in long context scenarios. Jamba is currently the only model at this scale that can support 140,000 character contexts on a single GPU, making it extremely cost-effective. As a basic model, Jamba is designed for developers to fine-tune, train and build customized solutions.
DBRX is a general-purpose large language model (LLM) built by Databricks’ Mosaic research team that outperforms all existing open source models on standard benchmarks. It adopts Mixture-of-Experts (MoE) architecture, uses 36.2 billion parameters, and has excellent language understanding, programming, mathematics and logical reasoning capabilities. DBRX aims to promote the development of high-quality open source LLM and facilitate enterprises to customize models based on their own data. Databricks provides enterprise users with the ability to interactively use DBRX, leverage its long context capabilities to build retrieval enhancement systems, and build customized DBRX models based on their own data.
Apple released its own large language model MM1, which is a multi-modal LLM with a maximum scale of 30B. Through pre-training and SFT, the MM1 model achieved SOTA performance in multiple benchmark tests, demonstrating attractive features such as in-context prediction, multi-image reasoning, and few-shot learning capabilities.
Sailor is a set of open language models customized for Southeast Asia, supporting Indonesian, Thai, Vietnamese, Malay, Lao, etc. These models are designed to understand and generate text in Southeast Asia's diverse languages through careful data curation. The Sailor model is built on Qwen 1.5 and contains model versions of different sizes from 0.5B to 7B to meet different needs. In tasks in Southeast Asian languages, such as question and answer, common sense reasoning, reading comprehension, etc., Sailor has demonstrated strong performance.
Meta Llama 3 is a new generation of open source large-scale language model launched by Meta Company. It has excellent performance and has performed well in multiple industry benchmark tests. It supports a wide range of usage scenarios, including new features such as improved reasoning capabilities. The model will support multi-language and multi-modality in the future, providing a longer context window and overall performance improvement. Llama 3 adheres to the open concept and will be deployed on major cloud services, hosting and hardware platforms for developers and communities to use.
Gemma-7B is a large-scale pre-trained language model with 7 billion parameters developed by Google and is designed to provide powerful natural language processing capabilities. It can understand and generate text, supports multiple languages, and is suitable for multiple application scenarios.
Gemma-2b is a series of open source pre-trained language models launched by Google, which provides multiple variants of different sizes. It can generate high-quality text and is widely used in fields such as question and answer, summarization, and reasoning. Compared with other similar models, its model size is smaller and can be deployed in different hardware environments. The Gemma series pursues safe and efficient artificial intelligence technology, allowing more researchers and developers to have access to cutting-edge language model technology.
Baichuan2-192K launches the world's longest context window model, Baichuan2-192K, which can input 350,000 words at a time, surpassing Claude2. Baichuan2-192K not only surpasses Claude2 in context window length, but also leads Claude2 in terms of long window text generation quality, long context understanding, long text Q&A, and summary. Baichuan2-192K achieves a balance between window length and model performance through extreme optimization of algorithms and engineering, achieving simultaneous improvement in window length and model performance. Baichuan2-192K has opened the API interface and provided it to enterprise users, and has been applied in the legal, media, financial and other industries.
Explore other subcategories under productive forces Other Categories
1361 tools
904 tools
767 tools
619 tools
607 tools
431 tools
406 tools
398 tools
AI language model Hot productive forces is a popular subcategory under 27 quality AI tools