Found 28 AI tools
Click any tool to view details
Entropy-based sampling is a sampling technology based on entropy theory, which is used to improve the diversity and accuracy of language models when generating text. This technique evaluates the uncertainty of the model by calculating the entropy and variance entropy of the probability distribution, thereby adjusting the sampling strategy when the model may fall into a local optimum or be overconfident. This approach helps avoid monotonic repetition of model outputs while increasing output diversity when model uncertainty is high.
Llama-3.2-1B is a multi-lingual large-scale language model released by Meta Company, focusing on text generation tasks. The model uses an optimized Transformer architecture and is tuned through supervised fine-tuning (SFT) and reinforcement learning with human feedback (RLHF) to match human preferences for usefulness and safety. The model supports 8 languages, including English, German, French, Italian, Portuguese, Hindi, Spanish and Thai, and performs well in a variety of conversational use cases.
The Qwen2.5 series language models are a series of open source decoder-only dense models, with parameter sizes ranging from 0.5B to 72B, designed to meet the needs of different products for model size. These models perform well in many fields such as natural language understanding, code generation, and mathematical reasoning, and are particularly suitable for application scenarios that require high-performance language processing capabilities. The release of Qwen2.5 series models marks an important progress in the field of large-scale language models, providing developers and researchers with powerful tools.
XVERSE-MoE-A36B is a multi-language large-scale language model independently developed by Shenzhen Yuanxiang Technology. It adopts a hybrid expert model (MoE) architecture and has a total parameter scale of 255.4 billion and an activation parameter amount of 36 billion. This model supports more than 40 languages including Chinese, English, Russian, Spanish, etc., and performs particularly well in Chinese and English bilinguals. The model uses 8K-length training samples, and through refined data sampling ratios and dynamic data switching strategies, the high quality and diversity of the model are ensured. In addition, the model has been customized and optimized for the MoE architecture, improving computing efficiency and overall throughput.
Yuan2.0-M32-hf-int8 is a mixed expert (MoE) language model with 32 experts, 2 of which are active. This model improves the efficiency of expert selection by adopting a new routing network - the attention router, resulting in an accuracy increase of 3.8% compared to models using traditional routing networks. Yuan2.0-M32 is trained from scratch, using 200 billion tokens, and its training calculations are only 9.25% of the calculations required for a dense model of the same parameter size. The model demonstrates competitiveness in programming, mathematics, and various professional fields, and uses only 3.7 billion active parameters, a small fraction of the total 4 billion parameters. The forward calculation per token is only 7.4 GFLOPS, which is only 1/19 of the Llama3-70B requirement. Yuan2.0-M32 surpassed Llama3-70B in the MATH and ARC-Challenge benchmarks, achieving 55.9% and 95.8% accuracy respectively.
Yuan2.0-M32 is a mixed expert (MoE) language model with 32 experts, 2 of which are active. A new routing network, the attention router, was introduced to improve the efficiency of expert selection, resulting in a 3.8% improvement in model accuracy over models using traditional router networks. Yuan2.0-M32 is trained from scratch, using 200 billion tokens, and its training calculations are only 9.25% of the calculations required for an intensive model with the same parameter scale. Showing competitiveness in coding, mathematics and various professional fields, Yuan2.0-M32 has only 370 million active parameters out of a total of 4 billion parameters, and the forward calculation amount per token is 7.4 GFLOPS, which is only 1/19 of the Llama3-70B requirement. Yuan2.0-M32 surpassed Llama3-70B in the MATH and ARC-Challenge benchmarks, with accuracy rates reaching 55.9% and 95.8% respectively.
Yuan2.0-M32 is a mixed expert (MoE) language model with 32 experts, 2 of which are active. A new routing network, attention routing, is proposed for more efficient expert selection, improving accuracy by 3.8%. The model is trained from scratch, using 2000B tokens, and its training calculations are only 9.25% of the calculations required for a dense model of the same parameter size. Competitive in coding, mathematics and various professional fields, using only 3.7B active parameters, the forward calculation amount per token is only 7.4 GFLOPS, only 1/19 of the Llama3-70B requirement. It surpassed Llama3-70B in the MATH and ARC-Challenge benchmarks, with accuracy rates reaching 55.9% and 95.8% respectively.
Tele-FLM-1T is an open source 1T multi-language large-scale language model based on the decoder-only Transformer architecture and has been trained with approximately 2T tokens. The model exhibits superior performance at scale, sometimes even surpassing larger models. In addition to sharing model weights, core design, engineering practice, and training details are also provided, which is expected to benefit both the academic and industrial communities.
Meta Llama 3.1 is a series of pre-trained and instruction-tuned multilingual large language models (LLMs), including models in 8B, 70B and 405B sizes, specifically optimized for multilingual conversation use cases and performing well in industry benchmarks. The model uses an optimized transformer architecture and is further aligned with human preferences through supervised fine-tuning (SFT) and reinforcement learning with human feedback (RLHF) to ensure its usefulness and safety.
Meta Llama 3.1 is a large-scale language model launched by Meta Company. It has 7 billion parameters and supports text generation and dialogue in 8 languages. The model uses an optimized Transformer architecture and is tuned through supervised fine-tuning (SFT) and reinforcement learning with human feedback (RLHF) to match human preferences for usefulness and safety. It is designed to support business and research use, and excels particularly well in multilingual conversation scenarios.
Meta Llama 3 is the latest large-scale language model launched by Meta, aiming to unlock the power of large-scale language models for individuals, creators, researchers, and enterprises of all types. The model contains different scale versions from 8B to 70B parameters, supporting pre-training and instruction tuning. The model is provided through a GitHub repository, and users can perform local inference by downloading the model weights and tokenizer. The release of Meta Llama 3 marks the further popularization and application of large-scale language model technology, with broad research and commercial potential.
DCLM-Baseline-7B is a 700 million parameter language model developed by the DataComp for Language Models (DCLM) team and mainly uses English. This model aims to improve the performance of language models through systematic data curation techniques. The model training uses PyTorch and OpenLM framework, the optimizer is AdamW, the learning rate is 2e-3, the weight attenuation is 0.05, the batch size is 2048 sequences, the sequence length is 2048 tokens, and the total number of training tokens reaches 2.5T. The model training hardware uses H100 GPU.
Gemma is a series of lightweight, advanced open models developed by Google, built on the same research and technology as the Gemini models. They are text-to-text decoder-only large language models suitable for a variety of text generation tasks such as question answering, summarization, and inference. The relatively small size of Gemma models enables deployment in resource-constrained environments such as laptops, desktops, or your own cloud infrastructure, making state-of-the-art AI models accessible to everyone and fostering innovation.
The multi-token prediction model is a technology developed by Facebook based on large-scale language model research, aiming to improve the efficiency and performance of the model by predicting multiple future tokens. This technique allows the model to generate multiple tokens in a single forward pass, resulting in faster generation and potentially improved model accuracy. This model is provided free of charge for non-commercial research purposes, but its use must comply with Meta’s privacy policy and relevant laws and regulations.
Samba is a simple yet powerful hybrid model with unlimited context length. Its architecture is very simple: Samba = Mamba + MLP + sliding window attention + hierarchical MLP stacking. The Samba-3.8B model was trained on 3.2 trillion tokens on the Phi3 dataset, and its performance on major benchmarks (such as MMLU, GSM8K and HumanEval) greatly exceeded Phi3-mini. Samba also enables perfect long context retrieval capabilities with minimal instruction tuning while maintaining complexity linear with sequence length. This makes Samba-3.8B-instruct perform well on downstream tasks such as long context summaries.
GLM-4-9B-Chat-1M is a new generation of pre-training model launched by Zhipu AI, which is an open source version of the GLM-4 series. It shows high performance in various dataset evaluations such as semantics, mathematics, reasoning, code and knowledge. This model not only supports multiple rounds of dialogue, but also has advanced functions such as web browsing, code execution, custom tool invocation, and long text reasoning. It supports 26 languages including Japanese, Korean, and German, and has specially launched a model version that supports 1M context length, which is suitable for developers and researchers who need to process large amounts of data and multi-language environments.
GLM-4-9B is a new generation of pre-training model launched by Zhipu AI. It is an open source version in the GLM-4 series. It performs well in multiple dataset evaluations such as semantics, mathematics, reasoning, code and knowledge, and has advanced functions such as multi-round dialogue, web browsing, code execution, custom tool invocation and long text reasoning. In addition, it also supports 26 languages including Japanese, Korean, and German, and has a model version that supports 1M context length.
MiLM-6B is a large-scale pre-trained language model developed by Xiaomi, with a parameter size of 6.4 billion. It has achieved the best results of the same size on the Chinese basic model evaluation data sets C-Eval and CMMLU. This model represents the latest progress in the field of natural language processing, has powerful language understanding and generation capabilities, and can be widely used in text generation, machine translation, question and answer systems and other scenarios.
MAP-NEO is a completely open source large-scale language model that includes pre-training data, data processing pipeline (Matrix), pre-training scripts and alignment code. The model was trained from scratch, using 4.5T English and Chinese tokens, and showed performance comparable to LLaMA2 7B. MAP-NEO outperforms similarly sized models on challenging tasks such as reasoning, mathematics, and coding. For research purposes, we are committed to achieving complete transparency of the LLM training process, so we fully released MAP-NEO, including final and intermediate checkpoints, self-trained tokenizers, pre-training corpora, and an efficient and stable optimized pre-training code base.
kan-gpt is a PyTorch-based implementation of Generative Pre-trained Transformers (GPTs) that utilizes Kolmogorov-Arnold Networks (KANs) for language modeling. The model shows potential in text generation tasks, especially when dealing with long-distance dependencies. Its importance lies in providing a new model architecture for the field of natural language processing, which helps improve the performance of language models.
Llama-3 70B Instruct Gradient 1048k is an advanced language model developed by the Gradient AI team. By extending the context length to more than 1048K, it demonstrates the ability of the SOTA (State of the Art) language model to learn to handle long texts after appropriate adjustments. The model uses NTK-aware interpolation and RingAttention technology, and the EasyContext Blockwise RingAttention library to efficiently train on high-performance computing clusters. It has broad application potential in commercial and research uses, especially in scenarios requiring long text processing and generation.
Mixtral-8x22B is a pre-trained generative sparse expert language model. It was developed by the Mistral AI team and aims to promote the open development of artificial intelligence. The model has 141B parameters and supports multiple optimization deployment methods, such as half-precision, quantification, etc., to meet the needs of different hardware and application scenarios. Mixtral-8x22B can be used for natural language processing tasks such as text generation, question answering, and translation.
RecurrentGemma is a series of open language models developed by Google. It adopts an innovative loop architecture design and has excellent performance in text generation tasks, including question and answer, summarization and reasoning. Compared with the Gemma model, RecurrentGemma requires less memory and generates faster inference for long sequences. This model provides pre-training and fine-tuned versions for instructions, and can be widely used in content creation, conversational AI and other scenarios.
Qwen1.5-MoE-A2.7B is a large-scale MoE (Mixture of Experts) language model with only 2.7 billion activation parameters, but its performance is comparable to the 7 billion parameter model. Compared with traditional large models, the training cost of this model is reduced by 75%, and the inference speed is increased by 1.74 times. It adopts a special MoE architecture design, including fine-grained experts, new initialization methods and routing mechanisms, which greatly improves model efficiency. This model can be used for a variety of tasks such as natural language processing and code generation.
WhiteRabbitNeo-7B-v1.5a is a version of the WhiteRabbitNeo series, which is a series of large-scale pre-trained language models for natural language processing tasks. This model can support multiple tasks such as text generation, summarization, and translation.
Yi-9B is one of the next generation open source bilingual large-scale language model series developed by 01.AI. The amount of training data reaches 3T, showing strong language understanding, common sense reasoning, reading comprehension and other abilities. It has excellent performance in coding, mathematics, common sense reasoning and reading comprehension, and is the leader among open source models of the same size. Suitable for personal, academic and commercial use.
LLaMA Pro is a model for large-scale natural language processing. By using extensions to the Transformer module, the model can efficiently and effectively utilize new corpora to improve the model's knowledge without forgetting old knowledge. LLaMA Pro delivers outstanding performance, excelling in general tasks, programming, and mathematics. It is a general model initialized based on LLaMA2-7B. LLaMA Pro and its guidance class model (LLaMA Pro-Instruct) achieved advanced performance on various benchmarks, demonstrating the huge potential for reasoning and processing a variety of tasks in intelligent agents. The model provides valuable insights into integrating natural and programming languages, laying a solid foundation for the development of advanced language agents that operate effectively in a variety of environments.
Mistral is a small but powerful open source natural language processing model that can be applied to a variety of usage scenarios. The Mistral 7B model performs better than the Llama 2 13B model, with natural programming capabilities and 8000 sequence lengths. Mistral is released under the Apache 2.0 license and is easy to deploy on any cloud and PC GPU.
Explore other subcategories under programming Other Categories
768 tools
465 tools
368 tools
294 tools
140 tools
85 tools
66 tools
61 tools
AI language model Hot programming is a popular subcategory under 28 quality AI tools