Tag: Open source model

Found 64 related AI tools

#Open source model

Tag Tool Count: 64

Total Products: 64

gpt oss

gpt oss

GPT OSS is an open source language model launched by OpenAI, with powerful reasoning capabilities and Apache 2.0 license. This model has the characteristics of high efficiency, security, API compatibility, etc., and is a pioneer of future open source language models.

Artificial Intelligence Open source model Reasoning ability Apache 2.0 License +1

#Open source model

CameraBench

CameraBench

CameraBench is a model for analyzing camera motion in video, aiming to understand camera motion patterns through video. Its main advantage lies in utilizing generative visual language models for principle classification of camera motion and video text retrieval. By comparing with traditional structure-from-motion (SfM) and real-time localization and construction (SLAM) methods, the model shows significant advantages in capturing scene semantics. The model is open source and suitable for use by researchers and developers, and more improved versions will be released in the future.

deep learning computer vision Open source model Video analysis +1

#Open source model

HiDream-I1

HiDream-I1

HiDream-I1 is a new open source image generation base model with 17 billion parameters that can generate high-quality images in seconds. The model is suitable for research and development and has performed well in multiple evaluations. It is efficient and flexible and suitable for a variety of creative design and generation tasks.

image generation AI technology Open source model high quality

#Open source model

Together Chat

Together Chat

Together Chat is a secure AI chat platform that offers 100 free messages per day for users who want private conversations and high-quality interactions. It uses North America as its server location to ensure the security of user information.

Privacy protection Open source model free service AI Chat

#Open source model

Wan 2.1 AI

Wan 2.1 AI

Wan 2.1 AI is an open source large-scale video generation AI model developed by Alibaba. It supports text-to-video (T2V) and image-to-video (I2V) generation, capable of transforming simple input into high-quality video content. This model is of great significance in the field of video generation. It can greatly simplify the video creation process, lower the creation threshold, improve creation efficiency, and provide users with rich and diverse video creation possibilities. Its main advantages include high-quality video generation effects, smooth display of complex movements, realistic physical simulation, and rich artistic styles. At present, the product is completely open source, and users can use its basic functions for free. It has high practical value for individuals and enterprises who have video creation needs but lack professional skills or equipment.

Open source model text to video image to video physics simulation +4

#Open source model

CSM 1B

CSM 1B

CSM 1B is a speech generation model based on the Llama architecture, capable of generating RVQ audio codes from text and audio input. This model is mainly used in the field of speech synthesis and has high-quality speech generation capabilities. Its advantage lies in its ability to handle multi-speaker dialogue scenarios and generate natural and smooth speech through contextual information. The model is open source and intended to support research and educational purposes, but use for impersonation, fraud, or illegal activities is expressly prohibited.

speech synthesis Open source model text to speech People who talk a lot

#Open source model

Gemma 3

Gemma 3

Gemma 3 is the latest open source model launched by Google, based on the research and technology development of Gemini 2.0. It is a lightweight, high-performance model that can run on a single GPU or TPU, providing developers with powerful AI capabilities. Gemma 3 is available in multiple sizes (1B, 4B, 12B and 27B), supports over 140 languages, and features advanced text and visual reasoning capabilities. Its key benefits include high performance, low computing requirements, and extensive multi-language support for rapid deployment of AI applications on a variety of devices. The launch of Gemma 3 aims to promote the popularization and innovation of AI technology and help developers achieve efficient development on different hardware platforms.

productive forces

AI Multi-language support Open source model high performance +1

#Open source model

HunyuanVideo-I2V

HunyuanVideo-I2V

HunyuanVideo-I2V is Tencent's open source image-to-video generation model, developed based on the HunyuanVideo architecture. This model effectively integrates reference image information into the video generation process through image latent stitching technology, supports high-resolution video generation, and provides customizable LoRA effect training functions. This technology is of great significance in the field of video creation, as it can help creators quickly generate high-quality video content and improve creation efficiency.

Artificial Intelligence image processing deep learning video generation +1

#Open source model

Wan2.1-T2V-14B

Wan2.1-T2V-14B

Wan2.1-T2V-14B is an advanced text-to-video generation model based on a diffusion transformer architecture that combines an innovative spatiotemporal variational autoencoder (VAE) with large-scale data training. It is capable of generating high-quality video content at multiple resolutions, supports Chinese and English text input, and surpasses existing open source and commercial models in performance and efficiency. This model is suitable for scenarios that require efficient video generation, such as content creation, advertising production, and video editing. The model is currently available for free on the Hugging Face platform and is designed to promote the development and application of video generation technology.

Multi-language support video generation Open source model text to video +2

#Open source model

PIKE-RAG

PIKE-RAG

PIKE-RAG is a domain knowledge and reasoning enhanced generative model developed by Microsoft, designed to enhance the capabilities of large language models (LLM) through knowledge extraction, storage and reasoning logic. Through multi-module design, this model can handle complex multi-hop question and answer tasks, and significantly improves the accuracy of question and answer in fields such as industrial manufacturing, mining, and pharmaceuticals. The main advantages of PIKE-RAG include efficient knowledge extraction capabilities, powerful multi-source information integration capabilities, and multi-step reasoning capabilities, making it perform well in scenarios that require deep domain knowledge and complex logical reasoning.

Open source model Industrial applications Reasoning enhancement Multi-hop Q&A +1

#Open source model

SkyReels-V1-Hunyuan-I2V

SkyReels-V1-Hunyuan-I2V

SkyReels V1 is a human-centered video generation model fine-tuned based on HunyuanVideo. It is trained through high-quality film and television clips to generate video content with movie-like quality. This model has reached the industry-leading level in the open source field, especially in facial expression capture and scene understanding. Its key benefits include open source leadership, advanced facial animation technology and cinematic light and shadow aesthetics. This model is suitable for scenarios that require high-quality video generation, such as film and television production, advertising creation, etc., and has broad application prospects.

Artificial Intelligence Open source video generation animation +3

#Open source model

SkyReels-V1

SkyReels-V1

SkyReels-V1 is an open source human-centered video basic model, fine-tuned based on high-quality film and television clips, focusing on generating high-quality video content. This model has reached the top level in the open source field and is comparable to commercial models. Its main advantages include: high-quality facial expression capture, cinematic light and shadow effects, and the efficient inference framework SkyReelsInfer, which supports multi-GPU parallel processing. This model is suitable for scenarios that require high-quality video generation, such as film and television production, advertising creation, etc.

Artificial Intelligence video generation Open source model Multi-GPU inference +1

#Open source model

DeepScaleR-1.5B-Preview

DeepScaleR-1.5B-Preview

DeepScaleR-1.5B-Preview is a large language model optimized by reinforcement learning, focusing on improving mathematical problem solving capabilities. This model significantly improves the accuracy in long text reasoning scenarios through distributed reinforcement learning algorithms. Its main advantages include efficient training strategies, significant performance improvements, and the flexibility of open source. The model was developed by UC Berkeley’s Sky Computing Lab and Berkeley AI Research teams to advance the use of artificial intelligence in education, particularly in mathematics education and competitive mathematics. The model is licensed under the MIT open source license and is completely free for researchers and developers to use.

productive forces

Artificial Intelligence reinforcement learning Open source model mathematics education +1

#Open source model

Lumina-Video

Lumina-Video

Lumina-Video is a video generation model developed by the Alpha-VLLM team, mainly used to generate high-quality video content from text. This model is based on deep learning technology and can generate corresponding videos based on text prompts input by users, which is efficient and flexible. It is of great significance in the field of video generation, providing content creators with powerful tools to quickly generate video materials. The project is currently open source, supports video generation at multiple resolutions and frame rates, and provides detailed installation and usage guides.

Artificial Intelligence content creation deep learning video generation +1

#Open source model

Zonos-v0.1

Zonos-v0.1

Zonos-v0.1 is a real-time text-to-speech (TTS) model developed by the Zyphra team with high-fidelity voice cloning capabilities. The model consists of a 1.6B parameter Transformer model and a 1.6B parameter Hybrid model (Hybrid), both released under the Apache 2.0 open source license. It generates natural, expressive speech based on text prompts and supports multiple languages. In addition, Zonos-v0.1 enables high-quality voice cloning from speech clips of 5 to 30 seconds, and can be adjusted based on conditions such as speaking speed, pitch, voice quality, and emotion. Its main advantages are high generation quality, support for real-time interaction, and flexible voice control capabilities. The model is released to promote research and development of TTS technology.

Multi-language support text to speech Open source model Voice cloning +1

#Open source model

Hibiki

Hibiki

Hibiki is an advanced model focused on streaming speech translation. It generates correct translation block by block by accumulating enough contextual information in real time, supports speech and text translation, and can perform sound conversion. The model is based on a multi-stream architecture and is able to process source and target speech simultaneously, generating a continuous audio stream and timestamped text translation. Its key benefits include high-fidelity speech conversion, low-latency real-time translation, and compatibility with complex reasoning strategies. Hibiki currently supports French to English translation, which is suitable for scenarios that require efficient real-time translation, such as international conferences, multi-language live broadcasts, etc. The model is open source and free, suitable for developers and researchers.

Open source model real-time translation Voice translation low latency +1

#Open source model

Qwen2.5-1M

Qwen2.5-1M

Qwen2.5-1M is an open source artificial intelligence language model designed for processing long sequence tasks and supports a context length of up to 1 million Tokens. This model significantly improves the performance and efficiency of long sequence processing through innovative training methods and technical optimization. It performs well on long context tasks while maintaining performance on short text tasks, making it an excellent open source alternative to existing long context models. This model is suitable for scenarios that require processing large amounts of text data, such as document analysis, information retrieval, etc., and can provide developers with powerful language processing capabilities.

productive forces

natural language processing Open source model Efficient reasoning Long sequence processing +1

#Open source model

BEN2

BEN2

BEN2 (Background Erase Network) is an innovative image segmentation model that uses the Confidence Guided Matting (CGM) process. It uses a thinning network to specifically process pixels where the model has lower confidence, resulting in more accurate matting effects. BEN2 performs well in hair matting, 4K image processing, object segmentation and edge refinement. Its base model is open source, and users can try the full model for free via API or web demo. The model training data includes DIS5k data set and 22K proprietary segmentation data set, which can meet a variety of image processing needs.

deep learning Open source model Image segmentation background erasure +1

#Open source model

YuE

YuE

YuE is an open source music generation model developed by the Hong Kong University of Science and Technology and the Multimodal Art Projection team. It can generate a complete song of up to 5 minutes, including vocals and backing parts, based on given lyrics. This model solves the complex problem of lyrics-to-song generation through a variety of technological innovations, such as semantically enhanced audio taggers, dual tagging technology, and lyric chain thinking. The main advantage of YuE is that it can generate high-quality music works, support multiple languages and music styles, and is highly scalable and controllable. The model is currently free and open source and aims to advance the development of music generation technology.

Artificial Intelligence Multi-language support Open source model music generation +1

#Open source model

Llasa-1B

Llasa-1B

Llasa-1B is a text-to-speech model developed by the Hong Kong University of Science and Technology Audio Laboratory. It is based on the LLaMA architecture and can convert text into natural and smooth speech by combining speech tags in the XCodec2 codebook. The model was trained on 250,000 hours of Chinese and English speech data and supports speech generation from plain text or synthesis using given speech cues. Its main advantage is that it can generate high-quality multi-language speech and is suitable for a variety of speech synthesis scenarios, such as audio books, voice assistants, etc. This model is licensed under CC BY-NC-ND 4.0 and commercial use is prohibited.

Artificial Intelligence speech synthesis text to speech Open source model +1

#Open source model

Llasa-3B

Llasa-3B

Llasa-3B is a powerful text-to-speech (TTS) model developed based on the LLaMA architecture and focuses on Chinese and English speech synthesis. By combining the speech coding technology of XCodec2, this model can efficiently convert text into natural and smooth speech. Its main advantages include high-quality speech output, support for multi-language synthesis, and flexible voice prompt functions. This model is suitable for a variety of scenarios that require speech synthesis, such as audiobook production, voice assistant development, etc. Its open source nature also allows developers to freely explore and extend its functionality.

speech synthesis Open source model text to speech Chinese and English support +1

#Open source model

MiniRAG

MiniRAG

MiniRAG is a retrieval enhancement generation system designed for small language models, aiming to simplify the RAG process and improve efficiency. It solves the problem of limited performance of small models in the traditional RAG framework through a semantic-aware heterogeneous graph indexing mechanism and a lightweight topology-enhanced retrieval method. This model has significant advantages in resource-constrained scenarios, such as in mobile devices or edge computing environments. The open source nature of MiniRAG also makes it easy to be accepted and improved by the developer community.

natural language processing Open source model Retrieval enhancement generation small language model +2

#Open source model

MatterGen

MatterGen

MatterGen is a generative AI tool launched by Microsoft Research for material design. It can directly generate new materials with specific chemical, mechanical, electronic or magnetic properties according to the design requirements of the application, providing a new paradigm for materials exploration. The emergence of this tool is expected to accelerate the research and development process of new materials, reduce research and development costs, and play an important role in batteries, solar cells, CO2 adsorbents and other fields. Currently, MatterGen’s source code is open source on GitHub for public use and further development.

productive forces

Generative AI Open source model Scientific research tools material design

#Open source model

Kokoro-82M

Kokoro-82M

Kokoro-82M is a text-to-speech (TTS) model created by hexgrad and hosted on Hugging Face. It has 82 million parameters and is open source using the Apache 2.0 license. The model released v0.19 on December 25, 2024, and provides 10 unique voice packs. Kokoro-82M ranked first in TTS Spaces Arena, showing its efficiency in parameter scale and data usage. It supports US English and British English and can be used to generate high-quality speech output.

speech synthesis Open source model text to speech Efficient computing

#Open source model

Llama-3-Patronus-Lynx-8B-Instruct

Llama-3-Patronus-Lynx-8B-Instruct

Llama-3-Patronus-Lynx-8B-Instruct is a fine-tuned version based on the meta-llama/Meta-Llama-3-8B-Instruct model developed by Patronus AI, mainly used to detect hallucinations in RAG settings. The model is trained on multiple data sets including CovidQA, PubmedQA, DROP, RAGTruth, etc., including manual annotation and synthetic data. It evaluates whether a given document, question, and answer is faithful to the document content, does not provide new information outside the document, and does not contradict the document information.

text generation Open source model Dialogue system Hallucination detection +1

#Open source model

Meta Video Seal

Meta Video Seal

Meta Video Seal is an advanced open source video watermarking model that can embed persistent, invisible watermarks after video editing. As AI-generated content increases, verifying video origins becomes critical. By embedding invisible watermarks, Video Seal can maintain the integrity of the watermark even after the video is edited, which is of great significance for copyright protection and content verification.

AI Open source model Copyright protection Content verification +1

#Open source model

OLMo-2-1124-13B-Instruct

OLMo-2-1124-13B-Instruct

OLMo-2-1124-13B-Instruct is a large-scale language model developed by Allen AI Research Institute, focusing on text generation and dialogue tasks. The model performs well on multiple tasks, including mathematical problem solving, scientific problem solving, and more. It is a 13B parameter-based version trained with supervised fine-tuning and reinforcement learning on specific datasets to improve its performance and security. As an open source model, it allows researchers and developers to explore and improve the science of language models.

productive forces

natural language processing machine learning text generation Open source model +1

#Open source model

OLMo-2-1124-7B-Instruct

OLMo-2-1124-7B-Instruct

OLMo-2-1124-7B-Instruct is a large-scale language model developed by the Allen Institute for Artificial Intelligence, focusing on dialogue generation tasks. The model is optimized on a variety of tasks, including mathematical problem solving, GSM8K, IFEval, etc., and is supervised fine-tuned on the Tülu 3 dataset. It is built on top of the Transformers library and can be used for research and educational purposes. The main advantages of this model include high performance, multi-task adaptability and open source, making it an important tool in the field of natural language processing.

natural language processing machine learning Open source model Hugging Face +1

#Open source model

Allegro-TI2V

Allegro-TI2V

Allegro-TI2V is a text-to-video generation model capable of generating video content based on user-provided prompts and images. The model has attracted attention for its open source nature, diverse content creation capabilities, high-quality output, small and efficient model parameters, and support for multiple accuracies and GPU memory optimization. It represents the current cutting-edge progress of artificial intelligence technology in the field of video generation and has important technical value and commercial application potential. The Allegro-TI2V model is provided on the Hugging Face platform and follows the Apache 2.0 open source protocol. Users can download and use it for free.

Artificial Intelligence video generation Open source model text to video +1

#Open source model

Llama-3.1-Tulu-3-70B-DPO

Llama-3.1-Tulu-3-70B-DPO

Llama-3.1-Tulu-3-70B-DPO is part of the Tülu3 family of models designed to provide a comprehensive guide to modern post-training techniques. This family of models aims to achieve state-of-the-art performance on a variety of tasks beyond chatting, such as MATH, GSM8K and IFEval. It is based on models trained on publicly available, synthetic and human-created datasets, is primarily in English, and is licensed under the Llama 3.1 Community License.

natural language processing text generation Open source model Dialogue system +1

#Open source model

Llama-3.1-Tulu-3-70B

Llama-3.1-Tulu-3-70B

Llama-3.1-Tulu-3-70B is a member of the Tülu3 family of models designed to provide a comprehensive guide to modern post-training techniques. The model not only performs well on chat tasks, but also shows excellent performance on multiple tasks such as MATH, GSM8K and IFEval. As an open source model, it allows researchers and developers to access and use its data and code to advance natural language processing technology.

natural language processing machine learning text generation Open source model

#Open source model

Qwen2.5-Coder-1.5B-Instruct-GPTQ-Int4

Qwen2.5-Coder-1.5B-Instruct-GPTQ-Int4

Qwen2.5-Coder is the latest series of Qwen large-scale language models, focusing on code generation, code reasoning and code repair. Based on the powerful Qwen2.5, this model includes 5.5 trillion source codes, text code associations, synthetic data, etc. in training. It is currently the leader in open source code language models, and its coding capabilities are comparable to GPT-4. In addition, Qwen2.5-Coder also has a more comprehensive real-world application foundation, such as code agents, etc., which not only enhances coding capabilities, but also maintains its advantages in mathematics and general capabilities.

code generation Programming assistance Open source model code fix +1

#Open source model

Qwen2.5-Coder-1.5B-Instruct-AWQ

Qwen2.5-Coder-1.5B-Instruct-AWQ

Qwen2.5-Coder is the latest series of Qwen large-scale language models, designed for code generation, reasoning and repair. Based on the powerful Qwen2.5, the model contains 5.5 trillion source codes, text code bases, synthetic data, etc. during training, making its code capabilities reach the latest level of open source code LLM. It not only enhances coding skills but also maintains advantages in math and general abilities.

code generation Programming assistance Open source model code fix +1

#Open source model

Qwen2.5-Coder-3B-Instruct-GPTQ-Int8

Qwen2.5-Coder-3B-Instruct-GPTQ-Int8

Qwen2.5-Coder-3B-Instruct-GPTQ-Int8 is a large language model in the Qwen2.5-Coder series, specially optimized for code generation, code reasoning and code repair. The model is based on Qwen2.5, and the training data includes source code, text code association, synthetic data, etc., reaching 5.5 trillion training tokens. Qwen2.5-Coder-32B has become the current most advanced large-scale language model for open source code, and its coding capabilities match GPT-4o. The model also provides a more comprehensive foundation for real-world applications such as code agents, which not only enhance coding capabilities but also maintain advantages in mathematical and general abilities.

code generation Open source model code fix code reasoning +2

#Open source model

Qwen2.5-Coder-3B-Instruct-GGUF

Qwen2.5-Coder-3B-Instruct-GGUF

Qwen2.5-Coder is the latest series of Qwen large-scale language models, focusing on code generation, code reasoning and code repair. Based on the powerful Qwen2.5, training tokens scale to 5.5 trillion, including source code, text code grounding, synthetic data, and more. Qwen2.5-Coder-32B has become the current most advanced large-scale language model for open source code, and its coding capabilities match GPT-4o. This model provides a more comprehensive foundation in practical applications, such as code agents, which not only enhances coding capabilities, but also maintains advantages in mathematics and general abilities.

code generation Programming assistance Open source model code fix +1

#Open source model

Qwen2.5-Coder-32B-Instruct-GPTQ-Int8

Qwen2.5-Coder-32B-Instruct-GPTQ-Int8

Qwen2.5-Coder-32B-Instruct-GPTQ-Int8 is a large language model optimized for code generation in the Qwen series. It has 3.2 billion parameters and supports long text processing. It is one of the most advanced models in the field of open source code generation. The model has been further trained and optimized based on Qwen2.5, which not only has significant improvements in code generation, reasoning and repair, but also maintains advantages in mathematics and general capabilities. The model uses GPTQ 8-bit quantization technology to reduce model size and improve operating efficiency.

code generation Programming assistance Open source model Long text processing +1

#Open source model

Qwen2.5-Coder-1.5B

Qwen2.5-Coder-1.5B

Qwen2.5-Coder-1.5B is a large language model in the Qwen2.5-Coder series, focusing on code generation, code reasoning and code repair. Based on the powerful Qwen2.5, this model has become the leader in the current open source code LLM by expanding the training token to 5.5 trillion, including source code, text code base, synthetic data, etc., with coding capabilities comparable to GPT-4o. In addition, Qwen2.5-Coder-1.5B also strengthens mathematical and general capabilities, providing a more comprehensive foundation for practical applications such as code agents.

code generation Programming assistance Open source model code fix +2

#Open source model

Qwen2.5-Coder-1.5B-Instruct

Qwen2.5-Coder-1.5B-Instruct

Qwen2.5-Coder is the latest series of Qwen large-scale language models, focusing on code generation, code reasoning and code repair. Based on the powerful capabilities of Qwen2.5, this model uses 5.5 trillion source codes, text code bases, synthetic data, etc. during training. It is currently the leader in open source code generation language models, and its coding capabilities are comparable to GPT-4o. It not only enhances coding capabilities, but also maintains its advantages in mathematics and general abilities, providing a more comprehensive foundation for practical applications such as code agency.

code generation Programming assistance Open source model code fix +1

#Open source model

Qwen2.5-Coder-3B-Instruct

Qwen2.5-Coder-3B-Instruct

Qwen2.5-Coder is the latest series of Qwen large-scale language models, focusing on code generation, code reasoning and code repair. Based on the powerful Qwen2.5, this series of models significantly improves code generation, reasoning and repair capabilities by increasing training tokens to 5.5 trillion, including source code, text code grounding, synthetic data, etc. Qwen2.5-Coder-3B is a model in the series with 3.09B parameters, 36 layers, 16 attention heads (Q) and 2 attention heads (KV), with a full 32,768 token context length. This model is currently the leader in open source code LLM, and its coding capabilities match GPT-4o, providing developers with a powerful code assistance tool.

code generation Programming assistance Open source model code fix +2

#Open source model

CogVideoX1.5-5B-SAT

CogVideoX1.5-5B-SAT

CogVideoX1.5-5B-SAT is an open source video generation model developed by the Knowledge Engineering and Data Mining Team of Tsinghua University. It is an upgraded version of the CogVideoX model. This model supports the generation of 10-second videos and supports the generation of higher-resolution videos. The model includes modules such as Transformer, VAE and Text Encoder, which can generate video content based on text descriptions. The CogVideoX1.5-5B-SAT model provides a powerful tool for video content creators with its powerful video generation capabilities and high-resolution support, especially in education, entertainment and business fields.

video generation Open source model text to video high resolution +1

#Open source model

Mochi in ComfyUI

Mochi in ComfyUI

Mochi is Genmo's latest open source video generation model, which is optimized in ComfyUI and can be implemented even with consumer-grade GPUs. Known for its high-fidelity motion and excellent prompt following, Mochi brings state-of-the-art video generation capabilities to the ComfyUI community. Mochi models are released under the Apache 2.0 license, which means developers and creators are free to use, modify, and integrate Mochi without being hindered by a restrictive license. Mochi is able to run on consumer-grade GPUs such as the 4090, and supports multiple attention backends in ComfyUI, allowing it to fit in less than 24GB of VRAM.

video generation Open source model Apache 2.0 License Consumer GPU

#Open source model

Tencent Hunyuan 3D

Tencent Hunyuan 3D

Tencent Hunyuan 3D is an open source 3D generative model that aims to solve the shortcomings of existing 3D generative models in terms of generation speed and generalization capabilities. The model adopts a two-stage generation method. The first stage uses a multi-view diffusion model to quickly generate multi-view images, and the second stage uses a feed-forward reconstruction model to quickly reconstruct 3D assets. The Hunyuan 3D-1.0 model can help 3D creators and artists automatically produce 3D assets, support rapid single-image 3D generation, and complete end-to-end generation within 10 seconds, including mesh and texture extraction.

automation image generation Open source model 3D generation +2

#Open source model

hertz-dev

hertz-dev

hertz-dev is Standard Intelligence's open source full-duplex, audio-only converter base model with 8.5 billion parameters. The model represents a scalable cross-modal learning technique capable of converting mono 16kHz speech into an 8Hz latent representation with a bitrate of 1kbps, outperforming other audio encoders. The main advantages of hertz-dev include low latency, high efficiency and ease of fine-tuning and building by researchers. Product background information shows that Standard Intelligence is committed to building general intelligence that is beneficial to all mankind, and hertz-dev is the first step in this journey.

Artificial Intelligence speech recognition audio processing Open source model +1

#Open source model

Mochi 1

Mochi 1

Mochi 1 is a research preview version of an open source video generation model launched by Genmo. It is committed to solving basic problems in the current AI video field. The model is known for its unparalleled motion quality, superior cue following capabilities, and ability to cross the uncanny valley to generate coherent, fluid human movements and expressions. Mochi 1 was developed in response to the need for high-quality video content generation, particularly in the gaming, film and entertainment industries. The product currently offers a free trial, and specific pricing information is not provided on the page.

AI video generation Open source model text to video High quality video +1

#Open source model

Allegro

Allegro

Allegro is an advanced text-to-video model developed by Rhymes AI that converts simple text prompts into high-quality short video clips. Allegro's open source nature makes it a powerful tool for creators, developers, and researchers in the field of AI video generation. The main advantages of Allegro include open source, diverse content creation, high-quality output, and small and efficient model size. It supports multiple precisions (FP32, BF16, FP16), and in BF16 mode, the GPU memory usage is 9.3 GB and the context length is 79.2k, which is equivalent to 88 frames. Allegro's technology core includes large-scale video data processing, video compression into visual tokens, and extended video diffusion transformers.

AI video generation Open source model text to video High quality video +2

#Open source model

Janus

Janus

Janus is an innovative autoregressive framework that addresses the limitations of previous approaches by separating visual encoding into distinct paths while utilizing a single, unified transformer architecture for processing. This decoupling not only alleviates the conflicting roles of the visual encoder in understanding and generation, but also enhances the flexibility of the framework. Janus' performance surpasses previous unified models and meets or exceeds the performance of task-specific models. Janus' simplicity, high flexibility, and effectiveness make it a strong candidate for the next generation of unified multimodal models.

multimodal Open source model Converter architecture visual encoding +1

#Open source model

LightRAG

LightRAG

LightRAG is a retrieval-enhanced generation model that aims to improve the performance of text generation tasks by combining the advantages of retrieval and generation. This model can provide more accurate and relevant information while maintaining generation speed, which is particularly important for application scenarios that require fast and accurate information retrieval. The development background of LightRAG is based on the need to improve existing text generation models, especially when large amounts of data and complex queries need to be processed. The model is currently open source and freely available, providing researchers and developers with a powerful tool to explore and implement retrieval-based text generation tasks.

natural language processing text generation Open source model Retrieve enhanced build

#Open source model

CogView3-Plus-3B

CogView3-Plus-3B

The text-to-image generation model developed by the Tsinghua University team is open source, has broad application prospects in the field of image generation, and has the advantages of high-resolution output.

Artificial Intelligence image generation deep learning text to image +1

#Open source model

Aria

Aria

Aria is a multi-modal native hybrid expert model with strong performance on multi-modal, language and encoding tasks. It excels in video and document understanding, supports multi-modal inputs up to 64K, and is able to describe a 256-frame video in 10 seconds. The Aria model has a parameter size of 25.3B and can be loaded with bfloat16 precision on a single A100 (80GB) GPU. Aria was developed to meet the need for multimodal data understanding, particularly in video and document processing. It is an open source model designed to advance the development of multi-modal artificial intelligence.

multimodal Document processing Open source model video understanding +1

#Open source model

CursorCore

CursorCore

CursorCore is a series of open source models designed to assist programming through programming instruction alignment, supporting features such as automated editing and inline chat. These features mimic the core capabilities of closed-source AI-assisted programming tools like Cursor. This project promotes the application of AI in the field of programming through the power of the open source community, allowing developers to write and edit code more efficiently. The project is currently in its early stages, but has already demonstrated its potential to improve programming efficiency and assist with code generation.

code generation Open source model AI-assisted programming Automated editing +1

#Open source model

Qwen2.5-LLM

Qwen2.5-LLM

The Qwen2.5 series language models are a series of open source decoder-only dense models, with parameter sizes ranging from 0.5B to 72B, designed to meet the needs of different products for model size. These models perform well in many fields such as natural language understanding, code generation, and mathematical reasoning, and are particularly suitable for application scenarios that require high-performance language processing capabilities. The release of Qwen2.5 series models marks an important progress in the field of large-scale language models, providing developers and researchers with powerful tools.

natural language processing machine learning Open source model high performance computing

#Open source model

Qwen2.5-Coder

Qwen2.5-Coder

Qwen2.5-Coder is a member of the Qwen2.5 open source family, focusing on code generation, reasoning, repair and other tasks. It improves coding capabilities by amplifying large-scale coding training data while maintaining mathematical and general capabilities. The model supports 92 programming languages and achieves significant improvements in code-related tasks. Qwen2.5-Coder adopts the Apache 2.0 license and is designed to accelerate the application of code intelligence.

Multi-language support code generation Open source model code fix +2

#Open source model

Qwen2.5

Qwen2.5

Qwen2.5 is a series of new language models built on the Qwen2 language model, including the general language model Qwen2.5, as well as Qwen2.5-Coder specifically for programming and Qwen2.5-Math for mathematics. These models are pre-trained on large-scale data sets, have strong knowledge understanding capabilities and multi-language support, and are suitable for various complex natural language processing tasks. Their main advantages include higher knowledge density, enhanced programming and mathematical capabilities, and better understanding of long text and structured data. The release of Qwen 2.5 is a major step forward for the open source community, providing developers and researchers with powerful tools to promote research and development in the field of artificial intelligence.

productive forces

Artificial Intelligence natural language processing Multi-language support Programming assistance +2

#Open source model

g1

g1 is an experimental project that aims to create an inference chain similar to OpenAI's o1 model on Groq hardware by using the Llama-3.1 70b model. This project demonstrates that it is possible to significantly improve the performance of existing open source models on logic problem solving using hinting technology alone, without the need for complex training. g1 helps the model achieve more accurate reasoning on logical problems through visual reasoning steps, which is of great significance for improving the logical reasoning ability of artificial intelligence.

Artificial Intelligence Open source model logical reasoning Groq hardware +1

#Open source model

AuraFlow v0.3

AuraFlow v0.3

AuraFlow v0.3 is a completely open source flow-based text-to-image generation model. Compared with the previous version AuraFlow-v0.2, the model has been trained with more calculations and fine-tuned on the aesthetic dataset to support various aspect ratios with width and height up to 1536 pixels. This model achieved state-of-the-art results on GenEval and is currently in the beta testing stage. It is being continuously improved and community feedback is very important.

image generation deep learning text to image Open source model

#Open source model

LongWriter

LongWriter

LongWriter is a long text generation model developed by a team at Tsinghua University. It is based on large-scale language models (LLMs) and is capable of generating text content of more than 10,000 words. This model is particularly suitable for scenarios where long coherent texts need to be generated, such as writing assistance, content creation, etc. Through fine tuning and optimization, LongWriter improves the quality and consistency of generated text while maintaining the efficiency and scalability of the model.

content creation writing assistance Open source model copywriting +1

#Open source model

CogVideoX-2B

CogVideoX-2B

CogVideoX-2B is an open source video generation model developed by the Tsinghua University team. It supports video generation using the English prompt language, has 36GB of inference GPU memory requirements, and can generate 6 seconds long, 8 frames per second, and 720*480 resolution videos. This model uses sinusoidal position embedding and currently does not support quantitative reasoning and multi-card reasoning. It is deployed based on Hugging Face's diffusers library and can generate videos based on text prompts, which has a high degree of creativity and application potential.

video generation Open source model AI creation

#Open source model

CogVideoX

CogVideoX

CogVideoX is an open source video generation model that has the same origin as the commercial model and supports the generation of video content through text descriptions. It represents the latest progress in text-to-video generation technology, has the ability to generate high-quality videos, and can be widely used in entertainment, education, business promotion and other fields.

Artificial Intelligence video generation Open source model text to video

#Open source model

DeepSeek-Coder-V2-Lite-Instruct

DeepSeek-Coder-V2-Lite-Instruct

DeepSeek-Coder-V2 is an open source Mixture-of-Experts code language model with performance comparable to GPT4-Turbo and outstanding performance on code-specific tasks. It is further pre-trained with an additional 6 trillion tokens, enhancing coding and mathematical reasoning capabilities while maintaining similar performance on general language tasks. Compared with DeepSeek-Coder-33B, there are significant improvements in code-related tasks, reasoning and general capabilities. In addition, the programming languages it supports are expanded from 86 to 338, and the context length is expanded from 16K to 128K.

Multi-language support code generation Open source model mathematical reasoning

#Open source model

DeepSeek-Coder-V2

DeepSeek-Coder-V2

DeepSeek-Coder-V2 is an open source Mixture-of-Experts (MoE) code language model with performance comparable to GPT4-Turbo and excellent performance on code-specific tasks. Based on DeepSeek-Coder-V2-Base, it is further pre-trained through a high-quality multi-source corpus of 6 trillion tokens, significantly enhancing coding and mathematical reasoning capabilities while maintaining performance on general language tasks. The supported programming languages have been expanded from 86 to 338, and the context length has been expanded from 16K to 128K.

Multi-language support code generation Programming assistance Open source model

#Open source model

Stable Audio Open

Stable Audio Open

Stable Audio Open is an open source text-to-audio model optimized for generating short audio samples, sound effects, and production elements. It allows users to generate up to 47 seconds of high-quality audio data through simple text prompts, and is particularly suitable for music production and sound design such as creating drum beats, instrumental riffs, ambient sounds, foley recordings, etc. A key benefit of the open source release is that users can fine-tune the model based on their own custom audio data.

Open source model music production audio generation sound design

#Open source model

360Zhinao-7B

360Zhinao-7B

360Zhinao is a series of 7B-scale intelligent language models open sourced by Qihoo 360, including a basic model and three dialogue models of different length contexts. These models have been pre-trained on large-scale Chinese and English corpora, perform well on a variety of tasks such as natural language understanding, knowledge, mathematics, code generation, etc., and have powerful long-text dialogue capabilities. The model can be used for the development and deployment of various conversational applications.

productive forces

Artificial Intelligence natural language processing Open source model Dialogue system

#Open source model

HuggingChat Assistants

HuggingChat Assistants

HuggingChat Assistants is a chatbot customization platform released by HuggingFace. Users can choose from multiple open source models hosted by HuggingFace to create customized chatbots suitable for multiple fields.

natural language processing chatbot Open source model HuggingFace +1

#Open source model

PIXART LCM

PIXART LCM

PIXART LCM is a text-to-image synthesis framework that integrates Latent Consistency Model (LCM) and ControlNet into the advanced PIXART-α model. PIXART LCM is known for its ability to generate high-quality images at 1024px resolution through an efficient training process. Integrating LCM in PIXART-δ significantly speeds up inference, allowing high-quality images to be generated in only 2-4 steps. Of particular note is that PIXART-δ achieves a breakthrough in generating a 1024x1024 pixel image in 0.5 seconds, a 7-fold improvement over PIXART-α. In addition, PIXART-δ is carefully designed to efficiently train on a 32GB V100 GPU in a single day. PIXART-δ with 8-bit inference capabilities can synthesize 1024px images within the 8GB GPU memory constraint, greatly enhancing its usability and accessibility. Furthermore, the introduction of ControlNet-like modules enables fine control over the text-to-image diffusion model. We introduce a novel ControlNet-Transformer architecture specifically tailored for Transformers, enabling explicit controllability and high-quality image generation. As a state-of-the-art open source image generation model, PIXART-δ provides a promising alternative to the family of stable diffusion models, making significant contributions to text-to-image synthesis.

image generation Open source model high quality images ControlNet +1

#Open source model