Found 52 related AI tools
Fogsight is an innovative animation engine that leverages large language models to generate vivid animations. Not only does it support multiple languages, it can also generate high-level narrative animations based on user input, and is suitable for education, entertainment and creative fields. Fogsight focuses on user experience, allowing interaction with AI through a simple interface to quickly generate the required animated content.
WeClone is a project that fine-tunes a large language model based on WeChat chat records. It is mainly used to achieve high-quality voice cloning and digital avatars. It combines WeChat voice messaging with a large 0.5B model, allowing users to interact with their digital doppelgangers via a chatbot. The technology has important applications in the fields of digital immortality and voice cloning, allowing users to continue communicating with others without their presence. This project is being rapidly iterated, suitable for users interested in AI and language models, and is currently in the free development stage.
Dream 7B is the latest diffusion large language model jointly launched by the NLP Group of the University of Hong Kong and Huawei's Noah's Ark Laboratory. It has demonstrated excellent performance in the field of text generation, especially in areas such as complex reasoning, long-term planning, and contextual coherence. This model adopts advanced training methods, has strong planning capabilities and flexible reasoning capabilities, and provides more powerful support for various AI applications.
NotaGen is an innovative symbolic music generation model that improves the quality of music generation through three stages of pre-training, fine-tuning and reinforcement learning. It uses large language model technology to generate high-quality classical scores, bringing new possibilities to music creation. The main advantages of this model include efficient generation, diverse styles, and high-quality output. It is suitable for fields such as music creation, education and research, and has broad application prospects.
Atom of Thoughts (AoT) is a new reasoning framework that transforms the reasoning process into a Markov process by representing solutions as combinations of atomic problems. This framework significantly improves the performance of large language models on inference tasks through the decomposition and contraction mechanism, while reducing the waste of computing resources. AoT can not only be used as an independent inference method, but also as a plug-in for existing test-time extension methods, flexibly combining the advantages of different methods. The framework is open source and implemented in Python, making it suitable for researchers and developers to conduct experiments and applications in the fields of natural language processing and large language models.
Spark-TTS is an efficient text-to-speech synthesis model based on a large language model with the characteristics of single-stream decoupled speech tokens. It leverages the power of large language models to reconstruct audio directly from code predictions, omitting additional acoustic feature generation models, thereby increasing efficiency and reducing complexity. The model supports zero-shot text-to-speech synthesis and is able to switch scenarios across languages and codes, making it ideal for speech synthesis applications that require high naturalness and accuracy. It also supports virtual voice creation, and users can generate different voices by adjusting parameters such as gender, pitch, and speaking speed. The background of this model is to solve the problems of low efficiency and high complexity in traditional speech synthesis systems, aiming to provide efficient, flexible and powerful solutions for research and production. Currently, the model is mainly geared toward academic research and legitimate applications, such as personalized speech synthesis, assistive technology, and language research.
Level-Navi Agent is an open source general network search agent framework that can decompose complex problems and gradually search information on the Internet until it answers user questions. It provides a benchmark for evaluating the performance of models on search tasks by providing the Web24 data set, covering five major fields: finance, games, sports, movies, and events. This framework supports zero-shot and few-shot learning, providing an important reference for the application of large language models in the field of Chinese web search agents.
M2RAG is a benchmark code library for retrieval augmentation generation in multimodal contexts. It answers questions by retrieving documents across multiple modalities and evaluates the ability of multimodal large language models (MLLMs) in leveraging multimodal contextual knowledge. The model was evaluated on tasks such as image description, multimodal question answering, fact verification, and image rearrangement, aiming to improve the effectiveness of the model in multimodal context learning. M2RAG provides researchers with a standardized testing platform that helps advance the development of multimodal language models.
TableGPT2-7B is a large-scale decoder model developed by Zhejiang University, specifically designed to handle data-intensive tasks, especially the interpretation and analysis of tabular data. The model is based on the Qwen2.5 architecture and is optimized through continuous pre-training (CPT) and supervised fine-tuning (SFT) to handle complex table queries and business intelligence (BI) applications. It supports Chinese queries and is suitable for enterprises and research institutions that need to process structured data efficiently. The model is currently free and open source, and a more professional version may be launched in the future.
MoBA (Mixture of Block Attention) is an innovative attention mechanism designed for large language models in long text contexts. It enables efficient long sequence processing by dividing context into chunks and letting each query token learn to focus on the most relevant chunks. The main advantage of MoBA is its ability to seamlessly switch between full attention and sparse attention, which not only ensures performance but also improves computational efficiency. This technology is suitable for tasks that require processing long texts, such as document analysis, code generation, etc., and can significantly reduce computing costs while maintaining high performance of the model. The open source implementation of MoBA provides researchers and developers with powerful tools to advance the application of large language models in the field of long text processing.
MNN Large Model Android App is an Android application developed by Alibaba based on large language model (LLM). It supports multiple modal inputs and outputs, including text generation, image recognition, audio transcription, and more. The application optimizes inference performance to ensure efficient operation on mobile devices while protecting user data privacy, with all processing done locally. It supports a variety of leading model providers, such as Qwen, Gemma, Llama, etc., and is suitable for a variety of scenarios.
Baichuan-M1-14B is an open source large language model developed by Baichuan Intelligence, specially optimized for medical scenarios. It is trained based on high-quality medical and general data of 20 trillion tokens, covering more than 20 medical departments, and has strong context understanding and long sequence task performance capabilities. The model performs well in the medical field and achieves the same results as a model of the same size in general tasks. Its innovative model structure and training methods enable it to perform well in complex tasks such as medical reasoning and disease judgment, providing strong support for artificial intelligence applications in the medical field.
Doubao-1.5-pro is a high-performance sparse MoE (Mixture of Experts) large language model developed by the Doubao team. This model achieves the ultimate balance between model performance and inference performance through integrated training-inference design. It performs well on multiple public evaluation benchmarks, especially in reasoning efficiency and multi-modal capabilities. This model is suitable for scenarios that require efficient reasoning and multi-modal interaction, such as natural language processing, image recognition, and voice interaction. Its technical background is based on the sparse activation MoE architecture, which achieves higher performance leverage than traditional dense models by optimizing the activation parameter ratio and training algorithm. In addition, the model also supports dynamic adjustment of parameters to adapt to different application scenarios and cost requirements.
PaSa is an advanced academic paper search agent developed by ByteDance. Based on large language model (LLM) technology, it can autonomously call search tools, read papers and filter relevant references to obtain comprehensive and accurate results for complex academic queries. The technique is optimized through reinforcement learning, trained using the synthetic dataset AutoScholarQuery, and performs well on the real-world query dataset RealScholarQuery, significantly outperforming traditional search engines and GPT-based methods. The main advantage of PaSa is its high recall and precision rates, which provide researchers with a more efficient academic search experience.
VITA-1.5 is an open source multi-modal large language model designed to achieve near real-time visual and voice interaction. It provides users with a smoother interactive experience by significantly reducing interaction latency and improving multi-modal performance. The model supports English and Chinese and is suitable for a variety of application scenarios, such as image recognition, speech recognition, and natural language processing. Its main advantages include efficient speech processing capabilities and powerful multi-modal understanding capabilities.
InternVL2-8B-MPO is a multimodal large language model (MLLM) that enhances the model's multimodal reasoning capabilities by introducing a mixed preference optimization (MPO) process. This model designed an automated preference data construction pipeline in terms of data, and built MMPR, a large-scale multi-modal reasoning preference data set. In terms of models, InternVL2-8B-MPO is initialized based on InternVL2-8B and fine-tuned using the MMPR data set, showing stronger multi-modal reasoning capabilities and fewer hallucinations. The model achieved an accuracy of 67.0% on MathVista, surpassing InternVL2-8B by 8.7 points, and its performance was close to InternVL2-76B, which is 10 times larger.
FlagEval is a model evaluation platform that focuses on the evaluation of large language models and multi-modal models. It provides a fair and transparent environment that allows different models to be compared under the same standards, helps researchers and developers understand model performance, and promotes the development of artificial intelligence technology. The platform covers a variety of model types such as dialogue models and visual language models, supports the evaluation of open source and closed source models, and provides special evaluations such as K12 subject tests and financial quantitative trading evaluations.
Kaka Subtitle Assistant (VideoCaptioner) is a powerful video subtitle preparation software that uses a large language model to perform intelligent segmentation, correction, optimization, and translation of subtitles, realizing one-click processing of the entire subtitle video process. The product does not require high configuration, is simple to operate, and has a built-in basic LLM model, ensuring that it can be used out of the box and consumes less model tokens, making it suitable for video producers and content creators.
FakeShield is a multi-modal framework designed to address two major challenges in the field of image detection and localization (IFDL): the black-box nature of the detection principle and the limited generalization ability between different tampering methods. FakeShield created the Multimodal Tamper Description Dataset (MMTD-Set) by enhancing the existing IFDL data set with GPT-4o, which is used to train FakeShield's tamper analysis capabilities. The framework includes a Domain Label-Guided Explainable Detection Module (DTE-FDM) and a Localization Module (MFLM), which are capable of handling various types of tamper detection interpretation and achieve localization guided by detailed text descriptions. FakeShield outperforms other methods in detection accuracy and F1 score, providing an interpretable and superior solution.
awesome-LLM-resourses is a platform that aggregates global large language model (LLM) resources, providing a series of resources and tools from data acquisition, fine-tuning, reasoning, evaluation to practical applications. Its importance lies in providing researchers and developers with a comprehensive resource library so that they can develop and optimize their language models more efficiently. The platform is maintained by Wang Rongsheng and continuously updated, providing strong support for the development of the LLM field.
VirtualWife is a virtual digital human project that aims to create a virtual partner with its own "soul". This project supports Bilibili live broadcast and is compatible with large language models such as openai and ollama. VirtualWife can not only provide emotional companionship, but also serve as a love mentor and psychological counselor to meet human emotional needs. The project is in the incubation stage, and the author has invested a lot of spare time in development. I hope users can support the development of the project by clicking star.
Open O1 is an open source project that aims to match the proprietary and powerful O1 model capabilities through open source innovation. The project gives these smaller models more powerful long-term reasoning and problem-solving capabilities by curating a set of O1-style thinking data for training LLaMA and Qwen models. As the Open O1 project progresses, we will continue to push what is possible with large language models, and our vision is to create a model that not only achieves O1-like performance, but also leads in test-time scalability, making advanced AI capabilities available to everyone. Through community-driven development and a commitment to ethical practices, Open O1 will become a cornerstone of AI progress, ensuring that the future development of the technology is open and beneficial to all.
Diabetica is a high-level language model developed specifically for diabetes treatment and care. Through deep learning and big data analysis, it is able to provide a variety of services including diagnosis, treatment recommendations, medication management, lifestyle advice and patient education. Diabetica’s models Diabetica-7B and Diabetica-1.5B demonstrate excellent performance on multiple diabetes-related tasks and provide a reproducible framework that allows other medical fields to benefit from such AI technology.
WaveCoder is a large code language model developed by Microsoft Research Asia. It enhances the breadth and versatility of the large code language model through instruction fine-tuning. It demonstrates excellent performance in multiple programming tasks such as code summarization, generation, translation, and repair. The innovation of WaveCoder lies in the data synthesis framework and two-stage instruction data generation strategy it uses to ensure the high quality and diversity of data. The open source of this model provides developers with a powerful programming aid that helps improve development efficiency and code quality.
RD-Agent is an automated research and development tool launched by Microsoft Research Asia. Relying on the powerful capabilities of large language models, it creates a new model of artificial intelligence-driven R&D process automation. By integrating data-driven R&D systems, it can use artificial intelligence capabilities to drive the automation of innovation and development. It not only improves R&D efficiency, but also uses intelligent decision-making and feedback mechanisms to provide unlimited possibilities for future cross-field innovation and knowledge transfer.
PresentationGen is a web application developed based on the SpringBoot framework. It automatically generates PPT files by integrating a large language model (LLM). This technology achieves rapid generation of PPTX files by preprocessing a large number of single-page templates and combining them in real time according to user needs when using them. It supports text replacement, making the generated presentations more personal and professional. This product is mainly aimed at users who need to quickly create presentations, such as business people, educators, and designers, helping them save time and improve work efficiency.
Hanwang Tiandi Large Model is a large language model launched by Hanwang Technology that focuses on the field of artificial intelligence and has 30 years of industry accumulation. It can realize multiple rounds of dialogue, handle tasks efficiently, and delve into multiple vertical subdivisions such as office, education, and humanities. This model continuously optimizes its own intelligence through intensive learning from human feedback, and provides diversified services including intelligent proofreading, automatic translation, legal consultation, drawing generation, copywriting generation, etc. to empower industries such as law, humanities, office, education, and medical care to improve efficiency and creativity.
AMchat is a large language model that integrates mathematical knowledge, advanced mathematics exercises and their solutions. It is based on the InternLM2-Math-7B model, fine-tuned through xtuner, and is specifically designed to answer advanced mathematics problems. This project won the Top 12 and Innovation and Creativity Awards in the 2024 Puyuan Large Model Series Challenge (Spring Competition), reflecting its professional capabilities and innovation in the field of advanced mathematics.
The Index-1.9B series is a lightweight large language model independently developed by Bilibili. It includes multiple versions, such as base, pure, chat and character, etc. It is suitable for pre-training of corpus mainly in Chinese and English, and has performed well on multiple evaluation benchmarks. The model supports SFT and DPO alignment, as well as RAG technology for role-play customization, which is suitable for scenarios such as dialogue generation and role-playing.
AIGCRank Large Language Model API Price Comparison is a tool specifically designed to aggregate and compare price information from major global AI model providers. It provides users with the latest large language model (LLM) price data, including some free AI large model APIs. Through this platform, users can easily find and compare the latest prices of major domestic and foreign API providers such as OpenAI, Claude, Mixtral, Kimi, Spark Model, Tongyi Qianwen, Wenxinyiyu, Llama 3, GPT-4, AWS and Google to ensure that they find the model pricing that best suits their projects.
HuggingChat is an iOS app designed to facilitate seamless communication between users and multiple state-of-the-art large-scale language models from multiple providers such as Mistral AI, Meta, and Google. It can meet the needs of various scenarios: stimulate creativity, provide expert guidance, promote education and self-improvement, improve work efficiency, quickly respond to daily problems, etc. As a pioneer adopter of transformative AI technology, HuggingChat will let you experience the endless possibilities of talking with advanced large language models.
MA-LMM is a large-scale multi-modal model based on a large language model, mainly designed for long-term video understanding. It processes videos online and uses a memory bank to store past video information, so that it can refer to historical video content for long-term analysis without exceeding the context length limit of the language model or the GPU memory limit. MA-LMM can be seamlessly integrated into current multi-modal language models and has achieved leading performance in tasks such as long video understanding, video question answering and video subtitles.
The "Infini-attention" technology developed by Google aims to extend the Transformer-based large language model to handle infinitely long inputs, achieve infinitely long input processing through a compressed memory mechanism, and achieve excellent performance on multiple long sequence tasks. Technical methods include compressed memory mechanisms, the combination of local and long-term attention, and streaming processing capabilities. Experimental results show performance advantages on long-context language modeling, key context chunk retrieval, and book summarization tasks.
MoMA Personalization is a personalized image generation tool based on the open source Multimodal Large Language Model (MLLM). It focuses on topic-driven personalized image generation, which can generate high-quality images that preserve the characteristics of target objects based on reference images and text prompts. MoMA does not require any fine-tuning and is a plug-in model that can be directly applied to existing diffusion models and improve the details and prompt fidelity of generated images while retaining the performance of the original model.
mPLUG-DocOwl is a modular multimodal large language model for document understanding, capable of handling OCR-free document understanding tasks. This model has excellent performance and supports a variety of tasks such as document visual question answering, information question answering, chart question answering, etc. Users can experience its powerful features through the online demo provided by the model.
Polaris is a large language model (LLM) system developed by Hippocratic AI that is highly focused on security and used in healthcare. Through a combination of constellation architecture and professional support agents, Polaris is able to perform multiple medical-related complex tasks. The product is positioned to provide long-term, multi-round voice conversations with patients and provide professional and accurate medical advice. In terms of price, it is billed by the hour, $9 per hour. The main functions include real-time multiple rounds of voice dialogue, medical information provision and interpretation, privacy and compliance checks, medication management and consultation, laboratory and vital sign analysis, nutritional advice, medical record and policy inquiry, patient relationship building, etc.
Search4All is a question and answer system based on large language models. It can answer various questions, including factual questions, interpretive questions, analytical questions, etc. The system uses advanced natural language processing technology to deeply understand the meaning of the question and give accurate answers. It has a wide range of knowledge reserves, covering history, geography, science, art, sports and other fields. At the same time, it also has certain reasoning and analytical abilities and can conduct logical analysis and suggestive answers to complex questions. Using Search4All can help users quickly obtain the information they need and improve work efficiency.
This GitHub repository is a centralized storage center for resources related to generative artificial intelligence, including the latest monthly research papers, interview question banks, course materials, code notebooks, etc. The content is regularly updated to allow developers and practitioners to keep up with the latest developments and improve productivity. The main resources include paper abstracts, interview question classifications, free course lists, open source notebooks, etc., as well as some usage scenarios and examples.
The Daguan "Cao Zhi" large model is a domestic large language model focusing on long text, multi-language, and vertical development. It has automated writing, translation, and professional report writing capabilities, and supports multi-language applications and vertical industry customization. It can provide high-quality copywriting services, is widely applicable to various industries, and is an intelligent tool to solve practical problems of enterprises.
ZeroTrusted.ai is a pioneering company specializing in generative AI security. Their LLM firewall product is designed to protect you from the risk of data exposure and exploitation by unscrupulous language model providers or malicious actors due to language model training data sets that may contain your sensitive information. The product provides an anonymity function to protect prompt privacy, ensures data security and privacy through ztPolicyServer and ztDataPrivacy, optimizes prompts and verification results to improve accuracy and prevent model fabrication, and supports integration with multiple tools such as LangChain and Zapier. The product is divided into multiple pricing plans such as free version, standard version, commercial version and enterprise version, with different functions and service levels. ZeroTrusted.ai is committed to simplifying security compliance and maximizing the protection of applications and data through cloud-agnostic zero trust solutions, dynamic adaptive encryption and other technologies.
Mobile-Agent is an autonomous multi-mode mobile device agent that leverages Multi-Mode Large Language Model (MLLM) technology, first using visual perception tools to accurately identify and locate visual and textual elements in the application's front-end interface. Based on the perceived visual environment, it autonomously plans and decomposes complex operational tasks and navigates mobile applications through step-by-step operations. Unlike previous solutions that relied on the application's XML files or mobile system metadata, Mobile-Agent's vision-centric approach is more adaptable across a variety of mobile operating environments, eliminating the need for specific system customization. To evaluate the performance of Mobile-Agent, we introduce Mobile-Eval, a benchmark for evaluating mobile device operations. Based on Mobile-Eval, we conducted a comprehensive evaluation of Mobile-Agent. Experimental results show that Mobile-Agent achieves significant accuracy and completion rate. Even under challenging instructions, such as multi-application operations, Mobile-Agent can still complete the requirements.
E^2-LLM is an efficient and extremely extended large language model method. It achieves effective support for long context tasks by requiring only one training process and greatly reducing computational costs. This method adopts RoPE position embedding and introduces two different enhancement methods aimed at making the model more robust during inference. Comprehensive experimental results on multiple benchmark datasets demonstrate the effectiveness of E^2-LLM on challenging long context tasks.
M2UGen is a multi-modal music understanding and generation framework that combines large language models and is designed to help users create music. It can simultaneously complete music understanding and multi-modal music generation tasks.
LLM Maybe LongLM is an artificial intelligence platform for developers and researchers, providing various models, data sets and solutions. Among them, LLM Maybe LongLM is a research result targeting long context processing of large language models. It achieves the ability to process long context through self-expansion. This method requires no training and requires only a few code modifications to the original model to expand the context window, providing an effective solution for processing long texts.
Ollama is a native large language model tool that allows users to quickly run Llama 2, Code Llama and other models. Users can customize and create their own models. Ollama currently supports macOS and Linux, with a Windows version coming soon. This product is positioned to provide users with a localized large language model operating environment to meet users' personalized needs.
TrackGPTs is a GPT tracking and analysis platform that continuously discovers new GPTs by tracking social media and other channels, and provides rich indicators to analyze the performance of each GPT. It can also analyze the historical data of GPTs, track its growth, and help users fully understand the latest developments in the GPT market.
Denser Chatbots can create chatbots using your personal website or uploaded files. Denser uses advanced technology to process your data and use large language models to extract insights from your specific data to answer your queries. Using the Retrieval Augmented Generation (RAG) approach, Denser Chatbots are able to generate answers based on your unique knowledge base, providing responses that are more personalized and relevant than standard large language models. Building and deploying Denser Chatbots is easy, just provide your website URL to start building and deploying, no programming skills are required.
GitLab Duo Chat is GitLab's AI conversation assistant, which can help users ask questions and obtain GitLab-related information. It uses a large language model that can handle natural language questions and provide answers.
AutoGen is a next-generation large language model application based on a multi-agent dialogue framework. It simplifies the orchestration, automation, and optimization of complex LLM workflows, maximizing the performance of LLM models and overcoming their weaknesses. AutoGen supports a variety of complex conversation patterns, with customizable and conversational agents, and developers can use AutoGen to build various conversation patterns.
The Prompt Engineering Guide is a guide that comprehensively introduces prompt engineering, including basic concepts, general tips for designing prompts, prompt technology, prompt applications, etc. It helps users better understand the capabilities and limitations of large language models, and master various skills and techniques for interacting with and developing large language models.
Brainglue is an experimental platform for large-scale language models that allows anyone to build powerful prompt chains to solve complex generative AI problems. Brainglue provides an intuitive AI playground that makes making and implementing cue chains a breeze. Users can experiment with different AI configurations by adjusting context windows and temperature settings. Brainglue supports a variety of cutting-edge AI models such as GPT-3.5 and GPT-4, and plans to add more model support soon. Users can define global variables and dynamically assign values for use in the prompt chain. Brainglue supports using the AI's response in the prompt chain to inform the next prompt, achieving more complex and reasonable output. Brainglue provides a template library of prompt chains, which enhances AI's reasoning capabilities.
The Mencius Generative Large Model (Mencius GPT) is a controllable large language model oriented to generative scenarios, which can help users complete a variety of tasks in specific scenarios through multiple rounds. It supports functions such as knowledge Q&A, multi-language translation, general writing and financial scenario tasks, and has the advantages of being more controllable, flexible, personalized and professional. Please consult the official website for specific pricing and usage methods.