Found 36 AI tools
Click any tool to view details
Gitee AI brings together the latest and hottest AI models, provides one-stop services for model experience, inference, training, deployment and application, provides abundant computing power, and is positioned as the best AI community in China.
MouSi is a multi-modal visual language model designed to address current challenges faced by large-scale visual language models (VLMs). It uses integrated expert technology to collaborate the capabilities of individual visual encoders, including image-text matching, OCR, image segmentation, etc. This model introduces a fusion network to uniformly process outputs from different vision experts and bridge the gap between image encoders and pre-trained LLMs. In addition, MouSi also explored different position encoding schemes to effectively solve the problems of position encoding waste and length limitation. Experimental results show that VLMs with multiple experts exhibit superior performance than isolated visual encoders, and obtain significant performance improvements as more experts are integrated.
OpenAI Embedding Models is a series of new embedding models, including two new embedding models and updated GPT-4 Turbo preview models, GPT-3.5 Turbo models, and text content review models. By default, data sent to the OpenAI API is not used to train or improve OpenAI models. New embedding models with lower pricing include the smaller, more efficient text-embedding-3-small model and the larger, more powerful text-embedding-3-large model. An embedding is a sequence of numbers that represents a concept in something like natural language or code. Embeddings make it easier for machine learning models and other algorithms to understand the relationships between content and perform tasks such as clustering or retrieval. They provide support for knowledge retrieval in the ChatGPT and Assistants APIs, as well as many retrieval augmentation generation (RAG) development tools. text-embedding-3-small is a new efficient embedding model. Compared with its predecessor text-embedding-ada-002 model, it has stronger performance. The average MIRACL score increased from 31.4% to 44.0%, while the average score in the English task (MTEB) increased from 61.0% to 62.3%. Pricing for text-embedding-3-small is also 5x lower than the previous text-embedding-ada-002 model, from $0.0001 per thousand tags to $0.00002. text-embedding-3-large is a new generation of larger embedding models, capable of creating embeddings of up to 3072 dimensions. With stronger performance, the average MIRACL score increased from 31.4% to 54.9%, while the average score in MTEB increased from 61.0% to 64.6%. text-embedding-3-large is priced at $0.00013/thousand marks. Additionally, we support native functionality for shortening embeddings, allowing developers to trade off performance and cost.
Adept Fuyu-Heavy is a new multi-modal model designed specifically for digital agencies. It performs well in multimodal reasoning, particularly in UI understanding, while also performing well on traditional multimodal benchmarks. Furthermore, it demonstrates our ability to extend the Fuyu architecture and obtain all associated benefits, including processing images of arbitrary sizes/shapes and efficiently reusing existing transformer optimizations. It also has the ability to match or exceed the performance of models of the same computational level, albeit requiring some of the capacity to be devoted to image modeling.
Meta-Prompting is an effective scaffolding technique designed to enhance the functionality of language models (LM). This method transforms a single LM into a multi-faceted commander, adept at managing and integrating multiple independent LM queries. By using high-level instructions, meta-cues guide LM to decompose complex tasks into smaller, more manageable subtasks. These subtasks are then handled by different "expert" instances of the same LM, each operating according to specific customized instructions. At the heart of this process is the LM itself, which, as the conductor, ensures seamless communication and effective integration between the outputs of these expert models. It also leverages its inherent critical thinking and robust validation processes to refine and validate the final results. This collaborative prompting approach enables a single LM to simultaneously act as a comprehensive commander and a diverse team of experts, significantly improving its performance in a variety of tasks. The zero-shot, task-agnostic nature of meta-cues greatly simplifies user interaction, eliminating the need for detailed task-specific instructions. Furthermore, our research shows that external tools, such as the Python interpreter, can be seamlessly integrated with the meta-hint framework, thereby broadening its applicability and utility. Through rigorous experiments with GPT-4, we demonstrate that meta-cueing outperforms traditional scaffolding methods: averaged across all tasks, including the 24-point game, One Move General, and Python programming puzzles, meta-cueing using the Python interpreter feature outperforms standard prompts by 17.1%, is 17.3% better than expert (dynamic) prompts, and is 15.2% better than multi-personality prompts.
WARM is a solution for aligning large language models (LLMs) with human preferences through the Weighted Average Reward Model (WARM). First, WARM fine-tunes multiple reward models and then averages them in the weight space. Through weighted averaging, WARM improves efficiency compared to traditional predictive ensemble methods, while improving reliability under distribution shifts and preference inconsistencies. Our experiments show that WARM outperforms traditional methods on summarization tasks, and using optimal N and RL methods, WARM improves the overall quality and alignment of LLM predictions.
ReFT is a simple and effective way to enhance the inference capabilities of large language models (LLMs). It first warms up the model through supervised fine-tuning (SFT), and then uses online reinforcement learning, specifically the PPO algorithm in this article, to further fine-tune the model. ReFT significantly outperforms SFT by automatically sampling a large number of reasoning paths for a given problem and naturally deriving rewards from real answers. The performance of ReFT may be further improved by incorporating inference-time strategies such as majority voting and re-ranking. It is important to note that ReFT improves by learning the same training problem as SFT without relying on additional or enhanced training problems. This shows that ReFT has stronger generalization ability.
Contrastive Preference Optimization is an innovative approach to machine translation that significantly improves the performance of ALMA models by training the model to avoid generating translations that are merely adequate but not perfect. This method can meet or exceed the performance of WMT competition winners and GPT-4 on WMT'21, WMT'22 and WMT'23 test datasets.
Zhipu AI released GLM-4 and CogView3 at the first Technology Open Day. The overall performance of GLM-4 has been improved by nearly 60%, supporting longer context, stronger multi-modal support and faster reasoning. CogView3 approaches the multi-modal generation capabilities of DALL·E 3. The product is positioned as the next generation of base model and image generation AI.
Chain-of-Table is a reasoning linked list framework for table understanding, specially designed to handle tasks such as table-based question answering and fact verification. It uses tabular data as part of the reasoning chain and guides large language models to perform operation generation and table updating by learning in context, thereby forming a continuous reasoning chain that demonstrates the reasoning process for a given tabular problem. This chain of reasoning contains structured information about intermediate results, enabling more accurate and reliable predictions. Chain-of-Table achieves new state-of-the-art performance on multiple benchmarks including WikiTQ, FeTaQA and TabFact.
DocGraphLM is a document graph language model for information extraction and question answering. It uses advanced visual rich document understanding technology, combining pre-trained language models and graph semantics. It is unique in that it proposes a joint encoder architecture to represent documents and adopts a novel link prediction method to reconstruct the document graph. DocGraphLM predicts the direction and distance between nodes via a convergent joint loss function, prioritizing neighborhood recovery and downweighting remote node detection. Experiments on three SotA datasets show that employing graphical features can achieve consistent improvements in information extraction and question answering tasks. Furthermore, we report that employing graphical features accelerates convergence during training, even though these features are only constructed via link prediction.
TinyGPT-V is an efficient multi-modal large-scale language model implemented by using a small backbone network. It has powerful language understanding and generation capabilities and is suitable for various natural language processing tasks. TinyGPT-V uses Phi-2 as a pre-trained model, which has excellent performance and efficiency.
PowerInfer is an engine for high-speed large language model inference on PCs using consumer GPUs. It exploits the high locality feature in LLM inference by preloading thermally activated neurons onto the GPU, thereby significantly reducing GPU memory requirements and CPU-GPU data transfer. PowerInfer also integrates adaptive predictors and neuron-aware sparsity operators to optimize the efficiency of neuron activation and computational sparsity. It can perform inference on a single NVIDIA RTX 4090 GPU at an average generation rate of 13.20 tokens per second, which is only 18% lower than the top server-grade A100 GPU. while maintaining model accuracy.
Gemini is Google's most powerful and versatile AI model, designed to be multi-modal and optimized for three different sizes: Ultra, Pro, and Nano. Gemini models offer superior performance and next-generation features to provide powerful AI support for a variety of applications. It provides scalable, efficient solutions with a focus on responsibility and security. Gemini models are already available on the market.
LEO is a multi-modal, multi-task omnipotent agent based on a large language model, capable of sensing, localizing, reasoning, planning and executing tasks in the 3D world. LEO is implemented through two stages of training: (i) 3D visual language alignment and (ii) 3D visual language action command adjustment. We carefully curate and generate a large-scale dataset containing object-level and scene-level multi-modal tasks that require deep understanding and interaction with the 3D world. Through rigorous experiments, we demonstrate LEO's outstanding performance on a wide range of tasks including 3D subtitles, question answering, reasoning, navigation, and robot operation.
AndesGPT Andes large model is a personalized and exclusive large model and agent released by OPPO. It is based on the device-cloud collaboration architecture design and provides a variety of model specifications with different parameter sizes. It supports technical features such as dialogue enhancement, personalization and device-cloud collaboration. OPPO will lay out large-scale forward-looking technologies, cooperate with the Intelligent Computing Joint Laboratory established by the University of Science and Technology of China, and open source the agent framework to support the efficient incubation, hosting and application of agents.
Zidong Taichu is a new generation large model launched by the Institute of Automation of the Chinese Academy of Sciences and the Wuhan Institute of Artificial Intelligence. It supports comprehensive question and answer tasks such as multiple rounds of question and answer, text creation, image generation, 3D understanding, signal analysis, etc., and has stronger cognitive, understanding, and creative capabilities. It has a wide range of application scenarios, including text creation, knowledge question and answer, image, text and sound understanding, music generation, 3D understanding and signal analysis and other functions. The product is positioned to provide high-quality artificial intelligence interactive experience.
Video GPT is an artificial intelligence-based video generation model that can generate various types of videos based on user input. It is highly flexible and creative to generate authentic and realistic video content. The advantage of Video GPT lies in its powerful language understanding and video generation capabilities. Users can quickly generate videos that meet their needs through simple text input. Video GPT's pricing is based on usage, with flexible payment plans available.
PremAI is an autonomous and controllable artificial intelligence infrastructure that provides complete AI solutions. It is highly flexible and scalable to meet a variety of different AI needs. The main functions of PremAI include model training and deployment, data management and processing, model evaluation and optimization, etc. Its advantage is that it provides an autonomous and controllable AI environment, and users can fully control their own data and models. The pricing of PremAI is based on user needs and usage. Please visit the official website for specific details.
Pali3 is a visual language model that generates the desired answer by encoding an image and passing it along with the query to an encoder-decoder Transformer. The model is trained through multiple stages, including single-modal pre-training, multi-modal training, resolution increasing, and task specialization. The main functions of Pali3 include image encoding, text encoding, text generation, etc. This model is suitable for tasks such as image classification, image subtitles, and visual question answering. The advantages of Pali3 include simple model structure, good training effect, and fast speed. The product is priced as free and open source.
OVO A.I. aims to provide useful solutions to humanize technology by leveraging cutting-edge research results in the field of artificial intelligence. We have adapted to technology for too long, it is time for technology to adapt to us humans. Make miracles possible!
Google Cloud AutoML can automatically build and deploy advanced custom machine learning models based on structured data. Using a simple graphical interface, developers can train high-quality models without in-depth machine learning knowledge, and can easily deploy and adjust models. Covering many fields such as image classification, object detection, and text classification.
Amazon SageMaker is a fully managed machine learning service that helps developers and data scientists quickly and cost-effectively build, train, and deploy high-quality machine learning models. It provides a complete development environment, including visual interface, Jupyter notebook, automatic machine learning, model training and deployment and other functions. Users can build end-to-end machine learning solutions through SageMaker without managing any infrastructure.
Moda Community is a developer community of artificial intelligence models. It brings together the most advanced machine learning models in various fields and provides users with one-stop services for model exploration, customization, training, deployment and application. Users can easily search for models of interest and get started quickly. At the same time, the community has also open sourced many pre-trained models, and developers can conduct secondary development based on these models. The Moda community is committed to lowering the threshold for AI development and helping developers obtain and use AI capabilities more conveniently.
The Baichuan model is a Chinese-English bilingual model that integrates intent understanding, information retrieval and reinforcement learning technology. It combines supervised fine-tuning and human intention alignment, and performs outstandingly in the fields of knowledge question answering and text creation. Baichuan-7B and Baichuan-13B are two large Chinese models that are open source and can be used free of charge for commercial use. They are among the best in many authoritative evaluation lists and have been downloaded over one million times. The product is positioned to provide high-quality language AI services to help users obtain world knowledge and professional services easily and universally.
BOMML is an intelligent AI hosting platform that provides one-stop AI solutions for your business. We provide you with comprehensive assistance from data collection to model deployment. Our AI models run on a secure data center cloud, protecting your privacy and data security. BOMML supports a variety of tasks, including text generation, conversational chat, embedded control, analysis, optical character recognition, and more. Regardless of your technology stack, it’s easy to integrate AI into your applications via APIs. We offer the most competitive pricing on the market, so you only pay for what you use. If you have a specific task or need AI based on your data, we can tune and train it for you. You can add documents, files, and other metadata as a knowledge base to generate more relevant responses. If you need to run a proprietary AI model on your hardware, we can help with that too. Whatever your needs, our experts will find a solution for you.
Anthropic is an artificial intelligence platform that provides advanced artificial intelligence solutions through technologies such as deep learning and natural language processing. Our products have powerful functions and advantages and can be applied in image recognition, natural language processing, machine learning and other fields. The pricing is flexible and reasonable, and it is positioned to help users achieve their goals of artificial intelligence applications. Whether you are a developer, researcher, or enterprise, Anthropic has what you need.
Baichuan-13B is an open source and commercially available large-scale language model developed by Baichuan Intelligence, with a parameter volume of 13 billion and a training data volume of 1.4 trillion tokens. The model supports Chinese and English bilingualism and has high-quality prediction and conversational capabilities. The model supports quantized deployment and CPU inference, and achieves excellent results in multiple benchmarks. It can be widely used in tasks in the field of natural language processing, such as question and answer systems, dialogue systems, text generation, etc.
Pangu Big Model is an artificial intelligence solution launched by Huawei Cloud. It uses multiple models such as NLP big model, CV big model, multi-modal big model, prediction big model, and scientific computing big model to achieve multiple functions such as dialogue question and answer, image recognition, multi-modal processing, predictive analysis, and scientific computing. The Pangu large model has the characteristics of efficient adaptation, efficient annotation, and accurate controllability, and can be widely used in various industries. Please visit the official website for details.
Openfabric AI is a distributed artificial intelligence platform that creates a new foundation for the construction and use of artificial intelligence applications through blockchain, advanced encryption and new infrastructure. It reduces the infrastructure requirements and technical knowledge required to leverage AI applications, promoting new market opportunities.
The iFlytek Spark cognitive large model is a new generation of cognitive intelligence large model launched by iFlytek. It has cross-domain knowledge and language understanding capabilities, and can understand and perform tasks based on natural dialogue. It has various abilities such as language understanding, knowledge question and answer, logical reasoning, mathematical problem solving, code understanding and writing, etc. This product is positioned to provide users with comprehensive language understanding and task execution solutions.
FABRIC is a tool for personalizing diffusion models through iterative feedback. It provides an easy way to improve model performance based on user feedback. Users can interact with the model in an iterative manner and adjust the model's predictions through feedback. FABRIC also provides rich functionality, including model training, parameter tuning, and performance evaluation. Its pricing is based on user usage and can meet the needs of different users.
Llama 2 is our next generation of open source large-scale language models, free for research and commercial use. It is robust in functionality and performance, with security and performance continuously improved through testing with external partners and internal teams. Llama 2 supports a wide range of use cases, making it ideal for solving difficult problems and driving innovation.
The pseudo-flexible base model (ptx0/pseudo-flex-base) is a text-to-image generation model based on Diffusion technology. It provides flexible image generation capabilities by converting text descriptions into realistic images. This model can generate images consistent with text descriptions based on given text prompts, with a high degree of flexibility and generation effects. The model also has stable performance and a reliable training basis, and can be widely used in image generation tasks in the field of artificial intelligence.
Stability AI generative model is an open source generative model library that provides training, inference and application functions for various generative models. The library supports the training of various generative models, including training based on PyTorch Lightning, and provides rich configuration options and modular design. Users can use this library to train generative models and perform inference and application through the provided models. The library also provides sample training configuration and data processing functions to facilitate users to quickly get started and customize.
Intel AI and Deep Learning Solutions are a series of downloadable AI reference kits launched by Intel in partnership with Accenture to help enterprises accelerate their digital transformation journey. These kits are built on the AI application tools Intel provides to data scientists and developers, and each kit includes model code, training data, instructions for machine learning pipelines, libraries, and Intel oneAPI components.
Explore other subcategories under AI Other Categories
17 tools
12 tools
10 tools
8 tools
7 tools
6 tools
AI model Hot AI is a popular subcategory under 36 quality AI tools