AI model inference training

OpenAI Embedding Models

OpenAI Embedding Models is a series of new embedding models, including two new embedding models and updated GPT-4 Turbo preview models, GPT-3.5 Turbo models, and text content review models. By default, data sent to the OpenAI API is not used to train or improve OpenAI models. New embedding models with lower pricing include the smaller, more efficient text-embedding-3-small model and the larger, more powerful text-embedding-3-large model. An embedding is a sequence of numbers that represents a concept in something like natural language or code. Embeddings make it easier for machine learning models and other algorithms to understand the relationships between content and perform tasks such as clustering or retrieval. They provide support for knowledge retrieval in the ChatGPT and Assistants APIs, as well as many retrieval augmentation generation (RAG) development tools. text-embedding-3-small is a new efficient embedding model. Compared with its predecessor text-embedding-ada-002 model, it has stronger performance. The average MIRACL score increased from 31.4% to 44.0%, while the average score in the English task (MTEB) increased from 61.0% to 62.3%. Pricing for text-embedding-3-small is also 5x lower than the previous text-embedding-ada-002 model, from $0.0001 per thousand tags to $0.00002. text-embedding-3-large is a new generation of larger embedding models, capable of creating embeddings of up to 3072 dimensions. With stronger performance, the average MIRACL score increased from 31.4% to 54.9%, while the average score in MTEB increased from 61.0% to 64.6%. text-embedding-3-large is priced at $0.00013/thousand marks. Additionally, we support native functionality for shortening embeddings, allowing developers to trade off performance and cost.

人工智能自然语言处理嵌入模型

人工智能 Visit

#2

ReFT

ReFT is a simple and effective way to enhance the inference capabilities of large language models (LLMs). It first warms up the model through supervised fine-tuning (SFT), and then uses online reinforcement learning, specifically the PPO algorithm in this article, to further fine-tune the model. ReFT significantly outperforms SFT by automatically sampling a large number of reasoning paths for a given problem and naturally deriving rewards from real answers. The performance of ReFT may be further improved by incorporating inference-time strategies such as majority voting and re-ranking. It is important to note that ReFT improves by learning the same training problem as SFT without relying on additional or enhanced training problems. This shows that ReFT has stronger generalization ability.

人工智能推理强化学习 +1

人工智能 Visit

#3

PowerInfer

PowerInfer is an engine for high-speed large language model inference on PCs using consumer GPUs. It exploits the high locality feature in LLM inference by preloading thermally activated neurons onto the GPU, thereby significantly reducing GPU memory requirements and CPU-GPU data transfer. PowerInfer also integrates adaptive predictors and neuron-aware sparsity operators to optimize the efficiency of neuron activation and computational sparsity. It can perform inference on a single NVIDIA RTX 4090 GPU at an average generation rate of 13.20 tokens per second, which is only 18% lower than the top server-grade A100 GPU. while maintaining model accuracy.

语言模型推理引擎消费级 GPU

人工智能 Visit

Related AI Tools

OpenAI Embedding Models

ReFT

PowerInfer

Related Subcategories

AI model

chatbot

Development and Tools

writing assistant

customer service

Development platform

Model training and deployment

AI search

Explore More AI Tools