Found 2 AI tools
Click any tool to view details
MLE-bench is a benchmark launched by OpenAI to measure the performance of AI agents in machine learning engineering. The benchmark brings together 75 machine learning engineering-related competitions from Kaggle to form a diverse set of challenging tasks that test real-world machine learning engineering skills such as training models, preparing data sets, and running experiments. Human benchmarks were established for each competition through Kaggle's public leaderboard data. We evaluated the performance of multiple cutting-edge language models on this benchmark using an open-source agent framework, and found that the best-performing setup—OpenAI’s o1-preview paired with the AIDE framework—achieved at least Kaggle bronze level in 16.9% of the competition. Additionally, various forms of resource expansion of AI agents and the impact of pre-training contamination are studied. The benchmark code for MLE-bench has been open sourced to facilitate future understanding of the machine learning engineering capabilities of AI agents.
The Prompt Report is a systematic research report focusing on the prompt technology of generative artificial intelligence (GenAI). By combining human and machine efforts, it processed 4,797 records from multiple databases and extracted 1,565 relevant papers. The report offers 58 text-based technologies, complemented by an extensive collection of multimodal and multilingual technologies. The goal is to provide a catalog of prompting techniques that is easy to understand and implement, and reviews agents as prompt extensions, including methods for evaluating outputs and designing prompts that contribute to safety and security. Additionally, the report applies prompting techniques in practice in two case studies.
Explore other subcategories under productive forces Other Categories
1361 tools
904 tools
767 tools
619 tools
607 tools
431 tools
406 tools
398 tools
AI research institute Hot productive forces is a popular subcategory under 2 quality AI tools