💻

programming Category

AI data mining

Found 23 AI tools

23

tools

Primary Category: programming

Subcategory: AI data mining

Found 23 matching tools

Related AI Tools

Click any tool to view details

Prisma Optimize

Prisma Optimize

Prisma Optimize is a tool that uses artificial intelligence technology to analyze and optimize database queries. It accelerates applications by providing in-depth insights and actionable recommendations to make database queries more efficient. Prisma Optimize supports a variety of databases, including PostgreSQL, MySQL, SQLite, SQL Server, CockroachDB, PlanetScale, and Supabase, and can be seamlessly integrated into existing technology stacks without the need for large-scale modifications or migrations. The main advantages of the product include improving database performance, reducing query latency, optimizing query patterns, etc. This is a powerful tool for developers and database administrators to help them manage and optimize databases more effectively.

团队协作 AI分析性能提升 +2

Tabled

Tabled

Tabled is a Python library for detecting and extracting tables. It uses surya to identify tables in PDFs, identify rows and columns, and be able to format cells into Markdown, CSV, or HTML. This tool is very useful for data scientists and researchers who often need to extract tabular data from PDF documents for further analysis. Key advantages of Tabled include highly accurate table detection and extraction capabilities, support for multiple output formats, and an easy-to-use command line interface. Additionally, it provides an interactive APP that allows users to intuitively try using Tabled on images or PDF files.

自动化机器学习数据提取 +2

Knowledge Table

Knowledge Table

Knowledge Table is an open source toolkit designed to simplify the process of extracting and exploring structured data from unstructured documents. It enables users to create structured knowledge representations such as tables and charts through a natural language query interface. The toolkit features customizable extraction rules, fine-tuned formatting options, and data provenance displayed through the UI to accommodate a variety of use cases. Its goal is to provide business users with a familiar spreadsheet interface, while providing developers with a flexible and highly configurable backend, ensuring seamless integration with existing RAG workflows.

自然语言处理开源数据提取 +2

VARAG

VARAG

VARAG is a system that supports multiple retrieval technologies, optimized for different use cases of text, image and multi-modal document retrieval. It simplifies the traditional retrieval process by embedding document pages as images and uses advanced visual language models for encoding, improving retrieval accuracy and efficiency. The main advantage of VARAG is its ability to handle complex visual and textual content, providing powerful support for document retrieval.

多模态文档处理 OCR +2

GraphReasoning

GraphReasoning

GraphReasoning is a project that uses generative artificial intelligence technology to transform 1,000 scientific papers into knowledge graphs. Through structured analysis, calculating node degrees, identifying communities and connectivity, and evaluating clustering coefficients and betweenness centralities of key nodes reveal fascinating knowledge architectures. The graph is scale-free, highly interconnected, and can be used for graph reasoning, using transitive and isomorphic properties to reveal unprecedented interdisciplinary relationships for answering questions, identifying knowledge gaps, proposing unprecedented materials designs, and predicting material behavior.

人工智能知识图谱数据挖掘 +1

AgentRE

AgentRE

AgentRE is an agent-based framework specifically designed for relationship extraction in complex information environments. It can efficiently process and analyze large-scale data sets by simulating the behavior of intelligent agents to identify and extract relationships between entities. This technology is of great significance in the fields of natural language processing and information retrieval, especially in scenarios where large amounts of unstructured data need to be processed. The main advantages of AgentRE include its high scalability, flexibility and ability to handle complex data structures. The framework is open source, allowing researchers and developers to freely use and modify it to suit different application needs.

自然语言处理信息检索智能代理 +2

magic-html

magic-html

magic-html is a Python library designed to simplify the process of extracting body area content from HTML. It provides a set of tools that can easily extract body area content from HTML. Whether dealing with complex HTML structures or simple web pages, this library aims to provide users with a convenient and efficient interface. It supports multi-modal extraction, supports multiple layout extractors, including articles, forums and WeChat articles, and also supports latex formula extraction and conversion.

Python库数据抽取 HTML提取

TAG-Bench

TAG-Bench

TAG-Bench is a benchmark used to evaluate and study the performance of natural language processing models in answering database queries. It builds on the BIRD Text2SQL benchmark and increases query complexity by adding requirements for world knowledge or semantic reasoning beyond explicit information in the database. TAG-Bench aims to promote the integration of AI and database technology and provide researchers with a platform to challenge existing models by simulating real database query scenarios.

自然语言处理基准测试数据库查询 +2

CyberScraper 2077

CyberScraper 2077

CyberScraper 2077 is an AI-based web crawler tool that uses large language models (LLM) such as OpenAI and Ollama to intelligently parse web content and provide data extraction services. Not only does this tool have a user-friendly graphical interface, it also supports multiple data export formats, including JSON, CSV, HTML, SQL, and Excel. Additionally, it features a stealth mode to reduce the risk of being detected as a robot, as well as ethical crawling features that adhere to robots.txt and website policies.

llm openai 网页爬虫 +1

Triplex

Triplex

Triplex is an innovative open source model that can convert large amounts of unstructured data into structured data. Its performance in building knowledge graphs exceeds that of gpt-4o, and the cost is only one-tenth of the cost. It greatly reduces the cost of generating knowledge graphs by efficiently converting unstructured text into semantic triples, the basis for knowledge graph construction.

开源知识图谱数据转换 +2

Datalore

Datalore

Datalore is an AI-driven data analysis tool that integrates Anthropic's Claude API and multiple data analysis libraries. It provides an interactive interface that enables users to perform data analysis tasks using natural language commands.

AI 自然语言处理数据分析 +1

Korvus

Korvus

Korvus is a search SDK built on Postgres that unifies the entire RAG (Retrieval Augmentation Generation) process into a single database query. It provides high-performance, customizable search capabilities while minimizing infrastructure considerations. Korvus utilizes PostgresML's pgml extension and pgvector extension to compress the RAG process inside Postgres. It supports multi-language SDKs, including Python, JavaScript, Rust and C, allowing developers to seamlessly integrate into existing technology stacks.

AI 自然语言处理机器学习 +4

Crawlee

Crawlee

Crawlee is a Python web crawler and browser automation library for building reliable crawlers, extracting data for use in AI, LLMs, RAG or GPTs. It provides a unified interface to handle HTTP and headless browser crawling tasks, supports automatic parallel crawling, and adjusts based on system resources. Crawlee is written in Python and includes type hints to enhance the development experience and reduce errors. It features automatic retries, integrated proxy rotation and session management, configurable request routing, persistent URL queues, pluggable storage options, and more. Compared with Scrapy, Crawlee provides native support for headless browser crawling, has a simple and elegant interface, and is completely based on standard asynchronous IO.

python automation web-crawler +4

LAMDA-TALENT

LAMDA-TALENT

LAMDA-TALENT is a comprehensive tabular data analysis toolbox and benchmarking platform that integrates more than 20 deep learning methods, more than 10 traditional methods, and more than 300 diverse tabular data sets. Designed to improve model performance on tabular data, the toolbox provides powerful preprocessing capabilities, optimizes data learning, and supports user-friendly and adaptable operations for both novice and expert data scientists.

机器学习深度学习数据科学 +2

APIGen

APIGen

APIGen is an automated data generation pipeline designed to generate verifiable, high-quality data sets for function-call applications. The model ensures data reliability and correctness through a three-level verification process, including format checking, actual function execution, and semantic verification. APIGen can generate diverse data sets in a large-scale and structured manner, and verify the correctness of the generated function calls by actually executing the API, which is crucial to improving the performance of the function call proxy model.

自动化自然语言处理机器学习 +2

DB-GPT

DB-GPT

DB-GPT is an open source AI native data application development framework that uses AWEL (Agentic Workflow Expression Language) and agent technology to simplify the integration of large model applications and data. It enables enterprises and developers to build customized applications with less code through technical capabilities such as multi-model management, Text2SQL effect optimization, RAG framework optimization, and multi-agent framework collaboration. In the Data 3.0 era, DB-GPT provides basic data intelligence technology for building enterprise-level report analysis and business insights based on models and databases.

安全数据库 llm +8

Yayi Information Extraction Large Model

Yayi Information Extraction Large Model

The Yayi Information Extraction Large Model (YAYI-UIE) was developed by the Zhongke Wenge algorithm team. It is a model that performs instruction fine-tuning on millions of manually constructed high-quality information extraction data. It can uniformly train information extraction tasks, including Named Entity Recognition (NER), Relationship Extraction (RE) and Event Extraction (EE), covering structured extraction in multiple scenarios such as general, security, finance, biology, medical, and business. The open source of this model aims to promote the development of the Chinese pre-trained large model open source community and jointly build the Yayi large model ecosystem through open source.

自然语言处理开源预训练模型 +1

Crawl4AI

Crawl4AI

Crawl4AI is a powerful, free web crawling service designed to extract useful information from web pages and make it available to large language models (LLMs) and AI applications. It supports efficient web crawling, provides LLM-friendly output formats such as JSON, cleaned HTML and Markdown, supports crawling multiple URLs at the same time, and is completely free and open source.

数据提取 AI集成爬虫 +1

DataBonsai

DataBonsai

databonsai is a Python library that utilizes large language models (LLMs) to perform data cleaning tasks. It provides a range of tools including data classification, transformation and extraction, as well as validation of LLM output, supports batch processing to save tokens, and has retry logic to handle rate limiting and transient errors.

自动化机器学习大型语言模型 +1

FineWeb

FineWeb

The FineWeb dataset contains more than 15 trillion English web page data that has been cleaned and deduplicated and comes from CommonCrawl. This dataset is designed for large language model pre-training and aims to promote the development of open source models. The dataset is carefully processed and filtered to ensure high quality and is suitable for a variety of natural language processing tasks.

自然语言处理开源数据集 +1

MyGO

MyGO

MyGO is a tool for multi-modal knowledge graph completion that improves the accuracy of completion by processing discrete modal information as fine-grained tags. MyGO uses the transformers library to embed text tags and then train and evaluate on multi-modal datasets. It supports custom data sets and provides training scripts to reproduce experimental results.

机器学习数据科学知识图谱 +1

Twitter-Insight-LLM

Twitter-Insight-LLM

Twitter-Insight-LLM is a large-scale language model based on Twitter data, designed to provide in-depth analysis of text data on the Twitter platform. This model may be used for sentiment analysis, trend prediction, user behavior research, etc.

自然语言处理机器学习开发编程 +2

MATLAB

MATLAB

MATLAB is a software environment for engineering and scientific computing. It provides a rich set of tools and features that make data analysis, visualization, and model development easier for engineers and scientists. MATLAB has a simple and easy-to-learn syntax that can quickly implement complex computing tasks. It also provides a large number of toolboxes covering applications in various fields, including signal processing, image processing, control system design, machine learning, etc. MATLAB is a powerful tool suitable for applications in a variety of engineering and scientific fields.

机器学习图像处理数据分析 +6

Related Subcategories

Explore other subcategories under programming Other Categories

Development and Tools

768 tools

AI model

465 tools

code assistant

368 tools

AI development assistant

294 tools

Model training and deployment

140 tools

AI code assistant

85 tools

Development platform

66 tools

research tools

61 tools

💻

Explore More programming Tools

AI data mining Hot programming is a popular subcategory under 23 quality AI tools

Browse programming Category Categories