💻 programming

gmft

Lightweight, high-performance deep PDF table extraction tool

#machine learning
#data conversion
#PDF processing
#Table extraction
gmft

Product Details

gmft is a toolkit for converting tables in PDF to various formats. It's lightweight, modular and performs well. gmft relies on Microsoft's Table Transformers, which are the best performing and most reliable of the many alternatives. gmft runs without a GPU, has high throughput, and is easy to install with just one line of code. It uses PyPDFium2, favored for its high throughput and permissive license. The training model TATR used by gmft is trained on the diverse data set PubTables-1M and has high reliability.

Main Features

1
Supports converting PDF tables to Pandas DataFrame and other formats
2
Ability to output text and position lists of tables
3
Supports cropped images of output tables
4
Support table title extraction
5
Quickly extract tables without OCR, works with images and scanned PDFs
6
High-throughput PDF processing with PyPDFium2
7
Highly configurable, supports custom models and extraction methods

How to Use

1
Install gmft: Enter `pip install gmft` on the command line to install
2
Import necessary modules: Import `CroppedTable, TableDetector, AutoTableFormatter`, etc. in the Python script
3
Create a PyPDFium2Document object: Create a document object using the PDF file path of the table to be extracted
4
Use TableDetector for table detection: traverse each page of the document and use the detector to extract the table
5
Use AutoTableFormatter to format tables: Format the detected tables
6
Convert extracted tabular data to required format: e.g. to Pandas DataFrame or other supported formats
7
Close the document object: After completing the extraction, call the close method of the document object to release the resources

Target Users

The target audience of gmft is data analysts, researchers and anyone who needs to extract tabular data from PDF documents. Due to its lightweight and high-performance characteristics, gmft is particularly suitable for situations where large numbers of PDF files need to be processed and data converted quickly.

Examples

Data analysts use gmft to extract data from research reports for further analysis

Researchers use gmft to extract experimental data from academic papers

Business users automate the process of extracting tabular data from contract documents through gmft

Quick Access

Visit Website →

Categories

💻 programming
› AI tool URL directory
› AI PDF

Related Recommendations

Discover more similar quality AI tools

query-key

query-key

QAbot-zh/query-key is a pure front-end API detection tool. It supports testing of multiple API formats, such as oneapi/newapi, etc., and can detect openai format APIs. The main advantage of this tool is its pure front-end implementation. Users do not need to worry about gateway timeouts while ensuring data security. It also provides a complete display of test activity data, including response time and model consistency, allowing users to intuitively understand the performance of the API. In addition, it supports local one-click operation and online hosting of pages, making it convenient for users to quickly deploy and use.

Data security API detection
💻 programming
ComfyUI-Nexus

ComfyUI-Nexus

ComfyUI-Nexus is a node customized for ComfyUI, designed to achieve seamless integration of multi-person collaboration workflows. It allows multiple users to work on the same workflow simultaneously, supports local and remote access, and enhances team collaboration with live chat capabilities. The plug-in also has administrator permission control, workflow backup and other functions to ensure smooth and efficient team workflow.

cooperation Workflow
💻 programming
SaltAI Language Toolkit

SaltAI Language Toolkit

SaltAI Language Toolkit is a project that integrates the retrieval-augmented generation (RAG) tool Llama-Index, Microsoft's AutoGen and LlaVA-Next, enhancing the functionality and user experience of the platform through ComfyUI's adaptable node interface. The project added proxy functionality on May 9, 2024.

language model ComfyUI
💻 programming
Praison AI

Praison AI

Praison AI is a low-code, centralized framework designed to simplify the creation and orchestration of multi-agent systems for a variety of large language model (LLM) applications. It emphasizes ease of use, customizability and human-computer interaction. Praison AI leverages AutoGen and CrewAI or other agent frameworks to enable complex automation tasks with predefined roles and tasks. Users can interact with the agent through the command line interface or user interface, create custom tools, and extend its functionality in a variety of ways.

automation Large language model
💻 programming
AskAITools Community Edition

AskAITools Community Edition

AskAITools is a cutting-edge search engine project tailored for the field of AI products, aiming to revolutionize the way users discover AI products by providing the most accurate, comprehensive, fast and intelligent search experience. The project includes a commercial version and a community version. The community version provides a basic front-end interface and search functions, and the code is completely open source. AskAITools adopts a hybrid search engine architecture that combines keyword search and semantic search capabilities, and achieves a balance of relevance and popularity through statistical data and weighted fusion technology.

AI tool
💻 programming
Awesome-Cluade-Artifacts

Awesome-Cluade-Artifacts

Awesome-Cluade-Artifacts is a GitHub repository dedicated to collecting and displaying interesting, substantive content generated in conversations by Anthropic's AI assistant Claude. These contents can be code snippets, Markdown documents, HTML pages, SVG images, Mermaid charts, or React components, etc. The platform encourages community members to share Claude Artifacts they find interesting, useful or creative, and provides detailed guidelines for contribution.

design creativity
💻 programming
Xterminal

Xterminal

Xterminal is an efficient development tool that integrates functions such as SSH, local console, and AI-enabled command prompts, aiming to provide developers with a more convenient development environment. It supports a variety of operating systems, including Windows, macOS and Linux, and has installation-free versions and versions compatible with older systems to meet the needs of different users.

AI development tools
💻 programming
Awesome-ChatTTS

Awesome-ChatTTS

Awesome-ChatTTS is an open source project that aims to provide FAQs and related resource collections for the ChatTTS project to help users get started quickly and solve problems they may encounter during use. This project not only compiles detailed installation guides and parameter descriptions, but also provides examples of various tone seeds, as well as auxiliary materials such as video tutorials.

speech synthesis Open source projects
💻 programming
transformers.js

transformers.js

transformers.js is a JavaScript library designed to provide advanced machine learning capabilities to web pages. It allows users to run pre-trained Transformers models directly in the browser without server support. This library uses ONNX Runtime as a backend and supports converting PyTorch, TensorFlow or JAX models to ONNX format. transformers.js is functionally equivalent to Hugging Face's transformers Python library and provides a similar API, allowing developers to easily migrate existing code to the web page.

machine learning Transformers
💻 programming
ShellGPT

ShellGPT

shell_gpt leverages the powerful capabilities of AI large-scale language models to provide assistance through the command line interface, enabling users to perform tasks through natural language instructions and improve work efficiency and efficiency.

development programming GPT-4
💻 programming
Awesome-gptlike-shellsite

Awesome-gptlike-shellsite

This guide covers everything from selecting a shell site, deployment process, subscription API, operation strategy, etc. to help you quickly deploy your own GPT service and realize the commercialization of the platform.

ChatGPT API
💻 programming
GeminiProChat

GeminiProChat

GeminiProChat is GeminiPro's minimized web interface, providing a simple and effective chat experience. It supports controlling websites through environment variables, allows users to deploy via Docker, and provides easy-to-deploy Vercel and Railway options. GeminiProChat is a flexible tool for users who need a simple and efficient chat interface.

development programming Web interface
💻 programming
Prompt Joy

Prompt Joy

Prompt Joy is a tool to help understand and debug LLM (Large Language Model) prompts. Key features include logging and split testing. Logging can record LLM requests and responses to facilitate checking the output results. Split testing makes it easy to A/B test and find the tips that perform best. It is decoupled from specific LLMs and can be used with LLMs such as OpenAI and Anthropic. It provides APIs for logging and split testing. Built with Node.js+PostgreSQL.

AI LLM
💻 programming
Intel AI Tools

Intel AI Tools

Intel Developer Zone is a platform for developers, providing various software tools, development products, solutions, etc. Developers can explore a variety of tools and technologies, connect with other developers, manage their own products, and more. The platform covers many fields such as artificial intelligence, cloud computing, edge computing, game development, and graphics media processing, and provides resources such as code samples, documents, and forums.

development programming AI open platform
💻 programming