Surya is a project for accurate line-by-line text detection and recognition (OCR) in any language.
Surya is a multilingual document OCR toolkit with accurate line-by-line text detection. It works across a range of documents and languages (see Usage and Benchmarking for more details). Surya is named after the Indian sun god, symbolizing universal vision. Surya is implemented in Python 3.9+ and PyTorch, supporting efficient OCR processing in multiple languages, including image animation and personalized T2I models. Surya is characterized by its efficiency and multi-language support capabilities.
Ideal for developers and researchers who need to do document OCR and multilingual text processing.
Developers use Surya for OCR processing of multi-language documents.
Researchers use Surya to conduct text detection and recognition experiments.
Language technology company adopts Surya to improve efficiency and accuracy of its document processing.
Discover more similar quality AI tools
EAGLE is a vision-centered, high-resolution multimodal large language model (LLM) family that enhances the perceptual capabilities of multimodal LLMs by mixing visual encoders and different input resolutions. The model contains channel connection based 'CLIP+X' fusion, suitable for vision experts with different architectures (ViT/ConvNets) and knowledge (detection/segmentation/OCR/SSL). The EAGLE model family supports input resolutions over 1K and achieves excellent results on multi-modal LLM benchmarks, especially on resolution-sensitive tasks such as optical character recognition and document understanding.
labelU-Kit is an open source front-end labeling component library that provides labeling functions for images, videos, and audios, and supports multiple labeling methods such as 2D boxes, points, lines, polygons, and three-dimensional boxes. It is provided as an NPM package, which is convenient for developers to integrate into their own annotation platform to improve the efficiency and flexibility of data annotation.
OnnxOCR is a lightweight OCR model reconstructed based on PaddleOCR. It is separated from the PaddlePaddle deep learning training framework and achieves fast inference speed. The model supports inference in over 80 languages, and after conversion to an ONNX model, inference is 5 times faster than using the PaddlePaddle framework. OnnxOCR is independent of the deep learning training framework and can be deployed directly. It is suitable for scenarios with limited computing power but accuracy needs to be maintained, and can be deployed on ARM and x86 architecture computers.
JavaVision is an all-round visual intelligent recognition project developed based on Java. It not only implements core functions such as PaddleOCR-V4, YoloV8 object recognition, face recognition, and image search, but can also be easily expanded to other fields, such as speech recognition, animal recognition, security inspection, etc. Project features include the use of the SpringBoot framework, versatility, high performance, reliability and stability, easy integration and flexible scalability. JavaVision aims to provide Java developers with a comprehensive visual intelligent recognition solution, allowing them to build advanced, reliable and easy-to-integrate AI applications in a familiar and favorite programming language.
PetThoughts is an image recognition application built on the Gemini API. Users can upload photos of their pets, and the app will intelligently analyze the pet's facial expressions and environment to guess what it may be thinking. The application has functions such as image recognition, facial analysis, and environmental analysis. It can accurately identify the pet's facial expressions, analyze its possible emotional state, and infer the pet's activities based on the environment. Finally, through natural language processing technology, the recognition results are converted into readable text descriptions. The app provides a simple and intuitive user interface, allowing users to easily upload photos and obtain pet analysis results. It helps users gain a deeper understanding of their pets' emotions and preferences.
MakeML is a development tool that can build an image target detection neural network without writing any code. It provides a simple and easy-to-use graphical interface. Users only need to upload training set images, draw bounding boxes, and set parameters to train an efficient target detection model and export it to CoreML format for use in iOS Apps. MakeML solves the pain point of high threshold for neural network development. It does not require any machine learning or programming knowledge to obtain powerful deep learning capabilities.
Cognitora is the next generation cloud platform designed for AI agents. Different from traditional container platforms, it utilizes high-performance micro-virtual machines such as Cloud Hypervisor and Firecracker to provide a secure, lightweight and fast AI-native computing environment. It can execute AI-generated code, automate intelligent workloads at scale, and bridge the gap between AI inference and real-world execution. Its importance lies in providing powerful computing and operation support for AI agents, allowing AI agents to run more efficiently and safely. Key benefits include high performance, secure isolation, lightning-fast boot times, multi-language support, advanced SDKs and tools, and more. This platform is aimed at AI developers and enterprises and is committed to providing comprehensive computing resources and tools for AI agents. In terms of price, users who register can get 5,000 free points for testing.
Macroscope is a programming efficiency tool that serves R&D teams. It has received US$30 million in Series A financing and has been publicly launched. The core functions focus on code management and R&D process optimization. By analyzing the code base to build a knowledge graph and integrating a multi-tool ecosystem, it solves the pain points of engineers being burdened with non-development work and managers having difficulty keeping track of R&D progress. Its technical advantage lies in multi-model collaboration (such as the combination of OpenAI o4-mini-high and Anthropic Opus 4) to ensure the accuracy of code review, and customer data is isolated and encrypted, compliant with SOC 2 Type II compliance, and promises not to use customer code to train models. Pricing is divided into Teams ($30/developer/month, at least 5 seats) and Enterprise (customized price) packages, targeting small and medium-sized R&D teams and large enterprises with customization needs, helping teams focus on core development and improving overall R&D efficiency.
100 Vibe Coding is an educational programming website focused on quickly building small web projects through AI technology. It skips complicated theories and focuses on practical results, making it suitable for beginners who want to quickly create real projects.
iFlow CLI is an interactive terminal command line tool designed to simplify the interaction between developers and terminals and improve work efficiency. It supports a variety of commands and functions, allowing users to quickly perform commands and management tasks. The key benefits of iFlow CLI include ease of use, flexibility, and customizability, making it suitable for a variety of development environments and project needs.
Claude Code Checkpoint is an essential companion app for Claude AI developers. Keep your code safe and never lost by tracking all code changes seamlessly.
Streamdown is a plug-and-play replacement for React Markdown designed for AI-driven streaming. It solves new challenges that arise when marking and streaming, ensuring safe and perfectly formatted Markdown content. Key advantages include AI-driven streaming, built-in security, support for GitHub Flavored Markdown, and more.
Qoder is an agent coding platform that seamlessly integrates with enhanced context engines and intelligent agents to gain a comprehensive understanding of your code base and systematically handle software development tasks. Supports the latest and most advanced AI models in the world: Claude, GPT, Gemini, etc. Available for Windows and macOS.
Compozy is an enterprise-grade platform that uses declarative YAML to provide scalable, reliable and cost-effective distributed workflows, simplifying complex fan-out, debugging and monitoring for production-ready automation.
Claude Code is a futuristic IDE that seamlessly integrates with CLI AI tools such as Claude Code and Gemini CLI. Its main advantages are that it provides multi-session orchestration, atomic branching capabilities, and greatly improves developer productivity. The product is positioned to be designed for developers who want fast delivery.