Found 34 AI tools
Click any tool to view details
ComfyUI-PyramidFlowWrapper is a set of packaging nodes based on the Pyramid-Flow model, aiming to provide a more efficient user interface and more convenient operation process through ComfyUI. This model uses deep learning technology to focus on the generation and processing of visual content, and has the ability to efficiently process large amounts of data. Product background information shows that it is an open source project initiated and maintained by developer kijai. It has not yet fully implemented its functions, but it already has certain use value. Since it is an open source project, its price is free and is mainly targeted at developers and technology enthusiasts.
ComfyUI LLM Party aims to develop a complete set of LLM workflow nodes based on the ComfyUI front-end, allowing users to quickly and easily build their own LLM workflows and easily integrate them into existing image workflows.
x-flux-comfyui is an AI model tool integrated in ComfyUI. It provides a variety of functions, including model training, model loading, and image processing. The tool supports low memory mode, which can optimize VRAM usage and is suitable for users who need to run AI models in resource-constrained environments. In addition, it provides an IP Adapter function that can be used with OpenAI’s VIT CLIP model to enhance the diversity and quality of generated images.
ComfyUI-GGUF is a project that provides GGUF quantitative support for ComfyUI native models. It allows model files to be stored in the GGUF format, a format popularized by llama.cpp. Although regular UNET models (conv2d) are not suitable for quantization, transformer/DiT models like flux seem to be less affected by quantization. This allows them to run at lower variable bitrates per weight on low-end GPUs.
x-flux is a set of deep learning model training scripts released by the XLabs AI team, including LoRA and ControlNet models. These models are trained using DeepSpeed, support 512x512 and 1024x1024 image sizes, and corresponding training configuration files and examples are provided. x-flux model training aims to improve the quality and efficiency of image generation, which is of great significance to the field of AI image generation.
Alpha-VLLM provides a series of models that support the generation of multi-modal content from text to images, audio, etc. These models are based on deep learning technology and can be widely used in content creation, data enhancement, automated design and other fields.
ComfyUI-Sub-Nodes is an open source project on GitHub that aims to provide subgraph node functionality for ComfyUI. It allows users to create and use subdiagrams in ComfyUI to improve workflow organization and reusability. This plug-in is especially suitable for developers who need to manage complex workflows in the UI.
MG-LLaVA is a machine learning language model (MLLM) that enhances the model's visual processing capabilities by integrating multi-granularity visual processes, including low-resolution, high-resolution, and object-centric features. An additional high-resolution visual encoder is proposed to capture details and fused with the underlying visual features through a Conv-Gate fusion network. Additionally, object-level features are integrated through bounding boxes identified by offline detectors to further refine the model’s object recognition capabilities. MG-LLaVA is trained exclusively on publicly available multi-modal data with instruction tuning, demonstrating superior perception skills.
AsyncDiff is an asynchronous denoising acceleration scheme for parallelizing diffusion models. It enables parallel processing of the model by splitting the noise prediction model into multiple components and distributing them to different devices. This approach significantly reduces inference latency with minimal impact on generation quality. AsyncDiff supports multiple diffusion models, including Stable Diffusion 2.1, Stable Diffusion 1.5, Stable Diffusion x4 Upscaler, Stable Diffusion XL 1.0, ControlNet, Stable Video Diffusion, and AnimateDiff.
ComfyUI-Hallo is a ComfyUI plug-in customized for Hallo models. It allows users to use ffmpeg in the command line and download model weights from Hugging Face, or manually download and place them in a specified directory. It provides developers with an easy-to-use interface to integrate Hallo models, thereby enhancing development efficiency and user experience.
ComfyUI-LuminaWrapper is an open source Python wrapper for simplifying the loading and use of Lumina models. It supports custom nodes and workflows, making it easier for developers to integrate Lumina models into their projects. This plug-in is mainly aimed at developers who want to use Lumina models for deep learning or machine learning in a Python environment.
EVE is an encoder-free visual-language model jointly developed by researchers from Dalian University of Technology, Beijing Institute of Artificial Intelligence, and Peking University. It demonstrates excellent capabilities at different image aspect ratios, surpassing Fuyu-8B in performance and approaching modular encoder-based LVLMs. EVE performs outstandingly in terms of data efficiency and training efficiency. It uses 33M public data for pre-training, 665K LLaVA SFT data for EVE-7B model training, and an additional 1.2M SFT data for EVE-7B (HD) model training. The development of EVE adopts an efficient, transparent, and practical strategy, opening up a new approach to cross-modal pure decoder architecture.
ComfyUI Ollama is a custom node designed for ComfyUI workflows using the ollama Python client, allowing users to easily integrate large language models (LLMs) into their workflows, or simply conduct GPT experiments. The main advantage of this plugin is that it provides the ability to interact with the Ollama server, allowing users to perform image queries, query LLM with given hints, and perform LLM queries with finely tuned parameters while maintaining the context of the generated chain.
JavaVision is an all-round visual intelligent recognition project developed based on Java. It not only implements core functions such as PaddleOCR-V4, YoloV8 object recognition, face recognition, and image search, but can also be easily expanded to other fields, such as speech recognition, animal recognition, security inspection, etc. Project features include the use of the SpringBoot framework, versatility, high performance, reliability and stability, easy integration and flexible scalability. JavaVision aims to provide Java developers with a comprehensive visual intelligent recognition solution, allowing them to build advanced, reliable and easy-to-integrate AI applications in a familiar and favorite programming language.
llava-llama-3-8b-v1_1 is an LLaVA model optimized by XTuner, based on meta-llama/Meta-Llama-3-8B-Instruct and CLIP-ViT-Large-patch14-336, and fine-tuned with ShareGPT4V-PT and InternVL-SFT. The model is designed for combined processing of images and text, has strong multi-modal learning capabilities, and is suitable for various downstream deployment and evaluation toolkits.
Mini-Gemini is a multi-modal visual language model that supports a series of dense and MoE large-scale language models from 2B to 34B, and has image understanding, reasoning and generation capabilities. It is built on LLaVA, utilizes dual visual encoders to provide low-resolution visual embeddings and high-resolution candidate regions, uses patch information mining to perform patch-level mining between high-resolution regions and low-resolution visual queries, and fuses text and images for understanding and generation tasks. Supports multiple visual understanding benchmark tests including COCO, GQA, OCR-VQA, VisualGenome, etc.
ComfyUI-Cloud is a custom node that allows users to take full control of ComfyUI locally while leveraging cloud GPU resources to run their workflows. It allows users to run workflows that require high VRAM without the need to import custom nodes/models to a cloud provider or spend money on new GPUs.
Champ is a generative model for generating 3D object shapes that combines implicit functions and convolutional neural networks to generate high-quality, diverse, and realistic 3D shapes. It can generate various categories of shapes, including animals, vehicles, and furniture.
ComfyUI-N-Sidebar is an open source project that combines the ComfyUI and N-Sidebar libraries to provide users with a comfortable and easy-to-use user interface and navigation bar. The project improves user experience by simplifying interface elements and optimizing interaction design.
ComfyUI-APISR is the API server part of the ComfyUI project, which provides necessary backend support for ComfyUI client applications. ComfyUI is a user interface framework designed to provide a comfortable user experience.
This is a custom sampler plug-in node of ComfyUI that implements the sampling method based on trajectory consistency distillation (TCD) proposed by Zheng et al. The plug-in adds TCDScheduler and SamplerTCD nodes to the Custom Sampler category of ComfyUI. Just clone it to the custom_nodes folder and restart ComfyUI to use it. TCDScheduler has a special parameter eta, which is used to control the randomness of each step. When eta=0, it means deterministic sampling, and when eta=1, it means completely random sampling. The default value is 0.3, but higher eta values are recommended when increasing the number of inference steps. This plug-in is based on the trajectory consistency distillation sampling method, which can provide smoother and consistent output results for AI models.
Tavus offers a range of AI models, particularly in generating highly realistic videos of talking heads. Its Phoenix model uses Neural Radiation Fields (NeRFs) technology to produce natural facial movements and expressions, synchronized with input. Developers can access these highly realistic and customizable video generation services through Tavus' API.
ComfyUI-layerdiffusion is a GitHub project that provides a custom node implementation of the Layer Diffusion model. The project allows users to install via Python dependencies, and currently only supports SDXL models. The goal of the project is to provide ComfyUI users with convenient integration of Layer Diffusion models.
OpenDiT is an open source project that provides a high-performance implementation of Colossal-AI-based Diffusion Transformer (DiT), specifically designed to enhance training and inference efficiency for DiT applications, including text-to-video generation and text-to-image generation. OpenDiT improves performance through the following technologies: up to 80% acceleration and 50% memory reduction on the GPU; including FlashAttention, Fused AdaLN and Fused layernorm core optimization; including hybrid parallel methods of ZeRO, Gemini and DDP, as well as sharding the ema model to further reduce memory costs; FastSeq: a novel sequence parallel method, especially suitable for workloads such as DiT, where the activation size is large but the parameter size is small; single-node sequence parallelism can save up to 48% of communication costs; breaking through a single GPU memory constraints, reducing overall training and inference time; Huge performance improvements with small code modifications; Users do not need to know the implementation details of distributed training; Complete text-to-image and text-to-video generation process; Researchers and engineers can easily use and adapt our process to practical applications without modifying the parallel part; Perform text-to-image training on ImageNet and publish checkpoints.
gligen-gui is a plug-in that provides an intuitive graphical user interface for GLIGEN. It uses ComfyUI as the backend and aims to simplify the operation process of GLIGEN and improve the user experience.
The Imp project aims to provide a family of powerful multimodal small language models (MSLMs). Our imp-v1-3b is a powerful MSLM with 3 billion parameters, built on top of a small but powerful SLM Phi-2 (2.7 billion) and a powerful visual encoder SigLIP (400 million), and trained on the LLaVA-v1.5 training set. Imp-v1-3b significantly outperforms opponents of similar model size on various multi-modal benchmarks, and even slightly outperforms the powerful LLaVA-7B model on various multi-modal benchmarks.
PetThoughts is an image recognition application built on the Gemini API. Users can upload photos of their pets, and the app will intelligently analyze the pet's facial expressions and environment to guess what it may be thinking. The application has functions such as image recognition, facial analysis, and environmental analysis. It can accurately identify the pet's facial expressions, analyze its possible emotional state, and infer the pet's activities based on the environment. Finally, through natural language processing technology, the recognition results are converted into readable text descriptions. The app provides a simple and intuitive user interface, allowing users to easily upload photos and obtain pet analysis results. It helps users gain a deeper understanding of their pets' emotions and preferences.
SCEPTER is an open source code library dedicated to the training, tuning and inference of generative models, covering a series of downstream tasks such as image generation, migration and editing. It integrates the mainstream implementation of the community and the self-developed methods of Alibaba Tongyi Lab to provide a comprehensive and universal toolset for researchers and practitioners in the generative field. This versatile library is designed to foster innovation and accelerate progress in this rapidly growing field.
Comfyspace is a ComfyUI Workspace Manager extension for organizing and managing all workflows. It allows users to seamlessly switch between different workflows within a single workspace, while supporting importing, exporting workflows and reusing sub-workflow modules. Features include version control, gallery and cover image settings, and easy workflow organization.
The code repository contains research on learning from synthetic image data (mainly pictures), including three projects: StableRep, Scaling and SynCLR. These projects studied how to train visual representation models using synthetic image data generated by text-to-image models, and achieved very good results.
LangSplat constructs a 3D language field by mapping CLIP language embeddings to a set of 3D Gaussian distributions, enabling open vocabulary queries for 3D scenes. It avoids the expensive rendering process in NeRF and greatly improves efficiency. The learned language features accurately capture object boundaries and provide an accurate 3D language field without the need for post-processing. LangSplat is 199 times faster than LERF.
LLaVA-3b is a model fine-tuned based on Dolphin 2.6 Phi, fine-tuned in the LLaVA manner using the SigLIP 400M vision tower. The model features multiple image tags, state-of-the-art layer output using a visual encoder, and more. This model is based on Phi-2 and is subject to a Microsoft Research License and commercial use is prohibited. Thanks to ML Collective for providing computing resource credits.
UniRef is a unified model for reference object segmentation in images and videos. It supports various tasks such as semantic reference image segmentation (RIS), few-shot segmentation (FSS), semantic reference video object segmentation (RVOS), and video object segmentation (VOS). The core of UniRef is the UniFusion module, which can efficiently inject various reference information into the basic network. UniRef can be used as a plug-in component for basic models such as SAM. UniRef provides models trained on multiple benchmark data sets, and also opens source code for research use.
ResFields are a class of networks specifically designed to effectively represent complex spatiotemporal signals. It introduces time-varying weights into multi-layer perceptrons and uses trainable residual parameters to enhance the expressive ability of the model. The method can be seamlessly integrated into existing technologies and can significantly improve results on a variety of challenging tasks, such as 2D video approximation, dynamic shape modeling, and dynamic NeRF reconstruction.
Explore other subcategories under programming Other Categories
768 tools
465 tools
368 tools
294 tools
140 tools
85 tools
66 tools
61 tools
AI image generation Hot programming is a popular subcategory under 34 quality AI tools