💻

programming Category

AI image generation

Found 34 AI tools

34

tools

Primary Category: programming

Subcategory: AI image generation

Found 34 matching tools

Related AI Tools

Click any tool to view details

ComfyUI-PyramidFlowWrapper

ComfyUI-PyramidFlowWrapper

ComfyUI-PyramidFlowWrapper is a set of packaging nodes based on the Pyramid-Flow model, aiming to provide a more efficient user interface and more convenient operation process through ComfyUI. This model uses deep learning technology to focus on the generation and processing of visual content, and has the ability to efficiently process large amounts of data. Product background information shows that it is an open source project initiated and maintained by developer kijai. It has not yet fully implemented its functions, but it already has certain use value. Since it is an open source project, its price is free and is mainly targeted at developers and technology enthusiasts.

机器学习深度学习开源项目 +2

ComfyUI LLM Party

ComfyUI LLM Party

ComfyUI LLM Party aims to develop a complete set of LLM workflow nodes based on the ComfyUI front-end, allowing users to quickly and easily build their own LLM workflows and easily integrate them into existing image workflows.

API LLM 工作流 +2

x-flux-comfyui

x-flux-comfyui

x-flux-comfyui is an AI model tool integrated in ComfyUI. It provides a variety of functions, including model training, model loading, and image processing. The tool supports low memory mode, which can optimize VRAM usage and is suitable for users who need to run AI models in resource-constrained environments. In addition, it provides an IP Adapter function that can be used with OpenAI’s VIT CLIP model to enhance the diversity and quality of generated images.

图像处理 AI模型自定义节点 +2

ComfyUI-GGUF

ComfyUI-GGUF

ComfyUI-GGUF is a project that provides GGUF quantitative support for ComfyUI native models. It allows model files to be stored in the GGUF format, a format popularized by llama.cpp. Although regular UNET models (conv2d) are not suitable for quantization, transformer/DiT models like flux seem to be less affected by quantization. This allows them to run at lower variable bitrates per weight on low-end GPUs.

开发编程 ComfyUI 模型优化 +2

x-flux

x-flux

x-flux is a set of deep learning model training scripts released by the XLabs AI team, including LoRA and ControlNet models. These models are trained using DeepSpeed, support 512x512 and 1024x1024 image sizes, and corresponding training configuration files and examples are provided. x-flux model training aims to improve the quality and efficiency of image generation, which is of great significance to the field of AI image generation.

图像生成深度学习模型训练 +3

Alpha-VLLM

Alpha-VLLM

Alpha-VLLM provides a series of models that support the generation of multi-modal content from text to images, audio, etc. These models are based on deep learning technology and can be widely used in content creation, data enhancement, automated design and other fields.

开源深度学习 AI生成 +1

ComfyUI-Sub-Nodes

ComfyUI-Sub-Nodes

ComfyUI-Sub-Nodes is an open source project on GitHub that aims to provide subgraph node functionality for ComfyUI. It allows users to create and use subdiagrams in ComfyUI to improve workflow organization and reusability. This plug-in is especially suitable for developers who need to manage complex workflows in the UI.

开源工作流子图 +1

MG-LLaVA

MG-LLaVA

MG-LLaVA is a machine learning language model (MLLM) that enhances the model's visual processing capabilities by integrating multi-granularity visual processes, including low-resolution, high-resolution, and object-centric features. An additional high-resolution visual encoder is proposed to capture details and fused with the underlying visual features through a Conv-Gate fusion network. Additionally, object-level features are integrated through bounding boxes identified by offline detectors to further refine the model’s object recognition capabilities. MG-LLaVA is trained exclusively on publicly available multi-modal data with instruction tuning, demonstrating superior perception skills.

机器学习多模态学习指令调优 +1

AsyncDiff

AsyncDiff

AsyncDiff is an asynchronous denoising acceleration scheme for parallelizing diffusion models. It enables parallel processing of the model by splitting the noise prediction model into multiple components and distributing them to different devices. This approach significantly reduces inference latency with minimal impact on generation quality. AsyncDiff supports multiple diffusion models, including Stable Diffusion 2.1, Stable Diffusion 1.5, Stable Diffusion x4 Upscaler, Stable Diffusion XL 1.0, ControlNet, Stable Video Diffusion, and AnimateDiff.

文本到图像扩散模型文本到视频 +5

ComfyUI-Hallo

ComfyUI-Hallo

ComfyUI-Hallo is a ComfyUI plug-in customized for Hallo models. It allows users to use ffmpeg in the command line and download model weights from Hugging Face, or manually download and place them in a specified directory. It provides developers with an easy-to-use interface to integrate Hallo models, thereby enhancing development efficiency and user experience.

Hugging Face ComfyUI插件 Hallo模型 +2

ComfyUI-LuminaWrapper

ComfyUI-LuminaWrapper

ComfyUI-LuminaWrapper is an open source Python wrapper for simplifying the loading and use of Lumina models. It supports custom nodes and workflows, making it easier for developers to integrate Lumina models into their projects. This plug-in is mainly aimed at developers who want to use Lumina models for deep learning or machine learning in a Python environment.

机器学习深度学习 Python +1

EVE

EVE

EVE is an encoder-free visual-language model jointly developed by researchers from Dalian University of Technology, Beijing Institute of Artificial Intelligence, and Peking University. It demonstrates excellent capabilities at different image aspect ratios, surpassing Fuyu-8B in performance and approaching modular encoder-based LVLMs. EVE performs outstandingly in terms of data efficiency and training efficiency. It uses 33M public data for pre-training, 665K LLaVA SFT data for EVE-7B model training, and an additional 1.2M SFT data for EVE-7B (HD) model training. The development of EVE adopts an efficient, transparent, and practical strategy, opening up a new approach to cross-modal pure decoder architecture.

数据驱动 AI研究视觉-语言模型 +1

ComfyUI Ollama

ComfyUI Ollama

ComfyUI Ollama is a custom node designed for ComfyUI workflows using the ollama Python client, allowing users to easily integrate large language models (LLMs) into their workflows, or simply conduct GPT experiments. The main advantage of this plugin is that it provides the ability to interact with the Ollama server, allowing users to perform image queries, query LLM with given hints, and perform LLM queries with finely tuned parameters while maintaining the context of the generated chain.

人工智能自然语言处理图像识别 +1

JavaVision

JavaVision

JavaVision is an all-round visual intelligent recognition project developed based on Java. It not only implements core functions such as PaddleOCR-V4, YoloV8 object recognition, face recognition, and image search, but can also be easily expanded to other fields, such as speech recognition, animal recognition, security inspection, etc. Project features include the use of the SpringBoot framework, versatility, high performance, reliability and stability, easy integration and flexible scalability. JavaVision aims to provide Java developers with a comprehensive visual intelligent recognition solution, allowing them to build advanced, reliable and easy-to-integrate AI applications in a familiar and favorite programming language.

人工智能开源计算机视觉 +2

llava-llama-3-8b-v1_1

llava-llama-3-8b-v1_1

llava-llama-3-8b-v1_1 is an LLaVA model optimized by XTuner, based on meta-llama/Meta-Llama-3-8B-Instruct and CLIP-ViT-Large-patch14-336, and fine-tuned with ShareGPT4V-PT and InternVL-SFT. The model is designed for combined processing of images and text, has strong multi-modal learning capabilities, and is suitable for various downstream deployment and evaluation toolkits.

人工智能自然语言处理深度学习 +2

MiniGemini

MiniGemini

Mini-Gemini is a multi-modal visual language model that supports a series of dense and MoE large-scale language models from 2B to 34B, and has image understanding, reasoning and generation capabilities. It is built on LLaVA, utilizes dual visual encoders to provide low-resolution visual embeddings and high-resolution candidate regions, uses patch information mining to perform patch-level mining between high-resolution regions and low-resolution visual queries, and fuses text and images for understanding and generation tasks. Supports multiple visual understanding benchmark tests including COCO, GQA, OCR-VQA, VisualGenome, etc.

图像生成多模态大型语言模型 +2

ComfyUI-Cloud

ComfyUI-Cloud

ComfyUI-Cloud is a custom node that allows users to take full control of ComfyUI locally while leveraging cloud GPU resources to run their workflows. It allows users to run workflows that require high VRAM without the need to import custom nodes/models to a cloud provider or spend money on new GPUs.

工作流程自定义节点云 GPU 资源 +3

Champ

Champ

Champ is a generative model for generating 3D object shapes that combines implicit functions and convolutional neural networks to generate high-quality, diverse, and realistic 3D shapes. It can generate various categories of shapes, including animals, vehicles, and furniture.

生成模型虚拟现实计算机图形学 +4

ComfyUI-N-Sidebar

ComfyUI-N-Sidebar

ComfyUI-N-Sidebar is an open source project that combines the ComfyUI and N-Sidebar libraries to provide users with a comfortable and easy-to-use user interface and navigation bar. The project improves user experience by simplifying interface elements and optimizing interaction design.

前端开发开源项目用户界面 +1

ComfyUI-APISR

ComfyUI-APISR

ComfyUI-APISR is the API server part of the ComfyUI project, which provides necessary backend support for ComfyUI client applications. ComfyUI is a user interface framework designed to provide a comfortable user experience.

后端开发 API服务端 ComfyUI框架

ComfyUI-TCD-scheduler

ComfyUI-TCD-scheduler

This is a custom sampler plug-in node of ComfyUI that implements the sampling method based on trajectory consistency distillation (TCD) proposed by Zheng et al. The plug-in adds TCDScheduler and SamplerTCD nodes to the Custom Sampler category of ComfyUI. Just clone it to the custom_nodes folder and restart ComfyUI to use it. TCDScheduler has a special parameter eta, which is used to control the randomness of each step. When eta=0, it means deterministic sampling, and when eta=1, it means completely random sampling. The default value is 0.3, but higher eta values are recommended when increasing the number of inference steps. This plug-in is based on the trajectory consistency distillation sampling method, which can provide smoother and consistent output results for AI models.

开源 ComfyUI 自定义节点 +2

Tavus Phoenix

Tavus Phoenix

Tavus offers a range of AI models, particularly in generating highly realistic videos of talking heads. Its Phoenix model uses Neural Radiation Fields (NeRFs) technology to produce natural facial movements and expressions, synchronized with input. Developers can access these highly realistic and customizable video generation services through Tavus' API.

开发编程 AI模型文本到视频 +2

ComfyUI-layerdiffusion

ComfyUI-layerdiffusion

ComfyUI-layerdiffusion is a GitHub project that provides a custom node implementation of the Layer Diffusion model. The project allows users to install via Python dependencies, and currently only supports SDXL models. The goal of the project is to provide ComfyUI users with convenient integration of Layer Diffusion models.

图像生成深度学习自定义节点

OpenDiT

OpenDiT

OpenDiT is an open source project that provides a high-performance implementation of Colossal-AI-based Diffusion Transformer (DiT), specifically designed to enhance training and inference efficiency for DiT applications, including text-to-video generation and text-to-image generation. OpenDiT improves performance through the following technologies: up to 80% acceleration and 50% memory reduction on the GPU; including FlashAttention, Fused AdaLN and Fused layernorm core optimization; including hybrid parallel methods of ZeRO, Gemini and DDP, as well as sharding the ema model to further reduce memory costs; FastSeq: a novel sequence parallel method, especially suitable for workloads such as DiT, where the activation size is large but the parameter size is small; single-node sequence parallelism can save up to 48% of communication costs; breaking through a single GPU memory constraints, reducing overall training and inference time; Huge performance improvements with small code modifications; Users do not need to know the implementation details of distributed training; Complete text-to-image and text-to-video generation process; Researchers and engineers can easily use and adapt our process to practical applications without modifying the parallel part; Perform text-to-image training on ImageNet and publish checkpoints.

文本到图像推理文本到视频 +2

GLIGEN GUI

GLIGEN GUI

gligen-gui is a plug-in that provides an intuitive graphical user interface for GLIGEN. It uses ComfyUI as the backend and aims to simplify the operation process of GLIGEN and improve the user experience.

编程辅助工具图形用户界面 GLIGEN插件

imp-v1-3b

imp-v1-3b

The Imp project aims to provide a family of powerful multimodal small language models (MSLMs). Our imp-v1-3b is a powerful MSLM with 3 billion parameters, built on top of a small but powerful SLM Phi-2 (2.7 billion) and a powerful visual encoder SigLIP (400 million), and trained on the LLaVA-v1.5 training set. Imp-v1-3b significantly outperforms opponents of similar model size on various multi-modal benchmarks, and even slightly outperforms the powerful LLaVA-7B model on various multi-modal benchmarks.

人工智能语言模型多模态

PetThoughts

PetThoughts

PetThoughts is an image recognition application built on the Gemini API. Users can upload photos of their pets, and the app will intelligently analyze the pet's facial expressions and environment to guess what it may be thinking. The application has functions such as image recognition, facial analysis, and environmental analysis. It can accurately identify the pet's facial expressions, analyze its possible emotional state, and infer the pet's activities based on the environment. Finally, through natural language processing technology, the recognition results are converted into readable text descriptions. The app provides a simple and intuitive user interface, allowing users to easily upload photos and obtain pet analysis results. It helps users gain a deeper understanding of their pets' emotions and preferences.

自然语言处理图像识别宠物 +2

SCEPTER

SCEPTER

SCEPTER is an open source code library dedicated to the training, tuning and inference of generative models, covering a series of downstream tasks such as image generation, migration and editing. It integrates the mainstream implementation of the community and the self-developed methods of Alibaba Tongyi Lab to provide a comprehensive and universal toolset for researchers and practitioners in the generative field. This versatile library is designed to foster innovation and accelerate progress in this rapidly growing field.

图像生成深度学习生成模型

Comfyspace

Comfyspace

Comfyspace is a ComfyUI Workspace Manager extension for organizing and managing all workflows. It allows users to seamlessly switch between different workflows within a single workspace, while supporting importing, exporting workflows and reusing sub-workflow modules. Features include version control, gallery and cover image settings, and easy workflow organization.

自动化效率助手开发编程 +2

syn-rep-learn

syn-rep-learn

The code repository contains research on learning from synthetic image data (mainly pictures), including three projects: StableRep, Scaling and SynCLR. These projects studied how to train visual representation models using synthetic image data generated by text-to-image models, and achieved very good results.

深度学习计算机视觉合成数据 +1

LangSplat

LangSplat

LangSplat constructs a 3D language field by mapping CLIP language embeddings to a set of 3D Gaussian distributions, enabling open vocabulary queries for 3D scenes. It avoids the expensive rendering process in NeRF and greatly improves efficiency. The learned language features accurately capture object boundaries and provide an accurate 3D language field without the need for post-processing. LangSplat is 199 times faster than LERF.

语言模型 3D 场景理解 +2

LLaVA-3b

LLaVA-3b

LLaVA-3b is a model fine-tuned based on Dolphin 2.6 Phi, fine-tuned in the LLaVA manner using the SigLIP 400M vision tower. The model features multiple image tags, state-of-the-art layer output using a visual encoder, and more. This model is based on Phi-2 and is subject to a Microsoft Research License and commercial use is prohibited. Thanks to ML Collective for providing computing resource credits.

Hugging Face 模型微调视觉编码器 +2

UniRef++

UniRef++

UniRef is a unified model for reference object segmentation in images and videos. It supports various tasks such as semantic reference image segmentation (RIS), few-shot segmentation (FSS), semantic reference video object segmentation (RVOS), and video object segmentation (VOS). The core of UniRef is the UniFusion module, which can efficiently inject various reference information into the basic network. UniRef can be used as a plug-in component for basic models such as SAM. UniRef provides models trained on multiple benchmark data sets, and also opens source code for research use.

图像处理深度学习计算机视觉 +2

ResFields

ResFields

ResFields are a class of networks specifically designed to effectively represent complex spatiotemporal signals. It introduces time-varying weights into multi-layer perceptrons and uses trainable residual parameters to enhance the expressive ability of the model. The method can be seamlessly integrated into existing technologies and can significantly improve results on a variety of challenging tasks, such as 2D video approximation, dynamic shape modeling, and dynamic NeRF reconstruction.

开发编程计算机视觉神经网络

Related Subcategories

Explore other subcategories under programming Other Categories

Development and Tools

768 tools

AI model

465 tools

code assistant

368 tools

AI development assistant

294 tools

Model training and deployment

140 tools

AI code assistant

85 tools

Development platform

66 tools

research tools

61 tools

💻

Explore More programming Tools

AI image generation Hot programming is a popular subcategory under 34 quality AI tools

Browse programming Category Categories