Found 40 AI tools
Click any tool to view details
GaussianCity is a framework focused on efficiently generating borderless 3D cities, based on 3D Gaussian rendering technology. This technology solves the memory and computing bottlenecks faced by traditional methods when generating large-scale urban scenes through compact 3D scene representation and spatially aware Gaussian attribute decoders. Its main advantage is the ability to quickly generate large-scale 3D cities in a single forward pass, significantly outperforming existing technologies. This product was developed by the S-Lab team of Nanyang Technological University. The related paper was published in CVPR 2025. The code and model have been open source and are suitable for researchers and developers who need to efficiently generate 3D urban environments.
Funes is an innovative online museum project that uses crowdsourced photogrammetry to transform human architecture from around the world into 3D models, aiming to create a free, accessible and massive 3D database. The project is named after the Argentinian writer Jorge Luis Borges's 'Fornes the Erudite and Memorable', symbolizing the eternal preservation of human material memory. Funes is not only a technology display platform, but also a cultural inheritance project that protects the architectural heritage of human civilization through digital means.
DiffSplat is an innovative 3D generation technology that enables rapid generation of 3D Gaussian point clouds from text cues and single-view images. This technology enables efficient 3D content generation by leveraging large-scale pre-trained text-to-image diffusion models. It solves the problems of limited data sets and inability to effectively utilize 2D pre-trained models in traditional 3D generation methods, while maintaining 3D consistency. The main advantages of DiffSplat include efficient generation speed (completed in 1~2 seconds), high-quality 3D output, and support for multiple input conditions. This model has broad prospects in academic research and industrial applications, especially in scenarios where rapid generation of high-quality 3D models is required.
ComfyUI-Hunyuan3DWrapper is a plug-in based on ComfyUI that encapsulates the Hunyuan3D-2 model for efficient 3D image generation and texture processing. This tool simplifies the use process of Hunyuan3D-2 models, allowing users to quickly achieve high-quality 3D model generation and texture rendering in the ComfyUI environment. It supports custom configurations and extensions and is suitable for users who need efficient 3D content creation.
StructLDM is a structured latent diffusion model for learning 3D human body generation from 2D images. It can generate diverse human bodies with consistent perspectives and supports different levels of controllable generation and editing, such as combined generation and local clothing editing. This model enables clothing-independent generation and editing without the need for clothing type or mask conditions. The project was proposed by Tao Hu, Fangzhou Hong and Ziwei Liu of Nanyang Technological University's S-Lab, and the relevant paper was published in ECCV 2024.
Stable Point Aware 3D (SPAR3D) is an advanced 3D generative model launched by Stability AI. It enables real-time editing and complete structure generation of 3D objects from a single image in less than a second. SPAR3D uses a unique architecture that combines precise point cloud sampling with advanced mesh generation technology to provide unprecedented control over 3D asset creation. The model is free for commercial and non-commercial use, and the weights can be downloaded at Hugging Face, the code is available on GitHub, or accessed through the Stability AI Developer Platform API.
Instant 3D AI is an online platform that uses artificial intelligence technology to quickly convert 2D images into 3D models. The importance of this technology is that it greatly simplifies the 3D model creation process, allowing non-professionals to easily create high-quality 3D models. Product background information shows that Instant 3D AI has gained the trust of more than 1,400 creators and received an excellent rating of 4.8/5. The main advantages of the product include rapid 3D model generation, user-friendly interface and high user satisfaction. In terms of price, Instant 3D AI provides a free trial, allowing users to experience the product first before deciding whether to pay.
TRELLIS 3D AI is a professional tool that uses artificial intelligence technology to convert pictures into 3D assets. By combining advanced neural networks and structured latent technology (Structured LATents, SLAT), it can maintain the structural integrity and visual details of input images and generate high-quality 3D assets. Product background information shows that TRELLIS 3D AI is trusted by professionals around the world for reliable image-to-3D asset conversion. Unlike traditional 3D modeling tools, TRELLIS 3D AI provides a conversion process from images to 3D assets without complex operations. The product price is free and suitable for users who need to generate 3D assets quickly and efficiently.
MegaSaM is a system that allows accurate, fast, and robust estimation of camera parameters and depth maps from monocular videos of dynamic scenes. The system breaks through the limitations of traditional structures from motion and monocular SLAM technologies, which usually assume that the input video mainly contains static scenes and a large amount of parallax. Through careful modification of the depth visual SLAM framework, MegaSaM is able to extend to videos of complex dynamic scenes in the real world, including videos with unknown fields of view and unrestricted camera paths. Extensive experiments with this technique on synthetic and real videos show that MegaSaM is more accurate and robust in camera pose and depth estimation, with faster or comparable runtimes compared to previous and parallel work.
GenEx is an AI model capable of creating a fully explorable 360° 3D world from a single image. Users can interactively explore this generated world. GenEx advances embodied AI in imaginary spaces and has the potential to extend these capabilities to real-world exploration.
TRELLIS is a native 3D generative model based on a unified structured latent representation and modified flow transformer, enabling diverse and high-quality 3D asset creation. This model comprehensively captures structural (geometry) and textural (appearance) information while maintaining flexibility during decoding by integrating sparse 3D meshes and dense multi-view visual features extracted from powerful vision base models. TRELLIS models are capable of processing up to 2 billion parameters and are trained on a large 3D asset dataset containing 500,000 diverse objects. The model produces high-quality results under text or image conditions, significantly outperforming existing methods, including recent methods of similar scale. TRELLIS also demonstrates flexible output format selection and local 3D editing capabilities not offered by previous models. Code, models and data will be released.
PSHuman is an innovative framework that leverages multi-view diffusion models and explicit reconstruction techniques to reconstruct realistic 3D human models from a single image. The importance of this technique lies in its ability to handle complex self-occlusion problems and avoid geometric distortion in the generated facial details. PSHuman jointly models global body shape and local facial features through a cross-scale diffusion model, achieving new perspective generation that is rich in detail and maintains identity features. In addition, PSHuman also enhances cross-view body shape consistency under different human postures through body priors provided by parametric models such as SMPL-X. The main advantages of PSHuman include rich geometric details, high texture fidelity, and strong generalization capabilities.
This is an AI system capable of generating 3D worlds from a single image, allowing users to enter any image and explore it in 3D. This technology improves control and consistency and will change the way we create movies, games, simulators, and other digital expressions. It represents the first step in spatial intelligence. By rendering the generated world in real time in the browser, users can experience different camera effects, 3D effects, and explore classic paintings in depth.
CAT4D is a technology that uses multi-view video diffusion models to generate 4D scenes from monocular videos. It can convert input monocular video into multi-view video and reconstruct dynamic 3D scenes. The importance of this technology lies in its ability to extract and reconstruct complete information of three-dimensional space and time from single-view video data, providing powerful technical support for fields such as virtual reality, augmented reality, and three-dimensional modeling. Product background information shows that CAT4D was jointly developed by researchers from Google DeepMind, Columbia University and UC San Diego. It is a case in which cutting-edge scientific research results are transformed into practical applications.
LucidFusion is a flexible end-to-end feed-forward framework for generating high-resolution 3D Gaussians from unposed, sparse, and arbitrary numbers of multi-view images. This technology utilizes relative coordinate maps (RCM) to align geometric features between different views, making it highly adaptable in 3D generation. LucidFusion can seamlessly integrate with the original single image to 3D process to generate detailed 3D Gaussians at 512x512 resolution, suitable for a wide range of application scenarios.
DimensionX is a 3D and 4D scene generation technology based on the video diffusion model. It can create 3D and 4D scenes with controllable viewing angles and dynamic changes from a single picture. Key advantages of this technology include a high degree of flexibility and realism, with the ability to generate scenes of a variety of styles and themes based on user-supplied prompt words. Background information on DimensionX reveals that it was developed by a group of researchers to advance image generation technology. Currently, the technology is freely available to the research and development community.
GenXD is a framework focused on 3D and 4D scene generation, which jointly studies general 3D and 4D generation using common camera and object motions in daily life. Due to the lack of large-scale 4D data in the community, GenXD first proposed a data curation process to obtain camera poses and object motion intensity from videos. Based on this process, GenXD introduces a large-scale real-world 4D scene dataset: CamVid-30K. By utilizing all 3D and 4D data, the GenXD framework is able to generate any 3D or 4D scene. It proposes multi-view-temporal modules that decouple camera and object motion to seamlessly learn from 3D and 4D data. In addition, GenXD also uses masked latent conditions to support multiple condition views. GenXD is able to generate videos that follow camera trajectories and consistent 3D views that can be lifted to a 3D representation. It is extensively evaluated on a variety of real-world and synthetic datasets, demonstrating the effectiveness and versatility of GenXD compared to previous methods for 3D and 4D generation.
Hunyuan3D-1 is a unified framework launched by Tencent for text-to-3D and image-to-3D generation. The framework adopts a two-stage approach, the first stage uses a multi-view diffusion model to quickly generate multi-view RGB images, and the second stage quickly reconstructs 3D assets through a feed-forward reconstruction model. Hunyuan3D-1.0 strikes an impressive balance between speed and quality, significantly reducing generation times while maintaining the quality and diversity of generated assets.
Tencent Hunyuan 3D is an open source 3D generative model that aims to solve the shortcomings of existing 3D generative models in terms of generation speed and generalization capabilities. The model adopts a two-stage generation method. The first stage uses a multi-view diffusion model to quickly generate multi-view images, and the second stage uses a feed-forward reconstruction model to quickly reconstruct 3D assets. The Hunyuan 3D-1.0 model can help 3D creators and artists automatically produce 3D assets, support rapid single-image 3D generation, and complete end-to-end generation within 10 seconds, including mesh and texture extraction.
GAGAvatar is a 3D avatar reconstruction and animation generation technology based on Gaussian models. It can quickly generate 3D avatars based on a single picture and achieve real-time facial expression animation. Key advantages of this technology include high-fidelity 3D model generation, fast rendering speed, and the ability to generalize to unseen identities. GAGAvatar captures identity and facial details through an innovative dual-lift method, using global image features and 3D deformable models to control expressions, providing a new benchmark for the research and application of digital avatars.
Long-LRM is a model for 3D Gaussian reconstruction capable of reconstructing large scenes from a sequence of input images. The model can process 32 source images at 960x540 resolution in 1.3 seconds and runs on only a single A100 80G GPU. It combines the latest Mamba2 module and the traditional transformer module, and improves efficiency while ensuring quality through efficient token merging and Gaussian pruning steps. Compared with traditional feedforward models, Long-LRM is able to reconstruct the entire scene at once instead of only a small part of the scene. On large-scale scene datasets, such as DL3DV-140 and Tanks and Temples, Long-LRM's performance is comparable to optimization-based methods while improving efficiency by two orders of magnitude.
Scholar Tianji LandMark is a large-scale real-life 3D model based on NeRF technology. It achieves 100 square kilometers of 4K high-definition training and has the capabilities of real-time rendering and free editing. This technology represents a new level of city-level 3D modeling and rendering, has extremely high training and rendering efficiency, and provides a powerful tool for urban planning, architectural design, virtual reality and other fields.
Mug Life creates stunning 3D characters by combining computer graphics expertise with the latest computer vision technology. Its technology is divided into three stages: disassembly, animation and reconstruction, combined with social platforms to allow users to connect and share creations.
SV3D Online is a stable online 3D video synthesis tool capable of transforming single images into engaging 3D perspectives and meshes.
CRM is a high-fidelity single image to 3D texture mesh generative model. It is able to generate six orthogonal view images from a single input image by integrating geometric priors into the network design, and then uses convolutional U-Net to create high-resolution triplanes. CRM further uses Flexicubes as a geometric representation to facilitate straightforward end-to-end optimization on textured meshes. The entire model is able to generate high-fidelity texture meshes from images in 10 seconds, without the need for test-time optimization.
Depthify.ai is a tool that converts RGB images into various spatial formats compatible with Apple Vision Pro and Meta Quest. By converting RGB images into spatial photos, a variety of computer vision and 3D modeling applications can be supported. It can generate depth maps, stereo images and HEIC files, and can be used on Apple Vision Pro.
DUSt3R is a novel dense and unconstrained stereo 3D reconstruction method suitable for arbitrary image collections. It does not require prior knowledge of camera calibration or viewpoint pose information, and relaxes the strict constraints of traditional projective camera models by treating the pairwise reconstruction problem as a regression of point maps. DUSt3R provides a unified monocular and binocular reconstruction method and proposes a simple and effective global alignment strategy in multi-image situations. Build the network architecture based on standard Transformer encoders and decoders, taking advantage of powerful pre-trained models. DUSt3R directly provides the 3D model and depth information of the scene and can recover pixel matching, relative and absolute camera information from it.
SIGNeRF is a new approach for fast and controlled NeRF scene editing and scene-integrated object generation. It introduces a new generative update strategy that ensures 3D consistency when editing images without the need for iterative optimization. SIGNeRF takes advantage of ControlNet's depth-conditioned image diffusion model to edit existing NeRF scenes in a single forward pass in a few simple steps. It can generate new objects into existing NeRF scenes and edit existing objects to achieve precise control of the scene.
HAAR is a generative model based on text input that generates realistic 3D hairstyles. It takes text prompts as input and generates 3D hairstyle assets ready for use in a variety of computer graphics animation applications. Different from current AI-based generative models, HAAR uses 3D hair as the basic representation to automatically annotate the generated synthetic hairstyle model through a 2D visual question answering system. We propose a text-guided generation method that uses a conditional diffusion model to generate guided hair strands in the latent hairstyle UV space, and uses a latent upsampling process to reconstruct thick hairstyles containing hundreds of thousands of hair strands, given a text description. The resulting hairstyle can be rendered using off-the-shelf computer graphics technology.
Gaussian SLAM is able to reconstruct renderable 3D scenes from RGBD data streams. It is the first neural RGBD SLAM method capable of reconstructing real-world scenes with photorealism. By utilizing 3D Gaussians as the main unit of scene representation, we overcome the limitations of previous methods. We observe that traditional 3D Gaussians are difficult to use in the monocular setting: they cannot encode accurate geometric information and are difficult to optimize with single-view sequential supervision. By extending the traditional 3D Gaussian to encode geometric information, and devising a novel scene representation and methods for growing and optimizing it, we propose a SLAM system capable of reconstructing and rendering real-world datasets without sacrificing speed or efficiency. Gaussian SLAM is capable of reconstructing and rendering real-world scenes with photorealism. We evaluate our method on common synthetic and real-world datasets and compare it with other state-of-the-art SLAM methods. Finally, we demonstrate that the final 3D scene representation we obtain can be rendered in real time via efficient Gaussian splash rendering.
Tafi Avatar is an AI Text-to-3D character engine, the fastest way to create custom 3D characters. It offers millions of high-quality 3D assets and requires no prior 3D experience to get started. You can enter via text prompts without having to design a 3D character yourself. Tafi Avatar is fast, high-quality, and suitable for a variety of scenarios and uses.
Looking Glass Blocks is the first holographic sharing platform built for 3D creators. It provides a built-in artificial intelligence conversion tool that can convert any 2D image into a hologram. Users can share and embed holograms to any device on the internet and cast them directly onto the Looking Glass display. There is no need to adjust lighting or textures, and the 3D scene can be displayed the way it was designed. Looking Glass Blocks also provides a discovery platform that allows users to discover and share holograms created by other creators.
3DFY.ai uses artificial intelligence technology to generate high-quality 3D models from text or just an image. Now anyone can quickly create compelling 3D assets across industries. We provide services such as 3DFY Prompt, 3DFY Megapacks and 3DFY Image. Our technology is based on advanced AI infrastructure, ensuring model quality and uniqueness. For pricing please visit the official website for details.
Polycam is an app that uses LiDAR scanners and photogrammetry to capture reality. It can convert real-world objects into 3D models, and supports 3D scanning and downloading of 3D models on iPhone, iPad, Android and the Web. Polycam's main functions include high-precision scanning, rapid generation of 3D models, visual editing and measurement tools, etc. It is suitable for users who need to perform 3D scanning and model making, such as architects, designers, artists, etc. Polycam offers free and paid versions, with the paid version offering more advanced features and larger model export sizes.
CopernicAI is a next-generation generative AI environment that leverages the latest deep learning technology to generate high-quality 360-degree panoramic images. It uses a 2+1D method to generate the environment, combining 2D images and 1D depth information to provide users with an immersive visual experience. CopernicAI provides a variety of generation methods, including generating 360-degree panoramic images, asteroid maps, etc. Users can generate images of different styles and scenes by inputting text. CopernicAI is suitable for various application scenarios, including virtual tourism, game development, art creation, etc. Please visit the official website for product pricing details.
CSM AI is a multi-modal 3D generation platform that can generate high-resolution geometry, textures and neural radiation fields from video, images or text. It can create environments and games quickly and accurately, providing developers with a new experience. CSM AI also provides APIs to facilitate developers to integrate it into their own applications or platforms. Suitable for creating immersive simulators and games.
ScanTo3D iOS App is an app for quickly scanning homes, buildings, and other large environments. It helps users create accurate 2D floor plans, BIM models and 3D visualizations. By scanning the target environment, the application can automatically generate accurate dimensions and details, providing users with an efficient and convenient modeling tool. In addition, ScanTo3D iOS App also provides rich editing and sharing functions, allowing users to easily manage and share scanned data. ScanTo3D iOS App is targeted at professionals and enthusiasts in the fields of architecture, real estate and interior design.
in3D can transform a character into a realistic full-body 3D avatar in one minute, just using your phone's camera. Integrate it into your product using the in3D Avatar SDK.
Luma AI is a text-to-3D conversion tool based on artificial intelligence technology. By using Luma AI, users can quickly convert text into 3D models, edit and render them, and achieve unique visual effects. Luma AI is efficient, easy to use and flexible, making it suitable for a variety of creative design, advertising production and digital media projects. Please refer to the official website for pricing details.
Luma AI is a technology company focusing on AI. Through its innovative technology, users can use their mobile phones to quickly generate the 3D models they need. The company was founded by a team with extensive 3D computer vision experience. Its technology is based on Neural Radiance Fields, which can model 3D scenes based on a small number of 2D images. Dream Machine is an AI model that quickly generates high-quality, photorealistic videos directly from text and images. It is a highly scalable and efficient transformer model specifically trained on video, capable of generating physically accurate, consistent, and event-filled footage. Dream Machine is the first step in building a universal imagination engine, now available to everyone.
Explore other subcategories under image Other Categories
832 tools
771 tools
543 tools
522 tools
352 tools
196 tools
95 tools
68 tools
3D modeling Hot image is a popular subcategory under 40 quality AI tools