Found 51 AI tools
Click any tool to view details
DreamMesh4D is a new framework that combines mesh representation and sparsely controlled deformation technology to generate high-quality 4D objects from monocular videos. This technique solves the challenges of traditional methods in terms of spatial-temporal consistency and surface texture quality by incorporating implicit Neural Radiation Fields (NeRF) or explicit Gaussian rendering as the underlying representation. DreamMesh4D uses inspiration from modern 3D animation pipelines to bind Gaussian drawing to triangular mesh surfaces, enabling differentiable optimization of textures and mesh vertices. The framework starts from a coarse mesh provided by a single-image 3D generation method and constructs a deformation map by uniformly sampling sparse points to improve computational efficiency and provide additional constraints. Through two-stage learning, combined with reference view photometric loss, score distillation loss, and other regularization losses, the learning of static surface Gaussians and mesh vertices and dynamic deformation networks is achieved. DreamMesh4D outperforms previous video-to-4D generation methods in terms of rendering quality and spatial-temporal consistency, and its mesh-based representation is compatible with modern geometry pipelines, demonstrating its potential in the 3D gaming and film industries.
Flex3D is a two-stage process that generates high-quality 3D assets from a single image or text prompt. This technology represents the latest advancement in the field of 3D reconstruction and can significantly improve the efficiency and quality of 3D content generation. The development of Flex3D is supported by Meta and team members with deep backgrounds in 3D reconstruction and computer vision.
ViewCrafter is a novel approach that exploits the generative power of video diffusion models and the coarse 3D cues provided by point-based representations to synthesize high-fidelity new views of universal scenes from single or sparse images. This method gradually expands the area covered by 3D clues and new perspectives through iterative view synthesis strategies and camera trajectory planning algorithms, thereby expanding the range of new perspective generation. ViewCrafter can facilitate various applications such as immersive experiences and real-time rendering through optimized 3D-GS representation, and more imaginative content creation through scene-level text-to-3D generation.
OmniRe is a comprehensive approach for efficient reconstruction of high-fidelity dynamic urban scenes through device logs. This technology achieves comprehensive reconstruction of different objects in the scene by constructing a dynamic neural scene graph based on Gaussian representation and constructing multiple local specification spaces to simulate various dynamic actors including vehicles, pedestrians and cyclists. OmniRe allows us to fully reconstruct the different objects present in the scene and subsequently implement a simulation of the reconstructed scene with the participation of all participants in real time. Extensive evaluation on the Waymo dataset shows that OmniRe significantly outperforms previous state-of-the-art methods both quantitatively and qualitatively.
Object Images is an innovative 3D model generation technology that simplifies the generation and processing of 3D shapes by encapsulating complex 3D shapes in a 64x64 pixel image, so-called 'Object Images' or 'omages'. This technology solves the challenges of geometric and semantic irregularities in traditional polygonal meshes by using image generation models, such as Diffusion Transformers, directly for 3D shape generation.
VFusion3D is a scalable 3D generative model built on pre-trained video diffusion models. It solves the problem of difficulty and limited quantity of 3D data acquisition, generates large-scale synthetic multi-view data sets by fine-tuning the video diffusion model, and trains a feed-forward 3D generation model that can quickly generate 3D assets from a single image. The model performed well in user studies, with users preferring results generated by VFusion3D more than 90% of the time.
SAM-guided Graph Cut for 3D Instance Segmentation is a deep learning method that utilizes 3D geometry and multi-view image information for 3D instance segmentation. This method effectively utilizes 2D segmentation models for 3D instance segmentation through a 3D to 2D query framework, constructs superpoint graphs through graph cut problems, and achieves robust segmentation performance for different types of scenes through graph neural network training.
TexGen is an innovative multi-view sampling and resampling framework for synthesizing 3D textures from arbitrary textual descriptions. It utilizes pre-trained text-to-image diffusion models, multi-view sampling strategies through consistent view sampling and attention guidance, and noise resampling techniques to significantly improve the texture quality of 3D objects with a high degree of view consistency and rich appearance details.
SF3D is a deep learning-based 3D asset generation model that can quickly generate textured 3D models with UV unwrapping and material parameters from a single image. Compared with traditional methods, SF3D is specially trained for mesh generation and integrates fast UV unwrapping technology to quickly generate textures instead of relying on vertex colors. In addition, the model learns material parameters and normal maps to improve the visual quality of the reconstructed model. SF3D also introduces a delighting step that effectively removes low-frequency lighting effects, ensuring that the reconstructed mesh is easy to use under new lighting conditions.
Stable Fast 3D (SF3D) is a large-scale reconstruction model based on TripoSR that can generate textured UV unwrapped 3D mesh assets from a single object image. The model is trained to create 3D models in less than a second, has a low polygon count, and is UV unwrapped and textured, making the model easier to use in downstream applications such as game engines or rendering work. In addition, the model predicts each object’s material parameters (roughness, metallic feel), enhancing reflection behavior during rendering. SF3D is suitable for fields that require rapid 3D modeling, such as game development, movie special effects production, etc.
HoloDreamer is a text-driven 3D scene generation framework that can generate immersive and perspective-consistent fully enclosed 3D scenes. It consists of two basic modules: stylized equirectangular panorama generation and enhanced two-stage panorama reconstruction. The framework first generates a high-definition panorama as an overall initialization of a complete 3D scene, and then utilizes 3D Gaussian Scattering (3D-GS) technology to quickly reconstruct the 3D scene, thereby achieving viewpoint-consistent and fully enclosed 3D scene generation. The main advantages of HoloDreamer include high visual consistency, harmony and robustness of reconstruction quality and rendering.
VGGSfM is a deep learning-based 3D reconstruction technology designed to reconstruct the camera pose and 3D structure of a scene from an unrestricted set of 2D images. This technology enables end-to-end training through a fully differentiable deep learning framework. It leverages deep 2D point tracking technology to extract reliable pixel-level trajectories while recovering all cameras based on image and trajectory features, and optimizes camera and triangulated 3D points via differentiable bundled adjustment layers. VGGSfM achieves state-of-the-art performance on three popular datasets: CO3D, IMC Phototourism and ETH3D.
Animate3D is an innovative framework for generating animations for any static 3D model. Its core idea consists of two main parts: 1) Propose a new multi-view video diffusion model (MV-VDM), which is based on multi-view rendering of static 3D objects and trained on the large-scale multi-view video dataset (MV-Video) we provide. 2) Based on MV-VDM, a framework combining reconstruction and 4D Scored Distillation Sampling (4D-SDS) is introduced to generate animations for 3D objects using multi-view video diffusion priors. Animate3D enhances spatial and temporal consistency by designing a new spatiotemporal attention module and maintains the identity of static 3D models through multi-view rendering. In addition, Animate3D also proposes an efficient two-stage process to animate 3D models: first directly reconstructing motion from the generated multi-view video, and then refining the appearance and motion through the introduced 4D-SDS.
CharacterGen is an efficient 3D character generation framework capable of generating 3D pose-unified character meshes with high quality and consistent appearance from a single input image. It solves the challenges posed by diverse poses through a streamlined generation pipeline and image-conditioned multi-view diffusion model to effectively calibrate the input pose to a canonical form while retaining the key attributes of the input image. It also adopts a general transformer-based sparse view reconstruction model and a texture back-projection strategy to generate high-quality texture maps.
EgoGaussian is an advanced 3D scene reconstruction and dynamic object tracking technology that can simultaneously reconstruct a 3D scene and dynamically track the movement of objects through only RGB first-person perspective input. This technology leverages the unique discrete properties of Gaussian scattering to segment dynamic interactions from the background, and exploits the dynamic nature of human activity through a fragment-level online learning process to temporally reconstruct the evolution of the scene and track the motion of rigid objects. EgoGaussian surpasses previous NeRF and dynamic Gaussian methods in the challenge of wild videos and also performs well in the quality of reconstructed models.
GaussianCube is an innovative 3D radiation representation method that greatly promotes the development of 3D generative modeling through structured and explicit representation. This technique achieves high-accuracy fitting by rearranging Gaussian functions into a predefined voxel grid using a novel density-constrained Gaussian fitting algorithm and an optimal transfer method. GaussianCube has fewer parameters and higher quality than traditional implicit feature decoders or spatially unstructured radiative representations, making 3D generative modeling easier.
L4GM is a 4D large-scale reconstruction model capable of quickly generating animated objects from single-view video input. It employs a novel dataset containing multi-view videos showing animated objects rendered in the Objaverse. The dataset contains 44K different objects and 110K animations, rendered from 48 viewpoints, resulting in 12M videos containing a total of 300M frames. L4GM is built on the pre-trained 3D large-scale reconstruction model LGM, which is capable of outputting 3D Gaussian ellipsoids from multi-view image inputs. L4GM outputs a 3D Gaussian splatting representation of each frame, which is then upsampled to a higher frame rate for temporal smoothing. In addition, L4GM adds a temporal self-attention layer to help learn temporal consistency and uses multi-view rendering loss at each time step to train the model.
WonderWorld is an innovative 3D scene extension framework that allows users to explore and shape virtual environments based on a single input image and user-specified text. It significantly reduces computation time and generates geometrically consistent extensions through fast Gaussian voxels and guided diffusion depth estimation methods, enabling 3D scene generation in less than 10 seconds, supporting real-time user interaction and exploration. This provides fields such as virtual reality, gaming, and creative design with the possibility to quickly generate and navigate immersive virtual worlds.
Bootstrap3D is a framework for improving 3D content creation, solving the problem of scarcity of high-quality 3D assets through synthetic data generation technology. It utilizes 2D and video diffusion models to generate multi-view images based on text prompts, and uses the 3D-aware MV-LLaVA model to screen high-quality data and rewrite inaccurate titles. This framework has generated 1 million high-quality synthetic multi-view images with dense descriptive captions to address the shortage of high-quality 3D data. Furthermore, it proposes a training time step rearrangement (TTR) strategy to learn multi-view consistency using a denoising process while maintaining the original 2D diffusion prior.
Ouroboros3D is a unified 3D generation framework that integrates diffusion-based multi-view image generation and 3D reconstruction into a recursive diffusion process. The framework jointly trains these two modules through a self-conditioning mechanism so that they can adapt to each other to achieve robust inference. During the multi-view denoising process, the multi-view diffusion model uses the 3D perceptual map rendered by the reconstruction module at the previous time step as an additional condition. The recursive diffusion framework combined with 3D perceptual feedback improves the geometric consistency of the entire process. Experiments show that the Ouroboros3D framework outperforms methods that separate these two stages for training, as well as existing methods that combine them during the inference stage.
Unique3D is a technology developed by a team at Tsinghua University that can generate high-fidelity textured 3D mesh models from a single image. This technology is of great significance in the fields of image processing and 3D modeling. It allows users to quickly convert 2D images into 3D models, providing powerful technical support for game development, animation production, virtual reality and other fields.
VastGaussian is an open source project for 3D scene reconstruction, which simulates the geometry and appearance information of large scenes by using 3D Gaussian. This project was implemented by the author from scratch and may have some errors, but it provides a new attempt in the field of 3D scene reconstruction. The main advantages of the project include the ability to handle large data sets, as well as improvements to the original 3DGS project to make it easier to understand and use.
CAT3D is a website that utilizes multi-view diffusion models to generate 3D scenes from new perspectives from any number of input images. It converts the generated views into interactively renderable 3D representations through a powerful 3D reconstruction pipeline. The entire processing time, including view generation and 3D reconstruction, takes just one minute.
Level of Gaussians (LoG) is a new technology for efficient rendering of three-dimensional scenes. It stores Gaussian primitives through a tree structure and reconstructs them end-to-end from the image through a progressive training strategy. It effectively overcomes local minima and achieves real-time rendering of millions of square kilometers of areas. It is an important advancement in rendering large-scale scenes.
PhysDreamer is a physics-based method that imparts interactive dynamics to static 3D objects by leveraging object dynamics priors learned by video generative models. This approach allows simulating realistic responses to novel interactions, such as external forces or agent manipulations, in the absence of data on the physical properties of real objects. PhysDreamer drives the development of more engaging and realistic virtual experiences through user studies that evaluate the authenticity of synthetic interactions.
InstantMesh is a feed-forward framework based on the LRM architecture for efficient generation of 3D meshes from a single image. It supports low-memory GPU environments and can generate 3D mesh models with texture mapping.
The pipeline leverages the generative capabilities of 2D diffusion models and cue self-healing to create panoramic images as an initial “flat” (2D) scene representation. This image is then lifted into a 3D Gaussian function using sculpting techniques to enable real-time exploration. To produce consistent 3D geometric structures, the pipeline builds spatially consistent structures by structuring the depth of the monocular view into a globally optimized point cloud. This point cloud serves as the initial state of a 3D Gaussian function to help solve the concealment problem caused by monocular input. By imposing semantic and geometric constraints on the synthesized and input camera views, the pipeline guides the optimization of Gaussian functions to reconstruct unseen regions. Overall, this approach provides a globally consistent 3D scene with a 360-degree field of view, providing an enhanced complimentary experience over existing technologies.
DiffHuman is a probabilistic photorealistic 3D human reconstruction method. It can predict the probability distribution of a 3D human body reconstruction from a single RGB image and sample multiple detailed and colorful 3D human body models through iterative denoising. Compared with existing deterministic methods, DiffHuman can generate more detailed reconstruction results in unknown or uncertain areas. At the same time, we also introduced a generation network that accelerates rendering, which greatly improves the inference speed.
Move API can convert videos containing human body movements into 3D animation assets, supports converting video files to usdz, usdc and fbx file formats, and provides preview videos. Ideal for integrating into production workflow software, enhancing application motion capture capabilities, or creating new experiences.
VisFusion is a technology that uses video data for online 3D scene reconstruction. It can extract and reconstruct a 3D environment from videos in real time. This technology combines computer vision and deep learning to provide users with a powerful tool for creating accurate 3D models.
Sketch2NeRF is a multi-view sketch-guided text-to-3D generation framework. It optimizes 3D scenes represented by Neural Radiation Fields (NeRF) through pre-trained 2D diffusion models such as Stable Diffusion and ControlNet. This method also proposes a novel simultaneous generation and reconstruction method to effectively optimize NeRF. Experimental evaluation on two collected multi-view sketch datasets demonstrates the ability of our approach to synthesize consistent 3D content with fine sketch control under high-fidelity text cues. Extensive results show that our method achieves state-of-the-art performance in sketch similarity and text alignment.
Shap-E is an official code and model release library for generating conditional 3D implicit functions. It can generate 3D objects from text or images. This product uses the latest generative models and can generate relevant 3D models based on given prompts.
GPTEval3D is an open source 3D generative model evaluation tool that implements automatic evaluation of text-to-3D generative models based on GPT-4V. It can calculate the ELO score of the generated model and compare and rank it with existing models. This tool is simple and easy to use, supports user-defined evaluation data sets, can give full play to the evaluation effect of GPT-4V, and is a powerful tool for studying 3D generation tasks.
Repaint123 can generate high-quality, multi-view consistent 3D content from a single image in 2 minutes. It combines the powerful image generation capabilities of the 2D scattering model and the texture alignment capabilities of the progressive redrawing strategy to generate high-quality, consistent multi-view images, and improves the image quality during the redrawing process through visibility-aware adaptive redrawing intensity. The generated high-quality, multi-view consistent images enable fast 3D content generation with a simple mean square error loss function.
3D Fauna is a method to build three-dimensional animal models by learning 2D network pictures. It addresses the challenge of model generalization by introducing a collection of semantically related models and provides a new large-scale dataset. During inference, given an image of any quadruped, our model can reconstruct a relevant 3D mesh model in a feed-forward manner within seconds.
Text2Immersion is an elegant method for generating high-quality 3D immersive scenes from text prompts. Our proposed pipeline first generates Gaussian clouds step by step using pre-trained 2D diffusion and depth estimation models. Next is to refine, interpolate and refine the Gaussian cloud to enhance the details of the generated scene. Unlike mainstream methods that only focus on a single object or indoor scene, or adopt reduced trajectories, our method can generate different scenes containing a variety of objects, and even expands to create imaginary scenes. As a result, Text2Immersion can have a broad impact on a variety of applications, such as virtual reality, game development, and automated content creation. Extensive evaluations demonstrate that our system outperforms other methods in terms of rendering quality and diversity, and continues to advance text-oriented 3D scene generation.
Paint3D can generate high-resolution, non-illuminated, diverse 2K UV texture maps for textureless 3D meshes, while performing conditional generation based on text or image input. It first generates perspective condition images through a pre-trained 2D diffusion model that considers depth information and performs multi-view texture fusion to obtain an initial rough texture map. It then uses specialized UV completion and UVHD texturing models to remove lighting effects and fill in incomplete areas. Paint3D can generate semantically consistent, high-quality 2K UV textures without lighting, thereby significantly improving the texture generation level of textureless 3D objects.
D3GA is a driveable 3D human body model based on Gaussian point cloud. It can learn to generate realistic 3D human body models from multi-view videos. The model is rendered in real time using 3D Gaussian point cloud technology, and the model deformation is driven by joint angles and key points. Compared with other methods, D3GA can generate higher quality results under the same training and testing data. It is suitable for applications that require real-time rendering and control of 3D human bodies.
GET3D is a generative model that generates high-quality 3D textured shapes. It is capable of generating 3D meshes with complex topology, rich geometric details, and high-fidelity textures. GET3D is trained through differentiable surface modeling, differentiable rendering, and 2D generative adversarial network methods. It is capable of generating a variety of high-quality 3D textured shapes, including cars, chairs, animals, motorcycles, people and buildings, etc.
ReconFusion is a 3D reconstruction method that utilizes diffusion priors to reconstruct real-world scenes with only a few photos. It combines Neural Radiance Fields (NeRFs) and diffusion priors to enable the synthesis of realistic geometry and textures at new camera positions beyond the input image set. By training diffusion priors on few-view and multi-view datasets, the method is able to synthesize realistic geometry and textures in unconstrained regions while preserving the appearance of the observed region. ReconFusion is extensively evaluated on a variety of real-world datasets, including forward-facing and 360-degree scenes, demonstrating significant performance improvements.
MoMask is a model for text-driven 3D human motion generation. It employs a hierarchical quantization scheme to represent human motion as multiple layers of discrete motion tokens in high-fidelity detail. MoMask is generated through two different bidirectional Transformer networks to predict motion tokens from text input. The model outperforms existing methods on text-to-motion generation tasks and can be seamlessly applied to related tasks such as text-guided temporal inpainting.
LucidDreamer is a domain-free 3D scene generation technology that can generate navigable 3D scenes from a single text prompt or a single image by fully leveraging the capabilities of existing large-scale diffusion generation models. The method has two alternating steps of dreaming and alignment, first generating multi-view consistent images based on the input, and then harmoniously integrating the newly generated 3D scene parts. The highly detailed Gaussian blobs generated by LucidDreamer have no restrictions on the target scene domain compared with previous 3D scene generation methods.
PhysGaussian is an innovative unified simulation rendering pipeline that simultaneously and seamlessly produces physically based dynamics and photorealistic renderings. The product utilizes a custom material point method (MPM) to combine a 3D Gaussian kernel with physically meaningful kinematic deformation and mechanical stress properties, evolving through the principles of continuum mechanics. The product features seamless integration of physical simulation and visual rendering, with both components using the same 3D Gaussian kernel as their discrete representation, without the need for triangle/tetrahedral meshes, Marching Cubes or any other geometric embeddings, highlighting the principle of “what you see is what you simulate”.
ZeroNVS is a tool for zero-sample 360-degree panoramic synthesis from a single real image. It provides 3D SDS distillation code, evaluation code, and trained models. Users can use the tool for their own NeRF model distillation and evaluation, and can conduct experiments on a variety of different datasets. ZeroNVS features high-quality compositing effects and supports custom image data. This tool is mainly used in fields such as virtual reality, augmented reality and panoramic video production.
Imitator is a novel approach to personalized voice-driven 3D facial animation. Given an audio sequence and a personalized style embedding as input, we generate individual-specific motion sequences with accurate lip closure for bilabial consonants ('m', 'b', 'p'). The subject's style embedding can be calculated from a short reference video (e.g. 5 seconds).
Chupa is a 3D human body generation pipeline that combines the generation capabilities of diffusion models with neural rendering technology to create diverse, realistic 3D human bodies. The pipeline can easily generalize to unseen human poses and render realistic results. Chupa generates diverse high-quality human meshes in latent space from SMPL-X meshes.
Flythroughs is an application based on AI and 3D generation technology that helps users easily create professional 3D Flythroughs. It uses the world's most advanced 3D-generating NeRF technology to generate realistic 3D experiences from video without any training or special equipment. Flythroughs also integrates a new 3D camera path AI that can generate realistic 3D experiences with one click. Flythroughs is suitable for real estate, construction, tourism, entertainment and other fields, and can help users show the fluidity and uniqueness of space.
Dpt Depth is an image processing tool based on Dpt depth estimation and 3D technology. It can quickly estimate depth information from input images and generate corresponding three-dimensional models based on depth information. Dpt Depth Estimation + 3D is powerful and easy to use, and can be widely used in computer vision, image processing and other fields. The product is available in a free trial version and a paid subscription version.
Any Image to 3D is an innovative AI system that can convert complex 2D pictures into 3D models. It removes the technical challenges of generating 3D content, making it easy for anyone to generate 3D models. It is suitable for areas such as gaming, robotics, mixed reality, visual effects, and e-commerce. Through simple visualization, users can transform ideas into detailed 3D models.
DreamFusion is a pre-trained 2D text-to-image diffusion model for generating high-fidelity, dimmable 3D objects. It generates 3D objects by optimizing a randomly initialized 3D model (Neural Radiance Field) using gradient descent, and can be viewed from any angle, re-illuminated with any lighting, or synthesized with any 3D environment. DreamFusion does not require 3D training data or modifications to the image diffusion model, demonstrating the effectiveness of pre-trained image diffusion models as a priori.
Neuralangelo is an artificial intelligence model launched by NVIDIA research that uses neural networks for 3D reconstruction. It can convert 2D video clips into detailed 3D structures and generate realistic virtual buildings, sculptures and other objects. It is able to accurately extract textures from complex materials, including roof tiles, glass panes and smooth marble. Creative professionals can import these 3D objects into design applications for further editing and use in areas such as art, video game development, robotics, and industrial digital twins. Neuralangelo’s 3D reconstruction capabilities will be a huge help to creators, helping them recreate the real world in the digital world. The tool will eventually enable developers to import detailed objects, whether small sculptures or massive buildings, into virtual environments for applications such as video games or industrial digital twins.
Explore other subcategories under image Other Categories
832 tools
771 tools
543 tools
522 tools
352 tools
196 tools
95 tools
68 tools
AI 3D tools Hot image is a popular subcategory under 51 quality AI tools