📁 Computer Vision and Pattern Recognition

DiffusionRL

Large-scale reinforcement learning for diffusion models

#image generation
#deep learning
#reinforcement learning
DiffusionRL

Product Details

Text-to-image diffusion model is a type of deep generative model that exhibits excellent image generation capabilities. However, these models are susceptible to implicit biases from web-scale text-image training pairs and may not accurately model the image aspects we care about. This can lead to suboptimal samples, model bias, and images that are inconsistent with human ethics and preferences. This paper introduces an efficient and scalable algorithm that leverages reinforcement learning (RL) to improve diffusion models covering diverse reward functions such as human preference, compositionality, and fairness, covering millions of images. We illustrate how our approach substantially outperforms existing methods, aligning diffusion models with human preferences. We further illustrate how this significantly improves the pre-trained Stable Diffusion (SD) model, generating samples that are 80.3% preferred by humans, while improving the composition and diversity of the generated samples.

Main Features

1
Improved diffusion model
2
Improving Diffusion Models Using Reinforcement Learning
3
Covers a variety of reward functions

Target Users

Suitable for improving the generation of text-to-image diffusion models, increasing human preference, composition, and diversity of images.

Examples

The text-to-image diffusion model is improved through DiffusionRL, improving the quality of image generation.

DiffusionRL is applied to improve the stable diffusion model to make the generated samples more consistent with human preferences.

The reinforcement learning algorithm of DiffusionRL is used to improve the generation effect of the diffusion model and increase the diversity of images.

Quick Access

Visit Website →

Categories

📁 Computer Vision and Pattern Recognition
› AI model
› AI image generation