Large-scale reinforcement learning for diffusion models
Text-to-image diffusion model is a type of deep generative model that exhibits excellent image generation capabilities. However, these models are susceptible to implicit biases from web-scale text-image training pairs and may not accurately model the image aspects we care about. This can lead to suboptimal samples, model bias, and images that are inconsistent with human ethics and preferences. This paper introduces an efficient and scalable algorithm that leverages reinforcement learning (RL) to improve diffusion models covering diverse reward functions such as human preference, compositionality, and fairness, covering millions of images. We illustrate how our approach substantially outperforms existing methods, aligning diffusion models with human preferences. We further illustrate how this significantly improves the pre-trained Stable Diffusion (SD) model, generating samples that are 80.3% preferred by humans, while improving the composition and diversity of the generated samples.
Suitable for improving the generation of text-to-image diffusion models, increasing human preference, composition, and diversity of images.
The text-to-image diffusion model is improved through DiffusionRL, improving the quality of image generation.
DiffusionRL is applied to improve the stable diffusion model to make the generated samples more consistent with human preferences.
The reinforcement learning algorithm of DiffusionRL is used to improve the generation effect of the diffusion model and increase the diversity of images.