text-to-image

DeepFloyd-IF: A Pixel-Based Triple-Cascaded Diffusion Model for Photorealistic Text-to-Image Generation

IF-I-XL-v1.0 DeepFloyd-IF is a pixel-based text-to-image triple-cascaded diffusion model, that can generate pictures with new state-of-the-art for photorealism and language understanding. The result is a highly efficient model that outperforms current state-of-the-art models, achieving a zero-shot FID-30K score of 6.66 on the COCO dataset. Inspired by Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding Model Details Developed by: DeepFloyd, StabilityAI Model type: pixel-based text-to-image cascaded diffusion model Cascade Stage: I Num Parameters: 4.