Enlarge / AI-generated image of a girl lying on the grass using Stable Diffusion 3.
On Wednesday, Stability AI released the Stable Diffusion 3 Medium weight, an AI image synthesis model that turns text prompts into AI-generated images. But the model’s launch has been met with ridicule online because it generates images of humans in a way that seems like a step back from other state-of-the-art image synthesis models, such as Midjourney and DALL-E 3. As a result, it’s easy to create anatomically inaccurate visual anomalies.
A Reddit thread asked, “Is this release supposed to be a joke?” [SD3-2B]” details SD3 Medium’s notable failure to render humans, particularly extremities such as hands and feet. Another thread, “Why is SD3 so bad at generating girls lying on grass?”, points out similar issues with the entire human body.
Hands have traditionally posed a challenge for AI image generators due to a lack of good examples in initial training data sets, but several image synthesis models seem to have overcome this challenge recently. In that sense, SD3 appears to be a major step back for image synthesis enthusiasts on Reddit, especially when compared to recent stable releases such as November’s SD XL Turbo.
“It wasn’t that long ago that StableDiffusion was competing with Midjourney, but now it seems like a joke in comparison. At least our dataset is safe and ethical!” wrote one Reddit user.
AI-generated image created using Stable Diffusion 3 Medium.
AI-generated image of a woman lying on grass, created using Stable Diffusion 3.
AI-generated image using Stable Diffusion 3 showing an injured hand.
AI-generated image of a woman lying on grass, created using Stable Diffusion 3.
AI-generated image using Stable Diffusion 3 showing an injured hand.
An AI-generated SD3 Medium image created by a Reddit user with the prompt “Woman in a dress on the beach.”
An AI-generated SD3 Medium image created by a Reddit user with the prompt “Photo of someone taking a nap in their living room.”
Fans of AI imagery have so far blamed Stable Diffusion 3’s anatomical flaws on Stability’s insistence on filtering out adult content (often referred to as “NSFW” content) from the SD3 training data that teaches its models how to generate images. “Believe it or not, this happens because heavily censoring a model also removes human anatomy,” one Reddit user wrote in the thread.
Essentially, whenever a user prompt focuses on a concept that isn’t well-represented in the AI model’s training dataset, the image synthesis model crafts its best interpretation of what the user is asking for — and sometimes it’s downright terrifying.
Stable Diffusion 2.0, released in 2022, ran into similar issues with its depictions of humans, and AI researchers quickly discovered that censoring adult content, including nudity, could significantly reduce the AI model’s ability to generate accurate human anatomy. At the time, Stability AI reversed course with SD 2.1 and SD XL, regaining some of its lost power by aggressively filtering NSFW content.
Another issue that can arise during pre-training of the model is that the NSFW filters that researchers use to remove adult images from the dataset are too strict, inadvertently removing images that may not be offensive, and removing depictions of humans in certain situations from the model.”[SD3] “It works fine as long as there are no humans in the image, but I think the improved nsfw filter for filtering training data now decides that anything humanoid is nsfw,” one Reddit user wrote on the topic.
I ran the prompts using Hugging Face’s free online demo of SD3 and saw results similar to what others have reported: For example, the prompt “man showing his hands” returned an image of a man holding up two giant hands upside down, even though each hand had at least five fingers.
An example of an SD3 Medium generated from the prompt “Woman lying on beach.”
An example of an nSD3 Medium generated with the prompt “Man showing his hands.”
Stability AI
An example of an SD3 Medium generated for the prompt “Woman showing hands.”
Stability AI
An example of an SD3 Medium generated from the prompt “Muscular barbarian with weapon next to CRT TV, cinematic 8K, studio lighting.”
An example of an SD3 Medium generated from the prompt “Cat with a beer can in a car.”
Stability is a serious problem
Stability announced Stable Diffusion 3 in February and plans to offer it in a variety of model sizes. Releasing today is the “Medium” version, a 2 billion parameter model. In addition to the weights available in Hugging Face, you can also experiment through the company’s Stability Platform. Weights are free to download and use under a non-commercial license only.
Soon after the February announcement, rumors spread that the release was delayed due to technical issues or poor management, as the release of the SD3 model weights was delayed. Stability AI has recently been in financial trouble, with founder and CEO Emad Mostaque resigning in March, followed by a series of staff cuts. Just before that, three key engineers Robin Rombach, Andreas Blattmann and Dominik Lorenz left the company. And the company’s problems go back even further, with news of the company’s serious financial situation continuing since 2023.
Some Stable Diffusion fans believe the failure of Stable Diffusion 3 Medium is a clear sign of mismanagement at the company, and that things are falling apart. The company has not filed for bankruptcy, but SD3 Medium has led some users to make dark jokes about the possibility of bankruptcy.
“I think I can now safely and ethically file for bankruptcy. [sic] As it turns out, that’s not the case.”