- cross-posted to:
- stablediffusion@lemmy.ml
- cross-posted to:
- stablediffusion@lemmy.ml
This is the best summary I could come up with:
If you ask text-to-image generators like DALL-E to create a menu for a Mexican restaurant, you might spot some appetizing items like “taao,” “burto” and “enchida” amid a sea of other gibberish.
Meanwhile, when a friend tried to use Instagram’s AI to generate a sticker that said “new post,” it created a graphic that appeared to say something that we are not allowed to repeat on TechCrunch, a family website.
“Image generators tend to perform much better on artifacts like cars and people’s faces, and less so on smaller things like fingers and handwriting,” said Asmelash Teka Hadgu, co-founder of Lesan and a fellow at the DAIR Institute.
The algorithms are incentivized to recreate something that looks like what it’s seen in its training data, but it doesn’t natively know the rules that we take for granted — that “hello” is not spelled “heeelllooo,” and that human hands usually have five fingers.
“Even just last year, all these models were really bad at fingers, and that’s exactly the same problem as text,” said Matthew Guzdial, an AI researcher and assistant professor at the University of Alberta.
In one recent video, which was called a “prompt engineering hero’s journey,” someone painstakingly tries to guide ChatGPT through creating ASCII art that says “Honda.” They succeed in the end, but not without Odyssean trials and tribulations.
The original article contains 1,134 words, the summary contains 223 words. Saved 80%. I’m a bot and I’m open source!