July 14: Meta's entry into image generation and editing

cyu@sh.itjust.works · 1 year ago

July 14: Meta's entry into image generation and editing

fiat_lux@kbin.social · edit-2 1 year ago

it utilizes the power of attention mechanisms to weigh the relevance of input data

By applying a technique called supervised fine-tuning across modalities, Meta was able to significantly boost CM3leon’s performance at image captioning, visual QA, and text-based editing. Despite being trained on just 3 billion text tokens, CM3leon matches or exceeds the results of other models trained on up to 100 billion tokens.

That’s a very fancy way to say they deliberately focussed it on a small set of information they chose, and that they also heavily configure the implementation. Isn’t it?

This sounds like hard-wiring in bias by accident, and I look forward to seeing comparisons with other models on that…

From the Meta paper:

“The ethical implications of image data sourcing in the domain of text-to-image generation have been a topic of considerable debate. In this study, we use only licensed images from Shutterstock.
As a result, we can avoid concerns related to images ownership and attribution, without sacrificing performance.”

Oh no. That… that was the only ethical concern they considered? They didn’t even do a language accuracy comparison? Data ethics got a whole 3 sentences?

For all the self-praise in the paper about state of the art accuracy on low input, and insisting I pronounce “CM3Leon” as Chameleon (no), it would have been interesting to see how well it describes people, not streetlights and pretzels. And how it evaluates text/images generated from outside the cultural context of its pre-defined dataset.

Akasazh · 1 year ago

Wow, the face paint one looks like domestic violence…