As a visually impaired person on the internet. YES! welcome to our world!
You’re lucky enough to get an image description that helpfully describes the image.
That description rarely tells you if it’s AI generated, that’s if the description writer even knows themselves.
Everyone in the comments saying “look at the hands, that’s AI generated”, and I’m sitting here thinking, I just have to trust the discussion, because that image, just like every other image I’ve ever seen, is hard to fully decipher visually, let alone look for evidence of AI.
Honestly, auto generating text descriptions for visually impaired people is probably one of the few potential good uses for LLM + CLIP. Being able to have a brief but accurate description without relying on some jackass to have written it is a bonefied good thing. It isn’t even eliminating anyone’s job since the jackass doesn’t always do it in the first place.
The models that do that now are very capable but aren’t tuned properly IMO. They are overly flowery and sickly positive even when describing something plain. Prompting them to be more succinct only has them cut themselves off and leave out important things. But I can totally see that improving soon.
Unfortunately the models are have trained on biased data.
I’ve run some of my own photos through various “lens” style description generators as an experiment and knowing the full context of the image makes the generated description more hilarious.
Sometimes the model tries to extrapolate context, for example it will randomly decide to describe an older woman as a “mother” if there is also a child in the photo. Even if a human eye could tell you from context it’s more likely a teacher and a student, but there’s a lot a human can do that a bot can’t, including having common sense to use appropriate language when describing people.
Image descriptions will always be flawed because the focus of the image is always filtered through the description writer. It’s impossible to remove all bias. For example, because of who I am as a person, it would never occur to me to even look at someone’s eyes in a portrait, let alone write what colour they are in the image description. But for someone else, eyes may be super important to them, they always notice eyes, even subconsciously, so they make sure to note the eyes in their description.
I’ve never seen a good answer to this in accessibility guides, would you mind making a recommendation? Is there any preferred alt text for something like:
“clarification image with an arrow pointing at object”
“Picture of a butt selfie, it’s completely black”
“Picture of a table with nothing on it”
“example of lens flare shown from camera”
“N/A” dangerous
Sometimes an image is clearly only useful as a visual aid, I feel like “” (exluding it) makes people feel like they are missing the joke. But given it’s an accessibility tool; unneeded details may waste your time.
I guess my question would be, why do you need the picture as a visual aid, is the accompanying body text confusing without that visual aid? and if so, by having no alt text, you accept that you will leave VI people confused and only sighted people will have the clarification needed.
If your including a picture of a table with nothing on it, there’s a reason, so yes, that alt text is perfectly reasonable.
Personally I wish there was a way to enable two types of alt text on images, for long and quick context.
Because I understand your concern about unnecessary detail, if I’m in a rush “a table with nothing on it” will do for quicker context, but there are times when it’s appropriate to go much deeper, “a picture of a hard wood rustic coffee table, taken from a high angle, natural sunlight, there are no objects on the table.”
They exist but none of them are perfect - they can’t possibly be perfect. It’s a bit of an arms race thing where AI images get more accurate and the detection software get more particular to match, however the economic incentives are on the side of the former.
As a visually impaired person on the internet. YES! welcome to our world!
You’re lucky enough to get an image description that helpfully describes the image.
That description rarely tells you if it’s AI generated, that’s if the description writer even knows themselves.
Everyone in the comments saying “look at the hands, that’s AI generated”, and I’m sitting here thinking, I just have to trust the discussion, because that image, just like every other image I’ve ever seen, is hard to fully decipher visually, let alone look for evidence of AI.
Alt text: a beautiful girl on a dock at sunset with some fugly hands and broken ass fingees
Honestly, auto generating text descriptions for visually impaired people is probably one of the few potential good uses for LLM + CLIP. Being able to have a brief but accurate description without relying on some jackass to have written it is a bonefied good thing. It isn’t even eliminating anyone’s job since the jackass doesn’t always do it in the first place.
I am so sorry, and i agree with your point, but i really had a good laugh at my mental image of a bonefied good thing :-)
If you know already or it’s autocorrect, just ignore me, if not, it’s bona fide :-)
The models that do that now are very capable but aren’t tuned properly IMO. They are overly flowery and sickly positive even when describing something plain. Prompting them to be more succinct only has them cut themselves off and leave out important things. But I can totally see that improving soon.
Unfortunately the models are have trained on biased data.
I’ve run some of my own photos through various “lens” style description generators as an experiment and knowing the full context of the image makes the generated description more hilarious.
Sometimes the model tries to extrapolate context, for example it will randomly decide to describe an older woman as a “mother” if there is also a child in the photo. Even if a human eye could tell you from context it’s more likely a teacher and a student, but there’s a lot a human can do that a bot can’t, including having common sense to use appropriate language when describing people.
Image descriptions will always be flawed because the focus of the image is always filtered through the description writer. It’s impossible to remove all bias. For example, because of who I am as a person, it would never occur to me to even look at someone’s eyes in a portrait, let alone write what colour they are in the image description. But for someone else, eyes may be super important to them, they always notice eyes, even subconsciously, so they make sure to note the eyes in their description.
I’ve never seen a good answer to this in accessibility guides, would you mind making a recommendation? Is there any preferred alt text for something like:
Sometimes an image is clearly only useful as a visual aid, I feel like “” (exluding it) makes people feel like they are missing the joke. But given it’s an accessibility tool; unneeded details may waste your time.
I guess my question would be, why do you need the picture as a visual aid, is the accompanying body text confusing without that visual aid? and if so, by having no alt text, you accept that you will leave VI people confused and only sighted people will have the clarification needed.
If your including a picture of a table with nothing on it, there’s a reason, so yes, that alt text is perfectly reasonable.
Personally I wish there was a way to enable two types of alt text on images, for long and quick context.
Because I understand your concern about unnecessary detail, if I’m in a rush “a table with nothing on it” will do for quicker context, but there are times when it’s appropriate to go much deeper, “a picture of a hard wood rustic coffee table, taken from a high angle, natural sunlight, there are no objects on the table.”
I’m sorry that you have to go through this stuff.
Is there no software that can just tell you if it’s AI generated or not?
They exist but none of them are perfect - they can’t possibly be perfect. It’s a bit of an arms race thing where AI images get more accurate and the detection software get more particular to match, however the economic incentives are on the side of the former.