Using AI for image transcripts, yay or nay?

Gonzako@lemmy.world · 7 days ago

Using AI for image transcripts, yay or nay?

Auster@thebrainbin.org · 6 days ago

Imo it’s a good use. But do make sure you read the outputs throughly. Even hand-made OCR tools can go crazy some times. Also if the AI can be fully offline / self-hosted, that’s even better imo.

Kierunkowy74@piefed.zip · 7 days ago

Check your output as it may be less accurate than your effort.

AI is able to extensively describe a photo, like these published on !pics@lemmy.world , but fails at seeing, what part of it is actually important, or recognising a point of a meme. It will save you many keystrokes, but probably will still need to be manually corrected.

qaz@lemmy.world · 6 days ago

I’d say go ahead but make sure it produces accurate enough results and make sure to add something like [AI Transcribed] in front so people can take the potential for additional errors into consideration when reading it.

Also, if you’re using an online service make sure you’re using something that doesn’t use it as training data. Many (probably almost all) artists / photographers won’t appreciate that.

placebo@lemmy.zip · 7 days ago

AI is great for this. We shouldn’t put people with disabilities at a disadvantage because of the anti-AI hysteria.

KatherinaReichelt@feddit.org · 5 days ago

I think that technology can really help us here. OCR on images is mostly solved. If you know what PaddleOCR can do, those people on Mastodon who are whining about others not including an image description for a screenshot seem really annoying. It is possible to do this directly on your computer without any costs, without the need for beefy hardware. So no need to try to force everyone else to include transcriptions for screenshot, no need to attack other people, just do it yourself and enjoy the text on the screenshot. Technology can really help us here.

This also does kind of apply to AI image descriptions. Try it and put an image into Gemini and ask it to describe it. You will be surprised. AI can totally give you a workable description of an image. The problem here is that those AI tools can get quite expensive when you are using them a lot and that many disabled people do not have much money. So in my opinion it totally is ok to include AI image descriptions.

I think that there are too many people in the fediverse who do not know the current state of the technology and hate AI for maybe the right reasons, but who are missing out how it could help them.

quediuspayu@lemmy.dbzer0.com · 7 days ago

If you can run it your computer for a job that you would do anyway, I don’t see why not

forestbeasts@pawb.social · 6 days ago

Do not.

Please just don’t.

People (hi I’m people) need what the image IS, what’s important about it, why you included it. Not just what some slop generator shat out about it.

Better to have nothing, which is at least honest, than to have something that PURPORTS to have meaning but then just, doesn’t.

– Frost

FaceDeer@fedia.io · 7 days ago

Give it a test and see how accurate it is, if it’s good enough then go ahead. People have been using AI-based OCR for literal decades already, nothing has fundamentally changed. There’s just a sudden moral panic about it lately.

vala@lemmy.dbzer0.com · 7 days ago

You have a unique advantage in using AI for this over a vision impaired person. That being that if the generated text is wrong, you know and can correct it.

Rimu@piefed.social · 7 days ago

If I were blind I’d prefer it if the app just hid all image posts from me. The alt text, when it exists, is going to be trash most of the time anyway.

pruwyben@discuss.tchncs.de · 7 days ago

*yea or nay

technocrit@lemmy.dbzer0.com · 7 days ago

There’s no real problem here because “AI” doesn’t exist. A transcript program is certainly not “intelligent” or even “artificial” in any meaningful sense.

So, if you want to use an automated transcription program, I don’t see why not. Just check that it’s fairly accurate and not somehow nefarious.

Meldrik@lemmy.wtf · 7 days ago

Not sure you and the OP is on the same page? Or maybe I’m not.

OP is talking about alternative text for images, for people who can’t see. The alternative text is a description of the image. I’m not sure how you could achieve automated alternative text without AI?

If you are talking about OCR, even that is AI powered.

forestbeasts@pawb.social · 6 days ago

Eh? There’s plenty of non-“AI”-powered OCR, isn’t there? Like, that’s been a thing since long before “AI” slop generators.

(Like, mayyyybe there’s some kind of machine learning component, but even IF there is, surely you don’t have to run it through a slop generator to get a transcription?)

qaz@lemmy.world · 6 days ago

Almost all OCR tools use machine learning AFAIK, the commonly used Tesseract OCR software also uses a neural network.

It certainly isn’t AGI, but AI just means machine learning nowadays.

Sentient Loom@sh.itjust.works · 7 days ago

Lumidaub@feddit.org · 7 days ago

If you can get an AI to produce an actually useful description, that would be extremely interesting. However, AIs don’t know what’s important about an image and will fill up the description with useless information, effectively spam for the person that needs a description.

Write just a sentence, describe the thing that is important, while keeping in mind why you’re even posting the image, and it’s going to take less time than asking the AI.

Frank Heijkamp@mastodontech.de · 7 days ago

@Lumidaub
Writing a short description will be faster and more accurate.

It will tale less time than checking and correcting the output of #ai.
@Gonzako

Gonzako@lemmy.world · 7 days ago

So you posted this from mastodon? Is @Lumidaub your tag there?

Lumidaub@feddit.org · 7 days ago

“@Lumidaub” is a reference to me. The system added that because they were, technically, replying to my comment here.

Gonzako@lemmy.world · 7 days ago

Gotcha, these look so full of links on my client

Lumidaub@feddit.org · 7 days ago

Yep, same, it’s a bit of a weakness of the Fediverse imho.

HappyFrog@lemmy.blahaj.zone · 7 days ago

For those that need it, any description is better than none.

Lumidaub@feddit.org · 7 days ago

True and one sentence written by a human who understands the image is better than twenty sentences by a word prediction machine.

HappyFrog@lemmy.blahaj.zone · 7 days ago

No matter how good human written descriptions are, people just won’t do them. So having a automated system is much more preferable.

Lumidaub@feddit.org · 7 days ago

I know what you’re saying but I truly think for most people it’s simply that they’re overthinking it. They think every single thing needs to be in the description, with references explained and sourced and whatnot. That does sound exhausting. And I have written a handful of descriptions like that for pictures where I thought the details were interesting enough to justify the effort. But really, a simple “The thirteenth Doctor and Rose Tyler embracing and deeply kissing” is already very sufficient in most cases (add “standing on an asteroid in front of a field of glittering stars - digital colour painting” if you have the spoons). So imho it’s better to educate them and encourage short, concise descriptions than to give in to the slop.

x74sys@programming.dev · 7 days ago

Yeah, apart from the fact that I imagine that people who need alt text don’t appreciate LLM output. It‘s very boring. It’s either extremely technical and ice-cold or so cringe that you have to stop reading. Just what I think.

At least for me, if I realize that I’m reading an AI blog article or AI generated text in some other form, I don’t read it.

originalucifer@moist.catsweat.com · 7 days ago

personally, this is the kind of laser focused tooling its good for. LLMs are going to be critical to assisting the disabled in many contexts.