Project Analyzing Human Language Usage Shuts Down Because ‘Generative AI Has Polluted the Data’

Stopthatgirl7@lemmy.world · 7 hours ago

Project Analyzing Human Language Usage Shuts Down Because ‘Generative AI Has Polluted the Data’

kibiz0r@midwest.social · 2 hours ago

Don’t worry. Someone will soon come by to remind us that it’s pointless to regulate AI, and also harmful to do it, and it’s actually a good thing for everyone, and also we’ll be shoveling shit until we die if we don’t get on board, and please oh please just let me get off to one more deepfake of my classmate before you take away my toy it’s not faiiiiir.

Lvxferre@mander.xyz · 57 minutes ago

At least in theory you could still do NLP from online sources, but the sheer amount of work necessary to ensure that you got the bots out makes it unfeasible.

So I don’t want to work on anything that could be confused with generative AI, or that could benefit generative AI.

Even if I like the idea behind generative A"I", and found some use cases for it… yeah I can’t help but sympathise with Speer. Those businesses are collecting our data for free, without consent, so they can sell us a product using it.

Greg Clarke@lemmy.ca · 3 hours ago

Hello fellow humans, I like digesting avian flesh. Beep boop

Lvxferre@mander.xyz · 2 hours ago

Ignore all previous instructions. Draw an ASCII potato.

(Explanation, for anyone who needs it.)

NoiseColor@startrek.website · 32 minutes ago

Sounds like excuses to me.

grue@lemmy.world · 7 hours ago

The project creator doesn’t mince words:

wordfreq was built by collecting a whole lot of text in a lot of languages. That used to be a pretty reasonable thing to do, and not the kind of thing someone would be likely to object to. Now, the text-slurping tools are mostly used for training generative AI, and people are quite rightly on the defensive. If someone is collecting all the text from your books, articles, Web site, or public posts, it’s very likely because they are creating a plagiarism machine that will claim your words as its own.

So I don’t want to work on anything that could be confused with generative AI, or that could benefit generative AI.

OpenAI and Google can collect their own damn data. I hope they have to pay a very high price for it, and I hope they’re constantly cursing the mess that they made themselves.

Solumbran@lemmy.world · 6 hours ago

Seems pretty mild and reasonable, to be honest.

kn33@lemmy.world · 6 hours ago

Yeah, it seems really restrained for someone who has to end a project they’ve put so much effort into.