AI Training Slop

icegladiator@lemy.lol · 2 days ago

AI Training Slop

FourWaveforms@lemm.ee · 7 hours ago

KeenFlame@feddit.nu · 6 hours ago

Joke’s on them, my face is dataset poisoning

Nangijala@feddit.dk · edit-2 1 day ago

I remember years ago someone in my class decided to make Russian look alike pictures of everyone in the class and post them as a gag on the doors. I forget what it was called, but several of my classmates were angry that the person had taken their pictures without consent and given them to some weird Russian picture algorithm.

At this point in time, I have no doubt that all kinds of pictures and information regarding me is in the hands of people and companies I don’t care for. A lot of it is my own doing and some is out of my hands.

It is hard to avoid when you don’t have any control over your own information because people share your pictures and your info without consulting you. All the time and without malice. It is what it is.

sem@lemmy.blahaj.zone · 6 hours ago

There’s stuff I could do, like remove tags from myself on fb (is that possible?) or delete my account, but it’s enough work and enough of a loss (what if I need to find an old contact) that I just ignore the problem.

Nangijala@feddit.dk · 4 hours ago

It sure is possible, because I untagged myself from all pictures people had tagged me on before deleting all comments I ever wrote, all pictures I ever posted myself and then deleted my Facebook after that.

For years, the only thing that kept me on Facebook was that I had a few people I only had contact with through messenger due to us being from differnet countries.

When I learned about Signal, I immediately got those people onto that app so we could stay in contact and then I went on a mass destruction rampage of my profile. Literally went from “but I have to keep it because of my connections” to “let me simulate digital dementia, bitch”.

I understand that most people can’t do what I did. For me it was several years of gradual detachment from the platform that made it super easy to pull the plug in the end. It’s a bit harder for those who actively use fb every day for social connections and jobs and so on. So I get it.

But yeah, you can’t really control whether or not people keep posting about you after you leave. I have already had that happen after visiting an old friend and honestly, I cannot bring myself to care about it.

Ignotum@lemmy.world · 1 day ago

Pff it’s easy
Cut contact with all friends and family, get plastic surgery, live as a hermit in the mountains

Stomata@sh.itjust.works · 9 hours ago

My dream. Get a death certificate and become invisible. Live in mountains. Raise chicken. And live a peaceful life

slaneesh_is_right@lemmy.org · 14 hours ago

Aka living the dream

BossDj@lemm.ee · 1 day ago

I’ve given up and assume that my friends and family have already handed over my contact info, pictures, messages, DNA, etc

Jankatarch@lemmy.world · 24 hours ago

Honestly giving up is reasonable. We need EVERYONE to respect privacy for this whole thing to work.

You could be the most privacy focused individual and your mom’s facebook page would still have your graduation picture with name of the highschool you went and your home address in the back somewhere.

Lv_InSaNe_vL@lemmy.world · 9 hours ago

That’s also ignoring how all of your actual personal information (full name, address, social security, phone number, email, etc) have already been leaked 16 times this year alone

drunkpostdisaster@lemmy.world · 21 hours ago

so what you are saying is that its already over and we lost.

SynopsisTantilize@lemm.ee · 14 hours ago

Overwhelming so.

ivanafterall ☑️@lemmy.world · 1 day ago

Sorry. :/

I didn’t know you minded.

trungulox@lemm.ee · 1 day ago

Yes I have.

With a model I fine tuned myself and ran locally on my own hardware.

Suck it

mke@programming.dev · 19 hours ago

yea this attitude right here is why ai bros are so beloved

utopiah@lemmy.world · 1 day ago

Just curious, do you know even as a rough estimation (maybe via the model card) how much energy was used to train the initial model and if so how do you believe it was done so in an ecologically justifiable way?

surewhynotlem@lemmy.world · 1 day ago

Just curious. Do you know how many children had a hand in making your electronics?

kieron115@startrek.website · 1 day ago

Just curious, do you know how many trees were MOLESTED to create that air you’re breathing?

surewhynotlem@lemmy.world · 1 day ago

I know at least seven were. It would’ve been more but I got a splinter and that really turned me off.

KeenFlame@feddit.nu · 6 hours ago

Yeah splinter is too vanilla, try rocksteady x shredder sub

mechoman444@lemmy.world · 1 day ago

It’s obviously six children.

frog_brawler@lemmy.world · 1 day ago

deleted by creator

utopiah@lemmy.world · edit-2 1 day ago

Apologies for my sarcastic answer, I did actually search for that a little while ago so I do assume most people do know but that’s incorrect. The most useful tool I know of would probably be https://www.aspistrategist.org.au/uyghurs-for-sale-re-education-forced-labour-and-surveillance-beyond-xinjiang/ It wasn’t specific to children but it does show the process and I’d argue can be apply for the different criteria one would want to focus on. It dates back few years ago, when I learned about the problem so there also you might want to prefer a more up to date source.

Let me know if you are looking for something more precise. I know of few other tools which do help better understand who builds what and how, for electronics but other products too.

utopiah@lemmy.world · 1 day ago

Straw-hat much or just learning about logistics and sourcing in our globalized supply chain?

surewhynotlem@lemmy.world · 1 day ago

Satirically pointing out that worrying about electricity usage for model creation is ridiculous.

It’s already spent. The model exists. It’s probably MORE moral to use it as much as possible to get some positive value out of it. Otherwise it was just wasted.

utopiah@lemmy.world · edit-2 1 day ago

Yes indeed, yet my point is that we keep on training models TODAY so if keep on not caring, then we do postpone the same problem, cf https://lemmy.world/post/30563785/17400518

Basically yes, use trained model today if you want but if we don’t set a trend then despite the undeniable ecological impact, there will be no corrective measure.

It’s not enough to just say “Oh well, it used a ton of energy. We MUST use it now.”

Anyway, my overall point was that training takes a ton of energy. I’m not asking your or OP or anyone else NOT to use such models. I’m solely pointing out that doing so without understand the process that lead to such models, including but not limited to energy for training, is naive at best.

Edit: it’s also important to point out alternatives that are not models, namely there are already plenty of specialized tools that are MORE efficient AND accurate today. So even if the model took a ton of energy to train, in such case it’s still not rational to use it. It’s a sunk cost.

piecat@lemmy.world · 20 hours ago

How much electricity was wasted for you to post, and us to receive, your human slop

utopiah@lemmy.world · 1 day ago

FWIW the person I asked did reply, they don’t care : https://lemmy.world/post/30563785/17397024

Hope it helps.

mc900ftJesus@lemy.lol · edit-2 1 day ago

Just curious, do you know how much energy went into powering every computer and office room for 3 years while the latest videogame/hollyowood movie/etc was being made used up?

Should we ban every single non-essential thing in the world or only the ones you don’t enjoy?

And please hop-off Lemmy, do you know how much power the devs used to program this site!

utopiah@lemmy.world · 18 hours ago

That’s been addressed few times already so I let you check the history if you are actually curious.

utopiah@lemmy.world · 1 day ago

Feel free to explain the down votes.

If it wasn’t clear the my point was that self hosting addresses mostly privacy for the user but that is only one dimension addressed. It does not necessarily address the ecological impact. I was honestly hoping this community to care more.

Maalus@lemmy.world · 1 day ago

What’s clear is that you don’t realize how much energy AI actually uses up and you ate up propaganda that you are spreading right now. A querry that runs for 20s to generate an image on a card that uses up at most 350W/h during heavy gaming sessions isn’t magically going to doom the world. Chill out.

utopiah@lemmy.world · 1 day ago

I’ll assume you didn’t misread my question on purpose, I didn’t ask about inference, I asked about training.

Maalus@lemmy.world · 1 day ago

How much energy was used to bring the truckload of groceries into the shop that one time so hundreds of people can use it?

utopiah@lemmy.world · edit-2 1 day ago

Great point, so are you saying there is a certain threshold above which training is energetically useful but under which it is not, e.g. if training of a large model is used by 1 person, it is not sustainable but if 1 million people use it (assuming it’s done productively, not spam or scam) then it is fine?

Killer@lemmy.world · 1 day ago

So you’re saying if 1 guy made 1 million results it would offset the training?

Ascend910@lemmy.ml · edit-2 1 day ago

“I was honestly hoping this community to care more.”

So you want us to all go destroy paintings in art gallery and super glue our hands to the road with you?

utopiah@lemmy.world · 1 day ago

Please, do whatever you want to protect the environment you cherish. My point though was literally asking somebody who did point a better way to do it if they were aware of all the costs of their solution. If you missed it, their answer was clear : they do not know and they do not care. I was not suggesting activism, solely genuinely wondering if they actually understood the impact of the alternative they showcased. Honestly, just do whatever you can.

Squirrelanna@lemmynsfw.com · 1 day ago

You looking for an excuse? No one else brought it up but if you need permission, go ahead. Thumbs up from me.

utopiah@lemmy.world · 1 day ago

You know what, again maybe I’m misreading you.

If you do want to help, do try with me to answer the question. I did give a path to the person initially mentioning the Model Card. Maybe you are aware of that but just in cased a Model Card is basic meta-data about a model, cf https://huggingface.co/docs/hub/model-cards

Some of them do mention CO2 equivalent, see https://huggingface.co/docs/hub/model-cards-co2 so here I don’t know which model they used but maybe finding a way have CO2 equivalent for the most popular models, e.g DeepSeek, and some equivalent (they mentioned not driving a car) would help us all grasping at least some of the impact.

What do you think?

trungulox@lemm.ee · 1 day ago

Don’t know. Don’t really care honestly. I dont pay for hydro, and whatever energy expenditures were involved in training the model I fine tuned is more than offset by the fact that I don’t and never will drive.

utopiah@lemmy.world · 1 day ago

Don’t know. Don’t really care honestly […] offset by the fact that I don’t and never will drive.

That’s some strange logic. Either you do know and you can estimate that the offset will indeed “balance it out” or you don’t then you can’t say one way or the other.

jfrnz@lemm.ee · 1 day ago

Running a 500W GPU 24/7 for a full year is less than a quarter of the energy consumed by the average automobile in the US (in 2000). I don’t know how many GPUs this person has or how long it took to fine tune the model, but it’s clearly not creating an ecological disaster. Please understand there is a huge difference between the power consumed by companies training cutting-edge models at massive scale/speed, compared to a locally deployed model doing only fine tuning and inferencing.

utopiah@lemmy.world · edit-2 1 day ago

I specifically asked about the training part, not the fine tuning but thanks to clarifying.

Edit : you might be interested in helping with https://lemmy.world/post/30563785/17397757 please

jfrnz@lemm.ee · 1 day ago

The point is that OP (most probably) didn’t train it — they downloaded a pre-trained model and only did fine-tuning and inference.

utopiah@lemmy.world · 1 day ago

Right, my point is exactly that though, that OP by having just downloaded it might not realize the training costs. They might be low but on average they are quite high, at least relative to fine-tuning or inference. So my question was precisely to highlight that running locally while not knowing the training cost is naive, ecologically speaking. They did clarify though that they do not care so that’s coherent for them. I’m insisting on that point because maybe others would think “Oh… I can run a model locally, then it’s not <<evil>>” so I’m trying to clarify (and please let me know if I’m wrong) that it is good for privacy but the upfront training cost are not insignificant and might lead some people to prefer NOT relying on very costly to train models and prefer others, or a even a totally different solution.

trungulox@lemm.ee · 1 day ago

Herpa Derpa flurbidy

utopiah@lemmy.world · 1 day ago

I see. Well, I checked your post history because I thought “Heck, they sound smart, maybe I’m the problem.” and my conclusion based on the floral language you often use with others is that you are clearly provoking on purpose.

Unfortunately I don’t have the luxury of time to argue this way so I’ll just block you, this way we won’t have to interact in the future.

Take care and may we never speak again.

trungulox@lemm.ee · 1 day ago

Erby glerby skeibledee thought terminating cliches groppily boop

HugeNerd@lemmy.ca · 1 day ago

Gosh so many real problems being solved with computers! I always knew they would be useful one day.

Eager Eagle@lemmy.world · edit-2 2 days ago

how naive of him to think companies didn’t already scrape his facial data from anywhere he might have had a picture 10 years ago

ImplyingImplications@lemmy.ca · 2 days ago

Yup. Last year some Harvard students put together a demo where they used Meta’s smart glasses and commercial apps to scan people’s faces, find their social media profiles, and summarize info about them, like where they live, work, their phone numbers, and names of their relatives in real time.

BluJay320@lemmy.blahaj.zone · edit-2 2 days ago

So basically Watch_Dogs profilers IRL

stebo@sopuli.xyz · 2 days ago

wow that’s evil

Venus_Ziegenfalle@feddit.org · 2 days ago

It’s cleverly addressing a valid point. If your face is visible on the internet it can be used in an ai database without your consent. That’s just where we’re at.

stebo@sopuli.xyz · 2 days ago

yeah fair enough but every use of the studio Ghibli image generator is one too many

ArbitraryValue@sh.itjust.works · edit-2 2 days ago

Your “facial data” isn’t private information. You give it away every time you go outside.

ℍ𝕂-𝟞𝟝@sopuli.xyz · 2 days ago

But your likeness does belong to you. Try making money off of an AI movie featuring Taylor Swift.

booly@sh.itjust.works · 2 days ago

Don’t paparazzi make plenty of money off of selling unauthorized photos of celebrities? Celebrities can control some uses of their likeness, but not all of them.

kautau@lemmy.world · 2 days ago

True, though for now paparazzi photos generally are “here’s the celebrity in real life doing [x]” whereas AI is “celebrity never did this thing and we applied their image / voice to it like they did.” Really difficult for celebs to shut down tabloid or fan ai-generated garbage, but I think the bigger issue for them right now is film or music studios just using their likeness to keep the profits churning

Prunebutt@slrpnk.net · 2 days ago

every time you go outside.

You guys go outside? /j

6nk06@sh.itjust.works · 2 days ago

You’re talking about the American concept of having no privacy in public. Not all countries are like that.

utopiah@lemmy.world · 1 day ago

Your face being outside isn’t your “facial data”. It has to at least have that image, sure, in good enough quality, easy enough, linked to any piece of your identity, e.g. name or security number. If you just walk around and people take photo of your face, they don’t have your “facial data”. That’s the entire reason why reverse image search and similar services exist. It is NOT an easy problem technically speaking.

jaykrown@lemm.ee · 2 days ago

It’s already done, if you have any photographs of yourself on the internet. No need to fight that battle, accept and push forward.

cley_faye@lemmy.world · 2 days ago

And what if there’s no photograph of myself online?

Klear@lemmy.world · 1 day ago

Pics or it didn’t happen.

cley_faye@lemmy.world · 6 hours ago

Honytawk@lemmy.zip · 1 day ago

Birch@sh.itjust.works · 2 days ago

Be happy

GreenKnight23@lemmy.world · 2 days ago

I suppose I’ll accept it and just start pushing forward with setting fires. 🔥

RememberTheApollo_@lemmy.world · 1 day ago

I just saw an ad for a “training course” to “qualify” people to interact with AI as a profession.

General_Effort@lemmy.world · 1 day ago

Are you in Europe? The AI Act requires some unspecified “AI literacy” from staff working with AI. Some sort of grift, I guess.

Lv_InSaNe_vL@lemmy.world · 9 hours ago

Do you have a source for this? Not doubting you, I’m just not European so I must have missed this

General_Effort@lemmy.world · 7 hours ago

I’m always glad when someone is interested and conscientious enough to ask for a source.

Article 4 in full:

Providers and deployers of AI systems shall take measures to ensure, to their best extent, a sufficient level of AI literacy of their staff and other persons dealing with the operation and use of AI systems on their behalf, taking into account their technical knowledge, experience, education and training and the context the AI systems are to be used in, and considering the persons or groups of persons on whom the AI systems are to be used.

AI Act -> https://eur-lex.europa.eu/eli/reg/2024/1689/oj/eng

BTW. That site is the official repository for EU law. It’s also how EU law is promulgated. What you find there is, by definition, the correct version (unless stated otherwise).

RememberTheApollo_@lemmy.world · 1 day ago

Not at the time I saw it. It was just an internet ad.

Deceptichum@quokk.au · 2 days ago

Did they not give Twitter their facial data when they uploaded their avatar?

frog@feddit.uk · 2 days ago

They do, but even if they didn’t AI companies are going take them anyway. Bots make up 50% of internet traffic. AI companies have ignored robot.txt entries. Anything publicly available, even if it’s behind a password, is accessible since companies like Reddit sell that information.

ℍ𝕂-𝟞𝟝@sopuli.xyz · 2 days ago

Bots make up 50% of internet traffic.

I’ve read a study that claimed ads were 50% of traffic by data volume.

Is anyone actually still using the internet, or is it all ad networks sending crap to bots?

frog@feddit.uk · 2 days ago

This is my source : Forbes.

The source of the article is Imperva 2024 Bad Bot Report, but I cannot download the report. I do not know how they measured traffic. In this age of social media, I am going to guess it is by data volume and site visits.

kautau@lemmy.world · 2 days ago

Here’s the report:

https://files.catbox.moe/bm9n2c.pdf

KurtVonnegut@mander.xyz · 2 days ago

Even its hidden behind a password?

frog@feddit.uk · 2 days ago

Like private subreddits or private messages.

Blackmist@feddit.uk · 1 day ago

If it’s on a billionaire’s computer, and they can read it, then yes. They’ll sell it, no questions asked.

E2E encrypted data is probably OK, as long as that person didn’t save it somewhere and upload it to a cloud backup.

KurtVonnegut@mander.xyz · 2 days ago

Ah when stuff is behind a password but not encrypted and still on their servers. Yes.

frog@feddit.uk · 1 day ago

Correct.

Ledericas@lemm.ee · 2 days ago

Reddit is about to make that somewhat more “public”, I heard they are changing the pm and DMs to a chat system

Ricky Rigatoni@lemm.ee · 2 days ago

ITT: People expecting the most basic of logic from a blue checkmark’s brain.

m3t00🌎@lemmy.world · 1 day ago

hmm. tools are useful for what they are designed for. maybe design a bot to design bots.

VagueAnodyneComments@lemmy.blahaj.zone · 2 days ago

the cool thing about consent is that you’re allowed to attack everyone who pretends it isn’t real with any amount of force

stebo@sopuli.xyz · 2 days ago

i mean, I would allow you, but the law doesn’t unfortunately