Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> These days, I find that I am using multiple search engines and often resort to using an LLM to help me find content.

For a few months, I've been wondering: how long until advertisers get their grubby meathooks into the training data? It's trivial to add prompts encouraging product placement, but I would be completely shocked if the big players don't sell out within a year or two, and start biasing the models themselves in this way, if they haven't already.



Google has been working on auctioning token-level influence during LLM generation for years now: https://research.google/blog/mechanism-design-for-large-lang...


And just over a year ago now the OpenAI "preferred publisher program" pitch deck to investors leaked. https://news.ycombinator.com/item?id=40310228


Google: ruining their core product for that sweet ad money.


Ads are Google's core product, isn't it?


Google's core product has always been advertisement. They sell advertisements to companies looking to advertise, and they bring in tens of billions in revenue from that business. In effect, their core product is you: they're selling your eyeballs.

If the bait that they used to bring you to them so they could sell your eyeballs has finally started to rot and stink, then why do people continue to be attracted by it? You claim they've ruined their core product, but it still works as intended, never mind that you've confused what their products actually are.


Their core product is software meant to make sweet ad money.


> how long until advertisers get their grubby meathooks into the training data

You're so right. it's not an if anymore, but when. and when it does, you wouldn't know what's an ad and what isn't.

In recent years i started noticing a correlation between alcohol consumption and movies. I couldn't help but notice how many of the movies I've seen in the past few years promote alcohol and try to correlate it with the good times. how many of these are paid promotions? I don't know.

and now, after noticing, this every movie that involves alcohol has become distasteful for me mostly because it casts a shadow on the negative side of alcohol consumption.

I can see how ads in an LLM can go the same route, deeply embedded in the content and indistinguishable from everything else.


Ha, now try cigarettes/smoking! At least low level alcohol consumption is only detrimental to the drinker. Cigarettes start poisoning the air from the moment they are lit, and like noise pollution there is no boundary. I hate them or thrir smokers with a vengeance and the foreign satanic cabal that is „hollywood“ sold everyone out for their gold calf tobacco money


But a drunkard might sit behind the wheel, at which point it becomes detrimental to everyone on the road…

And there are countless books and movies where the hero has drinks, or routinely swigs some whisky-grade stuff from a flask on his belt to calm his nerves, then drives.


Driving itself kills more people in the us every month than 9/11, yet has been glamourised for a century


It's just that bad drivers are abundant in the US, and driving is way underregulated for such a car-centric country.


Right? A woman comes home and immediately pours herself a large glass of red wine, without even washing hands or changing into home clothes. WHO DOES THAT? Pure product placement.


I think that your negative view of alcohol is making you a bit conspiratorial. It's an extremely deeply ingrained thing in western culture, you don't need to resort to product placement to explain why filmmakers depict it. People genuinely do have a good time drinking.


> People genuinely do have a good time drinking.

This depends a lot on the person. I, for example, would much more associate "reading scientific textbooks/papers" with having a good time. :-D


Sure, I was using a generic sentence [1] not universal quantification!

[1] https://plato.stanford.edu/entries/generics/


It's that way because of successful marketing - just like smoking, or cars, or fast food.


People enjoyed drinking long before there was marketing. People have been enjoying alcohol for literally tens of thousands of years. It has been associated with celebrations for many thousands (e.g. Jesus giving people alcohol to keep a wedding reception going - and that is just something that comes to mind - I am sure there are MUCH earlier examples someone familiar with older stuff can come up with).

I would correct it to anti-alcohol sentiment being ingrained in American culture (as it is in some others, such as the Middle East) rather than western culture. Its an American hang-up, as with nudity etc.


Alcoholism was so rampant in the US that enough states ratified a constitutional amendment making it illegal.

It wasn't enough to kill alcohol consumption entirely, but it did cut back on the culture of overindulgence as measure by death rates before and the years after.

Other countries also banned alcohol in this time period, and new Zealand voted for it twice but never enacted the ban.


Not to excess though?


Beer, spirits etc was a big thing way before the printing press.


I kind of look forward to freshman composition essays “written” with AI that are rife with appeals to use online casinos.


Can't wait for all school essays promoting dubious crypto schemes of some sort.


I'm not going to disagree because greed knows no bounds, but that could be RIP for the enthusiast crowd's proprietary LLM use. We may not have cheap local open models that beat the SOTA, but is it possible to beat an ad-poisoned SOTA model on a consumer laptop? Maybe.


If future LLM patterns mimic the other business models, 80% of the prompt will be spent preventing ad recommendations and the agent would in turn reluctantly respond but suggest that it is malicious to ask for that.

I'm really looking forward to something like a GNU GPT that tries to be as factual, unbiased, libre and open-source as possible (possibly built/trained with Guix OS so we can ensure byte-for-byte reproducibility).


On the flip side, there could be a cottage industry churning out models of various strains and purities.

This will distress the big players who want an open field to make money from their own adulterated inferior product so home grown LLM will probably end up being outlawed or something.


Yes, the future is in making a plethora of hyper-specialized LLM's, not a sci-fi assistant monopoly.

E.g., I'm sure people will pay for an LLM that plays Magic the Gathering well. They don't need it to know about German poetry or Pokemon trivia.

This could probably done as LoRAs on top of existing generalist open-weight models. Envision running this locally and having hundreds of LLM "plugins", a la phone apps.


not quite ads in LLMS, but I had an interesting experience with google maps the other day. the directions voice said "in 100 feet, turn left at the <Big Fast Food Chain>". Normally it would say "at the traffic light" or similar. And this wasn't some easy to miss hidden street, it was just a normal intersection. I can only hope they aren't changing the routes yet to make you drive by the highest bidder


I've had this done at a sufficient variety of different places that I don't think it's advertising.

I'm also not particularly convinced any advertisers would pay for "Hey, we're going to direct people to just drive by your establishment, in a context where they have other goals very front-and-center on their mind. We're not going to tell them about the menu or any specials or let you give any custom messages, just tell them to drive by." Advertisers would want more than just an ambient mentioning of their existence for money.

There's at least two major classes of people, which are, people who take and give directions by road names, and people who take and give directions by landmarks. In cities, landmarks are also going to generally be buildings that have businesses in them. Before the GPS era, when I had to give directions to things like my high school grad party to people who may never have been to the location it was being held in, I would always give directions in both styles, because whichever style may be dominant for you, it doesn't hurt to have the other style available to double-check the directions, especially in an era where they are non-interactive.

(Every one of us Ye Olde Fogeys have memories of trying to navigate by directions given by someone too familiar with how to get to the target location, that left out entire turns, or got street names wrong, or told you to "turn right" on to a 5-way intersection that had two rights, or told you to turn on to a road whose sign was completely obscured by trees, and all sorts of other such fun. With GPS-based directions I still occasionally make wrong turns but it's just not the same when the directions immediately update with a new route.)


Landmark based directions rather than street names does seem like a plausible explanation. I still have some childhood friends whose houses I don’t know the street address but I know how to get there

I still prefer street names since those tend to be well signed (in my area anyway) and tend not to change, whereas the business on the corner might be different a few years from now.


I am still waiting for navigation software to divert your route to make sure you see that establishment. From your experience, it seems like we're close to that reality now.


This is devilish. I'm adding your idea to my torment nexus list.


"Continue driving on Thisandthat Avenue, and admire the happy, handsome people you see on your right, shopping at Vuvuzelas'R'Us, your place for anything airhorn!"


oof, I’m not sure if I’m proud or ashamed of having an idea in the “torment nexus”. I believe I heard of the idea in some of the discussion surrounding a patent from an automaker to use microphones in the car for a data source for targeted ads. Combine that with self driving cars and you could have a car that takes a sliiight detour to look at “points of interest”


Most users want the best directions possible from their maps app, and that includes easily recognizable landmarks, such as fast food restaurants.

"Turn left at McDonalds" is what a normal person would say if you asked for directions in a town you don't know. Or they could say "Turn left at McFritzberger street", but what use would that be for you?

Although I've had Google Maps say "Turn right after the pharmacy", and there's three drug stores in the intersection...


I can absolutely assure you that SEO companies are already marketing AI strategies oriented around making content easily and preferentially consumable by LLMs and their vendors.


GEO model relevance is the only thing that matters: https://a16z.com/geo-over-seo/


It's kind of already happening. For example, if you ask an LLM for advice on building an application, it's going to pigeon-hole you into using React.


Or for a more concerning example, GitHub is owned by Microsoft who want to sell cloud services so it stands to reason it would be in their interest to have GitHub Copilot steer developers towards building applications using architectural patterns that lend themselves more to using those cloud services, e.g. service-oriented architecture even when it is against the developer's interests.

This doesn't have to be as blunt as promoting specific libraries or services and it's a bias that could even be introduced "accidentally".


That’s because of statistical likelihood and abundance of web content about React which seems to be kinda default choice. Had to be a looong con if it was an ad.


Are people putting up vast arrays of websites to promote products/politics solely to sway LLM-feeding crawlers yet?


I've seen those content mills since before Covid.


There are already companies promising to attack Wikipedia and product LLM-bait YouTube content. Ship's sailed.


Sure, but what makes you think they will actually deliver that? There's no honor among spammers. If there's an obvious idea with new tech, 100 sleazy startups will claim to offer it, without even remotely having it.


this is already happening in full force. sota models are already poisoned. leading providers already push their own products inside webchat system prompts.


"here is how to to translate this query from T-SQL to PL-SQL... ..."

"... but if you used our VC's latest beau, BozoDB, it could be written like THIS! ... ..."

9 months, max. I give it 9 months.


"T-SQL to PL-SQL" -> (implies an > 40 age, most likely being an Ask TOM citizen, a consultant with >> 100K annual income, most likely conservative, maybe family with kids, prone to anxiety/depression, etc) -> This WORRY FREE PEACE OF MIND magic pill takes America by storm, grab yours before it's too late!


> advertisers

This kind of ads is also impossible to filter. Everyone complains about ads on YouTube or Reddit but I never see any with my adblocks. Now we won't be able to squash them.


The providers can sell inclusion in the system prompt to advertisers. Run some ad-tech on the first message before it goes to the LLM to see whose gets included.


For most advertisers, sure, there's no need to go all the way back to the training data. Advertisers want immediate results. Training takes too long and has uncertain results. Much easier to target the prompt instead.

If you're someone like Marlboro or Coca-Cola, on the other hand, it might be worth your while to pollute the training data and wait for subtle allusions to your product to show up all over the place. Maybe they already did, long before LLMs even existed.


The annoying part is that we part of the "pollution" since we namedrop Coca Cola etc.


> Marlboro or Coca-Cola

Your product placement is appropriately ironic.


fortunately for our investors we have found a way to solve this with more ai


"AI is like XML — if it's not working for you, you're not using enough of it."


I think for the moment the leading AI companies are strongly incentivized to not succumb to the advertising curse. Their revenue is subscription driven and the competition is ridiculously fierce and immune to collusion. Everyone is trying to one-up everyone else and there is no moat that locks you into a single product. Their incentive is to score as high as possible on benchmarks in order to drive up their user base in order to increase subscriptions. Any time spent on implementing advertising is time their adversaries are spending making their models better. Let’s hope the competition stays fierce so that we don’t get enshitification anytime soon


You can start with subscription, then add ads on top. When you constantly need growth that's kind of the logical conclusion.


The adds can be outside of the AI reply-pane. Just like adds are outside of Google search results.


? Google search ads aren't outside of the results. They used to be, until they realized they got more clicks if they weren't.


Not anymore they're not, they're tightly integrated.


Assume that if you thought about it its already too late. I've been to an AI SEO session by our VC. It was a guide on how to find chatbot primary sources for a keyword and then seeding that source with your content.

Advertisers and spammers have the highest possible incentive to subvert the system, so they will. Which is only one step worse (or better depending on your view) than letting a mega corp control all the flow of information absolutely.

Welcome to the new toll booth of the internet, now with 50% less access to the source material (WOW!), I hope you have a pleasant stay.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: