Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Truly fascinating, thanks for this.

What I find a little perplexing is when AI companies are annoyed that customers are typing "please" in their prompts as it supposedly costs a small fortune at scale yet they have system prompts that take 10 minutes for a human to read through.



> AI companies are annoyed that customers are typing "please" in their prompts as it supposedly costs a small fortune

They aren’t annoyed. The only thing that happened was that somebody wondered how much it cost, and Sam Altman responded:

> tens of millions of dollars well spent--you never know

https://x.com/sama/status/1912646035979239430

It was a throwaway comment that journalists desperate to write about AI leapt upon. It has as much meaning as when you see “Actor says new film is great!” articles on entertainment sites. People writing meaningless blather because they’ve got clicks to farm.

> yet they have system prompts that take 10 minutes for a human to read through.

The system prompts are cached, the endless variations on how people choose to be polite aren’t.


My primary takeaway from the previous comment was not the reference to corporate annoyance, but the question of how to assess overly verbose replies. I can see that, on a large scale, those outputs might end up consuming a lot of human time and attention, which could (maybe) be mitigated.


> The system prompts are cached

The second line of Claude's system prompt contains the date and time. I wonder if they update the cache every minute then. And if it wouldn't have made more sense to put it at the bottom, and cache everything above it.


That’s a good point, however what it actually says is:

> The current date is {{currentDateTime}}.

The prose part refers to the date alone. The variable name is ambiguous. Although it says currentDateTime, in Python even though there’s a date class, it’s pretty common to use datetime objects even if all you need is the date. So depending on how that’s formatted, it could include the time, or it could just be the date.


Hah, yeah I think that "please" thing was mainly Sam Altman flexing about how many users ChatGPT has.

Anthropic announced that they increased their maximum prompt caching TTL from 5 minutes to an hour the other day, not surprising that they are investigating effort in caching when their own prompts are this long!


What I find fascinating is that people still take anything Scam Altman says seriously after his trackrecord of non-stop lying, scamming and bllsh*tting right in people's faces for years.

I can't really think of anything interesting or novel he said that wasn't a scam or lie?

Let's start by observing the "non-profit's" name...


But... but... he's innocent... can't you tell from his Ghibli avatar?


*clap clap clap*

Though the whole "What I find fascinating is that people still take anything ${A PERSON} says seriously after his trackrecord of non-stop lying, scamming and bllsh*tting right in people's faces for years" routine has been done to death over the past years. It's boring AF now. The only fun aspect of it is that the millions of people who do this all seem to think they're original.

I kindly suggest finding some new material if you want to pursue Internet standup comedy as a career or even a hobby. Thanks!


I didn't read it as attempted comedy. I am genuinely dismayed by how easy it is for grifters to continue to find victims long after being exposed.


"Attempted comedy" is the most charitable take I could give it (and truthfully, I implied something else).

My point is, their statement is quite obviously wrong, but it sure sounds nice. If you don't agree, I challenge you to provide that track record "of non-stop lying, scamming and bllsh*tting right in people's faces for years". Like, for real.

I'm not defending 'sama here; I'm not a fan of his either (but neither I know enough about him to write definite accusatory statements). It's a general point - the line I quoted is a common template, and it's always a ham-fisted way of using emotions in lieu of an argument, and almost always pure bullshit in the literal sense - except, ironically, when it comes to politicians, where it's almost always true (comes with the job), but no one minds when it comes to their favorite side.

Bottomline, it's not a honest framing and it doesn't belong here.


Name an example of something impressive HE built or did or said that was not a lie or scam?

You claim I'm "obviously" wrong. So where are the arguments?


I assume that they run the system prompt once, snapshot the state, then use that as starting state for all users. In that sense, system prompt size is free.

EDIT: Turns out my assumption is wrong.


Huh, I can't say I'm on the cutting edge but that's not how I understand transformers to work.

By my understanding each token has attention calculated for it for each previous token. I.e. the 10th token in the sequence requires O(10) new calculations (in addition to O(9^2) previous calculations that can be cached). While I'd assume they cache what they can, that still means that if the long prompt doubles the total length of the final context (input + output) the final cost should be 4x as much...


This is correct. Caching only saves you from having to recompute self attention on the system prompt tokens, but not from the attention from subsequent tokens, which are free to attend to the prompt.


My understanding is that even though it's quadratic, the cost for most token lengths is still relatively low. So for short inputs it's not bad, and for long inputs the size of the system prompt is much smaller anyways.

And there's value to having extra tokens even without much information since the models are decent at using the extra computation.


They can’t complain because the chat interface is a skeuomorphism of a conversation.


Why not just strip “please” from the user input?


You'd immediately run into the clbuttic Scunthorpe problem.


You can’t strip arbitrary words from the input because you can’t assume their context. The word could be an explicit part of the question or a piece of data the user is asking about.


Each call goes through an LLM-lite categorizer (NNUE mixed with Deeplearning) and the resulting body has something along the lines of a "politenessNeededForSense: boolean". If it is false, you can trust we remove all politeness before engaging with Claude 4. Saved roughly $13,000,000 this FY


Seems like you could detect if this was important or not. If it is the first or last word it is as if the user is talking to you and you can strip it; if not it's not.


That’s such a naive implementation. “Translate this to French: Yes, please”


It's very naive but worth looking into. Could always test this if it is really costing so much money for one word. Or build another smaller model that detects if it is part of the important content or not.


There are hundreds of other opportunities for cost savings and efficiency gains that don’t have a visible UX impact. The trade-off just isn’t worth it outside of some very specialized scenarios where the user is sophisticated enough to deliberately omit the word anyway.


They would write “How do you say ‘yes please’ in French”. Or “translate yes please in French”.

To think that a model wouldn’t be capable of knowing this instance of please is important but can code for us is crazy.


Or you could just not bother with dealing with this special case that isn't actually that expensive.


It'd run in to all sorts of issues. Although AI companies losing money on user kindness is not our problem; it's theirs. The more they want to make these 'AIs' personable the more they'll get of it.

I'm tired of the AIs saying 'SO sorry! I apologize, let me refactor that for you the proper way' -- no, you're not sorry. You aren't alive.


The obsequious default tone is annoying, but you can always prepend your requests with something like "You are a machine. You do not have emotions. You respond to exactly my questions, no fluff, just answers. Do not pretend to be a human."


I would also add "Be critical."


Like what issues?


Prompts such as 'the importance of please and thank you' 'How did this civilization please their populus with such and such' I'm sure with enough engineering it can be fixed, but there's always use cases where something like that would be like 'Damn, now we have to add an exeption for..' then another exception, then another.


Why not just strip "" from the user input?


To be fair OpenAI had good guidelines on how to best use chatgpt on their github page very early on. Except github is not really consumer facing, so most of that info was lost in the sauce.


Link?



Thanks.


If a user says "thank you" as a separate message, then that will require all the tokens from the system message + previous state of the chat. It's not about the single word "please".

That said, no one was "annoyed" at customers for saying please.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: