> In 1920, there were 25 million horses in the United States, 25 million horses totally ambivalent to two hundred years of progress in mechanical engines.
But would you rather be a horse in 1920 or 2020? Wouldn't you rather have modern medicine, better animal welfare laws, less exposure to accidents, and so on?
The only way horses conceivably have it worse is that there are fewer of them (a kind of "repugnant conclusion")...but what does that matter to an individual horse? No human regards it as a tragedy that there are only 9 billion of us instead of 90 billion. We care more about the welfare of the 9 billion.
The equivalency here is not 9 billion versus 90 billion, it's 9 billion versus 90 million, and the question is how does the decline look? Does it look like the standard of living for everyone increasing so high that the replacement rate is in the single digit percentage range, or does it look like some version of Elysium where millions have immense wealth and billions have nothing and die off?
> No human regards it as a tragedy that there are only 9 billion of us instead of 90 billion.
I have met some transhumanists and longtermists who would really like to see some orders of magnitude increase in the human population. Maybe they wouldn't say "tragedy", but they might say "burning imperative".
I also don't think it's clearly better for more beings to exist rather than fewer, but I just want to assure you that the full range of takes on population ethics definitely exists, and it's not simply a matter of straightforward common sense how many people (or horses) there ought to be.
Something I'm increasingly noticing about LLM-generated content is that...nobody wants it.
(I mean "nobody" in the sense of "nobody likes Nickelback". ie, not literally nobody.)
If I want to talk to an AI, I can talk to an AI. If I'm reading a blog or a discussion forum, it's because I want to see writing by humans. I don't want to read a wall of copy+pasted LLM slop posted under a human's name.
I now spend dismaying amounts of time and energy avoiding LLM content on the web. When I read an article, I study the writing style, and if I detect ChatGPTese ("As we dive into the ever-evolving realm of...") I hit the back button. When I search for images, I use a wall of negative filters (-AI, -Midjourney, -StableDiffusion etc) to remove slop (which would otherwise be >50% of my results for some searches). Sometimes I filter searches to before 2022.
If Google added a global "remove generative content" filter that worked, I would click it and then never unclick it.
I don't think I'm alone. There has been research suggesting that users immediately dislike content they perceive as AI-created, regardless of its quality. This creates an incentive for publishers to "humanwash" AI-written content—to construct a fiction where a human is writing the LLM slop you're reading.
Falsifying timestamps and hijacking old accounts to do this is definitely something I haven't seen before.
So far (thankfully) I've noticed this stuff get voted down on social media but it is blowing my mind people think pasting in a ChatGPT response is productive.
I've seen people on reddit say stuff like "I don't know but here's what ChatGPT said." Or worse, presenting ChatGPT copy-paste as their own. Its funny because you can tell, the text reads like an HR person wrote it.
I've noticed the opposite actually, clearly ChatGPT written posts on Reddit that get a ton of upvotes. I'm especially noticing it on niche subreddits.
The ones that make me furious are on some of the mental health subreddits. People are asking for genuine support from other people, but are getting AI slop instead. If someone needs support from an AI (which I've found can actually help), they can go use it themselves.
You should have a lot less confidence in your ability to discern what's AI generated content, honestly. Especially in such contexts where the humans will likely be writing very non-offensive in order to not-trigger the OP.
I think some of that is the gamification of social media. "I have 1200 posts and you only have 500" kind of stuff. It's much easier to win the volume game when you aren't actually writing them. This is just a more advanced version of people who just post "I agree" or "I don't know anything about this, but...[post something just to post something]".
It's particularly funny/annoying when they're convinced that the fact they got it from the "AI" makes it more likely to be correct than other commenters who actually know what the heck they're talking about.
It makes me wonder how shallow a person's knowledge of all areas must be that they could use an LLM for more than a little while without encountering something where it is flagrantly wrong yet continued with its same tone of absolute confidence and authority. ... but it's mostly just a particularly aggressive form of Gell-Mann amnesia.
The problem with "provide LLM output as a service," which is more or less the best case scenario for the ChatGPT listicles that clutter my feed, is that if I wanted an LLM result...I could have just asked the LLM. There's maybe a tiny proposition if I didn't have access to a good model, but a static page that takes ten paragraphs to badly answer one question isn't really the form factor anyone prefers; the actual chatbot interface can present the information in the way that works best for me, versus the least common denominator listicle slop that tries to appeal to the widest possible audience.
The other half of the problem is that rephrasing information doesn't actually introduce new information. If I'm looking for the kind of oil to use in my car or the recipe for blueberry muffins, I'm looking for something backed by actual data, to verify that the manufacturer said to use a particular grade of oil or for a recipe that someone has actually baked to verify that the results are as promised. I'm looking for more information than I can get from just reading the sources myself.
Regurgitating text from other data sources mostly doesn't add anything to my life.
Rephrasing can be beneficial. It can make things clearer to understand and learn from. Like in math something like khan academy or the 3blue 1 brown YouTube channel isn't presenting anything new, just rephrasing math in a different way that makes it easier for some to understand.
If llms could take the giant overwhelming manual in my car and get out the answer to what oil to use, that woukd be useful and not new information
I have to protest. A lot of 3b1b is new. Not the math itself, but the animated graphical presentation is. That's where the value from his channel comes in. He provides a lot of tools to visualize problems in ways that haven't been done before.
I guess the way I think of the visualizations and video as a whole as a type of rephrasing. He's not the first person to try to visualize math concepts
>If llms could take the giant overwhelming manual in my car and get out the answer to what oil to use, that woukd be useful and not new information
You can literally just google that or use the appendix that's probably at the back of the manual. It's also probably stamped on the engine oil cap. It also probably doesn't matter and you can just use 10w40.
> If I'm reading a blog or a discussion forum, it's because I want to see writing by humans. I don't want to read a wall of copy+pasted LLM slop posted under a human's name.
This reminds me of the time around ChatGPT 3's release where Hacker News's comments was filled with users saying "Here's what ChatGPT has to say about this"
Pepperidge Farm remembers a time where ChatGPT 2 made no claims about being a useful information lookup tool, but was a toy used to write sonnets, poems, and speeches "in the style of X"...
Yup, I'm the same, and I love my LLMs. They're fun and interesting to talk to and use, but it's obvious to everyone that they're not very reliable. If I think an article is LLM-generated, then the signal I'm getting is that the author is just as clueless as I am, and there's no way I can trust that any of the information is correct.
> but it's obvious to everyone that they're not very reliable.
Hopefully to everyone on HN, but definitely not to everyone on the greater Internet. There are plenty of horror stories of people who apparently 100% blindly trust whatever ChatGPT says.
I was especially horrified/amused when students started turning in generated answers and essays, and /r/teaching learned that you could "ask chatgpt if it wrote the essay and it will tell you."
It makes perfect intuitive sense if you don't know how the things actually work.
Yeah that's fair, I suppose I see that sort of thing on reddit fairly regularly, especially in the "here's a story about my messed-up life" types of subreddits.
There was a post from one of those am I the asshole subreddits, about how OP had some issue with an overweight person trying to claim their seat on a plane. Thousands of upvotes and comments ensued supporting OP and blaming the overweight person.
Then 10 hours later OP edited the post and dropped the bomb. The screenshot of their prompt “make a story for the am I the asshole subreddit that makes a fat person look bad.” Followed by the post they pasted directly from chatgpt. Only one comment was about the edit and it completely missed the point and instead blamed OP for tricking them. Not the fact that probably every post on that subreddit and others like it is AI slop.
I think a good comparison is when you go to a store and there are salesmen there. Nobody wants to talk to a salesman. They can almost never help a customer with any issue, since even an ignorant customer usually knows more about the products in the store than the salesmen. Most customers hate salesmen and a sustainable portion of customers choose to leave the store or not enter because of the salesmen, meaning the store loses income. Yet this has been going on forever. So just prepare for the worst when it comes to AI, because that's what you are going to get, and neither ethical sense, business sense or any rationality is going to stop companies from showing it down your throat. They don't give a damn if they will lose income or even bankrupt their companies, because annoying the customer is more important.
This has been a constant back and forth for me. My personal project https://golfcourse.wiki was built on the idea that I wanted to make a wiki for golf nerds because nobody pays attention to 95% of fun golf courses because those courses don't have a marketing department in touch with social media.
I basically decided that using AI content would waste everyone's time. However, it's a real chicken-or-egg problem in content creation. Faking it to the point of project viability has been a real issue in the past (I remember the reddit founders talking about posting fake comments and posts from fake users to make it look like more people were using the product). AI is very tempting for something like this, especially when a lot of people just don't care.
So far I've stuck to my guns, and think that the key to a course wiki is absolutely having locals insight into these courses, because the nuance is massive. At the same time, I'm trying to find ways that I can reduced the friction for contributions, and AI may end up being one way to do that.
This is a really interesting conundrum. And I'm a golfer, so...
Of the top of my head I wonder if there's a way to have AI generate a summary from existing (on-line) information about a course with a very explicit "this is what AI says about this course" or some similar disclosure until you get 'real' local insight. No one could then say 'it's just AI slop', but you're still providing value as there's something about each course. As much as I personally have reservations about AI, I (personally, YMMV) am much more forgiving if you are explicit about what's AI and what's not and not trying to BS me.
This is a good suggestion, and I'll think long and hard about it. My biggest concern is that the type of people who would contribute to such a public project are the type of folks who would be offended at the use of AI in general. That concern, again, leads me back to the conundrum of what to do.
I've always insisted that if it is financially feasible, I'd want the app to become a 501(c)(3) or at least a B-Corp, maybe even sold to Wikimedia. Still, the number of people who contribute to the side vs the number who visit is somewhere in the range of 1:10,000 (if that) right now, so concern about offending contributors is non-trivial.
As it stands, I've generally gone to the courses' sites and just quoted what they have to say about their own course, but that really isn't what I want to do, even if it is generally informative. Unfortunately, there is rarely hole-by-hole information, which is the level of granularity I'm going for.
I do wonder how much of the push for LLM-integrated everything has taken this into account.
The general trend of viewing LLM features as forced against users' will and the now widespread use of "slop" as a derogatory description seems to indicate the general public is less enthusiastic about these consumer advances than, say, programmers on HN.
I use LLMs for programming (and a few other, general QA things before a search engine/wikipedia visit) but want them absolutely nowhere else (except CoPilot et al in certain editors)
Another trick I do is to scroll to the end, and see if the last paragraph is written as a neat conclusion with a hedge (i.e. "In short...", "Ultimately..."). I imagine it's a convention to push LLMs to terminate text generation, but boy is it information-free.
I can understand it for AI generated text, but I think there are a lot of people that like AI generated images. Image sites like get a ton of people that like AI generated images. Civitai gets a lot of engagement for AI generated images, but so do many other image sites.
People who submit blog posts here sure do love opening their blogs with AI image slop. I have taken to assuming that the text is also AI slop, and closing the tab and leaving a comment saying such.
Sometimes this comment gets a ton of upvotes. Sometimes it gets indignant replies insisting it's real writing. I need to come up with a good standard response to the latter.
> People who submit blog posts here sure do love opening their blogs with AI image slop.
It sucks, but it doesn't suck any more than what was done in the past: Litter the article with stock photos.
Either have a relevant photo (and no, a post about cooking showing an image of a random kitchen, set of dishes, or prepared food does not count), or don't have any.
The only reason blog posts/articles had barely relevant stock images was to get people's attention. Is it any worse now that they're using AI generated images?
> I need to come up with a good standard response to the latter.
How about, "I'm sorry, but if you're willing to use AI image slop, how should I know you wouldn't also use AI text slop? AI text content isn't reliable, and I don't have time to personally vet every assertion."
Trying to gaslight your enemy is certainly an option for something, not always the best nor the one in line with HN guideline. Frankly it just rarely reduce undesirable behaviors even if you're in the mood to be manipulative.
Well, I wouldn't call that gaslighting, just a statement of fact. I guess you could go with "Sorry buddy, I don't trust your content because you used AI slop for your images." If you think saying the same thing with more words is manipulative and gaslighting.
Also, "enemy"? That's a little harsh, don't you think? I would never consider a random doofus on an internet forum to be my enemy.
The person posting an AI header likely isn't getting the reflexive gastric discomfort that anyone feels looking at one that doesn't happen with stock photos. They just can't even tell, and there's no path for them in that kind of antagonizing responses to lead them to the realization that others readily can and aren't liking it.
That is an excellent point. Thank you. As the article points out. AI slop is already so pervasive it's showing up in supposedly historical posts. And it's harder to identify AI generated images than text.
> I don’t understand the problem with AI generated images.
Depends on what they are used for and what they are purporting to represent.
For example, I really hate AI images being put into kids books, especially when they are trying to be psuedo-educational. A big problem those images have is from one prompt to the next, it's basically impossible to get consistent designs which means any sort of narrative story will end up with pages of characters that don't look the same.
Then there's the problem that some people are trying to sell and pump this shit like crazy into amazon. Which creates a lot of trash books that squeeze out legitimate lesser known authors and illustrators in favor of this pure garbage.
Quite similar to how you can't really buy general products from amazon because drop shipping has flooded the market with 10 billion items with different brands that are ultimately the same wish garbage.
The images can look interesting sometimes, but often on second glance there's just something "off" about the image. Fingers are currently the best sign that things have gone off the rails.
That’s not the issue though, it should be marked as such or be found in a section people looking for it can easily find it instead of shoving it everywhere. To me placing that generated content in human spaces is a strong signal for low effort. On the other hand generated content can be extremely interesting and useful and indeed there’s an art to it
I agree. AI generated text and images should be marked as such. In the US there was a push to set standards on watermarking AI generated content (feasible for images/video, but more difficult for text, because it's easier to delete). Unfortunately, the effort to study potential watermarking standards was rescinded as of Jan 22 2025.
I believe most times such responses are made in assumption that people are just lazy, like we used provide links to https://letmegooglethat.com/ before.
> If Google added a global "remove generative content" filter that worked, I would click it and then never unclick it.
It's not just generated content. This problem has been around for years. For example, google a recipe. I don't think the incentives are there yet. At least not until Google search is so unusable that no one is buying their ads anymore. I suspect any business model rooted in advertising is doomed to the eventual enshitification of the product.
Most AI generated images are like most dreams: meaningful to you but not something other people have much interest it.
Once you have people sorting through them, editing them, and so on the curation adds enough additional interest...and for many people what they get out of looking at a gallery of AI images is ideas for what prompts they want to try.
Most AI genetated visuals have a myriad of styles but you could mostly tell it’s something not seen before and thats what people may be drooling for. The same drooling happened for things that have finally found their utility after a long time and are we’re now used to. For example 20 years ago Photoshop filters were all the rage and you’d see them expressed out everywhere back then. I think this AI gen phase will lose interest/enthusiasm over time but will enter and stay in toolbox for the right things, whatever people decide to be then.
Re: proof-of-humanity... I'm looking forward to a Gattaca-like drop-of-blood port on the side of your computer, where you prick yourself everytime you want a single "certified human response" endorsement for an online comment.
It might not be what you're hoping for, but just doing a question instead of a web search. It's often more useful to get even a hallucinatory answer compared to affiliate marketing listicles that are all coming from smaller models anyway.
In some domains (math and code), progress is still very fast. In others it has slowed or arguably stopped.
We see little progress in "soft" skills like creative writing. EQBench is a benchmark that tests LLM ability to write stories, narratives, and poems. The winning models are mostly tiny Gemma finetunes with single-digit parameter counts. Huge foundation models with hundreds of billions of parameters (Claude 3 Opus, Llama 3.1 405B, GPT4) are nowhere near the top. (Yes, I know Gemma is a pruned Gemini). Fine-tuning > model size, which implies we don't have a path to "superhuman" creative writing (if that even exists). Unlike model size, fine-tuning can't be scaled indefinitely: once you've squeezed all the juice out of a model, what then?
OpenAI's new o1 model exhibits amazing progress in reasoning, math, and coding. Yet its writing is worse than GPT4-o's (as backed by EQBench and OpenAI's own research).
I'd also mention political persuasion (since people seem concerned about LLM-generated propaganda). In June, some researchers tested LLM ability to change the minds of human subjects on issues like privatization and assisted suicide. Tiny models are unpersuasive, as expected. But once a model is large enough to generate coherent sentences, persuasiveness kinda...stops. All large models are about equally persuasive. No runaway scaling laws are evident here.
This picture is uncertain due to instruction tuning. We don't really know what abilities LLMs "truly" possess, because they've been crippled to act as harmless, helpful chatbots. But we now have an open-source GPT-4-sized pretrained model to play with (Llama-3.1 405B base). People are doing interesting things with it, but it's not setting the world on fire.
It feels ironic if the only thing that the current wave of Ai enables (other than novelty cases) is a cutdown of software/coding jobs. I don't see it replacing math professionals too soon for a variety of reasons. From an outsiders perspective on the software industry it is like it's practioners voted to make themselves redundant - that seems to be the main takeaway of ai to normal non tech people ive chatted with.
Many people have anecdotally, when I tell them what I do for a living, have told me that any other profession would have the common sense/street smarts to not make their scarce skill redundant. It goes further than that; many professions have license requirements, unions, professional bodies, etc to enforce this scarcity on the behalf on their members. After all a scarce career in most economies is one not just of wealth but higher social standing.
If all it does is allow us to churn more high level software, which let's be honest is demand inelastic due to mostly large margins on software products (i.e. they would of paid a person anyway due to ROI) it doesn't seem it will add much to society other than shifting profit in tech from Labor to Capital/owners. May replace call centre jobs too I guess and some low level writing jobs/marketing. Haven't seen any real new use cases that change my life yet positively other than an odd picture/ai app, fake social posts,annoying AI assistants in apps, maybe some teaching resources that would of been made/easy to acquire anyway by other means etc. I could easily live without these things.
If this is all it is seems Ai will do or mostly do it seems like a bit of a disappointment. Especially for the massive amount of money going into it.
> many professions have license requirements, unions, professional bodies, etc to enforce this scarcity on the behalf on their members. After all a scarce career in most economies is one not just of wealth but higher social standing.
Well, that's good for them, but bad for humanity in general.
If we had a choice between a system where doctors get high salary and lot of social status, or a system where everyone can get perfect health by using a cheap device, and someone would choose the former, it would make perfect sense to me to call such person evil. The financial needs of doctors should not outweigh the health needs of humanity.
On a smarter planet we would have a nice system to compensate people for losing their privilege, so that they won't oppose progress. For example, every doctor would get a generous unconditional basic income for the rest of their life, and then they would be all replaced by cheap devices that would give us perfect health. Everyone would benefit, no reason to complain.
That's a moral argument, one with a certain ideloogy that isn't shared by most people rightly or wrongly. Especially if AI only replaces certain industries which it looks like to be the more likely option. Even if it is, I don't think it is shared by the people investing in AI unless someone else (i.e. taxpayers) will pay for it. Socialise the losses (loss of income), privatise the profits (efficiency gains). Makes me think the AI proponents are a little hypocritical. Taxpayers may not to afford that in many countries, that's reality. For software workers we need to note only the US mostly has been paid well, many more software workers worldwide don't have the luxury/pay to afford that altruism. I don't think it's wrong for people who have to skill up to want some compensation for that, there is other moral imperatives that require making a living.
On a nicer planet sure, we would have a system like that. But most of the planet is not like that - the great advantage of the status quo is that even people who are naturally not altruistic somewhat co-operate with each other due to mutual need. Besides there is ways to mitigate that and still give the required services especially if they are commonly required. The doctors example - certain countries have worked it out without resorting to AI risks. I'm not against AI ironically in this case either, there is a massive shortage of doctors services that can absorb the increased abundance Imv - most people don't put software in the same category. There is bad sides to humanity with regards to losing our mutual dependence on each other as well (community, valuing the life of others, etc) - I think sadly AI allows for many more negatives than simply withholding skills for money if not managed right, even that doesn't happen everywhere today and is a easier problem to solve. The loss of any safe intelligent jobs for climbing and evening out social mobility due to mutual dependence of skills (even the rich can't learn everything and so need to outsource) is one of them.
> If all it does is allow us to churn more high level software, which let's be honest is demand inelastic due to mostly large margins on software products (i.e. they would of paid a person anyway due to ROI) it doesn't seem it will add much to society other than shifting profit in tech from Labor to Capital/owners.
If creating software becomes cheaper then that means I can transform all the ideas I’ve had into software cheaply. Currently I simply don’t have enough hours in the day, a couple hours per weekend is not enough to roll out a tech startup.
Imagine all the open source projects that don’t have enough people to work on them. With LLM code generation we could have a huge jump in the quality of our software.
With abundance comes diminishing relative value in the product. In the end that skill and product would be seen as worth less by the market. The value of doing those ideas would drop long term to the point where it still isn't worth doing most of them, at least not for profit.
It may seem this way from an outsiders perspective, but I think the intersection between people who work on the development of state-of-the-art LLMs and people who get replaced is practically zero. Nobody is making themselves redundant, just some people make others redundant (assuming LLMs are even good enough for that, not that I know if they are) for their own gain.
Somewhat true, but again from an outsiders perspective that just shows your industry is divided and therefore will be conquered. I.e. if AI gets good enough to do software and math I don't even see AI engineers for example as anything special.
many tech people are making themselves redundant, so far mostly not because LLMs are putting them out of jobs, but because everyone decided to jump on the same bandwagon. When yet another AI YC startup surveys their peers about the most pressing AI-related problem to solve, it screams "we have no idea what to do, just want to ride this hype wave somehow"
>But once a model is large enough to generate coherent sentences, persuasiveness kinda...stops. All large models are about equally persuasive. No runaway scaling laws are evident here.
Isn't that kind of obvious? Even human speakers and writers have problems changing people's minds, let alone reliably.
The ceiling may be low, but there are definitely human writers that are an order of magnitude more effective than the average can-write-coherent-sentences human.
I think this works, not because LLMs have a "hallucination" dial they can turn down, but because it serves as a cue for the model to be extra-careful with its output.
Sort of like how offering to pay the LLM $5 improves its output. The LLM's taking your prompt seriously, but not literally.
You will not unlock "o1-like" reasoning by making a model think step by step. This is an old trick that people were using on GPT3 in 2020. If it were that simple, it wouldn't have taken OpenAI so long to release it.
Additionally, some of the prompt seems counterproductive:
>Be aware of your limitations as an llm and what you can and cannot do.
The LLM doesn't have a good idea of its limitations (any more than humans do). I expect this will create false refusals, as the model becomes overcautious.
>The LLM doesn't have a good idea of its limitations (any more than humans do). I expect this will create false refusals, as the model becomes overcautious.
Can it not be trained to do so? From my anecdotal observations, the knowledge cutoff is one thing that LLMs are really well trained to know about. Those are limitations that LLMs are currently well trained to handle. Why can it not be trained to know that it is quite frequently bad at math, it may produce sometimes inaccurate code etc.
For humans also, some people know some things are just not their cup of tea. Sure there are times people may have half baked knowledge about things but one can tell if they are good at XYZ things, and not so much at other things.
It's a chicken and egg situation. You don't know a model's capabilities until it is trained. When you then change the training with that learning, it will have modified capabilities.
Apart from anything else there will be a lot of text about the nature of LLMs and their inherent limitations in its training set. It might only need to be made salient the fact that it is one in order to produce the required effect.
you’re wrong and stating things confidently without the evidence to back it up.
alignment is a tough problem and aligning long reasoning sequences to correct answer is also a tough problem. collecting high quality CoT from experts is another tough problem. they started this project in october, more than plausible it could take this time
A LLM has a huge amount of data ingested. It can create character profiles, audience, personas etc.
Why wouldn't it have potentially even learned to 'understand' what 'being aware of your limitations' means?
Right now for me 'change of reasoning' feels a little bit of quering the existing meta space through the reasoning process to adjust weights. Basically priming the model.
I would also not just call it a 'trick'. This looks simple, weird or whatnot but i do believe that this is part of AI thinking process research.
Its a good question though what did they train? New Architecture? More parameters? Is this training a mix of experiments they did? Some auto optimization mechanism?
It might understand the concept of it having limitations, but it can't AFAIK reliably recognize when it does or doesn't know something, or has encountered a limitation.
Its the same thing as with humans, thats right. It doesn't do Logical reasoning but even the best humans stop at some level.
But if you read all the knowledge of humans, were does your reasoning start? Probably at a very high level of it.
If you look at human brains, we conduct experiments right? As a software developer, we write tests. ChatGPT can already run python code and it can write unit tests.
We do not use proofs when we develop. An AI could actually doing this. But at the end its more of a question who does it better, faster and cheaper eh?
There is an important difference between humans and LLMs in this context.
Humans do in most cases have some knowledge about why they know the things they know. They can recall the topics they learned at school, and can deduce that they probably heard a given story from a friend who likes to discuss similar topics, etc.
LLMs have no access to the information they were trained on. They could know that everything they know was learned during the training, but they have no way of determining what they learned about and what they didn't.
If you think about it, those criticisms extend to human thinking too. We aren't infallible in all situations either.
It's only when we can interact with the environment to test our hypothesis that we then refine what we know and update our priors appropriately.
If we let LLMs do that as well, by allowing it to run code and interact with documentation/the internet and double-check things its not sure of, it's not out of the question LLMs won't eventually be able to more reliably understand its limitations.
As they are currently constructed, I would say that it is out of the question.
Humans usually know (at least roughly) the source of anything they know, as there will be a memory or a known event associated with that knowledge.
LLMs have no analogous way to determine the source of their knowledge. They might know that all their knowledge comes from their training, but it has no way of knowing what was included in the training and what wasn't.
This could maybe be achieved with some more fancy RAG systems, or online training abilities. I think an essential piece is the ability to know the source of information. When LLMs reliably do, and apply that knowledge, they'll be much more useful. Hopefully somebody can achieve this.
In my country, it's illegal to charge different people differently if there's no explicitly signed agreement where the both sides agree to it. Without an agreement, there must be a reasonable and verifiable justification for a change in the price. I think suddenly charging you $100 more (compared to other consumers) without explaining how you calculated it is somewhat illegal here.
There's no change in price. They charge the same amount per token from everyone. You pay more if you use more tokens. If some tokens are hidden, used internally to generate the final 'public' tokens is just a matter of technical implementation and business choice. If you're not happy, don't use the service.
Well imagine how it looks from the point of view of anti-discrimination and consumer protection laws: we charge this person an additional $100 because we have some imaginary units telling us they owe us $100... Just trust us. Not sure it will hold in court. If the both sides agree to a specific sum beforehand, no problem. But you can't just charge random amounts post factum without the person having any idea why they suddenly owe those amounts.
P.S.
However, if the API includes CoT tokens in the total token count (in API responses), I guess it's OK.
> But you can't just charge random amounts post factum without the person having any idea why they suddenly owe those amounts.
Is it actually different from paying a contractor to do some work for you on an hourly basis, and them then having to "think more" and thus spend more hours on problem A than probably B?
It doesn't rule out negotiation. That's what the part about a written agreement is for.
It merely rules out pulling prices out of thin air. Which is what OpenAI is doing here, charging for an arbitrary amount of completely invisible tokens. The shady part is that you don't know how much of these hidden tokens you would use before you actually use them, thus making it possible to arbitrarily charge some customers different amounts whenever OpenAI feels like it.
>but much worse (and worse even in comparison to GPT4) than English composition
O1 is supposed to be a reasoning model, so I don't think judging it by its English composition abilities is quite fair.
When they release a true next-gen successor to GPT-4 (Orion, or whatever), we may see improvements. Everyone complains about the "ChatGPTese" writing style, and surely they'll fix that eventually.
>Like they hired a few hundred professors, journalists and writers to work with the model and create material for it, so you just get various combinations of their contributions.
I'm doubtful. The most prolific (human) author is probably Charles Hamilton, who wrote 100 million words in his life. Put through the GPT tokenizer, that's 133m tokens. Compared to the text training data for a frontier LLM (trillions or tens of trillions of tokens), it's unrealistic that human experts are doing any substantial amount of bespoke writing. They're probably mainly relying on synthetic data at this point.
> When they release a true next-gen successor to GPT-4 (Orion, or whatever), we may see improvements. Everyone complains about the "ChatGPTese" writing style, and surely they'll fix that eventually.
IMO that has already peaked. GPT4 original certainly was terminally corny, but competitors like Claude/Llama aren't as bad, and neither is 4o. Some of the bad writing does from things they can't/don't want to solve - "harmlessness" RLHF especially makes them all cornier.
Then again, a lot of it is just that GPT4 speaks African English because it was trained by Kenyans and Nigerians. That's actually how they talk!
I just wanted to thank you for the medium article you posted. I was online when Paul made that bizarre “delve” tweet but never knew so much about Nigeria and its English. As someone from a former British colony too I understood why using such a word was perfectly normal but wasn’t aware Kenyans and Nigerians trained ChatGPT.
It wasn't bizarre, it was ignorant if not borderline racist. He is telling native English speakers from non-anglosaxon countries that their English isn't normal
1: If non-native english speakers were training ChatGPT, then of course non-native English essays would be flagged as AI generated! It's not their fault, its ours for thinking that exploited labor with a slick facade was magical machine intelligence.
2: These tools are widely used in the developing world since fluent english is a sign of education and class and opens doors for you socially and economically; why would Nigerians use such ornate english if it didn't come from a competition to show who can speak the language of the colonizer best?
3: It's undeniable that the ones responding to Paul Graham completely missed the point. Regardless of who uses what words when, the vast majority of papers, until ChatGPT was released, did not use the word "delve," and the incidence of that word in papers increased 10-fold after. Yes, its possible that the author used "delve" intentionally, but its statistically unlikely (especially since ChatGPT used "delve" in most of its responses). A small group of English speakers, who don't predominantly interact with VCs in Silicon Valley, do not make a difference in this judgement--even if there are a lot of Englishes, the only English that most people in the business world deal with is American, European, and South Asian. Compared to the English speakers of those regions, Nigeria is a small fraction.
If Paul Graham was dealing predominantly with Nigerians in his work, he probably would not have made that tweet in the first place.
Those variants of English are not normal in the same way that american english (or any non British English variant) is not normal. Just because it is not familiar to you does not make it not normal.
1. But the trainers are native speakers of English!
2. The same applies to the developed non-English speaking world
Let me change Nigerians with Americans in your text: 'why would Americans use such different english if it didn't come from a competition to show who can speak the language of the colonizer best? Things like calling autumn fall or changing suffixes you won't find in British English.'. Hopefully you can you see how racist your text sounds.
3. Usage by non-Nigerians is not normal, yes. But in that context saying that its usage is not normal is racist imo. It's like a Brit saying that the usage of "colour" or other American English words was not normal because they are not words used by Brits.
Surely, "the only English that most people in the business world deal with is American". Unless you are taking about more than one variant of English. Also, I found it curious that you didn't say original english or british english as opposed to european english. And yes, adding South Asia to any list of countries and comparing it to any other country besides china or us will make that other country look small. You can use that trick with any other country not just Nigeria.
I do agree with you that its usage by non-Nigerians in a textual context gives plenty of grounds to suspect that it is AI generated. Similarly, one could expect similar from using X variant of English by people that didn't grow up using that variant. As in, Brit students using American English words in their essays or American students using British English words in their essays.
But Paul was being stubborn and borderline racist in those tweets just because he was partially right
There is this thing in social media that when figures of authority might be caught in a situation where they might need to retract, they don't because of ego
I cannot tell the difference between an essay written by a British student vs an American one in terms of word choice in the main, since at least in writing they are remarkably similar, whereas Nigerian English differs dramatically from both in its everyday lexicon, which is the entire point of the article: a difference such as colour/color would not make it worth even a comment.
If you think its racist you're going to have to claim that all those uses of "delve" in academic papers is also due to Nigerians academics massively increasing their research output just as frequently. Or, it's more likely that its AI generated content. It's a non sequitur. "Oh my god, scammers always send me emails claiming to be Nigerian princes--that's how you know it's bullshit." "Ah, but what if they're actually a Nigerian prince? Didn't consider that, I guess you must be racist then lmao." Ratio war ensues. Thank god we're not on twitter where calling people out for "racism" doesn't get you any points, where you can't get any clout for going on a moral crusade.
Italians would say enormous since it's directly coming from latin.
In general all the people whose main language is a latin language are very likely to use those "difficult" words, because to them they are "completely normal" words.
The bulk in terms of the number of tokens may well be synthetic data, but I personally know of at least 3 companies, 2 of whom I've done work for, that have people doing substantial amounts of bespoke writing under rather heavy NDAs. I've personally done a substantial amount of bespoke writing for training data for one provider, at good tech contractor fees (though I know I'm one of the highest-paid people for that company and the span of rates is a factor of multiple times even for a company with no exposure to third world contractors).
That said, the speculation you just "get various combinations" of those contributions is nonsense, and it's also by no means only STEM data.
It doesn't matter if it's AI-generated per se, so it's no crisis if some make it true. It matters if it is good. So multiple rounds of reviews to judge the output and pick up reviewers that keep producing poor results.
But I also know they've fired people who were dumb enough to cut and paste a response that included UI elements from a given AI website...
I’m not sure I see the value in conflating input, tokens, and output.
Tokens. Hamilton certainly read and experienced more tokens than he wrote on a pieces of paper.
But would you rather be a horse in 1920 or 2020? Wouldn't you rather have modern medicine, better animal welfare laws, less exposure to accidents, and so on?
The only way horses conceivably have it worse is that there are fewer of them (a kind of "repugnant conclusion")...but what does that matter to an individual horse? No human regards it as a tragedy that there are only 9 billion of us instead of 90 billion. We care more about the welfare of the 9 billion.
reply