More

rafiki6 · 2025-09-16T15:14:33 1758035673

Pretty fun! A few questions:

1 - are you planning to let people write their own prompts?

2 - when will you share the model names?

rvivek · 2025-09-16T15:16:52 1758035812

1 -- yes, soon 2 -- every week; check back this Sat.

rafiki6 · on April 8, 2023

Because OpenAI is a single team with a singular focus.

Google is a behemoth with multiple products and a lot of people with opinions who you have to get through to launch a product.

Also, OpenAI has not unseated Google's dominance in search nor do I see this happening.

rafiki6 · on April 8, 2023

Just because it's newly created doesn't mean that the structure of the language and the concepts it represents are actually new.

It's clear that whatever tests he writes cover well established and understood concepts.

This is where I believe people are missing the point. GPT4 is not a general intelligence. It is a highly overfit model, but it's overfit to literally every piece of human knowledge.

Language is humanities way of modelling real world concepts. So GPT is able to leverage the relationships we create through our language to real world concepts. It's just learned all language up until today.

It's an incredible knowledge retrieval machine. It can even mimick how our language is used to conduct reasoning very well.

It can't do this efficiently, nor can it actually stumble upon a new insight because it's not being exposed in real time to the real world.

So, this professors 'new' test is not really new. It's just a test that fundamentally has already been modelled.

famouswaffles · on April 8, 2023

Watching posts shift in real time is very entertaining. First it's not generally intelligent because it can't tackle new things then when it obviously does its not generally intelligent because it's overfit.

You've managed to essentially say nothing of substance. So it passes because structure and concepts are similar. okay. are students preparing for tests working with alien concepts and structures then because i'm failing to see the big difference here.

A model isn't overfit because you've declared it so. and unless GPT-4 is several trillion parameters, general overfitting is severely unlikely. But i doubt you care about any of that. Can you devise a test to properly asses what you're asserting ?

rafiki6 · on April 8, 2023

I have no idea what is shifting in real time. I formed this opinion of GPT4 by running it through several benchmarks and making adjustments to them, so my view is empirical and it was formed 1 week after it came out.

Your post says nothing of substance because it offers no substantial rebuttal and seems to just attack a position by creating a hand-waved argument without any clear understanding of how parameters in-fact impact a model's outputs.

You also completely missed my point.

famouswaffles · on April 8, 2023

Oh several benchmarks ? Wow. Please do tell what these benchmarks were and how you evaluated them. Should surely be easy enough to replicate.

rafiki6 · on April 8, 2023

You seem to have a serious attitude problem in your responses so this is my last one.

It's propietary company evaluation data, and it's for a specific domain related to software development, a domain that OpenAI is actively attempting to improve performance for.

Anyways enjoy your evening. If you want to actually have a reasonable discussion without being unpleasant I'd be happy to discuss further.

famouswaffles · on April 8, 2023

How does it empirically prove general overfitness ?

People study from books or from teachers or other sources of knowledge and internalize it and relate it to other concepts as well, and no one considers that to be a form of overfitting.

You basically said what amounts to "it overfits to concepts" which is honestly quite ridiculous. Not only is it a standard humans would fail, that's not what overfit is generally taken to mean.

greesil · on April 8, 2023

I agree with the parent post. I can get ChatGPT to solve a basic world problem but if I add a small wrinkle to it that a human would understand it fails hard. Overfitted seems apt.

Yeah it's amazing, but it's not AGI.

anonylizard · on April 8, 2023

Stop confusing ChatGPT with GPT-4. Most common rookie mistake. GPT-4 is way stronger at 'solving problems' than ChatGPT. I was baiting ChatGPT with basic logical or conversion problems, I stopped doing that with GPT-4, since it would take too much effort to beat it.

greesil · on April 8, 2023

Possibly rookie mistake?

https://chat.openai.com/chat

What is this? Is this ChatGPT, or GPT4? I'm talking about my experiences last week with this URL.

flangola7 · on April 8, 2023

Are you paying $20/month and selecting the GPT-4 from the drop-down menu?

kelseyfrog · on April 8, 2023

It's trivially easy even with gpt-4.

> Please respond with the number of e's in this sentence.

> There are 8 "e" characters in the sentence "Please respond with the number of e's in this sentence."

bumbledraven · on April 8, 2023

Dealing with words on the level of their constituent letters is a known weakness of OpenAI’s current GPT models, due to the kind of input and output encoding they use. The encoding also makes working with numbers represented as strings of digits less straightforward than it might otherwise be.

In the same way that GPT-4 is better at these things than GPT-3.5, future GPT models will likely be even better, even if only by the sheer brute force of their larger neural networks, more compute, and additional training data.

(To see an example of the encoding, you can enter some text at https://platform.openai.com/tokenizer. The input is presented to GPT as a series of integers, one for each colored block.)

xyzzy123 · on April 8, 2023

Almost like it has a kind of dyslexia when it comes to "looking inside" tokens.

If you instead ask it to write a Python program to do the same job, it will do it perfectly.

famouswaffles · on April 8, 2023

First GPT-4.

Second, You're going to have to give specific examples on what a small wrinkle is. I've seen "can't solve variation of common word problem" but that's a failure mode of people too. and if you reword the question so it doesn't bias common priors or even telling it it's making an assumption wrong, it often gets it right.

ftxbro · on April 8, 2023

OK write your basic word problem including its small wrinkle, so that the parent commenter can be entertained when GPT-5 solves it.

Eji1700 · on April 8, 2023

> Watching posts shift in real time is very entertaining. First it's not generally intelligent because it can't tackle new things then when it obviously does its not generally intelligent because it's overfit.

This wasn't new in the same way that making any test about Romeo and Juliet isn't new. You're still going to the same sources for the answer. It's the exact same goalpost.

tehf0x · on April 8, 2023

Ah the good old "it's not me it's the test" argument. These systems are not just next token predictors, they learn complex algorithms and can perform general computation, its just so happens that by asking them to next-token predict the internet they learn a bunch of smart ways to compress everything, potentially in a way similar to how we might use a general concept to avoid memorizing a lookup table. Please have a look at https://arxiv.org/pdf/2211.15661 and https://mobile.twitter.com/DimitrisPapail/status/16208344092.... We don't understand everything that's going on yet but it would be foolish to discount anything at this stage, or to state much of anything with any degree of confidence (and that stands for both sides of the opinion spectrum). Also these systems aren't exposed to the real world today, but this will be untrue very soon https://ai.googleblog.com/2023/03/palm-e-embodied-multimodal...

rafiki6 · on April 8, 2023

I never said: - "it's not me it's the test" - "These systems are not just next token predictors"

None of the papers or blogs you've shared offer any points that actually rebutt what I'm saying.

And yes, we will eventually have them work in real time. Can't wait.

xyzzy123 · on April 8, 2023

Don't students prepare for tests by studying past instances of them?

"Teaching the test" (aka overfitting of human students at the expense of "real" learning) is a common complaint about our current education system.

Do you think it doesn't "deserve" an A here?

rafiki6 · on April 8, 2023

Did I say that?

The OP's post was saying it's somehow able to solve something new. It's showing a severe misunderstanding how how language modelling works.

qgin · on April 8, 2023

I think the hallucinations show that it's not simply overfit to all of human knowledge. To hallucinate, there is a certain amount of generalization and information overlap that is necessary.

pama · on April 8, 2023

I’m working in a related area and I’m rather curious about this point. In what way is GPT-4 overfit? Does overfit in this context mean the conventional: validation loss went up with additional training, or something special?

rafiki6 · on April 8, 2023

More specifically validation loss is irrelevant when you can't even sample out of distribution anymore.

Kranar · on April 8, 2023

This is an unusual comment to say the least. It suggests that unless GPT4 can somehow independently derive facts entirely on its own, then it's nothing more than an overfit model, almost as if to say that it's basically just a kind of sophisticated search engine on top of a glorified Wikipedia.

Of course that's not actually true, people don't independently invent knowledge either. People study from books or from teachers or other sources of knowledge and internalize it and relate it to other concepts as well, and no one considers that to be a form of overfitting.

flangola7 · on April 8, 2023

What would a "new" test look like then?

qgin · on April 8, 2023

I would certainly be peeved if I showed up to a midterm that asked questions outside of existing human knowledge.

flangola7 · on April 8, 2023

"I didn't make it into the university I wanted because I didn't invent enough new mathematics during the entry exams."

KyeRussell · on April 8, 2023

Given that OpenAI were THEMSELVES surprised by how even GPT-3 ended up, it’s always funny to see HN know-it-alls pipe up with all the answers.

These sorts of poorly formed faux-philosophical arguments against LLMs have become the new domain of people that confuse blindly acting skeptical with actual intelligence.

Ironic.

This latest generation of AI quite rightfully raises questions and challenges assumptions about what it means to be intelligent. It quite rightfully challenges our assumptions about what can be accomplished with language. And, thank God, it quite rightfully challenges assumptions many have made about what sets humanity apart from everything else.

PaulDavisThe1st · on April 8, 2023

> poorly formed faux-philosophical arguments against LLMs

There's a misunderstanding here. The post you're replying to is not an argument against LLMs. It's an argument about what LLMs can and cannot do, what their fundamental capabilities are, and so forth.

It's very clear that if you need a system to provide answers based on a substantial body of human writing, LLMs are totally awesome. But that doesn't mean, in and of itself, that they can X or that they can Y.

Eji1700 · on April 8, 2023

> Given that OpenAI were THEMSELVES surprised by how even GPT-3 ended up,

Yeah and they have 0 incentive to overhype their takes. OpenAI has already slanted already impressive data in the past to make it more "hype building" for the general public, when a more scientific study style reading is "this is really cool, here's where it still fails". I am very confident shit like that is the same.

rafiki6 · on April 1, 2023

The problem is everyone is doing this.

There's no moat anymore. Building some dumb webapp to sell to people to make their lives marginally more convenient is not sustainable model.

So what do you do now? Seems like the only option is to move towards a life and death industry.

But when everyone does this, it's game over.

rafiki6 · on April 1, 2023

It's easy to say embrace it, but we aren't that far off from a fully capable developer. People are already stitching task based flows to make it basically build a whole thing and a big thing.

The majority of the software industry is in a lot of trouble and honestly given that it was one of the highest paying industries that's not great for the economy.

What are people supposed to even do? What's next for all these displaced workers?

rafiki6 · on April 1, 2023

That's pretty much what GPT is.

rafiki6 · on Nov 20, 2022

That's an interesting take on what to do personally. I've been thinking the best think to do is just to work hard, keep head down, and if the axe falls, expect that it will likely take at least 1 year to land the next role and it won't be an ideal setup and to just ride it out.

rafiki6 · on Nov 20, 2022

Generally speaking no company will willingly canabalize it's current product line for an unproven and premature technology in a new product line.

That's also usually how companies get displaced. I'm not sure we'll be driving any cars from established manufacturers if EVs gain wide adoption and reach a price level that's affordable to the average person.

nicoburns · on Nov 20, 2022

> Generally speaking no company will willingly cannibalize it's current product line for an unproven and premature technology in a new product line.

I imagine the deadlines European countries have put on ICE sales might change that attitude in this case.

rafiki6 · on Nov 20, 2022

I don't quite understand the strategy of sitting and waiting on a technology competitive advantage. Surely, to get the technology ramped up and integrated into their manufacturing process isn't an instantaneous thing?

I do agree that Tesla's entire value is in their battery tech. Their 'premium' cars are actually about on par with Korean and Japanese mid-tier vehicles in terms of quality at best, by every objective measure. But their pricing is luxury level simply due to the cost of batteries.

csa · on Nov 20, 2022

> I don't quite understand the strategy of sitting and waiting on a technology competitive advantage.

1. Utilizing current facilities, thereby being able to spread out the fixed cost of building a manufacturing plant over more units.

2. Related to 1, waiting until building new facilities is necessary to be competitive in order to be able to use the most up-to-date battery design as well as most up-to-date facilities design.

3. Keep the design a secret as long as possible. I’m not sure if its possible to keep battery tech a secret, but if there is no functioning version, then it can’t be reverse engineered.

> Their 'premium' cars are actually about on par with Korean and Japanese mid-tier vehicles in terms of quality at best, by every objective measure. But their pricing is luxury level simply due to the cost of batteries.

Totally agree.

rafiki6 · on Nov 20, 2022

Voice assistants, chat bots etc. are all premature technologies that are dying slow deaths.

The primary reason is quality control. The way these devices are tested can never truly represent the massive variation which would impact their ability to process and parse sound. For example, the wide range of accents for a language like English. The variations in ambient noise in real world environments etc.

Beyond that, generative language models have only recently become powerful, but they need server side processing which is incredibly expensive for the majority of contexts where an AI is useful. Think of call centers. I HATE when companies try to use voice AI in call centers, thinking it's a good way to save money.

Bank Call Center Phone Cal example:

Voice AI: "tell me, how can I help?" Me: "I'd like to request my final statements for a recently closed account." Voice AI: "I'm not sure I heard that correctly" Me: "Statements for a closed account" Voice AI: "Do you want to close an account?" Me: "Statements" Voice AI: "I'm not sure I can help with that, let me get you to a customer care representative. Please enter or say your 16 digit account number"

What was the point of that? The vast majority of customers know how to use online banking to get information at this point. Why did you make me do this? And then, imagine I get disconnected and need to call back. Go through the same process again. The bank may have saved some money (questionable, as they have already outsourced the call center anyway to somewhere cheap), but they've irked me so much, I'm always ready to switch. To bad all banks are the same where I live.

Point being, the tech is too premature, unfinished and hard to build and it offers questionable value.

Voice AI is mostly useful in situations where I need to be handsfree. I think what SoundHound is doing makes the most sense. Sell your Voice AI as an API to manufacturers who build good quality speakers.

Everything else is pointless right now.