Hacker News new | past | comments | ask | show | jobs | submit login
OpenAI releasing new open model in coming months, seeks community feedback (openai.com)
106 points by georgehill 38 days ago | hide | past | favorite | 77 comments



For those wondering how to answer "what do you want to see from an open model" I put this in: an open weights end to end multimodal model, a model large enough to act as a functional teacher along with a range of nicely distilled smaller sizes, code repo to make training / finetuning easy.

As I write this, I'd also like to request a set of tool calling LLMs in various sizes. Feels to me like a small fast local tool calling assessor with a large context to support a lot of MCP functions would be very useful locally.


How about everything needed for full reproducibility?

This is what I would like to see


Seems unlikely. :) As is an end to end multimodal model, I'd guess. But we can ask!


Id hope they will also start to target machines with 128gb of unified ram now that we seem to have at least 3 options on that front.


I'd rather see more openness and the ability to run on commodity hardware. There are hundreds of options on that front.


I think that when learning privacy and being able to run things oneself is less of a problem.

Keeping ones work confidential becomes interesting once one starts doing new things, so while it would be fun, I don't think that's the most useful type of open model. I think code models are probably the most useful.


and guess what

they did announce they'll release open weights :)

https://x.com/sama/status/1906793591944646898?s=46&t=6NqVriD...

to be honest many of us never saw that coming... LOL


I don’t think it will be an end to end multimodal model, unless they’re holding that shocker for later - he says “language model” in the announcement. So basically o3 mini, not 4o multimodal.


Easy. A reasoning model with better performance than QWQ but at 21B (like Reka Flash 3) and good tooling call support. A model as “intelligent “ as Qwen2.5 but personality and creativity of Gemini (or Gemma at a minimum)


And also you should get prizes for using it


Something even cooler would be a model trained for 4 or less (1.33) bit weights instead of quantized after pretraining.

Math units are completely underutilized when I'm inferencing with batch size of 1, and post-training quantization under 8 bits loses too much of the precision to make a real difference compared to smaller models with higher precision.


Open model is such a misnomer: it's like calling an ELF an "open executable".

Distributing things for free doesn't make them "open". The reality is that free (as in free beer) weights are closer to freeware than anything open source. In fact, since all these models are build using pirated media a more appropriate term could be plain old warez.


Getting weights without the training set and training scripts still gives you a form that's modifiable by end users, a single person can fine-tune a model. Getting the training scripts and dataset gives you nothing useful unless you have millions to burn. "Open weights" are closer to the spirit of open source than training scripts and datasets.


That's true although I think it not quite the same as freeware in the sense that we can realistically (say) fine tune an open model in ways that don't quite apply to (say) someone sending you a binary


You can do interesting things with executable binaries as well, see WINE :)


This definitely feels like a move made out of desperation, but I admit I'm curious to see what they release.

It's fascinating that some of the best-funded startups of the 2020s are all rushing to commoditize their core technology as quickly as possible. They all seem to have the same business model as the Change Bank from SNL


Desperation? How so?

The cynical late-stage-capitalism argument for releasing an open source model is for PR/goodwill and hoping for the chance the model becomes a foundation model in the OSS community. But OpenAI definitely isn't operating around good PR and the chance of releasing a foundation model is unlikely given the competition.


Orgs and individuals are chasing sentiment to cash out while valuations are inflated (there has been some expectation deflation, see Microsoft terming datacenter leases and the CoreWeave IPO fizzle), but open models are still welcome (democratization).


Desperation because they face an existential risk from trillion dollar companies (and others) creating open models that are starting to catch up to their flagship.


Current open-weights models are not as multilingual as GPT3 or GPT4. I'd like to see support for more languages.


Gemma 3 is.


Q: How open?

Can it be used commercially? Is the training protocol going to be open? Or just the weights released like Llama models?


> Can it be used commercially?

OpenAI has been good about sticking to the commercially-friendly MIT License for their OSS. (e.g. Whisper/tiktoken)


It's a kind of an open secret that there's no 'training' protocol for these state of the art models.

Researchers behave like alchemists when training these models, and the actions are not really reproducible.


They could provide access to the training code. It's useful for training smaller models or distilling larger ones. They don't need to release every details involved in tuning the optimization parameters during the pre-training stage.


There is no training 'code' that will get you anything close to a usable result.


At this point I think it is safe to assume open means open weights only.


They specify "open-weights" later down the form


That doesn't mean anything in particular other than that the weights can be downloaded and run. What's interesting is the terms under which they can be used - there are quite a few "open weights" models that can only be used for non-commercial purposes without a paid license, like Command R from Cohere (cc-by-nc-4.0).


> Let's start with your details


Well obviously they will have thousands of applications from which they need to make a selection removing trolls, luddites, etc.

How would you do it without asking for some applicants' details?


Considering how much they trust their LLMs, why don't they just run o1-pro to make a summary of the responses given in the feedback


>they need to make a selection removing trolls, luddites, etc

Do the luddites opinions not count?


If you have a business selling hamburgers, does it make sense to ask vegetarians what should be on it?


I would think it makes some amount of sense if you think they're vegetarian for some moral reason and you think you could court them to become customers.


But why target a market that doesn't want you? Do you really think a burger place can some how override a person's morals to not eat meet? I realize were stretching the analogy here, but what is the point? Maybe if you already saturated the existing market and trying to grow. But then why saddle that market with the wants/needs of the polar opposite?


Given that they come in good faith (trolling is already excepted), I would ask. Burger toppings are predominately plant-based, so perfectly in their wheelhouse. In fact, I'd expect better suggestions from them than the average burger eater.


This seems like a stretch. Another example may be, "If I am throwing a party, should I ask people that I do not invite, what they want?". Sure you could say that they could approximate the wants of the people who _are_ going, but why not just ask the people who are going, directly? It's just noise, otherwise.


> why not just ask the people who are going, directly?

"If I had asked people what they wanted, they would have said faster horses"

You don't poll people to find out what they want. You poll people to gather their ideas. Ideas that you can then leverage to deliver what your intended audience wants, even when they didn't know that they wanted it!

If we assume this party you are throwing has 10 guests, you think you're going to get all the best ideas from those 10 specific people and nothing from the hundreds of people you could have asked? Maybe if you're throwing a party for professional party planners, but otherwise...


I think asking 10 people instead of 8 billion people is a more reasonable way to discover the preferences of those 10 people. That is not to say there is not interesting information at the margin, but it is the margin.


Why not ask the 10 people and 8 billion other people? The 10 people might have good ideas too, but no need to rely on them entirely. Most especially when you are OpenAI and can throw your language models at finding the useful information found in those 8 billion responses.

The artificial divide you are trying to create is unnecessary and not a reflection of the real world. Most people will gather input from as far as wide as they can. They might not be able to operate at anything close to OpenAI scale, but even Average Joe will turn to random strangers (e.g. on Facebook or Reddit) to get party ideas.


Because these populations are orders of magnitude apart in size, intent, and investment; and trying to find the diamond in the rough for a 10-person survey is a lot easier than an 8-billion person survey. How about if you own a burger joint in Kansas and you are being told you need to send out a survey to people in Peru, because there might be some good insight there? Sure, maybe there is, but this is not helpful advice.


Why wouldn't it be helpful? I am almost certainly going to get better ideas from people introducing me to Peruvian flavours than a bunch of "I like what is on the Big Mac" responses.

Sure, one of my customers in Kansas might have a brilliant idea that I've never considered, but much more likely I'll already be familiar with anything they can dream up.

1. They are going to come from much the same background as I.

2. They are apt to be home cooks at best, while a burger joint is expected to elevate.

Topping a burger is an implementation detail. Within reason, the customer doesn't really care about what is on the burger as long as it tastes good. In a similar vein, are you going to ask the expected users of your new cat meme app which programming language you should use?


I think we are going in circles. At the end of the day, I do not think it is reasonable to expect everyone to survey everyone, and take all of that information in and weight it equally; YMMV.


Where have you dreamt up this idea that they are going to get responses from everyone? I certainly won't be taking time to respond, and I have some interest in it. Most people I encounter on a daily basis don't even know what OpenAI is.


Well traditionally you're supposed to make them count if you're on the luddite side.


First step: don't talk to them, (one of you might get convinced)


To the tech sector's asking for input to help shape their vision? I would guess they don't.


I just don't think a word predictor is the future of AI.

It seems like a sloppy way to simulate intelligence.


If their plan is to sabotage things, then no their opinions should not count.


In this case I don't see a problem. If you want your opinion to hold weight then put your name/background/credentials/reputation behind it. Not every discussion needs to be fully anonymous and upvote-based.


OpenAI should release its frontier model as an open-weight model. There are already open-weight models that match OpenAI's best models (at the time of their release), so the idea that OpenAI would lose something by making its frontier models open-weight doesn't hold up. With an open-weight model, they would instantly kill any proprietary competitor, similar to what Google did to all its competitors with Android.

IMO, OpenAI should focus on tooling, infra, and setting standards for AI apps and profit from those. MCP is what "custom GPT" is supposed to be; OpenAI lost that battle, among many others.

Gopher started as a freely available protocol but later changed its licensing terms, requiring fees to be paid—similar to what OpenAI has done. We know how Gopher ended up: today, most people haven't even heard of it, despite its adoption at the beginning.


There are already open-weight models that match OpenAI's best models (at the time of their release), so the idea that OpenAI would lose something by making its frontier models open-weight doesn't hold up

If true, that cuts both ways though right?

It also means we lose nothing if OpenAI releases a closed weight model.


> There are already open-weight models that match OpenAI's best models (at the time of their release)

Did I miss something? Do we now have o1-pro level performance in an open source model?


Depends on your use case.

DeepSeek R1 is far better at producing maintainable, modularized code as a coding assistant as an example.

The big deal is that the distilled versions like DeepSeek-R1-Distill-Qwen-32B are good enough that anyone with a few old 1080 Ti's sitting around can run them and get most of the performance.

When you can run gemma3/qwq/DeepSeek-R1-Distill-Qwen-.../etc... you can easily switch models when one fails too.

And you have consistent performance that doesn't degrade over time, have the ability to avoid leaking prompt data between client etc...

It is all horses for courses though. For me o1-pro is roughly the same as o1 with just higher limits etc... but is still worse than o1-preview IMHO.

In my experience the few percentage points on synthetic benchmarks that o1-pro was claimed to have doesn't matter much in real world problems.

R1 pretty much matched o1-1217 on every benchmark and the distilled models like DeepSeek-R1-Distill-Qwen-32B only lost a tiny fraction.

A few months of o1-pro costs will get you a local usable model of GPUs if you are fine with ~20 eval tokens/sec.

But if o1-preview wasn't better for your use case than o1-proe...the calculus can change.


Not yet! But we have some close ones. For example, OLMo 2[1], OlympicCoder[2], and OpenThinker[3].

[1]: https://allenai.org/blog/olmo2-32B

[2]: https://huggingface.co/open-r1/OlympicCoder-32B#evaluation

[3]: https://www.open-thoughts.ai/blog/aiw


What's the major use case of open source models? The stable diffusion community seems pretty active. A lot of fine tuning to generate NSFW. What about LLM?


The biggest use case of open-source LLM is that your use of it is private and input is not sent to third-parties. For a personal perspective, it means it's free and accessible forever (and yes can be used for NSFW stuff), and from a business perspective, it matters both for legal reasons such as finetuning on propritary data and it mitigates the business liability of issues with the third-party LLM provider such as random API outages.


It's a huge compliance hurdle at my company to add any new vendor, so we almost exclusively use open models that we can run on our own hardware (or our rented cloud instances). Even just getting Bedrock enabled in one of our existing AWS accounts has been in the works for months.


When coupled with smaller parameters sizes, it enables BYOD, different cost scaling, and local inference.


Don’t kid yourself about the intentions. It’s only so enterprise customers can deploy a shitty chatbot on premise for their "secret data". The models will be free for commercial use until 1 mil turnover or something like that.


I mean for large enough buyers that was already an option, they wouldn't need to release a new model just for that.


I don't believe OpenAI provides on-premise serving.


Citation needed.


What would you like to see in an open-weight model from OpenAI? Explain what you would use it for

See... that's kinda the idea behind an open model. I don't have to explain what I would use it for.


Sounds like an open GPT-4o / o1 distill or something… And that they want to know what to go for.


The idea that they want to train a new custom model for open release instead of just... giving us GPT-3 already suggests a terrible start. I'm calling it now, this is a strategic counterplay against Google's Gemma models so @sama can sell Tim Cook a "frontier" local model that doesn't compete with anything coherent. A fig leaf for their "Open" identity and a paper tiger for the Apple Intelligence panoply.

OpenAI doesn't believe in Open Source, they merely want it's prestige without committing to it on-principle.


> instead of just... giving us GPT-3

If you're referring to the GPT-3 from 2020, modern open source models five years later are a) better at benchmarks b) much smaller yet still better at said benchmarks c) much, much cheaper/faster due to architectural improvements.

The real hard thing for OpenAI to do is to release an open-weights model that's better/more differentiated than Gemma 3 (at the small scale) or DeepSeek R1 (at the large scale)


But if OpenAI was Open, they'd open-source those old obsolete models. You're right that no one really wants it, and it has little to no commercial value at this point, but that's all the more reason they should just put it out there and actually live up to their name.


Is there anything about training a new model? I just assumed they were asking for all the little bits that companies forget to do at release with an open model.


[flagged]


The original comment says "their mentality about open-model is wrong, evident from their post". Does not say it's shady! Why to be cynical?


I'm surprised they expect people to voluntarily undergo that interrogation, for their feedback to then be processed by some LLM.


If you're interested in the field, why wouldn't you answer the questionnaire? It costs you basically nothing while the upside is getting something that is potentially at least a tiny bit more useful to you than if you hadn't said anything?


It takes time and will likely be ignored.


Why would it be ignored? They're the ones asking.


would you write a response just to be read by an LLM?


Sure, I do it everyday when I chat with AIs. Humans not reading our exchanges doesn't bother me particularly.




Consider applying for YC's Summer 2025 batch! Applications are open till May 13

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: