Hacker News new | past | comments | ask | show | jobs | submit | idiliv's comments login

Wait, but we're doing that already, and it works well (Qwen 2.5 VL)? If need be, you can always resort to structured generation to enforce schema conformity?


Duplicate, posted on October 9: https://news.ycombinator.com/item?id=41784591


Where do you see the MMLU-Pro evaluation for Llama 3.2 90B? On the link I only see Llama 3.2 90B evaluated against multimodal benchmarks.


Ah you're right I totally misread that!


Is the "Ultra Deep" analysis worth it over the standard "Deep" analysis?


When you are interested in heath issues, probably yes. For hobbyist the standard coverage will be enough.


In the demo, O1 implements an incorrect version of the "squirrel finder" game?

The instructions state that the squirrel icon should spawn after three seconds, yet it spawns immediately in the first game (also noted by the guy doing the demo).

Edit: I'm referring to the demo video here: https://openai.com/index/introducing-openai-o1-preview/


Yeah, now that you mention it I also see that. It was clearly meant to spawn after 3 seconds. Seems on successive attempts it also doesn't quite wait 3 seconds.

I'm kind of curious if they did a little bit of editing on that one. Almost seems like the time it takes for the squirrel to spawn is random.


How are flexible working hours equivalent to more money?


If you don't have to pay for child care because you can just take off the time to pick up your kids from school, you are saving money your job otherwise forced you to spend.

Being forced to commute at peak time costs more - on a retail salary the difference between a peak and off-peak ticket can mean that the first hour or two of working is essentially pointless, as you're just paying back the cost of getting there in the first place.

Having to pay an extra surcharge to visit the dentist, because you can only go on a Saturday because you need to be at work other days. Flexible working would allow you to just take the Tuesday off no problem and go when it's cheaper.

I'm sure there are lots of other examples that apply to different lifestyles.


hours with your child aren't fungible. you can't pay the babysitter to go see the dance recital for you if you want to be the parent instead of the babysitter being the parent. all the money in the world isn't going to make up for missing the soccer game where your kid makes the winning goal.


Well to be pedantic, with all the money in the world you wouldn't be working for Ikea and the problem wouldn't exist so really that's a problem also solved by money...

Generally though higher paid employees tend to have more sway within a company structure and likely don't need to miss these important events, the win here is that something that was generally true for mid management up for most companies now extends down through all the ranks.


You can manage things in your life when they occur instead of spending money to displace them or risk losing your job because of them.

In another way, if you present a worker with the option between two jobs with the same hourly rate, one having flexible working hours and the other not, which would expect to be more likely choice? You can then measure the value of this choice by changing the hourly rates between the two until you see changes in outcome and you would be able to estimate exactly how much "more money" it appears to be "worth."


You can rent them online for ~ 4-5 $ per hour per GPU. Not cheap, but definitely feasible as a weekend project.


where can I rent a H100 for 4-5 dollars an hour?

AWS doesn't let you use p5 instances (not getting a quota as a private person), lambda cloud is sold out.


It looks like Runpod currently (checked right now) has "Low" availability of 8x MI300 SXM (8x$4.89/h), H100 NVL (8x$4.39/h), and H100 (8x$4.69/h) nodes for anyone w/ some time to kill that wants to give the shootout a try.


We'd be happy to provide access to MI300X at TensorWave so you can validate our results! Just shoot us an email or fill out the form on our website


If you're able to advertise available GPU compute in some public forums then it's enough to tell us about the demand of MI300X in cloud ...


You're joking/trolling right? There are literally 10's of thousands of H100s available on gpulist right now, does that mean there's no cloud demand for Nvidia gpus? (I notice from your comment history that you seem to be some sort of bizarre NVDA stan account, but come on, be serious)


In Mixtral 8x7B, the 8 means that the model uses Mixture-of-Experts (MoE) layers with 8 experts. The 7B means that if you were to remove 7 of the 8 experts in each layer, then you would end up with a 7B model (which would have exactly the same architecture as Mistral 7B). Therefore, a 1x7B model has 7B params. An 8x7B model has 1 * 7B + (8-1) * sz_expert params, where sz_expert is some constant value that the MoE layers increase by when adding one expert. In the case of Mixtral 8x7B the model size is 46.3GB, so, sz_expert ≈ 5.6B.

If these assumptions port over to 8x22B, then 8x22B has, at 281GB, sz_expert ≈ 13.8B.


I tried to check this for myself.

I agreed for the first one, (46.3 - 7) / 7 = 5.61b.

The second one doesn't match up, (281 - 22) / 7 = 37b or (140.5 - 22) / 7 = 16.92b. Am I doing something wrong?


Just tried this again and I also arrive at 16.92B. Not sure what I did wrong the first time, thanks for double-checking this!


Oh, and to answer your actual question: Assuming that the model is released with 16 bits per parameter, then it as 281GB / 16 bit = 140.5 parameters.


Hi Martin! It's Robert from Cambridge (you were my DOS :)). Glad to see your name pop up on HN!


Hi there, good to hear from you! :)


People here seem mostly impressed by the high resolution of these examples.

Based on my experience doing research on Stable Diffusion, scaling up the resolution is the conceptually easy part that only requires larger models and more high-resolution training data.

The hard part is semantic alignment with the prompt. Attempts to scale Stable Diffusion, like SDXL, have resulted only in marginally better prompt understanding (likely due to the continued reliance on CLIP prompt embeddings).

So, the key question here is how well Sora does prompt alignment.


The real advancement is the consistency of character, scene, and movement!


There needs to be an updated CLIP-like model in the open-source community. The model is almost three years old now and is still the backbone of a lot of multimodal models. It's not a sexy problem to take on since it isn't especially useful in and of itself, but so many downstream foundation models (LLaVA, etc.) would benefit immensely from it. Is there anything out there that I'm just not aware of, other than SigLIP?


I agree.

I think one part of the problem is using English (or whatever natural language) for the prompts/training. Too much inherent ambiguity. I’m interested to see what tools (like control nets with SD) are developed to overcome this.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: