A bit off-topic, but that comparison graph is a great example why you should buy...

peterkos · 2024-11-30T06:59:54 1732949994

That's a pretty egregious mistake for a designer to make -- and that's not even mentioning the lack of accessibility. WebAIM's contrast checker says it's a 1:1 contrast ratio!

If someone is releasing a model that claims to have a level of reasoning, one would hope that their training dataset was scrutinized and monitored for unintended bias (as any statistical dataset is susceptible to: see overfitting). But if the graph on the announcement page is literally unreadable to seemingly anyone but the creator... that's damning proof that there is little empathy in the process, no?

parhamn · 2024-11-30T17:20:30 1732987230

> that's damning proof that there is little empathy in the process, no?

No.

hmottestad · 2024-11-30T18:04:56 1732989896

I wouldn’t say it’s implied, but there’s a reason people put on nice clothes for an interview.

I’m looking at the graphs on my phone and I’m pretty sure that there are 5 graphs and 3 labels. And their 8B model doesn’t seem to be very good, looks like a 20B model beats it in every single benchmark.

pavlov · 2024-11-30T08:34:17 1732955657

The body text is also quite hard to read because the font has a tall x-height and line spacing is very tight.

This makes paragraphs look very dense, almost like it was set in uppercase only, because the lowercase letters don’t create a varying flow between lines for the eye to follow.

The model may be good, but the web design doesn’t win any prizes.

AYBABTME · 2024-11-30T06:42:27 1732948947

Even on a high quality screen, it's a bit much.

lolinder · 2024-11-30T15:25:39 1732980339

Also, is it standard practice to obfuscate which models you're benchmarking against? They're just labeled Model A-D, with sizes but no additional information.

sigmoid10 · 2024-11-30T16:10:21 1732983021

Given the context, it appears they are not benchmarking against other models but comparing differently sized versions of the same model. The 8B one is just the one they decided to give a catchy name. The other ones are probably also just fine tuned Llama models. But without information on the total compute budget (i.e. nr. of trained tokens), this kind of plot is pretty useless anyways.

lolinder · 2024-11-30T19:58:31 1732996711

That doesn't make any sense, because their 8B is listed as benchmarking above the 13B "model A".

sigmoid10 · 2024-12-02T08:16:25 1733127385

That's why it is very likely it has seen more tokens during training and why the plot is worthless.

xena · 2024-11-30T14:57:50 1732978670

I have an iPhone 15 Pro Max and it took me five glances to see the last bar. Use black outlines or something idk

imachine1980_ · 2024-11-30T15:18:12 1732979892

i sadly don't feel this is a mistake, the transparent once are the two that beat the model in one category or more, its feels more like scam than error, if not please fix it