More

bigdict · 2025-04-28T17:33:47 1745861627

Has this been used widely since?

heyitsguay · 2025-04-28T17:43:21 1745862201

I ran a comparison of DINOv2 with and without registers on some image embedding tasks for work; DINOv2+registers saw a performance metric bump of 2-3%. Not nothing, not transformative, worth using when the only difference for inference is the model name string you're loading.

ashvardanian · 2025-04-28T17:51:36 1745862696

I can't speak for the whole industry, but we used it in older UForm <https://github.com/unum-cloud/uform> and saw good adoption, especially among those deploying on the Edge, where every little trick counts. It's hard to pin down exact numbers since most deployments didn't go through Hugging Face, but at the time, these models were likely among the more widely deployed by device count.

kombine · 2025-04-28T19:32:55 1745868775

For example, it is used here https://github.com/facebookresearch/vggt/

godelski · 2025-04-28T18:12:51 1745863971

bigdict · 2025-04-03T17:29:33 1743701373

Thank you so much for continuing to support Gemma 3 with these updates.

bigdict · 2025-04-03T17:24:33 1743701073

Amazing, I've been wishing for this! Do you have any estimates on how much accuracy is first lost then recovered compared to the original bf16 and the naively quantized models?

bigdict · 2025-04-02T22:56:18 1743634578

Sure, you can get better model performance by throwing more compute at the problem in different places. Does is it improve perf on an isoflop basis?

jwilber · 2025-04-02T23:00:14 1743634814

There’s no one-size-fits-all answer here, but in my experience, for long contexts, perf for conv-based methods outperforms strictly attention-based methods. See evo2:

“With the current implementation of Evo2, we do not have the heavily optimized kernels in place for convolution operators like we do for attention layers in a model like llama2. Even with this shortcoming, we see that the benefit from including more convolutional layers makes up for the earlier stage of optimization at around the 64k context length. Beyond that point we see an improvement in performance even compared to a highly optimized transformer model.“

https://docs.nvidia.com/bionemo-framework/latest/models/evo2...

Reubend · 2025-04-03T00:55:16 1743641716

It's a valid criticism that this method would increase compute requirements, but sometimes an improvement in the end result justifies the compute needed. For things like code generation in large datasets, many people would be willing to "pay" with more compute if the results were better. And this doesn't seem to require more memory bandwidth, so it could be particularly good for local models.

eightysixfour · 2025-04-02T23:58:46 1743638326

That's... not always a given for SOTA sized models. When the ROI on more training stops, it is nice to have alternatives, whether that is RL-tuned reasoning models or alternative architectures that improve specific areas of weakness.

fabmilo · 2025-04-03T02:27:42 1743647262

I read the paper and the results don't really convince me that is the case. But the problem still remains of being able to use information from different part of the model without squishing it to a single value with the softmax.

bigdict · 2025-03-31T20:36:30 1743453390

Gemma 3 is.

bigdict · 2025-03-13T02:38:21 1741833501

* the code is at https://github.com/google-deepmind/gemma

* you download the weights at https://www.kaggle.com/models/google/gemma-3/

bigdict · 2025-02-18T21:50:56 1739915456

"undo their disgusting propaganda, apply our beautiful correct opinions"

This is so cringe.

bigdict · 2025-02-12T03:18:04 1739330284

Why is it timely?

nickthegreek · 2025-02-12T03:22:50 1739330570

Google and Apple just updated the name of an international body of water based on the demand of a single person throwing his weight around. No one asked for this change. It has no backing by locals. It wasn’t even a thing. This wasn’t a culture war issue, it’s a flexing of power by an old delusional man. AP has been the only corporate organization to stand up to this nonsense that I’ve seen.

Not to mention the cdc scrubbing and such that judges are now overturning and demanding information returned to the public.

tomrod · 2025-02-12T03:24:55 1739330695

A great example of the many system abuses over the past several weeks.

People did not vote for the Dark Enlightenment nor the Butterfly Revolution. Want to gut USAID? Work with Congress and pass a law. Want to strip Social Security? Samesies.

That is the rule of law. If you are a representative of the people, you follow the law.

throw0101d · 2025-02-12T03:32:20 1739331140

> Want to gut USAID? Work with Congress and pass a law. Want to strip Social Security? Samesies.

Heck, want to get rid of the useless penny (like we Canadians did years ago)? Change the law:

* https://www.law.cornell.edu/uscode/text/31/5111

drekipus · 2025-02-12T03:34:50 1739331290

[flagged]

tomrod · 2025-02-12T04:01:25 1739332885

That is not how the US system is set up. If that is performed, then the word coup applies.

insane_dreamer · 2025-02-12T03:34:44 1739331284

Of all the wrong things Trump/Musk are doing , renaming the Gulf was arguably the most benign. But it does send a powerful signal of “I can do whatever the f I want”

JKCalhoun · 2025-02-12T03:44:22 1739331862

Also kind of sends a signal, "I am even pettier than you imagined."

I ask again, where are the adults?

intended · 2025-02-12T05:43:47 1739339027

Dead. Fox News+Republican Orwellian state on one side, media competition for attention on the other.

weaksauce · 2025-02-12T03:40:45 1739331645

are you sure about that? it makes it so that he can approve drilling in the gulf because it's not named the same. or at least that's his idea to subvert biden's drilling ban... we'll see if it is the case.

insane_dreamer · 2025-02-12T05:20:48 1739337648

interesting; hadn't thought of that angle. but he could just do an EO to reverse the drilling ban, as he did with Alaska. I don't think he needs to rename the gulf of Mexico to accomplish that.

__MatrixMan__ · 2025-02-12T04:24:24 1739334264

What better place than here, what better time than now?

fknorangesite · 2025-02-12T03:34:44 1739331284

Don't feign ignorance.

UncleOxidant · 2025-02-12T03:45:31 1739331931

I mean, I guess it's possible they've been really busy with other stuff and are just now looking at the internet after some kind of six-month death march at work. (but probably not likely)

ModernMech · 2025-02-12T03:53:48 1739332428

lol it's like a reverse Rip Van Winkle -- goes to bed and wakes up as if the revolution against monarchy never happened.

fknorangesite · 2025-02-16T22:16:43 1739744203

> six-month

ten-year

tbrownaw · 2025-02-12T03:46:43 1739332003

Please state explicitly for the record why it's timely, in order that anyone who cares can compare that reason against the posting guidelines to decide how seriously to take it.

.

A bit wordier, but hopefully harder to denounce on the pretext that it's playing dumb.

onetokeoverthe · 2025-02-12T03:19:32 1739330372

because this is the time

bigdict · 2025-02-04T17:11:32 1738689092

Have you seen the RFK confirmation hearing?

bigdict · 2025-01-28T22:32:40 1738103560

You are quoting Wikipedia, not US foreign policy :)