Hacker News new | past | comments | ask | show | jobs | submit | bigdict's comments login

Has this been used widely since?

I ran a comparison of DINOv2 with and without registers on some image embedding tasks for work; DINOv2+registers saw a performance metric bump of 2-3%. Not nothing, not transformative, worth using when the only difference for inference is the model name string you're loading.

I can't speak for the whole industry, but we used it in older UForm <https://github.com/unum-cloud/uform> and saw good adoption, especially among those deploying on the Edge, where every little trick counts. It's hard to pin down exact numbers since most deployments didn't go through Hugging Face, but at the time, these models were likely among the more widely deployed by device count.

For example, it is used here https://github.com/facebookresearch/vggt/

yes

Thank you so much for continuing to support Gemma 3 with these updates.


Amazing, I've been wishing for this! Do you have any estimates on how much accuracy is first lost then recovered compared to the original bf16 and the naively quantized models?


Sure, you can get better model performance by throwing more compute at the problem in different places. Does is it improve perf on an isoflop basis?


There’s no one-size-fits-all answer here, but in my experience, for long contexts, perf for conv-based methods outperforms strictly attention-based methods. See evo2:

“With the current implementation of Evo2, we do not have the heavily optimized kernels in place for convolution operators like we do for attention layers in a model like llama2. Even with this shortcoming, we see that the benefit from including more convolutional layers makes up for the earlier stage of optimization at around the 64k context length. Beyond that point we see an improvement in performance even compared to a highly optimized transformer model.“

https://docs.nvidia.com/bionemo-framework/latest/models/evo2...


It's a valid criticism that this method would increase compute requirements, but sometimes an improvement in the end result justifies the compute needed. For things like code generation in large datasets, many people would be willing to "pay" with more compute if the results were better. And this doesn't seem to require more memory bandwidth, so it could be particularly good for local models.


That's... not always a given for SOTA sized models. When the ROI on more training stops, it is nice to have alternatives, whether that is RL-tuned reasoning models or alternative architectures that improve specific areas of weakness.


I read the paper and the results don't really convince me that is the case. But the problem still remains of being able to use information from different part of the model without squishing it to a single value with the softmax.


Gemma 3 is.



"undo their disgusting propaganda, apply our beautiful correct opinions"

This is so cringe.


Why is it timely?


Google and Apple just updated the name of an international body of water based on the demand of a single person throwing his weight around. No one asked for this change. It has no backing by locals. It wasn’t even a thing. This wasn’t a culture war issue, it’s a flexing of power by an old delusional man. AP has been the only corporate organization to stand up to this nonsense that I’ve seen.

Not to mention the cdc scrubbing and such that judges are now overturning and demanding information returned to the public.


A great example of the many system abuses over the past several weeks.

People did not vote for the Dark Enlightenment nor the Butterfly Revolution. Want to gut USAID? Work with Congress and pass a law. Want to strip Social Security? Samesies.

That is the rule of law. If you are a representative of the people, you follow the law.


> Want to gut USAID? Work with Congress and pass a law. Want to strip Social Security? Samesies.

Heck, want to get rid of the useless penny (like we Canadians did years ago)? Change the law:

* https://www.law.cornell.edu/uscode/text/31/5111


[flagged]


That is not how the US system is set up. If that is performed, then the word coup applies.


Of all the wrong things Trump/Musk are doing , renaming the Gulf was arguably the most benign. But it does send a powerful signal of “I can do whatever the f I want”


Also kind of sends a signal, "I am even pettier than you imagined."

I ask again, where are the adults?


Dead. Fox News+Republican Orwellian state on one side, media competition for attention on the other.


are you sure about that? it makes it so that he can approve drilling in the gulf because it's not named the same. or at least that's his idea to subvert biden's drilling ban... we'll see if it is the case.


interesting; hadn't thought of that angle. but he could just do an EO to reverse the drilling ban, as he did with Alaska. I don't think he needs to rename the gulf of Mexico to accomplish that.


What better place than here, what better time than now?


Don't feign ignorance.


I mean, I guess it's possible they've been really busy with other stuff and are just now looking at the internet after some kind of six-month death march at work. (but probably not likely)


lol it's like a reverse Rip Van Winkle -- goes to bed and wakes up as if the revolution against monarchy never happened.


> six-month

ten-year


Please state explicitly for the record why it's timely, in order that anyone who cares can compare that reason against the posting guidelines to decide how seriously to take it.

.

A bit wordier, but hopefully harder to denounce on the pretext that it's playing dumb.


because this is the time


Have you seen the RFK confirmation hearing?


You are quoting Wikipedia, not US foreign policy :)


Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: