I ran a comparison of DINOv2 with and without registers on some image embedding tasks for work; DINOv2+registers saw a performance metric bump of 2-3%. Not nothing, not transformative, worth using when the only difference for inference is the model name string you're loading.
I can't speak for the whole industry, but we used it in older UForm <https://github.com/unum-cloud/uform> and saw good adoption, especially among those deploying on the Edge, where every little trick counts. It's hard to pin down exact numbers since most deployments didn't go through Hugging Face, but at the time, these models were likely among the more widely deployed by device count.
Amazing, I've been wishing for this! Do you have any estimates on how much accuracy is first lost then recovered compared to the original bf16 and the naively quantized models?
There’s no one-size-fits-all answer here, but in my experience, for long contexts, perf for conv-based methods outperforms strictly attention-based methods. See evo2:
“With the current implementation of Evo2, we do not have the heavily optimized kernels in place for convolution operators like we do for attention layers in a model like llama2. Even with this shortcoming, we see that the benefit from including more convolutional layers makes up for the earlier stage of optimization at around the 64k context length. Beyond that point we see an improvement in performance even compared to a highly optimized transformer model.“
It's a valid criticism that this method would increase compute requirements, but sometimes an improvement in the end result justifies the compute needed. For things like code generation in large datasets, many people would be willing to "pay" with more compute if the results were better. And this doesn't seem to require more memory bandwidth, so it could be particularly good for local models.
That's... not always a given for SOTA sized models. When the ROI on more training stops, it is nice to have alternatives, whether that is RL-tuned reasoning models or alternative architectures that improve specific areas of weakness.
I read the paper and the results don't really convince me that is the case. But the problem still remains of being able to use information from different part of the model without squishing it to a single value with the softmax.
Google and Apple just updated the name of an international body of water based on the demand of a single person throwing his weight around. No one asked for this change. It has no backing by locals. It wasn’t even a thing. This wasn’t a culture war issue, it’s a flexing of power by an old delusional man. AP has been the only corporate organization to stand up to this nonsense that I’ve seen.
Not to mention the cdc scrubbing and such that judges are now overturning and demanding information returned to the public.
A great example of the many system abuses over the past several weeks.
People did not vote for the Dark Enlightenment nor the Butterfly Revolution. Want to gut USAID? Work with Congress and pass a law. Want to strip Social Security? Samesies.
That is the rule of law. If you are a representative of the people, you follow the law.
Of all the wrong things Trump/Musk are doing , renaming the Gulf was arguably the most benign. But it does send a powerful signal of “I can do whatever the f I want”
are you sure about that? it makes it so that he can approve drilling in the gulf because it's not named the same. or at least that's his idea to subvert biden's drilling ban... we'll see if it is the case.
interesting; hadn't thought of that angle. but he could just do an EO to reverse the drilling ban, as he did with Alaska. I don't think he needs to rename the gulf of Mexico to accomplish that.
I mean, I guess it's possible they've been really busy with other stuff and are just now looking at the internet after some kind of six-month death march at work. (but probably not likely)
Please state explicitly for the record why it's timely, in order that anyone who cares can compare that reason against the posting guidelines to decide how seriously to take it.
.
A bit wordier, but hopefully harder to denounce on the pretext that it's playing dumb.
reply