Hacker Newsnew | past | comments | ask | show | jobs | submit | more espadrine's commentslogin

The blind test at lmarena.ai does give it a higher Elo than GPT-4o (API), Claude, and Gemini 1.5 Pro. It seems that people do enter real-life scenarios in the arena.


Hernan Moraldo is from Argentina. That may be all there is to it.


> I'm quite nervous for the future.

Videos like these were already achievable through VFX.

The only difference here is a reduction in costs. That does mean that more people will produce misinformation, but the problem is one that we have had time to tackle, and which gave rise to Snopes and many others.


Philosophically, it always bugged me that distros were so centrally in control of packaging: app developers are not allowed to package them in the way they see fit, which has caused some friction in the past[0]. It seems healthier and more scalable if each developer could package it autonomously without work from the central maintainer group.

KDE Project Banana mentions little about packaging, apart from this:[1]

> Instead of legacy packages we target modern deployment systems such as flatpak and systemd-sysext

I hope they succeed in that respect. Flatpak has the promise of decentralizing distribution. That said, my experience of it so far has been brittle, with things like changing your locale or shell prompt causing your apps to freeze[2]. I hope they make a stable operating system nonetheless, but it is not easy.

[0]: https://news.ycombinator.com/item?id=41407768#41408628

[1]: https://community.kde.org/KDE_Linux

[2]: https://github.com/flatpak/flatpak/issues/5564


> It seems healthier and more scalable if each developer could package it autonomously without work from the central maintainer group.

> Flatpak has the promise of decentralizing distribution. That said, my experience of it so far has been brittle, with things like changing your locale or shell prompt causing your apps to freeze[2]. I hope they make a stable operating system nonetheless, but it is not easy.

That's not a coincidence; you just got rid of the pesky maintainers who made sure that the collective system worked, leaving only a bunch of separate apps that are free to do whatever they want.

On the same note, this ensures that security updates are decentralized, so instead of running 'apt upgrade' and being done, you get to hope that every dev providing you with packages updates their dependencies (which is also harder because flatpak encourages vendoring as much as possible).


When Meta prevented the EU from using meta.ai or even downloading its vision models, I sunk my head in the AI legistation.

Here, I am honestly not sure which part they rely on, to say that what they made might be unlawful.

The closest thing I found for Meta was that “emotion recognition systems” are classified as high-risk (paragraph 54), and high-risk systems must have their training data disclosed (Art 11(1))[0]. In theory, you could upload photos to meta.ai and ask it what emotions are displayed, but it is already a stretch. For GenChess, I’m at a loss; it doesn’t sound like you can do that. (Not that it prevented any vision chatbot from releasing.)

If someone has a better guess as to why they might have restrained it here, I am curious.

[0]: https://eur-lex.europa.eu/legal-content/EN/TXT/?uri=CELEX%3A...


It's simple. Big tech doesn't like regulation. By pretending the regulation won't let them release "all these cool things" they can turn public opinion against regulation.


See: Cookie Banners.

Nothing in the law requires a banner. It could even be handled in the browser by letting people choose what third-party cookies to accept (or none, hence the problem) and having that be negotiated during page load.

It's nice the law is being interpreted to require to be as easy to reject all local storage of other's data as it is to accept all local storage of other's data.


> Furthermore, by rotating the vector, we have absolutely zero impact on the norm of the vector, which encodes the semantic information of our token.

Doesn’t the angle encode semantic information? Cosine similarity works for embeddings after all.


The academic paper: https://www.nature.com/articles/s41586-024-08025-4

They use the last N prefix tokens, hash them (with a keyed hash), and use the random value to sample the next token by doing an 8-wise tournament, by assigning random bits to each of the top 8 preferred tokens, making pairwise comparisons, and keeping the token with a larger bit. (Yes, it seems complicated, but apparently it increases the watermarking accuracy compared to a straightforward nucleus9 sampling.)

The negative of this approach is that you need to rerun the LLM, so you must keep all versions of all LLMs that you trained, forever.


They actually run 2^30-way tournament (they derive an equivalent form that doesn't requires 2B operations). You do not need to run the LLM, it only depends on the tokenizer.


You’re right. I understood it to require taking the top 2^30 tokens, but instead they sample 2^30 times with replacement.

Too bad they only formulate the detection positive rate empirically. I am curious what the exact probability would be mathematically.


Why do you need to rerun the LLM? Watermark detection only requires the hash functions (equation (1) from the paper).


That makes me wonder though what the best loss function was. I assume you used MSE on the logscore. I wonder if a sigmoid on which of two articles has the higher score would yield better results for the downstream RLHF task.


It takes no time at all to find other major mistakes. For instance, the Mixtral diagram § 6.6.1 shows a single router that selects separate 32-layer transformers. Instead, Mixtral has one router per layer (inside of each block), and it doesn’t select a transformer block: it selects a feedforward.


Terminology.

Since they targeted very low risk, they did a geographically-segmented rollout, starting with Phoenix, which is one of the easiest places to drive: a lot of photons for visibility, very little rain, wide roads.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: