Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Let me clear a huge misunderstanding (twitter.com/ylecun)
114 points by YeGoblynQueenne on Feb 18, 2024 | hide | past | favorite | 104 comments



The main thing I'm confused about by these comments is that as far as I understand, the Sora model (and many others like it) performs the diffusion process in latent space, and then translates this to pixels.

So it's strange to me to claim that it doesn't have an abstract representation.

But, maybe the latent space of a diffusion-VQVAE pipeline is fundamentally different from that of JEPA, I haven't read the relevant papers for that. Curious if someone could explain if they are different ideas of representation.


The claim, with admittedly limited consensus, is that a difference in degree becomes a difference in kind: we’ve known “forever in AI years” that restricting the inputs, and loss, and parameter dimensionality is right up there with the underlying architecture in terms of the properties those latent geometries exhibit. It’s even kind of a meme: the fingers problem (which neither Sora nor V-JEPA partisans seem to be claiming is “solved”).

I’m a little oversubscribed at the moment so I haven’t downloaded the weights and played around with V-JEPA. The fact I’m pointing out that I could should make it pretty clear which way I lean on this: everyone wants to make money, let’s be real, but some seem resigned to “infinite, government-enforced monopoly via OpenPhilanthropy bribery” isn’t going to fly, so let’s at least prove how the sausage is made.

It’s fairly uncontroversial that things in the neighborhood of a “bottlenecked” VAE are often forced to exploit structure if it’s there.

This (claimed) result is about a way to exploit more of this structure/symmetry economy (greedy-ish optimizer) to pull the latent representations into yet a higher “effective regime” than yet demonstrated, with excellent properties around machine economics.

Representation learning isn’t new (though LeCun is an acknowledged pioneer in it), and constraints as a powerful tool isn’t new (causal making in attention architectures shouldn't ruffle many feathers).

Self/semi/unsupervised learning isn’t novel. But likewise, not are they k-equivalent synonyms: Dean et al. were distinguishing between Continuous Bag of Words and skip-gram in word2vec in 2013.

But it does cool shit (I like the I-JEPA reconstructions as a go to slide deck raster), and it’s available weight.

I think the truth shall soon emerge.


I feel LeCun got roped in debating the likes of Marcus and Yudkowsky. This has made his arguments lose nuance and become rigid. I also can't escape the feeling that if Facebook was tuned into Transformers, they would have shipped earlier, so there must have been some resistance or underestimation that's now repeated "They can't reason", "They can't plan", "They can't understand the world", "They are a distraction / side road to AGI".

It is kind of ironic that researchers who claim LLMs lack adaptive intelligence seemingly refuse to adapt their intelligence to LLMs. If even GPT-3 can find logical holes or oversimplification in your arguments about GPTs, at one point this starts becoming embarrassing and unbecoming.

> The generation of mostly realistic-looking videos from prompts does not indicate that a system understands the physical world.

While arguably true, it also does not indicate that a system does not understand the physical world (reflections, collision detection, gravity, object permanence, long-term scene coherence, etc.).

If LeCun wants to argue it does not understand the physical world, he should do so directly. Not attack something that is not directly stated, but rather convincingly and tentatively demo'd (I myself find it hard to argue that a system that generates novel pond reflections has not memorized/stored in weights some generalization program to apply to realistic scene generation).

This demo shows it is not even a wild prediction to guess that soon (consumer tech) we will be able to discuss visual scenes with conversational AIs.


> long-term scene coherence,

FWIW none of the video models released so far demonstrate any object coherence whatsoever, which suggests they don't have the higher level capabilities you mention yet.

In Sora, as soon as an object is obstructed by an obstacle or goes offscreen, it's likely to disappear or be radically transformed.


You've seen the demos of a couple holding hands and walking, or the museum shots where all the paintings maintain coherence, or a woman temporarily obscuring a street sign. Or you haven't seen those demos. Either way...


In the couple holding hands videos you see the people walking in front duck into a wall and disappear, the girl in front walks into the fence and disappears, another girl walks straight through that fence. These aren't just issues with forgetting, it completely doesn't understand how things works. It draws a fence but then doesn't understand that people can't walk through it.

Or the video with the dog, that dog phases straight through those window shutters as if they weren't there and were rendered in layers rather than 3d. It doesn't understand the scenes it draws at all, it had shadows from those shutters so they were drawn to have depth, but that dog then were rendered on top of those shutters anyway and moved straight through them. You even see their shadows overlap since the shadow part is treated differently apparently, so it "knows" they overlap but also renders the dog on top, telling me that it doesn't really know any of that at all and is just based on guessing based on similar looking data samples.

And this in videos handpicked because they were especially good. We should expect the videos we are able to generate to be way worse than the demo in general. They didn't even manage to make a dog that moves between windows without such bugs, that was the best they got and even that was had a very egregious error for a very short clip.


The primary thing I noticed is that it doesn't quite seem to grasp that time runs forward either. In the video with the dogs in the snow you can see snow getting kicked up in reverse. I.E. snow arcs through the air and lands right as a paw gets placed.

Kind of made me wonder how these videos would look run backwards, but not enough to figure out how to make them run backwards.

EDIT: wow, the "backwards" physics is especially noticeable in the chair video[0]. Aside from the chair morphing wildly, notice how it floats and bounces around semi-physically. Clearly some issues grasping cause and effect.

[0] https://www.youtube.com/watch?v=lfbImB0_rKY


If the "couple holding hands and walking" one is the "Beautiful, snowy Tokyo city is bustling. ..." look at the traffic on the left side of the frame:

https://www.youtube.com/watch?v=ezaMd4l_5kw

We also have the spontaneous creation and annihilation of wolves and the shape-shifting chair:

https://www.youtube.com/watch?v=jspYKxFY7Sc

https://www.youtube.com/watch?v=lfbImB0_rKY


That chair is fucking wild.

The more I watch the cherry blossom one, the more I see how wrong it is, even the fact there is Cherry blossoms in the middle of winter is just totally wack. I've seen it snow in Tokyo before during spring when the cherry blossoms were out, but you don't have a foot of snow on the roof like in the clip.

Edit: I know the prompt asked for the cherry blossoms in snow, but it's still a wild amount of snow which is somehow not covering the trees.


If we are talking analogies, this is just Sora forgetting because of limitations of how the network handles the autoregressive dynamics. When they make a bigger version of Sora this will happen less. Sora aleady has unprecedented object permanence, see the woman walking in Tokyo scene where signs and people are reconstructed after two seconds of occlusion. Soon we will have object permanence following ten or more seconds of occlusion. Then a minute. Then three minutes. Then we will figure out a trick to store long term memory. What will people say then?


Our brain can't work that in our long-term memory btw, that why each time we remember something, we change minor aspects of said thing.


It's also what happens when we dream: everything is fluid. Things appear and disappear, people and places become someone or somewhere else, reading is difficult and hands are distorted.


Because these systems are dreaming about their datasets. Or hallucinating about it, as people have decided to call lately. I won't say this is a dead end. I will say we are very, very short of any sort of actual intelligence.


More like: "Let me clear a huge misunderstanding here by using poorly-defined terms and cramming niche complex ideas elaborated elsewhere into this tweet."

Someone correct me, but it seems like he's saying:

1- generative models don't understand the real world

2- generative models that work off of just pixels are more expensive and less useful than a model that represents the contents of the frame with abstract representations, particularly when related to higher-level actions like applying logic to a scene and not just interpolating between frames.

3- V-JEPA is such a model, and it performs particuarly well against generative when combining those representations as inputs with a much more easily trained and small model trained on specific tasks.


I'm nowhere near an expert but it seemed like he was claiming true understanding of latent space is needed for generating coherent continuations, but Sora demo already has longish videos that are coherent. It's hard for me not to think this is someone just trying to still be right when they are wrong, but I may misunderstand.


The way i understand it is that Sora is mostly just 'moving pictures' with no rhyme or reason. Yann Lecun is interested in videos that tell a 'story', with cause and effect. Like a magician putting his hand into a top hat and pulling out a rabbit, kind of video.


Yeah, the same way ChatGPT is only predicting the next word with no rhyme or reason. However to actually predict the next word so that the entire sentence makes sense and is relevant for the context (e.g. answers a question) you probably must be actually understanding the meaning of the words and the language and have a world model. I can't imagine a NN moving around pixels in the shape of a cat with no understanding whatsoever of what a cat is, what it can do, how it is supposed to move, what it wants, etc.


Of course. But stable diffusion can do that. It understands what a cat is and can draw cat wearing a hat. Videos are about actions, cause and effect, which is entirely different thing than still pictures.


It doesn't understand a cat at all. Humans understand, models are deterministic functions with some randomness added in. Just because it appears to understand doesn't make it so.


What is understanding though? I 'appear to understand' what a cat is, why does being human make it that I do so? What is the difference between making the correct associations and actually 'understanding' anyway?


Understanding, in this context, is what humans do, by definition. Until you have a concrete definition of what understanding is you can't apply it to anything else. Informal definitions of understanding by those who experience it aren't very useful at all.


I think that's my point? You were willing to say it doesn't apply without a definition?


I'm saying if you want to apply it outside human experience you need an concrete definition otherwise you can call anything you like 'understanding' which is what's happening.


If for you the definition of "understanding" is "something that only humans can do" then your statement about AI is totally pointless: of course AI doesn't "understand", but at the same time it might do something that is perfectly equivalent and that "only machines can do".


> something that is perfectly equivalent

So go ahead and define it, in concrete terms, external to humans. It can't be equivalent unless there is a definite basis for equivalence. Cat videos don't cut it.

My point is that understanding, as we know it, only exists in the human mind.

If you want to define something that is functionally equivalent but implemented in a machine, that is absolutely fine, but don't point to something a machine does and say "look it's understanding!" without having a concrete model of what understanding is, and how that machine is, in concrete terms, achieving it.


Nope, sorry. You said you have no idea of what understanding is, except that by definition it can only be done by humans.

Fine. Then I posit the existence of understanding-2, which is exactly identical to understanding, whatever it is, except for the fact that it can only be done by machines. And now I ask you to prove to me that AI doesn't have understanding-2.

This is just to show you the absurdity of trying to claim that AI doesn't have understanding because by definition only humans have it.


> Nope, sorry. You said you have no idea of what understanding is, except that by definition it can only be done by humans.

He said understanding is what humans do, not that only humans can do it. Stop arguing against a strawman.

Nobody would define understanding as something only humans can do. But it makes sense to define understanding based on what humans do, since that is our main example of an intelligence. If you want to make another definition of understanding then you need to prove that such a definition doesn't include a lot of behaviors that fails to solve problems human understanding can solve, because then it isn't really the same level of as human understanding.


> He said understanding is what humans do, not that only humans can do it. Stop arguing against a strawman.

Ok, so his argument is:

> Humans understand, models are deterministic functions

> Until you have a concrete definition of what understanding is you can't apply it to anything else

> Informal definitions of understanding by those who experience it aren't very useful at all.

Basically he says: "I don't accept you using the term 'understanding' until you provide a formal definition of it, which none of us has. I don't need such definition when I talk about people, because... I assume that they understand".

Which means: given two agents, I decide that I can apply the word "understanding" only to the human one, for no other reason that it is human, and simply refuse to apply it to non-humans, just because.

Clearly there is absolutely nothing that can convince this person that an AI understands- precisely because it's a machine. Put in front of a computer terminal with a real person on the other side- but being told it's a machine- he would refuse to call "understanding" whatever the human on the other side does. Which makes the entire discussion rather pointless, don't you think?


...and measurable IMO :) We need a way to know how much something understands.


If we had that then education would be solved, but we still struggle to educate people and ensure fair testing that tests understanding instead of worthless things like effort or memorization.


I guess you understand wolves, do you think this video demonstrates understanding of how wolves work?

https://www.youtube.com/watch?v=jspYKxFY7Sc


Certainly. It demonstrates it knows how they look, how they move, what type of behaviour they show at a certain age, in what kind of environment you might see them. What it doesn't seem to have is enough persistence to remember how many there are. But the idea of an animate, acting wolf is there, no doubt about it.


Models can't understand anything but the dataset they are trained on. This is very obvious from the plausible looking wolf cubs moving plausibly while behaving in absurd ways when it comes to real life wolves, or real life anythings for that matter. Models are compressing huge planet scale datasets and spitting out plausible instances of bytes that could belong to the training dataset, but very obviously fail to grasp any real world understanding of what is represented by those bytes.


diffusion models can reliably draw a cat when prompted a cat and given a random noise. Sure it's deterministic, but it can work with any random noise in a explorative kind of way. I'd say it's very general in it's 'understanding' of a cat.


It has no "understanding" of a cat. It's an associative store with soft edges that pulls out compressed cat representations when given the noun "cat". The key store includes nouns, adverbs, verbs, and adjectives, and style abstractions, and there are mappings into the store that link all of those.

But they're very limited, and if you prompt with a relationship that isn't defined you get best-guess, which will either be quite distant or contaminated with other values.

If you ask Dall-E for "a woman made of birds" you get a composite that also includes trees and/or leaves. Dall-E has values for "made of" and "birds" but its value representation for "birds" is contaminated with contextual trees and branches.

Leonardo doesn't have a value for "made of", so you get a woman surrounded by bird-like blobs.

To understand a cat in a human sense the store would have to include the shape, the movement dynamics in all possible situations, the textures, and a set of defining behaviours that is as complete as possible. It would also have to be able to provide and remember an object-constant instantiation of a specific cat that is clean of contamination.

SORA is maybe 10% of the way there. One of the examples doing the rounds shows some puppies playing in snow. It looks impressive until you realise the puppies are zombies. They have none of the expressions or emotions of a real puppy.

None of this is impossible, but training time, storage, and power consumption all explode the more information you try to include.


I don't see what's so problematic with it. I doubt the model is actually confusing trees and branches with birds. It has associations, but humans do too. If I ask a human to draw a demon, the background would not be an office?

Also the complain about 'made of' not being in the training data. Humans who never saw a bird can not draw a bird. Why is that saying something about the model?

I'm not saying that diffusion models act like humans. And I was talking specifically about image generation. My usage of the word understanding is in the task of image generation. I'm not even talking about 'made of', or 'birds'. Just 'cats' and 'hats'. If it can understand 1 thing, it can understand others, but they are not always in the training data.

This is all a non-problem. It kinda remind me of the discussion of what constitutes a 'male', or 'female'. All i want is to refer this one property that i observe in diffusion models. Which is what language is, reference. If you are so covetous of the word 'understand', then provide an alternative to refer to this property and i will gladly use it.

https://imgur.com/a/fuS8kcf


> It's an associative store with soft edges that pulls out compressed cat representations when given the noun "cat".

And how do you know that this is not what "understanding" is? To me, understanding the concept of a cat is exactly to immediately recall (or have ready) all the associations, the possibilities, the consequences of the "cat" concept. If you can make up correct sentences about cats and conduct a reasonable conversation about cats, it means that you understand cats.


> If you can make up correct sentences about cats and conduct a reasonable conversation about cats, it means that you understand cats.

No, plenty of humans can have reasonable conversations about things with zero understanding about them. We know they don't understand because when put in a situation in practice they fail to use the things they talked about. Understanding means you can apply what you know, not only talk about it.


It's interesting, can you give me some examples of such cases?


If you don't believe that is true, how can you explain programmer recruitment? If discussing something cogently is showing real understanding, any 10 minute discussion would be enough to make a hire decision.


Well, we definitely see actions, and actors, and causes and effects in the SORA demo videos. With imperfections and occasional mistakes, but they're undeniably there.


I want him to answer how Sora can do water simulations without having a model of part of physics.

How can Sora predict where the waves and ripples should go? Is it just "correlations not causal", whatever that means.


Generative AI is already full of misunderstandings. From people claiming it "understands" to now that it "simulates".

I'm no math wiz and my training in statistics is severely lacking but it feels like people need to review what they think it's possible with Generative AI because we are so far from understanding and AGI that my head hurts every time these words show up in a discussion.


Geoffrey Hinton and Yoshua Bengio says it can understand and we are close to AGI. Maybe you can explain why they're wrong instead of just saying it.


Extraordinary claims require extraordinary proof. I'm not the one making the claims.


Did you see the bling zoo demo? How can you say it doesn't have rhyme or reason?


Video, he says

1. The next frame is easy, but multiple frames is not

2. What works for text doesn't work for video.

Then Sora comes out and shows multiple frames and someone tweets gotcha.

He then tweets without saying he misspoke ..... goes on about the model doesn't understand physics.

And his project, V-JEPA, is the best

He keeps saying stuff about "sucks as a mental model" but doesn't say why that would not apply to text.

https://twitter.com/ylecun/with_replies

Me: If text doesn't need a mental model, I see no reason video needs it. His argument sucks or is badly worded.


>does not indicate that a system understands the physical world.

That's a bit like saying the chatbots aren't actually intelligent. Sure, but there is at least a plausible illusion of it & even that has usefulness.

The same applies to physical world. e.g. Look at the reflections in the SORA demo. They're not right, but they're also not entirely wrong either. That to me suggests usefulness in approximating the physical world


I feel like LeCunn often gets it half right and then gets overzealous/has a conflict of interest. I think there are good reasons to be less than perfectly optimistic that OpenAI's claims will bear out (The kind of errors SORA makes seem to imply that "simulation" may be a stretch in my opinion, I talked about this a little in the thread about that announcement and honestly when I first read this post I thought he hit the nail on the head until about the fifth sentence), but I don't think there's strong evidence to suggest that they definitely can't yet, and this seems more like a sales pitch for the model he's currently working on than an expert making a particularly compelling case via better understanding of the theory or practice of this thing

Like, in general I think there's a lot of hype around AGI and that skepticism toward OpenAI's claims isn't completely unwarranted, but the amount of public attention on the topic lately has caused everyone to commit to very hard lines that lack nuance about the implications of various advances. It's in some ways cool that AI is no longer just an academic curiosity, but it makes for a lot of nonsense to slog through, even from top researchers


I think one of the things that needs to be said about Sora is the videos that examples that we've seen have been impressive, but this is also how most of OpenAI's examples have been on marketing pages. What matters is when you yourself are actually able to use it. In the past, it was only when I've actually been able to try the tech myself that I've been able to see how successful (or unsuccessful) they really are for doing real work.


No one is claiming Sora is a finished product

HN gets hung up on damning things that aren't perfect _right now_

You also see this with FSD...it isn't perfect today, so HN writes it off forever

Sora is a demo and a teaser of what will be a useful polished tool in three years, that's all


No, we've been writing off FSD for the last 6+ years it's been promised as "coming soon"

People have bought, owned and used, and sold their Tesla w/ FSD without once being able to use the feature.

Its not even any better or less relaxing - instead of driving you babysit a black box.


It's a shame that Yann didn't explicitly say what the misunderstanding was or clear it up.

I think he's claiming that the misunderstanding is that SORA understands the physical world. He goes on to say that generating the next frame conditional on some action is much harder than generating an entire plausible video. This just doesn't make sense to me. Every frame in a plausible video clearly needs to be conditioned on the actions shown in previous frames.

The tweet is mostly incoherent and I'm left assuming he's upset about SORA and wants to say his ideas are/were better but hasn't managed to express why in a meaningful way.


As I understand it, diffusion-based video generation models simply are not casual in this way. They work by modifying the previous frames in the video to be consistent with future frames just as much as they do later frames to be consistent with earlier ones. That's why Yann LeCun can argue that they do not have to be able to generate plausible continuations of a real video, just generate some arbitrary sample from the space of plausible-looking videos, and that the latter does not imply the ability to do the former. It's also why it's not possible to just generate videos of arbitrary length and lots of VRAM is required to create even a relatively short clip.


Thank you. That would make sense. Perhaps Yann's target audience are supposed to know this already, but your explanation actually cleared up a misunderstanding for me.


I quote Zen Mind Beginner's Mind a lot: "In the beginner's mind there are many possibilities, but in the expert's there are few."

It's like, the smarter someone gets, the more it trips them up when it comes to questions like the one Yann is trying to address. Because, "world modeling" doesn't have to mean that the system understands advanced physics and calculus. Look at our own brains – only analogies, I'm aware – and think about how much a child can infer about what will happen when you throw a ball at them.

Are they 'calculating' trajectories? Not consciously. That would be too slow, regardless. Their brains have become wired through experience and evolution. Formally explaining why the ball will do what it does? Well, that takes years of math and physics education.

Bottom line: in the months and years after a model comes out, research tends to uncover all sorts of things happening inside them that are unexpected. Best thing is to approach it with a beginner's mind, and say, "hey, it's doing _something_ interesting – let's try and understand what that is," rather than just forcing everything to conform to our existing worldview.


OpenAI willfully fuels hype of all sorts and lets people extrapolate without basis or limit, because it's hugely profitable for them.

LeCun is the voice of reason trying to point out the limitations of the current technology and fundamental questions that need to be answered.

Until there is something more than cute demos, like an actual path forward that can implement what people are hyping, or better still working examples, it's all speculative nonsense.


LeCun is the voice of reason trying to point out the limitations of the current technology and fundamental questions that need to be answered.

I think this is the more likely true, but less popular take as well.


"Furthermore, generating those continuations would be not only expensive but totally pointless."

Why would it be pointless? I think there are many creative uses of video continuation model, considering the amount of control it gives you.


If we cut all the videos in the world in half, half of the videos will be continuations. Or another way of saying, all video is continuation, so I'm with you, not sure why it would be pointless. In fact, a continuation based workflow with prompting seems like the easiest way to get a specific effect.


Quick, feed in season 1 of Firefly.


As opposed to season 2? :(


It's not pointless. It can potentially generate world simulations. We might be inside one such long video.


I mean, there's a point here. Even in the videos OpenAI shows as "successes", you can spot uncanny coherence mistakes. There's one where a forklift morphs into one that's turned 90 degrees. The previous frame looks a bit ambiguous about the direction the squarish shape was pointing at, and somehow what happens for several seconds before that wasn't a strong enough signal for the model. And in the "failure" videos, we see objects merging and splitting in similar ways that DALL-E inserts extra arms into scenes with groups of people.

To me, it's clear that there's no real global understanding yet. Whether this person's preferred ML approach solves that, I'm not knowledgeable enough to tell.


Someone on here was claiming Sora understands physics, while I was watching steaming frozen snow, the lady in red skating down the street in Tokyo on wet concrete and a bunch of people having lunch next to miniature people visiting a mini market stand made from a bike.

Definitely felt like something was off to me.


It's interesting how far behind xAI already is in the current market. Their current advantage of having a corpus of text isn't even relevant.


Just a random thought - maybe Musk was/is so hell bent on removing censorship because he wants more robust data, covering a larger array of thoughts, for his AI plans.


I can’t tell if his comments came before or after the latest OpenAI product announcement shoeing off what looks like generative video.


The original video: before, this tweet: after.


World Government Summit was from 12-14 February 2024 in Dubai, so either a day before or at the same time, depending on timezones...


LeCun is such a hack and is guilty exactly the same hype as OpenAI.

Firstly his insistence on “self supervised learning” which is just a wrong and unhelpful rebranding of existing methodologies. Followed by talking about VicREG as if it’s a meaningful contribution and not just hacked together crap which is not only theoretically unfounded but plain nonsensical. Followed again by his “JEPA” work which again is just a rebranding of other works like BYOL and data2vec.

Its so frustrating to see this guy pop up and try to claim credit for an entire field of representation learning which he hasn’t made a single meaningful contribution to in decades.


He won the fucking Turing. I think you can rock your CV alongside “he’s such a hack”.

I’ve met the guy a few times (briefly), and I’m aware of his general vibe. I don’t agree with him about everything, but “hack” is absurd, and I’m not posting about who is a hack (or even a crook) under a burner alt.

What’s your Fields medal for?


Just because he made a good thing decades ago doesn’t mean he has any clue about what is going on now.

Many of us actually work in the field and are tired of the mindless hype and self promotion. LeCun is just upset people aren’t hyping his crap rather than OpenAIs.

But I’m sure you can give a good reason why he calls it self supervised instead of unsupervised and JEPA instead of BYOL or data2vec and what his actual contributions to the field of modern representation learning are.

Pauling had a Nobel, didn’t stop him being a crank about vitamin C.


I didn’t say you can’t criticize household names with the highest honor in their field.

I’m saying that it takes some cheek to say there’s nothing original about results that many experts (who are nothing to do with me, and me in a narrow way) think are pretty hot shit.

I try (and sometimes fail) to confine my strident claims requiring strident evidence to topics I’m willing and able to get into the nitty gritty on, and bring my CV and reputation to the party as a courtesy. Anyone can post biased-sounding shit and hand-wave over the substance of the claim under an 11-month burner named with malice aforethought to talk shit and be able to disavow it.

I gave you a rather firm tap on the shoulder about how borderline at best this move is, and under the assumption you’ve got an HN main, you know this.

I had hoped you might put the effort that went into doubling down into explaining the apparently obvious prior art, the whataboutism kick flip into the Shockley Race IQ trope made zero people more enlightened and steered a contentious thread even further off the rails. Nobel laureates do in fact go off the friggin rails from time to time, but I don’t think LeCun is an interesting comparison to Kary Mulis past “AM Radio yelling”. I’m calling that one out.

Maybe elaborate on a pretty debatable claim instead?


This is the most hilarious internet tough guy shit I've ever seen.

Like literally just read the papers I mentioned. Or don't, I don't care. I have literally no clue who you are so maybe take your creepy-as-fuck "firm tap on the shoulder" and wanting to see my "cv" shit somewhere else.


We all go out of bounds on HN norms from time to time, and myself more than many.

I generally appreciate the feedback that I’ve landed there.

This isn’t seeming to be going anywhere useful and I’m going to respectfully bow out. If you’ve got to have the last word, go to town: but do it by posting the links to the papers for those who haven’t read them.

I’ve pulled some dumb stunts on HN, even some recently, but I’ll stand behind there being no Navy Seal Copypasta flex here for the record.

Be well ml-anon.


>“self supervised learning” which is just a wrong and unhelpful rebranding of existing methodologies

What are those methodologies?


Self-supervised learning is such a trivial concept that it doesn't need a new term. It's just marketing/PR speech to attract attention. You could just call it "generating labels from data"

All these stupid new terms of trivial concepts confuse people who are new to the field. That's nothing new though, research has always been like this, people love to make up new phrases that they can own and sell as novel ideas to reviewers, even if they're just trivial renamings. It's nothing more than a PR game.


If people didn't consider it a useful pointer to a certain set of ideas, it wouldn't have caught on in the ML community.

That's how language works.


In the 90’s and 2000’s we called it unsupervised learning.

In fact there is a direct formal equivalence between many so-called self supervised learning techniques and matrix factorization. And literally no one would claim that matrix factorisation is anything other than unsupervised.

Similarly Yann makes such a big deal about not doing contrastive learning when contrastive methods and his JEPA nonsense can also be shown to be formally equivalent. He’s a grifter like the rest of them. There’s a reason why he basically holds no power at Meta and hasn’t been in charge of AI research there in a long time.


Just seeing an error message, and cannot sign in as well. Am I the only one?


Cleared the twitter web cache. Shitty software.


It's impossible to access this content (even more so now that nitter is dead). Should HN continue driving traffic to a site that is entirely inaccessible?


Is it really impossible? I clicked the link and saw the content fine. If you mean "impossible without signing up", I'd say it's within your rights to not want to sign up, but it's a very uncharitable definition of "impossible".


Impossible for me yes. Am getting prompted by a sign-up screen to join twitter.


Why is it impossible for you to sign up for Twitter?


I know @dang has commented on the subject before but now that Nitter is dead, it should be considered hardwalled and Twitter links should be flagged imo. There is no real workaround anymore that doesn't involve signing up.


hn should drive traffic wherever the content is published / mirrored. If there are multiple options, choose the most public one.

We shouldn't censor ourselves and not discuss information that was posted on twitter.


Other paywalled content at least can be opened with archive.md and friends. Usually people here kindly post a link for convenience to archive. With twotter that no longer works.

So this type of content means that in order to follow the discussion I must also be a twitter subscriber.

Maybe people could consider first creating an archive.org snapshot that is then linked instead. Would be the kind thing to do and help the rest of us.


> Maybe people could consider first creating an archive.org snapshot that is then linked instead

Yes, that would be the best way to handle this.


Given tweets are so short, just paste the whole tweet into the text.

If it’s a whole thread though then just don’t bother. That approach to blogging is just dumb.


Microblogging is a failed experiment. What a detriment to humanity.


I agree with you in the sense that it isn’t a constructive form of communication in the sense that the short form prevents nuance and complexity which inevitably leads to conflict.

It made Trump president and gives Musk control of the media narrative though so it certainly works for some people.


I don’t understand:

This link opens the post for me, while paywalled news is semi-routinely posted.

What’s the objection?


Sometimes Twitter links won’t open for me now without being logged in. I’ve loved the change because it has helped me not use Twitter at all. This one opened for me though.


The difference is that paywalled news sites require a login to access content that they themselves produced and published, while Twitter/X now requires a login to access content written by other people.

In other words, paywalled news is a first party content provider, but Twitter is just a third party who inject their mandatory login between me (the reader) and the writer so they can make some profit by showing ads on the content.

That's fine, but authors should consider if they want to publish on a site like that when there are so many other options.


[flagged]


Original content of comment, in case it gets flagged:

---

It is a huge stretch to assert that successful movie directors "understand the physical world".

They do a worse thing: understand the human psyche.

---

Unrelated but I really don't like these analogies. They use clever words to try to hand wave some vague ideas into existence, trying to appear smart but never really succeeding.


Love the analogy, stealing it.


Worse? It’s more direct.


It's more creepy.


This is important, even though the Twitter part is useless. Here's the paper, "Revisiting Feature Prediction for Learning Visual Representations from Video"[1]

A big problem with machine learning so far has been the lack of some underlying, more abstract model of the subject matter. I don't understand much of this, but apparently something called "representation", a purely mathematical concept, is able to help with this.[2] This turns some kinds of abstract problems into linear algebra problems, which means matrix operations, the things GPUs do so well, help.

Code is available, from links in [2].

Would someone comment on this, please. Is this the beginning of a huge breakthrough? One where, instead of working in text token space or image pixel space, it's possible to work in an automatically generated more abstract space?

[1] https://scontent-sjc3-1.xx.fbcdn.net/v/t39.2365-6/427986745_...

[2] https://en.wikipedia.org/wiki/Representation_theory


The "representation" in "representation theory" that the wikipedia article refers to is a term of art, referring to a class of one-to-one maps between groups (sets endowed with a binary operation with specific properties) and square matrices, in such a way that matrix multiplication preserves the properties of this binary operation between the original group elements being mapped to matrices. There is no mention of this sort of algebraic operation whatsoever in the paper you have linked, which, as far as I can tell, uses the word "representation" only in a nontechnical descriptive sense. I believe you might be misunderstanding the nature of this breakthrough.





Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: