More

laughingcurve · 2026-01-21T01:29:33 1768958973

Possibly the greatest contribution to Claude code in months. I am rushing to my terminal to install, test, and update.

laughingcurve · 2026-01-14T01:16:25 1768353385

Don’t be too harsh, it’s the most effort Gary has put into his criticism in a while </s>

I appreciate good critique but this is not it

laughingcurve · 2026-01-05T22:57:22 1767653842

Article from 2018/19 and this hypothesis remains just that afaik with plenty of evidence going against it

swyx · 2026-01-06T00:07:47 1767658067

i intereviewed Jon (lead author on this paper) and yeah he pretty much disowns it now https://www.latent.space/p/mosaic-mpt-7b

gwern · 2026-01-06T00:53:16 1767660796

Could you explain why you think that? I'm looking at the lottery ticket section and it seems like he doesn't disown it; the reason he gives, via Abhinav, for not pursuing it at his commercial job is just that that kind of sparsity is not hardware friendly (except with Cerebras). "It doesn't provide a speedup for normal commercial workloads on normal commercial GPUs and that's why I'm not following it up at my commercial job and don't want to talk about it" seems pretty far from "disowning the lottery ticket hypothesis [as wrong or false]".

oofbey · 2026-01-06T01:41:10 1767663670

I think that was pretty clear even when this paper came out - even if you could find these sub networks they wouldn’t be faster on real hardware. Never thought much of this paper, but it sure did get a lot of people excited.

sailingparrot · 2026-01-06T03:09:08 1767668948

It was exciting because of what it means regarding how a model learns, regardless on whether or not its commercially applicable.

gwern · 2026-01-06T02:11:21 1767665481

(Cerebras is real hardware.)

oofbey · 2026-01-06T05:28:39 1767677319

It is real in that it exists. It is not real in the sense that almost nobody has access to them. Unless you work at one of the handful of organizations with their hardware, it’s not a practical reality.

aaronblohowiak · 2026-01-06T06:55:47 1767682547

how long will that be the case?

oofbey · 2026-01-06T14:47:34 1767710854

They have a strange business model. Their chips are massive. So they necessarily only sell them to large customers. Also because of the way they’re built (entire wafer is a single chip) no two chips will be the same. Normally imperfections in the manufacturing result in some parts of the wafer being rejected and other binned as fast or slow chips. If you use the whole wafer you get what you get. So it’s necessarily a strange platform to work with - every device is slightly different.

IshKebab · 2026-01-06T09:22:25 1767691345

At least for the foreseeable future (next 50 years say).

laughingcurve · 2026-01-06T06:37:50 1767681470

i saw how it nerdsniped an extremely capable faculty member

swyx · 2026-01-06T18:24:08 1767723848

he pretty much always says it offline haha but i maay have mixed it up with the subsequent convo we had at neurips https://www.latent.space/p/neurips-2023-startups

laughingcurve · 2026-01-06T06:37:06 1767681426

cool beans, thanks for this -- I think it's easier to hear it directly from the authors. I was hesitant to start researchposting and come off like a dick.

also; note to self: If I publish and disown my papers, shawn will interview me :)

yorwba · 2026-01-05T23:29:09 1767655749

What evidence against it do you have in mind? I think it's a result of little practical relevance without a way to identify winning tickets that doesn't require buying lots of tickets until you hit the jackpot (i.e. training a large, dense model to completion) but that doesn't make the observation itself incorrect.

kingstnap · 2026-01-06T01:13:53 1767662033

The observation itself is also partially incorrect. This is a video I watched a few months ago that went further into the whole how do you deal with subnetworks thing.

https://youtu.be/WW1ksk-O5c0?list=PLCq6a7gpFdPgldPSBWqd2THZh... (timestamped)

At the timestamp they discuss how actually the original ICLR results only worked on these extremely tiny models and larger ones didn't work. The adaptation you need to sort of fix it is to train densely first for a few epochs, only then you can start increasing sparsity.

paulsutter · 2026-01-06T15:36:41 1767713801

Watched the video - thanks

Ioannu is saying the paper's idea for training a dense network doesn't work in non-toy networks (the paper's method for selecting promising weights early doesn't improve the network)

BUT the term "lottery ticket" refers to the true observation that a small subset of weights drive functionality (see all pruning papers). It's great terminology because they truly are coincidences based on random numbers.

All that's been disproven is that paper's specific method to create a dense network based on this observation

laughingcurve · 2025-12-31T16:47:41 1767199661

The article makes no compelling points to me as an avid user of these applications.

I would rather shove ice picks covered in lemon juice than provide Java or Ellison anymore room in the digital ecosystem. And I’m not talking politics here wrt Ellison, just awful

newsoftheday · 2025-12-31T18:24:49 1767205489

Someone else on the page commented about Oracle. Why are there still people hung up on Oracle or Ellison when if anything, they've helped Java to thrive more.

The real threat has been and continues to be ... Google. They pulled a Microsoft move (that they got busted for) and Google got away with it. Google killed Eclipse as the IDE for Android development and threw that business over to their Russian buddies at JetBrains.

Google is the threat to Java, not Oracle.

waffletower · 2025-12-31T22:56:06 1767221766

I'm a progressive -- just as I am not dumping my climate friendlier Tesla at a loss because Musk is a Nazi buffoon, there is no way I am walking away from my GraalVM compiled babashka binary because another billionaire turd kicked Stephen Colbert off the tonight show. I can mourn and label both as petulant and stupid, without having to bleed my back like Saint Thomas More.

laughingcurve · 2025-12-29T00:25:37 1766967937

Unironically, they are indeed somewhat safer -- however if people are willing to accept a substitute good of AI-based fortune telling... which I have seen lately ...

smegger001 · 2025-12-29T00:48:21 1766969301

we automated fortune telling a long time ago

https://en.wikipedia.org/wiki/Fortune_teller_machine

laughingcurve · 2025-12-21T19:39:52 1766345992

Wow, I have not thought about OiNK in ages... great memories! OiNK and WhatCD did something very special for the musical community

laughingcurve · 2025-12-16T13:58:03 1765893483

Incentives don’t remove agency. They might have incentives… but these are awful scum who deserve nothing but contempt

MSFT_Edging · 2025-12-16T15:04:03 1765897443

No but it provides a framework to begin thinking about ways we can protect the vulnerable from these contemptible but totally predictable bad actors.

For example, families forced to publicly beg for money to provide their sick children with treatment. What societal structures enable this situation to occur? Who is profiting off of this structure?

pbhjpbhj · 2025-12-16T19:23:01 1765912981

You'd have to be inconscionably crass to profiteer off charities treating kids for cancer ... https://en.wikipedia.org/wiki/Donald_J._Trump_Foundation "Granting money to charities that rented Trump Organization facilities".

IAmBroom · 2025-12-16T14:19:58 1765894798

Well, that's a pretty bold stance. Scammers who steal from dying children are bad people? Geez...

laughingcurve · 2025-11-18T19:17:22 1763493442

I appreciate you doing this and sharing it. I had a similar experience with rust and tokenization library (BERTScore) and realized it was better to let the barely worse method stand because the effort was not worth it to maintain long term

laughingcurve · 2025-10-30T13:42:50 1761831770

I am a researcher in this field and and would love to talk more about loyal agents

dazzaji · 2025-11-02T03:38:08 1762054688

By all means! I’m not sure if Hacker News rules or norms permit us to talk here or not but I’ll at least respond here as a start:

What about loyal agents would you like to talk about?

laughingcurve · 2025-10-24T16:48:58 1761324538

I love typst and I tried to use it recently when shipping publications to EMNLP, AAAI, and NeurIPS. While there were a lot of upsides to it, things got very bad when the teams grew beyond just a few people. Typst is incredible for single-person or a trio of people, but the web experience is not there yet for collaboration. I’m really hoping for typst to continue and I plan to use it whenever I can for smaller projects or stuff that wont involve working with professors or students who are not interested in learning new things during publication time.

vhartman · 2025-10-24T17:01:20 1761325280

How did you deal with the usually supplied templates that you get for TeX? Did you just try to imitate as best as possible?

(I wanted to look at this at some point for replacing latex for submissions to IEEE confs/corl etc)

swiftcoder · 2025-10-25T06:35:46 1761374146

> Typst is incredible for single-person or a trio of people, but the web experience is not there yet for collaboration

How did collaborate in the past on LaTeX publications?

vhartman · 2025-10-31T07:20:07 1761895207

Not OP, but likely overleaf.