Hacker Newsnew | past | comments | ask | show | jobs | submit | DavidSJ's commentslogin

The bearer of that shirt knows that God wrote in Lisp (perhaps Scheme): https://youtu.be/WZCs4Eyalxc


Ostensibly, yes, but it was mostly hacked together with Perl: https://xkcd.com/224/


The article seems to suggest the false positive rate is only 38%:

The trial followed 25,000 adults from the US and Canada over a year, with nearly one in 100 getting a positive result. For 62% of these cases, cancer was later confirmed.

(It also had a false negative rate of 1%:)

The test correctly ruled out cancer in over 99% of those who tested negative.


If the stats were as good as the hyperbole in the article, it would clearly state the only 2 metrics that really matter: predictive value positive (what's the actual probability that you really have cancer if you test positive) and predictive value negative (what's the actual probability that you're cancer free if you test negative). As tptacek points out, these metrics don't just depend on the sensitivity and specificity of the test, but they are highly dependent on the underlying prevalence of the disease, and why broad-based testing for relatively rare diseases often results in horrible PVP and PVN metrics.

Based on your quoted sections, we can infer:

1. About 250 people got a positive result ("nearly one in 100")

2. Of those 250 people, 155 (62%) actually had cancer, 95 did not.

3. About 24,750 people got a negative test result.

4. Assuming a false negative rate of 1% (the quote says "over 99%") it means of those 24,750 people, about 248 actually did have cancer, while about 24,502 did not.

When you write it out like that (and I know I'm making some rounding assumptions on the numbers), it means the test missed the majority of people who had cancer while subjecting over 1/3 of those who tested positive to fear and further expense.


"only 2 metrics that really matter"

Nope, there is another important thing that matters: some of the cancers tested are really hard to detect early by other means, and very lethal when discovered late.

I would not be surprised if out of the 155 people who got detected early, about 50 lives were saved that would otherwise be lost.

That is quite a difference in the real world. Even if the statistics stays the same, the health consequences are very different when you test for something banal vs. for pancreatic cancer.


Careful: the stats you're reading are all-cancers, cancers aren't uniformly prevalent, and the specific cancers you're referring to might (and probably do) have much, much worse screening outcomes than the aggregate.


"and probably do"

Why probably?

I don't see where this "probably" comes from; it could well be the other way round. It is a new technology and its weak and strong points / applications may differ significantly from what we currently use.


You're overfocused on the "technology" and underfocused on the base rate of the cancers you're concerned about. Just do the math. What you need to know is "just how accurate would this test need to be in order for most of the positives it generates to be true positives". The numbers will be surprising.


Both the technology and the base rate DO matter.

Say that you are hunting the elusive snipe, one bird in a million. With standard techniques, you will have a lot of false positives.

But if you learn that the elusive snipe gives off a weird radio signal that other birds don't, your hunt will be a lot shorter.

Same with relatively rare cancers. If you can detect some very specific molecule or structure, your test will be quite reliable anyway. That is why I don't get your use of "probably". Unless you are really familiar with the underlying biochemistry, the probabilities cannot be guessed.

There is absolutely no reason why tests for rare diseases should have high false positive rates. In many other diseases, they don't. For example (although the underlying technology differs), Down syndrome is rare, but its detection barely has any false positives. You can test the entire pregnant population for Downs reliably, and many countries already do that.


I'm not saying that you couldn't develop a super-reliable test for a rare cancer! The point isn't that these cancers are impossible to screen for. The point is that the numbers given in this article do not suggest that such a test has been developed in this case.


It surely does not make sense to screen the entire population for pancreatic cancer, but we already do have a screening for pancreatic cancer in Czechia for "at risk population".

https://www.cgs-cls.cz/screening/program-vyhledavani-rakovin...

For such programs, a blood test would be a huge boon and they could even expand the coverage a bit.


Maybe. Let's do a thought experiment.

Let's say you do have a positive test for pancreatic cancer. Overall 5 year survival rate 12%, but other than with other cancers, people continue to die after that. Basically, it is almost a death sentence if it is a true positive. Early detection will increase your odds a bit, and prolong your remaining expected lifetime, but even stage 1 pancreatic cancer, only 17% survive to 10 years. Let's say you are one of the 99% of false positives, because everyone gets tested in this hypothetical scenario. Let's say imaging and biopsy looks clean. No symptoms (which you typically don't have until stage 3 with pancreatic cancer, where it is far too late anyways). With the aforementioned odds, what would you do?

Panic? Certainly, given that if it is a real positive, you might as well order your headstone.

Panic more? Maybe people with those news will change their behaviour and engage in risky activities, get depressed, or attempt suicide (https://jamanetwork.com/journals/jamanetworkopen/fullarticle... ). All of which will kill some of those people.

Get surgery to remove your pancreas? Well, just the anesthesia as a 0.1% chance of killing you, the surgery might kill 0.3% in total. No pancreas means you will instantly have diabetes, which cuts your life expectancy by 20 years.

Start chemotherapy? Chemo is very dangerous, and there is no chemo mixture known to be effective against pancreatic cancer, usually you just go with the aggressive stuff. It is hard to come by numbers as to how many healthy people a round of chemo would kill, but in cancer patients, it seems that at least 2% and up to a quarter die in the 4 weeks following chemotherapy (https://www.nature.com/articles/s41408-023-00956-x ). And chemotherapy itself has a risk of causing cancers later on.

Start radiation therapy? Well, you don't have a solid tumor to irradiate, so that is not an option anyways. But if done, it would increase your cancer risk as well as damage the irradiated organ (in that case probably your pancreas).

So in all, from 100 positive tests you have 99 false positives in this scenario. If just one of those 99 false positives dies of any of the aforementioned causes, the test has already killed more people than the cancer ever would have. Even if no doctor would do surgery, chemotherapy or radiation treatment on those hypothetical false positives, the psychological effects are still there and maybe already too deadly.

So it is a very complex calculation to decide whether a test is harmful or good. Especially in extreme types of cancer.


"Let's say you are one of the 99% of false positives, because everyone gets tested in this hypothetical scenario."

This alone is a disqualifier for your scenario. A test with 99 per cent of false positives will not be widely used, if at all. (And the original Galleri test that the article was about is nowhere near to that value, and it is not intended to be used in low-risk populations anyway.)

I am all for wargaming situations, but come up with some realistic parameters, not "Luxembourg decided to invade and conquer the USA" scenarios.


> Nope, there is another important thing that matters: some of the cancers tested are really hard to detect early by other means, and very lethal when discovered late.

You are arguing for testing everyone there. If you cannot detect them by other means, you need to test for them this way. And do it for everyone. You have already set up the unrealistic wargaming scenario. You picked pancreatic cancer as your example where you do have to test every 6 months at least, because if you do it more rarely, the disease progression is so fast that testing is useless. There are no specific risk groups for pancreatic cancer beyond a slight risk increase by "the usual all-cancer risk factors". Nothing to pick a test group by.

And a 99% overall false positive rate is easy to achieve, lot's of tests that are in use have this property if you just test everyone very frequently. Each instance of testing has an inherent risk of being a false positive, and if you repeat that for each person, their personal false-positive risk of course goes up with it. All tests that are used frequently have an asymptotic 100% false positive rate.


"You are arguing for testing everyone there."

Are you mistaking me for someone else? I never said or even implied that.

"And a 99% overall false positive rate is easy to achieve,"

Not in the real world, any such experiment will be shut down long before the asymptotic behavior kicks in. Real healthcare does not have unlimited resources to play such games. That is why I don't want to wargame them, it is "Luxembourg attacks the US scenario".

"There are no specific risk groups for pancreatic cancer"

This is just incorrect, people with chronic pancreatitis have massively increased risk of developing pancreatic cancer (16x IIRC). There also seems to be a hereditary factor.

Czech healthcare system, in fact, has a limited pancreatic cancer screening program since 2024, for people who were identified as high-risk.

https://www.cgs-cls.cz/screening/program-vyhledavani-rakovin...


Prolonging the expected lifetime by several years nontrivially improves chances of surviving until better drugs are found, and ultimately long term survival. Our ability to cure cancers is not constant, we're getting better at it every day.


Even so. Current first-line treatment for pancreatic cancer is surgery, because chemo doesn't really help a lot. Chemo alone is useless in this case. So any kind of treatment that does have a hope of treating anything involves removing the pancreas.

Take those 99% false positives. If you just remove the pancreas from everyone, you remove 20 years of lifetime through severe diabetes. In terms of lost life expectancy, you killed up to 25 people. Surgery complications might kill one more. In all, totally not worth it, because even if you manage to save everyone of those 1% true positives, you still killed more than 20 (statistical) people.

And the detection rate might be increased by more testing. But it needs to be a whole lot more, and it won't help. Usually pancreatic cancer is detected in stage 3 or 4, when it becomes symptomatic, 5 year survival rate below 10% (let's make it 5% for easier maths). The progression from stage 1 to stage 3 takes less than a year if untreated. So you would need to test everyone every 6 months to get detections into the stage 1 and stage 2 cases, that are more treatable. Let's assume you get everyone down to stage 1, with a survival rate of roughly 50% at 5 years, 15% at 10 years. We get a miracle cure developed after 10 years where everyone who is treated survives. So basically we get those 15% 10-year-survivors all to survive to their normal life expectancy (minus 20 because no more pancreas). Averaging they get an extra 10 years each.

Pancreatic cancer is diagnosed in 0.025% of the population each year. In the US at 300Mio., thats 750k in 10 years. With our theoretical miracle cure after 10 years for 15%, that is a gain of 1.125Mio years lifetime. A 1 hour time needed for testing per each of 300Mio people twice a year for 10 years already wastes 685k years of lifetime, so half the gain already. That calculation is already in "not worth it" territory if the waiting time for the blood-draw appointment is increased. That calculation is already off if you calculate the additional strain on the healthcare system, and the additional deaths that will cause.


> If the stats were as good as the hyperbole in the article, it would clearly state the only 2 metrics that really matter: predictive value positive (what's the actual probability that you really have cancer if you test positive) and predictive value negative (what's the actual probability that you're cancer free if you test negative). As tptacek points out, these metrics don't just depend on the sensitivity and specificity of the test

This is a bizarre thing to say in response to... a clear statement of the positive and negative predictive value. PPV is 62% and NPV is "over 99%".

Your calculations don't appear to have any connection to your criticism. You're trying to back into sensitivity ("the test missed the majority of people who had cancer") from reported PPV and NPV, while complaining that sensitivity is misleading and honest reporting would have stated the PPV and NPV.


so possibly saving lives and late stage cancer care level medical expenses 2/3 of positive results vs fear and lighter medical care 1/3 of the time. is this not a win?


I believe, that you that sumes as mean.


Unfortunately, I don't think the mechanism is quite restricted enough. TFA says that repairing the BBB helps amyloid plaque clearance. Would the author of your blog post claim that as a win, or admit that the plaques are downstream of the problem, and that BBB integrity is closer to the root cause of the disease process?

The author of that blog post, for whom I am in an excellent position to speak, would point to the "sole intended mechanism" clause in the testable prediction. That is, if the therapeutic's developers do not claim any other intended pathway for clinical benefit from improved BBB integrity other than amyloid−β clearance, then it would count. If not, then it would not count, even if it's plausible or even likely that that's the main pathway by which the benefits are accruing.

However, because this is early preclinical research, it's not likely to reach a late-stage clinical trial within the 12-year window of the author's prediction. Furthermore, in every year there are about a dozen of these preclinical studies that go viral for some reason or other, often having little correlation with how promising the science is. I haven't had a chance to look into this one in detail, so this isn't a negative comment about it, but the base rate of this stuff panning out is low, even if it's good research.

The author of that article would also point out that the concept of "the root cause" isn't terribly well-defined, but that strong evidence points to amyloid pathology as the common entrypoint in all cases of Alzheimer's disease, even if multiple upstream factors (some possibly relating to the BBB) can feed into that, depending on the specific case. Similarly, calorie surplus causes obesity in nearly all cases, but the specific cause of calorie surplus may vary from person to person.

I can't guess what point you're trying to make with a long article that acknowledges the fraud in the beginning, and then rehashes the initial reasons for looking into the amyloid hypothesis. No one is claiming it was stupid to look into the amyloid hypothesis. They are complaining that it hasn't been the most promising theory in quite a long time, and it was fraudulently held as the most promising. Other theories, arguably more promising, are listed throughout your article.

A correction: the article does discuss other hypotheses, in pointing out that they can't account for crucial evidence, whereas there isn't any major evidence the amyloid hypothesis seems to have trouble accounting for, and it thus remains very strong.


To add to this, a moderate amount of turbulence (a type of chaotic fluid flow) in engines and wing surfaces is sometimes deliberately engineered to improve combustion efficiency and lift, and also chaotic flow can induce better mixing in heat exchangers and microfluidics systems.


while Alzheimers drugs may be more effective earlier in the disease course, none of them are "effective" in the sense of meaningfully staving the disease off; the upside to early detection is not very strong.

One correction here: the amyloid antibodies that successfully clear out a large amount of plaque have yet to report data from intervention trials prior to symptom onset, so we can’t say this with confidence and in fact we have good reason to suspect they would be more effective at this disease stage.

I wrote about this and related topics here: https://www.astralcodexten.com/p/in-defense-of-the-amyloid-h...

Edited to add: the sort of test discussed in the OP wouldn’t be relevant to presymptomatic treatment, however, since it’s a test of symptoms rather than biomarkers for preclinical disease.


That is an amazing breakdown of AD, and I think it will be my go-to for sharing in the future.

Have you seen the research in phase-targeted auditory stimulation, memory, amyloid, and sleep? Do you have thoughts on that?

Acoustic stimulation during sleep predicts long-lasting increases in memory performance and beneficial amyloid response in older adults - https://doi.org/10.1093/ageing/afad228

Acoustic Stimulation to Improve Slow-Wave Sleep in Alzheimer's Disease: A Multiple Night At-Home Intervention https://doi.org/10.1016/j.jagp.2024.07.002


Thank you for your kind words.

I hadn’t seen that research, thanks for passing it along. It seems like an interesting approach to improve slow wave sleep, which is known to help with amyloid clearance.


If you're looking into this space further, I've posted the majority of the research on our website https://www.affectablesleep.com/how-it-works (bottom of the page)


> An LLM was only every meant to be a linguistics model, not a brain or cognitive architecture.

See https://gwern.net/doc/cs/algorithm/information/compression/1... from 1999.

Answering questions in the Turing test (What are roses?) seems to require the same type of real-world knowledge that people use in predicting characters in a stream of natural language text (Roses are ___?), or equivalently, estimating L(x) [the probability of x when written by a human] for compression.


I'm not sure what your point is?

Perhaps in 1999 it seemed reasonable to think that passing the Turing Test, or maximally compressing/predicting human text makes for a good AI/AGI test, but I'd say we now know better, and more to the point that does not appear to have been the motivation for designing the Transformer, or the other language models that preceded it.

The recent history leading to the Transformer was the development of first RNN then LSTM-based language models, then the addition of attention, with the primary practical application being for machine translation (but more generally for any sequence-to-sequence mapping task). The motivation for the Transformer was to build a more efficient and scalable language model by using parallel processing, not sequential (RNN/LSTM), to take advantage of GPU/TPU acceleration.

The conceptual design of what would become the Transformer came from Google employee Jakob Uzkoreit who has been interviewed about this - we don't need to guess the motivation. There were two key ideas, originating from the way linguists use syntax trees to represent the hierarchical/grammatical structure of a sentence.

1) Language is as much parallel as sequential, as can be seen by multiple independent branches of the syntax tree, which only join together at the next level up the tree

2) Language is hierarchical, as indicated by the multiple levels of a branching sytntax tree

Put together these two considerations suggests processing the entire sentence in parallel, taking advantage of GPU parallelism (not sequentially like an LSTM), and having multiple layers of such parallel processing to hierarchically process the sentence. This eventually lead to the stack of parallel-processing Transformer layers design, which did retain the successful idea of attention (thus the paper name "Attention is all you need [not RNNs/LSTMs]").

As far as the functional capability of this new architecture, the initial goal was just to be as good as the LSTM + attention language models it aimed to replace (but be more efficient to train & scale). The first realization of the "parallel + hierarchical" ideas by Uzkoreit was actually less capable than its predecesssors, but then another Google employee, Noam Shazeer, got involved and eventually (after a process of experimentation and ablation) arrived at the Transformer design which did perform well on the language modelling task.

Even at this stage, nobody was saying "if we scale this up it'll be AGI-like". It took multiple steps of scaling, from early Google's early Muppet-themed BERT (following their LSTM-based ELMo), to OpenAI's GPT-1, GPT-2 and GPT-3 for there to be a growing realization of how good a next-word predictor, with corresponding capabilities, this architecture was when scaled up. You can read the early GPT papers and see the growing level of realization - they were not expecting it to be this capable.

Note also that when Shazeer left Google, disappointed that they were not making better use of his Transformer baby, he did not go off and form an AGI company - he went and created Character.ai making fantasy-themed ChatBots (similar to Google having experimented with ChatBot use, then abandoning it, since without OpenAI's innovation of RLHF Transformer-based ChatBots were unpredictable and a corporate liability).


> I'm not sure what your point is?

I was just responding to this claim:

> An LLM was only every meant to be a linguistics model, not a brain or cognitive architecture.

Plenty of people did in fact see a language model as a potential path towards intelligence, whatever might be said about the beliefs of Mr. Uszkoreit specifically.

There's some ambiguity as to whether you're talking about the transformer specifically, or language models generally. The "recent history" of RNNs and LSTMs you refer to dates back to before the paper I linked. I won't speak to the motivations or views of the specific authors of Vaswani et al, but there's a long history, both distant and recent, of drawing connections between information theory, compression, prediction, and intelligence, including in the context of language modeling.


I was really talking about the Transformer specifically.

Maybe there was an implicit hope of a better/larger language model leading to new intelligent capabilities, but I've never seen the Transformer designers say they were targeting this or expecting any significant new capabilities even (to their credit) after it was already apparent how capable it was. Neither Google's initial fumbling of the tech or Shazeer's entertainment chatbot foray seem to indicate that they had been targeting, and/or realized they had achieved, a more significant advance than the more efficient seq-2-seq model which had been their proximate goal.

To me it seems that the Transformer is really one of industry/science's great accidental discoveries. I don't think it's just the ability to scale that made it so powerful, but more the specifics of the architecture, including the emergent ability to learn "induction heads" which seem core to a lot of what they can do.

The Transformer precursors I had in mind were recent ones, in particular Sutskever et als "Sequence to Sequence learning with Neural Networks [LSTM]" from 2014, and Bahdanau et als "Jointly learning to align & translate" from 2016, then followed by the "Attention is all you need" Transformer paper in 2017.


Circling back to the original topic: at the end of the day, whether it makes sense to expect more brain-like behavior out of transformers than "mere" token prediction does not depend much on what the transformer's original creators thought, but rather on the strength of the collective arguments and evidence that have been brought to bear on the question, regardless of who from.

I think there has been a strong case that the "stochastic parrot" model sells language models short, but to what extent still seems to me an open question.


I'd say that whether to expect more brain-like capabilities out of Transformers is more an objective matter of architecture - what's missing - and learning algorithms, not "collective arguments". If a Transformer simply can't do something - has no mechanism to support it (e.g. learn at run time), then it can't do it, regardless of whether Sam Altman tells you it can, or tries to spin it as unimportant!

A Transformer is just a fixed size stack of transformer layers, with one-way data flow through this stack. It has no internal looping, no internal memory, no way to incrementally learn at runtime, no autonomy/curiosity/etc to cause it to explore and actively expose itself to learning situations (assuming it could learn, which it anyways can't), etc!

These are just some of the most obvious major gaps between the Transformer architecture and even the most stripped down cognitive architecture (vs language model) one might design, let alone an actual human brain which has a lot more moving parts and complexity to it.

The whole Transformer journey has been fascinating to watch, and highly informative as to how far language and auto-regressive prediction can take you, but without things like incremental learning and the drive to learn, all you have is a huge, but fixed, repository of "knowledge" (language stats), so you are in effect building a giant expert system. It may be highly capable and sufficient for some tasks, but this is not AGI - it's not something that could replace an intern and learn on the job, or make independent discoveries outside of what is already deducible from what is in the training data.

One of the really major gaps between an LLM and something capable of learning about the world isn't even the architecture with all it's limitations, but just the way they are trained. A human (and other intelligent animals) also learns by prediction, but the feedback loop when the prediction is wrong is essential - this is how you learn, and WHAT you can learn from incorrect predictions is limited by the feedback you receive. In the case of a human/animal the feedback comes from the real world, so what you are able to learn critically includes things like how your own actions affect the world - you learn how to be able to DO things.

An LLM also learns by prediction, but what it is predicting isn't real world responses to it's own actions, but instead just input continuations. It is being trained to be a passive observer of other people's "actions" (limited to the word sequences they generate) - to predict what they will do (say) next, as opposed to being an active entity that learns not to predict someone else's actions, but to predict it's own actions and real-world responses - how to DO things itself (learn on the job, etc, etc).


Sometimes literally with that SI prefix.


This is the original video, for those looking: https://m.youtube.com/watch?v=77ubC4bcgRM


PSA: it's easy to miss on the first watch because the big action happens in the background behind the gate.


Thanks, first watch all I saw was the driveway crack appear. Second pass could be mistaken for a parallax effect as the entire background shifts forward!


So, I recommend seeing it in 3 passes. 1st pass, see the right 1/3rd area of the video. It shows the 2 sides moving. Then see the middle 1/3rd area of the video. It shows both the movement and the rupture in the ground. Then see the left 1/3rd area of the video. It shows the rupture on the ground clearly.



Link added to the top text. Thanks!


Link added to the top text. Thanks!


I once got an email about the funeral arrangements for somebody's mother. I know this person very well, because he uses my email address for everything. I know what internet subscription he has. I know where he bought his e-bike. Where he goes on holiday. Etc.

I was expecting this person to be you.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: