More

ceh123 · 2025-11-15T19:06:31 1763233591

Exactly! It's n+1 points in n dimensions (when finite). Another way to think about it (the way that I know because it extends into general Banach spaces and not just n dimensional spaces) is that each point inside is the unique weighted average of the extreme points (corners). So in 2d, if you have a square you can get that middle point by averaging all the corners, or averaging two opposing corners, so it's not a simplex.

ceh123 · 2025-11-15T17:19:51 1763227191

On the topic of simplices! I did my PhD in dynamical systems and the space of invariant measures [0] is (in the compact setting) always a simplex and the extreme points are the ergodic measures. It's because of this that you can kind of assume your system is ergodic do work there and frequently be able to generalize to the non-ergodic case (through ergodic decomposition).

But the real thing I wanted to mention here was the Poulsen Simplex [1]. This is the unique Choquet simplex [2] for which the extreme points are dense. This means that it's like an uncountably infinite dimensional triangle where no matter where you are inside the triangle, you're arbitrarily close to a corner. It's my favorite shape and absolutely wild and impossible to conceptualize (even though I worked with it daily for years!)

[0] https://en.wikipedia.org/wiki/Invariant_measure

[1] https://eudml.org/doc/74350

[2] https://en.wikipedia.org/wiki/Choquet_theory

ceh123 · 2025-11-13T15:28:26 1763047706

This paper is a theoretical analysis showing that the ridge regularization that optimizes the source task almost never optimizes transfer performance. Interestingly, in high SNR regimes (low noise) the optimal regularization for pre-training is higher than the task specific optimal regularization, and in low SNR regimes (high noise) it’s better to regularize less than you would if you were just optimizing for that task.

Although the proofs are in the world of (L2-SP) ridge regression, experiments were run using an MLP on MNIST and CNN on CIFAR-10 and suggest the SNR-regularization relationship persists in non-linear networks.

ceh123 · 2025-10-27T20:48:43 1761598123

Presenting false data to investors is fraud, doesn't matter how it was generated. In fact, humans are quite good at "generating plausible looking data", doesn't mean human generated spreadsheets are fraud.

On the other hand, presenting truthful data to investors is distinctly not fraud, and this again does not depend on the generation method.

alfalfasprout · 2025-10-27T20:52:15 1761598335

If humans "generate plausible looking data" despite any processes to ensure data quality they've likely engaged in willful fraud.

An LLM doing so needn't even be willful from the author's part. We're going to see issues with forecasts/slide decks full of inaccuracies that are hard to review.

ceh123 · 2025-10-28T01:06:42 1761613602

I think my main point is just because an LLM can lie, doesn’t necessarily mean an LLM generated slide is fraud. It could very easily be correct and verified/certified by the accountant and not fraud. Just cuz the text was generated first by an LLM doesn’t mean fraud.

That being said, oh for sure this will lead to more incidental fraud (and deliberate fraud) and I’m sure it already has. Would be curious to see the prevalence of em-dash’s in 10k’s over the years.

lionkor · 2025-10-27T21:44:53 1761601493

> doesn't matter how it was generated

is there precedent for this supposed ruling?

ceh123 · 2025-10-28T00:56:56 1761613016

US v Simon 1969, see [0] for a review.

Establishes that accountants who certify financials are liable if they are incorrect. In particular, if they have a reason to believe they might not be accurate and they certify anyway they are liable. And at this stage of development it’s pretty clear that you need to double check LLM generated numbers.

Obviously no clue if this would hold up with today’s court, but I also wasn’t making a legal statement before. I’m not a lawyer and I’m not trying to pretend to be one.

[0] https://scholarship.law.stjohns.edu/cgi/viewcontent.cgi?arti...

lionkor · 2025-10-28T07:21:11 1761636071

Fascinating thank you for the link

ceh123 · 2025-08-31T15:24:48 1756653888

Even one of these topics I would say it would take most PhDs at least 2-3 years to “master”. I feel like at the end of my math PhD (5 years, 3 focused solely on my research area) I had just scratched the surface of mastery in my sub field, and that’s with 3 published papers.

I guess you’re right though, defining “mastery” is the key missing point here.

ceh123 · 2025-08-08T13:04:52 1754658292

Right but for self improving AI, training new models does have a real world bottleneck: energy and hardware. (Even if the data bottleneck is solved too)

ceh123 · 2025-06-03T17:10:23 1748970623

ClearStride AI is building a comprehensive AI/ML powered platform for diagnostic radiology. Our initial focus is on equine radiographs, specifically targeting the unique needs of sports horse practitioners.

We are using deep learning to build a comprehensive diagnostic assistance platform that will enhance veterinary workflows and improve diagnostic accuracy. Our mission is to revolutionize the field of veterinary diagnostics, starting with automated annotations of radiographs and report generation.

We are looking for a founding SWE to help us finalize and deploy our MVP. The team is remote and based between CO and NY.

If you are interested please reach out to us through founders at clearstride dot ai.

ceh123 · 2025-05-07T13:20:57 1746624057

I'm not sure if this really says the truth is more complex? It is still doing next-token prediction, but it's prediction method is sufficiently complicated in terms of conditional probabilities that it recognizes that if you need to rhyme, you need to get to some future state, which then impacts the probabilities of the intermediate states.

At least in my view it's still inherently a next-token predictor, just with really good conditional probability understandings.

dymk · 2025-05-07T13:51:54 1746625914

Like the old saying goes, a sufficiently complex next token predictor is indistinguishable from your average software engineer

johnthewise · 2025-05-07T14:41:27 1746628887

A perfect next token predictor is equivalent to god

lanstin · 2025-05-07T20:36:15 1746650175

Not really - even my kids knew enough to interrupt my stream of words with running away or flinging the food from the fork.

Tadpole9181 · 2025-05-08T04:30:03 1746678603

That's entirely an implementation limitation from humans. There's no reason to believe a reasoning model could NOT be trained to stream multimodal input and perform a burst of reasoning on each step, interjecting when it feels appropriate.

We simply haven't.

lanstin · 2025-05-09T21:17:58 1746825478

Not sure training on language data will teach how to experiment with the social system like being a toddler will, but maybe. Where does the glance of assertive independence as the spoon turns get in there? Will the robot try to make its eyes gleam mischeviously as is written so often.

jermaustin1 · 2025-05-07T13:51:02 1746625862

But then so are we? We are just predicting the next word we are saying, are we not? Even when you add thoughts behind it (sure some people think differently - be it without an inner monologue, or be it just in colors and sounds and shapes, etc), but that "reasoning" is still going into the act of coming up with the next word we are speaking/writing.

spookie · 2025-05-07T18:19:03 1746641943

This type of response always irks me.

It shows that we, computer scientists, think of ourselves as experts on anything. Even though biological machines are well outside our expertise.

We should stop repeating things we don't understand.

BobaFloutist · 2025-05-07T18:57:10 1746644230

We're not predicting the next word we're most likely to say, we're actively choosing the word that we believe most successfully conveys what we want to communicate. This relies on a theory of mind of those around us and an intentionality of speech that aren't even remotely the same as "guessing what we would say if only we said it"

ijidak · 2025-05-07T23:45:06 1746661506

When you talk at full speed, are you really picking the next word?

I feel that we pick the next thought to convey. I don't feel like we actively think about the words we're going to use to get there.

Though we are capable of doing that when we stop to slowly explain an idea.

I feel that llms are the thought to text without the free-flowing thought.

As in, an llm won't just start talking, it doesn't have that always on conscious element.

But this is all philosophical, me trying to explain my own existence.

I've always marveled at how the brain picks the next word without me actively thinking about each word.

It just appears.

For example, there are times when a word I never use and couldn't even give you the explicit definition of pops into my head and it is the right word for that sentence, but I have no active understanding of that word. It's exactly as if my brain knows that the thought I'm trying to convey requires this word from some probability analysis.

It's why I feel we learn so much from reading.

We are learning the words that we will later re-utter and how they relate to each other.

I also agree with most who feel there's still something missing for llms, like the character from wizard of Oz that is talking while saying if he only had a brain...

There is some of that going on with llms.

But it feels like a major piece of what makes our minds work.

Or, at least what makes communication from mind-to-mind work.

It's like computers can now share thoughts with humans though still lacking some form of thought themselves.

But the set of puzzle pieces missing from full-blown human intelligence seems to be a lot smaller today.

thomastjeffery · 2025-05-07T14:24:20 1746627860

We are really only what we understand ourselves to be? We must have a pretty great understanding of that thing we can't explain then.

mensetmanusman · 2025-05-07T22:28:27 1746656907

I wouldn’t trust a next word guesser to make any claim like you attempt, ergo we aren’t, and the moment we think we are, we aren’t.

hadlock · 2025-05-07T17:01:19 1746637279

Humans and LLMs are built differently, it seems disingenuous to think we both use the same methods to arrive at the same general conclusion. I can inherently understand some proofs of pythagorean's theorem but an LLM might apply different ones for various reasons. But the output/result is still the same. If a next token generator run in parallel can generate a performant relational database that doesn't directly imply I am also a next token generator.

skywhopper · 2025-05-07T21:50:59 1746654659

Humans do far more than generate tokens.

Mahn · 2025-05-07T14:34:52 1746628492

At this point you have to start entertaining the question of what is the difference between general intelligence and a "sufficiently complicated" next token prediction algorithm.

dontlikeyoueith · 2025-05-07T19:11:36 1746645096

A sufficiently large lookup table in DB is mathematically indistinguishable from a sufficiently complicated next token prediction algorithm is mathematically indistinguishable from general intelligence.

All that means is that treating something as a black box doesn't tell you anything about what's inside the box.

int_19h · 2025-05-07T19:51:26 1746647486

Why do we care, so long as the box can genuinely reason about things?

chipsrafferty · 2025-05-07T22:29:19 1746656959

What if the box has spiders in it

dontlikeyoueith · 2025-05-08T00:09:12 1746662952

:facepalm:

I ... did you respond to the wrong comment?

Or do you actually think the DB table can genuinely reason about things?

int_19h · 2025-05-08T00:59:46 1746665986

Of course it can. Reasoning is algorithmic in nature, and algorithms can be encoded as sufficiently large state transition tables. I don't buy into Searle's "it can't reason because of course it can't" nonsense.

zeroonetwothree · 2025-05-08T02:51:14 1746672674

It can do something but I wouldn’t call it reasoning. IMO a reasoning algorithmic must be more complex than a lookup table.

int_19h · 2025-05-08T02:58:06 1746673086

We were talking about a "sufficiently large" table, which means that it can be larger than realistic hardware allows for. Any algorithm operating on bounded memory can be ultimately encoded as a finite state automaton with the table defining all valid state transitions.

dontlikeyoueith · 2025-05-08T18:50:07 1746730207

This is such a confusion of ideas that I don't even know how to respond any more.

Good luck.

Tadpole9181 · 2025-05-07T14:11:41 1746627101

But then this classifier is entirely useless because that's all humans are too? I have no reason to believe you are anything but a stochastic parrot.

Are we just now rediscovering hundred year-old philosophy in CS?

BalinKing · 2025-05-07T15:28:56 1746631736

There's a massive difference between "I have no reason to believe you are anything but a stochastic parrot" and "you are a stochastic parrot".

ToValueFunfetti · 2025-05-07T16:16:26 1746634586

If we're at the point where planning what I'm going to write, reasoning it out in language, or preparing a draft and editing it is insufficient to make me not a stochastic parrot, I think it's important to specify what massive differences could exist between appearing like one and being one. I don't see a distinction between this process and how I write everything, other than "I do it better"- I guess I can technically use visual reasoning, but mine is underdeveloped and goes unused. Is it just a dichotomy of stochastic parrot vs. conscious entity?

Tadpole9181 · 2025-05-08T00:20:30 1746663630

Then I'll just say you are a stochastic parrot. Again, solipsism is not a new premise. The philosophical zombie argument has been around over 50 years now.

ceh123 · 2025-01-28T13:40:39 1738071639

To add some extra numbers here just to showcase how little energy usage this is.

This means it's adding about 0.012% additional energy consumption to those users energy consumption.

From another angle: Average US house energy consumption is around 30kWh per day. 0.012% of that is 3.75 watt hours of energy per day. This is the equivalent amount of energy as streaming HD video to your iPhone on a 4G network for 1.5 seconds. [0]

So in other words, a 15s youtube ad you are forced to watch on your phone before watching the video you were going to watch anyway takes an order of magnitude more energy than the average AI user according to this article.

[0] https://www.statista.com/statistics/1109623/electricity-cons...

bumby · 2025-01-28T13:50:44 1738072244

Help me understand. The article claims 20MM users with an average of 24 prompts per day. Do we have any information on the average number of prompts per user? I suspect it would follow a Pareto distribution, with a small subset consuming much much more, but no clue as to where that average assumption holds.

Edit: a quick search points to 5-10 prompts per session but we’d still need to know the average number of sessions

datadrivenangel · 2025-01-28T14:28:59 1738074539

In December I did 61 images, and in November I did 121, according to Replicate's billing. Probably more like 10-20 images per session, which for me are usually 1-2 times per week?

visarga · 2025-01-28T14:28:16 1738074496

Yes, I searched this too. There probably are many single round sessions, but the average length is 8-12 messages.

ceh123 · on Nov 11, 2024

We don't need first principals thinking every time, but having an understanding of why you can't just test 100 variations of your hypothesis and accept p=0.05 as "statistically significant" is important.

Additionally it's quite useful to have the background to understand the differences between Pearson correlation and Spearman rank, or why you might want to use Welch's t-test vs students, etc.

Not that you should know all of these things off the top of your head necessarily, but you should have the foundation to be able to quickly learn them, and you should know what assumptions the tests you're using actually make.