Hacker Newsnew | past | comments | ask | show | jobs | submit | ekez's commentslogin

I implemented an algorithm for neural network verification called ⍺β-CROWN for a deep-learning library called tinygrad.

https://github.com/0xekez/tinyLIRPA

tinygrad’s small set of operations and laziness made it easy to implement. Tho my overall sense is that neural network verification is currently more of a research interest than something practical.


I wonder if the authors have tried incorporating error feedback from Lean into their models.

Work from 2023 [1] showed general purpose models did better when they were able to incorporate error feedback, humans incorporate error feedback, but none of the SOTA models on minif2f seem to.

[1]: https://arxiv.org/abs/2310.04353


In a way, DeepSeek Prover's subgoal decomposition is a partial-step towards error/proof-state feedback. (DS Prover breaks down a proof into subgoals and attacks each subgoal separately with batched sampling, then puts the pieces back together.)

This is distinct from the approach of the previous SOTA for an open-weights model (Kimina Prover) which generated at the full-proof level.

While it was very impressive to see Kimina's ability to generate medium-length proofs (think AIME-level problems) without sub-goals or feedback at intermediate steps, it's likely that at least subgoal decomposition will be required for longer proofs (think IMO-level problems.)

I certainly agree that where and how error/proof state feedback is best incorporated (training data synthesis / reward function / CoT during inference / etc.) is a fascinating area of research. (It's rumored that GDM's AlphaProof does use proof state / lean feedback already.)


That's surprising to learn.

I'm surprised those even use actual lean code instead of like raw type theory.


There’s something compelling about these, especially w.r.t. their ability to generalize. But what is the vision here? What might these be able to do in the future? Or even philosophically speaking, what do these teach us about the world? We know a 1D cellular automata is Turing equivalent, so, at least from one perspective, NCA/these aren’t terribly suprising.


Potentially it would be useful if you could enter a grid from satelite images and simulate wildfire spread or pollution spread or similar problems.


these are going to be the dominant lifeforms on earth exceeding bacteria, plants and humans in terms of energy consumption

cellular automata that interact with their environment, ones that interact with low level systems and high level institutions. to some approximation we, humans are just individual cells interacting in these networks. the future of intelligence aint llms, but systems of automata with metabolic aspects. automata that co-evolve, consume energy and produce value. ones that compete, ones that model each other.

we're not being replaced, we're just participants in a transformation where boundaries between technological and cellular systems blur and eventually dissolve. i'm very thankful to be here to witness it

see: https://x.com/zzznah/status/1803712504910020687


I'll have what this guy is smoking. Those visualizations are pretty, though.

I can imagine this being useful for implementing classifiers and little baby GenAI-adjacent tech on an extremely tiny scale, on the order of several hundred or several thousand transistors.

Example: right now, a lot of the leading-edge biosensors have to pull data from their PPG/ECG/etc chips and run it through big fp32 matrices to get heart rate. That's hideously inefficient when you consider that your data is usually coming in as an int16 and resolution any better than 1bpm isn't necessary. But, fp32 is what the MCU can do in hardware so it's what you gotta do. Training one of these things to take incoming int16 data and spit out a heart rate could reduce the software complexity and cost of development for those products by several orders of magnitude, assuming someone like Maxim could shove it into their existing COTS biosensor chips.


yes absolutely: current systems are wildly inefficient. the future is one of extreme energy efficiency.

re smoking: sorry let me clarify my statement. these things will be the dominant life forms on earth in terms of metabolism, exceeding the energy consumption of biological systems, over 1k petawatt hours per year, dwarfing everything else

the lines betwen us may blur metaphorically, we'll be connected to them how we're connected to ecosystems of plants and bacteria. these systems will join and merge in the same way we've merged with smartphones -- but on a much deeper level


Okay so another way to put it is that these are gonna be the software we run on lots of computers in the future. Why this particular model of intelligence and not some other one?


So grandiose. It's a good thing to rapture is happening when you're alive to see it. You're just that important.


i wasn't around to see the first humans land on the moon. i feel a similar deep sense of awe and excitement to see this revolution


because the goal of life is to maximize metabolic throughput?

or to minimze energetic waste?


The self-healing properties suggest biological evolution to me.


If "we" all got together and reduced permitting costs, "our" homes would be much more valuable because a developer can build a highrise where the house is.


why is it impossible to decide whether x<0, x=0, or x>0 as in Example 1?


This is better known as the Table Maker's Dilemma.

Say you have some computable number p, that means you can compute a (rational) approximation p' to p within any given tolerance eps > 0 (i.e. you know |p - p'| < eps). To determine whether p > 0, p = 0, or p < 0, you compute an approximation p' to a certain tolerance eps. If p' > eps then you know p > 0, if p < -eps then you know p < 0, otherwise you need a better approximation... Without further knowledge about p, there is no point where you can assert p = 0.


Thank you for the response. (I assume you mean "p' < -eps" in the second conditional?)

How often are rational approximations computable within any given tolerance?


Only x=0 is undecidable. It's because you have to check an infinite number of digits past the decimal point to see if all of them are zero, and that will not halt.


> Only x=0 is undecidable.

That cannot be true, because if both x<0 and x>0 are decidable then x=0 is also decidable.


Equality of real numbers is undecidable. Specifically recursively undecidable, which is the same as uncomputable.

The Real numbers are topologically weird when you dig into it.

Random example.

https://mathworld.wolfram.com/RichardsonsTheorem.html


I agree that given two computable real numbers x,y, asking whether x=y is undecidable.

The point I'm making is that it can't be true that x<y or x>y are decidable, because then x=y would also be decidable, and it isn't.


A computable number is real number that can be computed to by a finite, terminating algorithm.

x<y meets that definition because you can compare digits until one doesn't match.

The same can be done for not equal, testing until there is a difference.

Real real numbers are infinite precision.

Showing that a number is not, not equal always requires you to normalize, truncate etc...

No matter how many digits you test, there is an uncountable infinity of real numbers between you epsilon.

That means you do not have a finite terminating algorithm.


Is x < y decidable or semidecidable?

If x and y are the same number, then a Turing machine that compares the digits of x and y to determine whether x < y will never terminate. Doesn’t this mean that x < y is not decidable?


A Turing machine as an infinite long tape, and even if it takes 10 times the time to the heat death of the universe x<y will halt.

Finite time is not practical time.

Given two Turing machines A and B approximating numbers a and b, where a ≠ b, you can use an epsilon to prove a > b, that is 'sufficient' for a proof.

You are trying to convert that one sided, semi-decidable 'a>b' into a two sided a>b f Id yes and a≤b is no.

That is undecidable.

Even if you change the ≤ into <, which does imply the exitance of b. It doesn't prove equality.

You have to resort to tactics like normalization for that.

Type theory is probably a good path to understand why. You seem to be coming from a set perspective and you will run into the limits of ZFC that way.


> A Turing machine as an infinite long tape, and even if it takes 10 times the time to the heat death of the universe x<y will halt.

> Finite time is not practical time.

Sorry for my daftness but if x and y are both a transcendental number, for example pi, then x and y have infinitely many digits so the Turing machine that determines whether x < y is true will not halt, even with an infinitely long tape, or will it? What am I misunderstanding?

If you could recommend a source that explains the basics of these arguments, I’d appreciate it.


Pi is transcendental but computable.

For an introduction to real real analysis is an old book typically called the 'Baby Rudin' Principles of Mathematical Analysis by Walter Rudin.

It is tiny and expensive, as most college math text books are once you get past a certain level. No idea if there is something better now, it is from the 1950's and there are used copies in most used book shops for cheap (or there was). It has no definition of 'equals' but does have a concept of equivalent relations.

Dummit and Foote's Abstract Algebra book is a great reference, especially with the problem with AI pollution. Coming from a General Relativity background, it still took me two years to work through on my own.

I don't know any suggestions on a more modern approach of starting with type or category theory.

I am not really interested in constructivist math, it is a problem to get around for me, so maybe someone who is can chime in on a better path.

While not rigorous, this paper does a good job at explaining just how muddy the concept of 'equality' is.

https://arxiv.org/abs/2405.10387


What if you want to decide if x<y when actually x==y? Then your comparator doesn't halt.


You need to move past decimal expansion, there are sufficient methods to prove one sided limits on the reals in finite time.


Haven't seen the paper, but I'd guess the argument is something like "if the domain for x includes 0, then [edit: whenever x = 0] the Turing machine will get into an infinite loop looking for the first inverted bit[1]. _None_ of the three states will ever be provably true or false. If the domain _excludes_ 0 then you can prove that the machine will hit a non-zero bit in finite time, thus terminate, and either x<0 or x>0 will be shown true. But if you exclude 0 then you've also shown that the =0 state will never be true."

[1] multiple zero bits and a one bit is >0, multiple 1 bits and a zero bit is <0. Infinite zero bits (even if followed by a one bit) is +0, infinite 1 bits (even if followed by a zero bit) is -0.


An exact comparison is undecidable. You can only say if the numbers sufficiently differ. This is equivalent to saying that x > 0 + ε or x < 0 - ε, where ε > 0. If you represent numbers in binary, ε = 2^(-n), where n is the maximum number of bits you allocate to represent a number. Obviously you have n < ∞ in any finite machine.


Nope. You're thinking about this mathematically, not algorithmically. The question is essentially to decide whether the output of a TM is all zeros or whether it will eventually output a 1. If it outputs a 1 you will know that in a finite amount of time by running the program, but if it never outputs a 1 then you can't tell that by running the program, and it's impossible in general to tell any other way because that would be tantamount to solving the halting problem.

Here is a concrete example: write a program that systematically tests all numbers for counterexamples to the Goldback conjecture. Every time a number fails to be a counterexample, output a zero, otherwise output a 1. Consider the output of that program to be a real number expressed as a binary decimal. If that number is not zero you will know in a finite amount of time, but if it is zero you will not know that until someone proves the Goldbach conjecture.


The order relation on computable reals is undecidable - this is well known, and the parent's argument is correct. If both x<y and x>y were decidable then you could easily decide equality too by just checking them and seeing if the answer to both was "no".

Computability theory is mathematics.


Tracing back through this thread, the term "deciable" was introduced in a very unfortunate way:

ekez: why is it impossible to decide whether x<0, x=0, or x>0 as in Example 1?

lisper: Only x=0 is undecidable. It's because you have to check an infinite number of digits past the decimal point to see if all of them are zero, and that will not halt.

teraflop: That cannot be true, because if both x<0 and x>0 are decidable then x=0 is also decidable.

Both lisper (me) and teraflop are correct. The problem is that inequality is not decidable, it's semi-decidable. It's teraflop who introduced the word "decidable" into the discussion, not me, but I didn't pay enough attention to that at that time.

/u/scarmig got it right in this comment: https://news.ycombinator.com/item?id=41150519


Right, I agree that they are semidecidable, thanks for clarifying. My interpretation of "only x=0 is undecidable" was that you were claiming the other two were decidable (which is a reasonable reading imo). But it sounds like we don't actually disagree here.


It's my fault for not being more explicit. I just didn't think it all the way through.


> Nope. You're thinking about this mathematically, not algorithmically.

I don't know what you mean by this, and I'm not sure how it relates to what I said. I'm using "decidable" in the strict, computer-science sense of the word.

The statement in example 1 of the paper, which we're discussing, is about computable real numbers. A computable real number is one for which there is a Turing machine that can calculate it to any desired finite precision in finite time.

A semidecidable problem is one for which there is a program that halts if the answer is "yes", but may not halt if the answer is "no". The halting problem is in this category.

Given a computable real number x, asking whether x<0 or x>0 are semidecidable but not decidable problems.


Yes, you're right. See https://news.ycombinator.com/item?id=41154273 for my post-mortem.


x > 0 and x < 0 are both semidecidable. Which is to say, they can be decided, but their converses cannot. So if either of those are true, they will halt (by running them in parallel, and halting once one halts), but if neither of them are, it may not halt.

If you remove 0 from possible inputs, you no longer need to worry about that possibility, and the problem is then decidable.


The metric the authors use confuses me.

Edit distance seems like a strange way to test if the model understands arithmetic ([1], Figure 3). I think `1+3=3` would be equally as correct as `1+1=9`?

Why not consider how far off the model is `abs(actual-expected)`? I wonder if there is an inflection point with that metric.

https://arxiv.org/abs/2206.07682


It depends on how you do arithmetic. If you're a human and you do column addition, 12345+35791=58136 is just as big of a mistake as 48146 (the actual result is 48136). It's just one mistaken column in both. Binary half-adders work the same way.

We don't really know how LLMs do arithmetic. Maybe token edit distance would be interesting, but either way it doesn't really change the claim of the paper.

Unrelated: The link is incorrect, the one you're referring to is here: https://arxiv.org/pdf/2304.15004.pdf


Yeah, and as an aside I wonder how hard it would be to train an LLM to do addition, multiplication, etc, human style? Presumably it should be possible at least in step-by-step style (as substitute for short-term memory), the same way that we do it.

Without using an algorithmic approach, it seems an LLM can only learn a bunch of partially correct heuristics, and attempt to generalize over examples.

I've played with this a bit in the past, and came to the conclusion that GPT-3 seems to have learnt to compare the size of numbers (whether accurately or via heuristics), and would get the approximate size of an answer right (depending on the task), even if not the actual value right. I seem to recall it also doing this for tasks like asking for a prime number greater than a particular value.


I mean, is it efficient to teach them addition human style instead of heuristics of when to call the right function?

Imagine you could say 'calc' in your brain, and some separate subcomponent of your brain that is far more power efficient could return an answer almost instantly? You would not focus on understanding addition/subtraction, more you would focus on on when to use addition/subtraction/multiplication or whatever.


> is it efficient to teach them addition human style

Not if math is your only goal, but there'd be value in making the models more powerful so that they could learn to do simple things like this (and not so simple things) by themselves. You can't have a tool for everything, and hopefully future AGI can itself do things that are more than just a mashup of existing tool capabilities.


> ChatGPT specifically seems to absolutely adore the phrase [complex and multifaceted], using it at every opportunity to explain higher level concepts.

I'm curious why this is. Before the echo chamber, did the LLM lean some ontology of speech?


The title is an allusion to the experiment which samples the output of the 12th layer of LLaMA-7b. Surprisingly, there is a fair bit of structure on the principle components of those samples.


Generally agree - the goal here was mostly to illustrate how simple a TCP proxy actually could be.

I shiver to think of how many lines of code actually get run with the tokio macro and green threads.


What are the advantages of tagged memory as opposed to using the unused bits in a regular pointer as a tag?


Usually, tagged memory at a hardware level, goes along with support for tagged operations in the processor.

In the LISP machines for example, you had an add instruction. Which would happily work correctly on pointers, floats, and integers depending on the data type, at the machine code level. Offers safety and also makes the the compiler simpler.

But where this really shines is in things like, well, lists, since the tags can distinguish atoms from pairs and values like nil, fairly complex list-walking operations can be done in hardware, and pretty quickly too. It also makes hardware implementation of garbage collection possible.

This is just my intuition, but I suspect, these days, it all works out to about the same thing in the end. You use some of the cache for code that implements pointer tagging, or you can sacrifice some die area away from cache for hardwired logic doing the same thing. It probably is in the same ballpark of complexity and speed.


In addition to what retrac wrote, these tag bits would apply to immediates as well, not just pointers.


Speed, like is usual with hw vs software implementations of things. But the difference is these days less than it used to be, because processors spend most of their time stalled on all kinds of things and the ALU processing capacity is underused most of the time.

An interesting question for a modern ISA design would be to figure out how to make tagged words and memory work well with SIMD.


> An interesting question for a modern ISA design would be to figure out how to make tagged words and memory work well with SIMD.

Not sure what the issue might be. Let’s say you’re doing a multiply-add: you’d call the one for the data type you want and if any operand were of the wrong type you’d get a fault. Am I missing something?


That sounds like a lot of mode bits or instruction variants to me. But maybe it wouldn't be a problem.


Consider that your SIMD instruction might itself take a tag mask and the ALU need only do equality on the tag field. In fact it could do that in parallel with the ALU op; on mismatch you could simply discard the current state and abort the operation. However realistically you’d want the same set of SIMD instructions as an I tagged architecture anyway.

Also I expect any compiler would assume that the contents of an array subject to SIMD computation would be homogeneous anyway, perhaps trying to enforce it elsewhere.

In any case this doesn’t seem like a big deal to me…but I could be wrong!


Yep, sounds like a good sketch.

I guess to really get into it a good start would be to work with existing SIMD and take a quantitative approach to what is actually the most hot part of it. I wonder if any existing language implementations (eg Common Lisp) attempt to do these kinds of things in the first place.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: