> It has a massive phonemic inventory, with 44 unique items. Compare with Spanish's 24, or German's 25.
I'm not sure where you're getting these numbers from, but German has around 45 phonemes according to all sources I could find, depending on how you count: 17 vowels (including two different schwa sounds), 3 diphthongs, 25 consonants.
Or btrfs. I also think that filesystem snapshots are underrated backup strategy, assuming your data fits on one disk (which should be the case for almost all applications outside of FAANG).
Why would btrfs or btrfs snapshot require single disks?
My btrfs is combination of different size disks bought over time (3T to 24T) and snapshots works just fine. I've configured it to use raid with 2 copies for data and 3 for metadata.
I wonder to what extent this might be a case where the base model (the pure token prediction model without RLHF) is "taking over". This is a bit tongue-in-cheek, but if you see a chat protocol where an assistant makes 15 random wrong suggestions, the most likely continuation has to be yet another wrong suggestion.
People have also been reporting that ChatGPT's new "memory" feature is poisoning their context. But context is also useful. I think AI companies will have to put a lot of engineering effort into keeping those LLMs on the happy path even with larger and larger contexts.
I think this is at least somewhat true anecdotally. We do know that as context length increases, adherence to the system prompt decreases. Whether that de-adherence is reversion to the base model or not I'm not really qualified to say, but it certainly feels that way from observing the outputs.
Pure speculation on my part but it feels like this may be a major component of the recent stories of people being driven mad by ChatGPT - they have extremely long conversations with the chatbot where the outputs start seeming more like the "spicy autocomplete" fever dream creative writing of pre-RLHF models, which feeds and reinforces the user's delusions.
Many journalists have complained that they can't seem to replicate this kind of behavior in their own attempts, but maybe they just need a sufficiently long context window?
I also like the recursion. In essence, you're making a meta-proof about what PA proves, and given that you trust PA, you also trust this meta-proof.
> I think just PA+"PA is consistent" is enough?
It's not clear to me how. I believe PA+"PA is consistent" would allow a model where Goodstein's theorem is true for the standard natural numbers, but that also contains some nonstandard integer N for which Goodstein's theorem is false. I think that's exactly the case that's ruled out by the stronger statement of ω-consistency.
I'm seriously fed up with all this fact-free AI hype. Whenever an LLM regurgitates training data, it's heralded as the coming of AGI. Whenever it's shown that they can't solve any novel problem, the research is in bad faith (but please make sure to publish the questions so that the next model version can solve them -- of course completely by chance).
Here's a quote from the article:
> How many humans can sit down and correctly work out a thousand Tower of Hanoi steps? There are definitely many humans who could do this. But there are also many humans who can’t. Do those humans not have the ability to reason? Of course they do! They just don’t have the conscientiousness and patience required to correctly go through a thousand iterations of the algorithm by hand. (Footnote: I would like to sit down all the people who are smugly tweeting about this with a pen and paper and get them to produce every solution step for ten-disk Tower of Hanoi.)
In case someone imagines that fancy recursive reasoning is necessary to solve the Towers of Hanoi, here's the algorithm to move 10 (or any even number of) disks from peg A to peg C:
1. Move one disk from peg A to peg B or vice versa, whichever move is legal.
2. Move one disk from peg A to peg C or vice versa, whichever move is legal.
3. Move one disk from peg B to peg C or vice versa, whichever move is legal.
4. Goto 1.
Second-graders can follow that, if motivated enough.
There's now constant, nonstop, obnoxious shouting on every channel about how these AI models have solved the Turing test (one wonders just how stupid these "evaluators" were), are at the level of junior devs (LOL), and actually already have "PhD level" reasoning capabilities.
I don't know who is supposed to be fooled -- we have access to these things, we can try them. One can easily knock out any latest version of GPT-PhD-level-model-of-the-week with a trivial question. Nothing fundamentally changed about that since GPT-2.
The hype and the observable reality are now so far apart that one really has to wonder: Are people this easily gullible? Or do so many people in tech benefit from the hype train that they don't want to rain on the parade?
I could be wrong, but it seems you have misunderstood something here, and you've even quoted the part that you've misunderstood. It isn't that the algorithm for solving the problem isn't known. The LLM knows it, just like you do. It is that the steps of following the algorithm are too verbose if you're just writing them down and trying to keep track of the state of the problem in your head. Could you do that for a large number of disks?
Please do correct me if the misunderstanding is mine.
I feel like practically anybody could solve Tower of Hanoi for any degree of complexity using this algorithm. It’s a four step process that you just repeat over and over.
That's an algorithm to solve it, but you have to describe every move at every step, while doing this on paper or in your head, while keeping track of the state of the game, again, in your head. That is what they're challenging the LLM to do.
You're still misunderstanding. If it is easy, please feel free to demonstrate solving it by telling us what the 1300th step is from working it out in your head.
Yes, the whole "Towers of Hanoi is a bad test case" objection is a non-sequitur here. It would be a significant objection if the machines performed well, but not given the actual outcome - it is as if an alleged chess grandmaster almost always lost against opponents of unexceptional ability.
It is actually worse than that analogy: Towers of Hanoi is a bimodal puzzle, in which players who grasp the general solution do inordinately better than those who do not, and the machines here are performing like the latter.
Lest anyone thinks otherwise, this is not a case of setting up the machines to fail, any more than the chess analogy would be. The choice of Towers of Hanoi leaves it conceivable that they do would well on tough problems, but that is not very plausible and needs to be demonstrated before it can be assumed.
They set it up to fail the moment they ran it with a large number of disks and assumed the models would just continue as if it ran the same simple algorithm in a loop, and the moment they set temperature to 1.
I take your point that the absence of any discussion of the effect of temperature choice or justification for choosing 1 seems to be an issue with the paper (unless it is quite obviously the only rational choice to those working in the field?)
> Second-graders can follow that, if motivated enough.
Try to motivate them sufficiently to do so without error for a large number of disks, I dare you.
Now repeat this experiment while randomly refusing to accept the answer they're most confident in for any given iteration, and pick an answer they're less confident in on their behalf, and insist they still solve it without error.
(To make it equivalent to the researchers running this with temperature set to 1)
> obnoxious shouting on every channel about how these AI models have solved the Turing test (one wonders just how stupid these "evaluators" were)
Huh? Schoolteachers and university professors complaining about being unable to distinguish ChatGPT-written essay answers from student-written essay answers is literally ChatGPT passing the Turing test in real time.
No it's not. The traditional interpretation of the Turing test requires interactivity. That is, the evaluator is allowed to ask questions and will receive a response from both a person and a machine. The idea is that there should be no sequence of questions you can ask that would reliably identify the machine. That's not even close to true for these "AI" systems.
You're right about interactivity, something that I overlooked -- but I think it's nevertheless the case that a large fraction of human interrogators could not distinguish a human from a suitably-system-prompted ChatGPT even over the course of an interactive discussion.
ChatGPT 4.5 was judged to be the human 73% of the time in this RCT study, where human interrogators had 5-minute conversations with a human and an LLM: https://arxiv.org/pdf/2503.23674
This is kind of an irrelevant (and doubtless unoriginal) shower thought here but, if humans are judging the AI to be human much more often than the human, surely that means the AI is not faithfully reproducing human behaviour.
Sure, a non-human's performance "should" be capped at ~50% for a large sample size. I think seeing a much higher percentage, like 73%, indicates systematic error in the interrogator. This -- the fact that humans are not good at detecting genuine human behaviour -- is really a problem in the Turing test itself, but I don't see a good way to solve it.
LLaMa 3.1 with the same prompt "only" managed to be judged human 56% of the time, so perhaps it's actually closer to real human behaviour.
This comes down to the interpretation of the Turing test. Turing's original test actually pitted the two "unknowns" against each other. Put simply, both the human and the computer would try to make you believe they were the person. The objective of the game was to be seen as human, not to be indistinguishable from human.
This is obviously not quite what people understand the Turing test as anymore, and I think that interpretation confusion actually ends up weakening the linked paper. Your thought aptly describes a problem with the paper, but that problem is not present in the Turing test by its original formulation.
It's hard to say what a "bona fide 3-party Turing test" is. The paper even has a section trying to tackle that issue.
I think trying to discuss the minutia of the rules is a path that leads only to madness. The Turing test was always meant to be a philosophical game. The point was to establish a scenario in which a computer could be indistinguishable from a human. Carrying it out in reality in meaningless, unless you're willing to abandon all intuitive morality.
Quite frankly, I find the paper you linked misguided. If it was undertaken by some college students, then it's good practice, but if it was carried out by seasoned professionals they should find something better to do.
The original Turing game was about testing for a male of female player.
If you want to know more about that, or this research, you could try asking AI for a no-fluff summary.
The Transformer architecture and algorithm and matrix multiplication are a bit more involved. It would be hard to keep those inside your chain-of-thought / working memory and still understand what is going on here.
I can't shake the feeling that this a dream that was pursued by people who (at least for a time) didn't need the income, and not technology that was under any pressure to actually work. Something like a lifestyle business, but in this case, maybe a lifestyle charity.
The article is full of "community" this and "local people" that, and very low on details. The little that is there raises red flags. For example: The fact that their rented machine shop had to close down is given as an explanation for them having to sell all their machines below cost and then not having the money to buy the machines back when they found a new place. That doesn't add up: temporary storage spaces exist and aren't even expensive, given that you can choose a remote location. It seems like a crucial detail was left out, maybe one that would paint them in a bad light.
I gather that they sell (apparently unsafe?) wood chippers, presses and some injection moulds, probably at cost. I don't understand what else is there. The "version 4" release thing mentioned in the article might be their open-source "academy" [1, 2] that's supposed to teach you how to start your local recycling shop. It includes valuable tips like "add all your expenses" and "don't forget to include taxes" and comes complete with an empty Excel sheet -- I'm sorry, a "Business Calculator". No commits since 2020, so the "version 5" of this guide that they claim to have been working on for five years must be hosted somewhere on a private GitHub fork instead. I'm sure it's awesome. Best of luck.
Moving a machine shop is not easy. A decent knee mill is about 2000 lbs. Disassembly is possible but you need a hoist or crane of some sort. Then you have to lift that into a vehicle like a pickup truck, trailer or box/flat bed truck. Then repeat the process at the storage location, then all again to move it to the new location. It seems the people involved did not want to or have the ability to tackle this. Paying a rigger is possible but the cost is very high.
I'll just assume they sold below cost to get people to bring their own equipment to take the machinery away at zero cost to them.
Due diligence won't catch every single problem every time. Luck is as much a part of success in business as solving the right problem at the right time. Supporting small business matters to ensure that enough people try to succeed as is necessary to get over the multitude of challenges that impede progress.
I imagine players of Bathrom Simulator 2025 find it extremely satisfying.
Seriously, I don’t know where this hate is coming from. Is it the idea that a “level” is a maze to be solved? Because there are other styles of gameplay, some where conversing with people is in fact part of the fun.
Dont disagree that compile times are slower compared to alternatives in this space, although I'm not sure I'd quite describe that as some sort of barrier to entry from contributing
Rust compile times are *really* fast compared to running Zig or Go through a static checker and then a compiler, to arrive at similar confidence in the behavior of the code as a single run through rustc.
It sounds like maybe you're used to skipping part of the development process required for stable well-tested software. Maybe that works for UIs? Wouldn't want my runtime authors feeling that way.
People who aren't developers generally install pre-built binaries. I know that's how almost every Linux distribution I've used for going on 30 years has worked.
Developers, on the other hand, seem to need some help running static checkers regularly. Most don't seem to have well exercised CI toolchains doing it for them. Or aren't aware that they should be doing so - like yourself.
Rust makes use of the tight integration between language, compiler, and checker to allow the language to be easier and more thoroughly checked than possible with checkers built for other languages. Many steps performed for compilation are reused. Which is what makes it so fast.
If you think "why can't I skip this?" you have missed the point and are exactly the target developer these checks are for.
I've written a lot of software. Never had a complaint about compile times in any language. The reality of developing with Rust is that after the first build, only changed files are rebuilt, which makes it just as fast to compile as any other language.
If compile times are what concerns you about a language, that tells me you're not very far along in your career. You have a lot more problems to discover which are very much more serious. If you ever encounter the problems Rust solves, you'll grow to appreciate it's solutions.
The smallest unit of compilation in Rust is the crate, not individual files unfortunately. That's why you'll see larger projects like Wezterm or uv use workspaces to try to tame compile times.
You're right that the crate, not the file, is the smallest unit of compilation.
My largest rust project is still fairly small at around 8k lines across two dozen files. Still, it builds and links in 30 seconds on my workstation which is several years old and not particularly fancy.
I could see this beginning to become an issue around 100k lines of code, but I'd think most people would be looking to split into multiple crates before that just for readability.
I don’t get the difference. In both C and Rust you can have pointers to uninitialized memory. In both languages, you can’t use them except in very specific circumstances (which are AFAIK identical).
There are two actual differences in this regard: C pointers are more ergonomic than Rust pointers. And Rust has an additional feature called references, which enable a lot more aggressive compiler optimizations, but which have the restriction that you can’t have a reference to uninitialized memory.
I agree with you. My point is that the additional feature (references) creates a new potential for UB that doesn’t exist in C, and that justifies the “doesn't really ever occupy my mind as a problem” statement being criticized upthread. You can’t compare C to Rust-without-references because no one writes Rust that way. It’s not like C++-without-exceptions which is a legitimate subset that people use.
Is there any advantage to this approach over publishing a separate "zod4" package? That would be just as opt-in and incremental, not bloat download sizes forever, and make it easier to not accidentally import the wrong thing.
Ecosystem libraries would need to switch from a single peer dependency on Zod to two optional peer dependencies. Despite "optional peer dependencies" technicall being a thing, its functionally impossible for a library to determine which of the two packages your user is actually bringing to the table.
Let's say a library is trying to implement an `acceptSchema` function that can accepts `Zod3Type | Zod4Type`. For starters: those two interfaces won't both be available if Zod 3 and Zod 4 are in separate packages. So that's already a non-starter. And differentiating them at runtime requires knowing which package is installed, which is impossible in the general case (mostly because frontend bundlers generally have no affordance for optional peer dependencies).
Thanks a lot for zod and really looking forward to trying out the new version! Also as a primary Go developer this issue just has been really interesting to read about. Go avoids this issue by just not having peer dependencies and relying on the compiler to not bundle unused code - or just live with huge binaries when it doesn't work out :-)
I am still curious about the `zod4` idea though. Any thoughts on adding an `__api_version` type of property to distinguish on to all of zod's public types? Perhaps it's prohibitive code-size wise but wondering if it's possible somehow. Then downstream libraries could differentiate by doing the property check.
Just wanted to share the idea but the current state seems ok too.
I'm not sure where you're getting these numbers from, but German has around 45 phonemes according to all sources I could find, depending on how you count: 17 vowels (including two different schwa sounds), 3 diphthongs, 25 consonants.
reply