More

bnjmn · 2025-10-16T20:43:21 1760647401

Here's a use case that seems more science fictional to me (as the parent of a 2yo) than warp drive: a robot that can gently restrain an uncooperative human baby while changing its diaper, with everything that entails: identifying and eliminating all traces of waste from all crevices, applying diaper cream as necessary, unfolding and positioning the new diaper correctly and quickly, always using enough but never too much force... not to mention the nightmare of providing any guarantees about safety at mass-market scale. Even one maimed baby, or even just a baby some robot neglects to prevent from falling off the changing table, is game over for that line of robots.

Is there any research program that could claim to tackle this? It's so far beyond folding laundry and doing dishes, which are already quite difficult.

I wouldn't bet my life on this tech _never_ materializing, but I would mistrust anyone who claimed it was feasible with today's tech. It calls for an entirely different kind of robotic perception, feedback, and control.

robobenjie · 2025-10-16T20:49:22 1760647762

This is a great one. The manipulation is hard, but we're probably on a trajectory to be able to do it in 1-3 years if you were tolerant of some risk to the baby, but, of course, your tolerance for injuring babies is basically zero. I think 'risk & reliability' is a good potential category: there is the bar of 'got it to do a task reliably enough that we got a video' and the bar of 'got it to do a task reliably enough that I'd risk an infant in its grippers.

Judgmentality · 2025-10-16T21:16:42 1760649402

> but we're probably on a trajectory to be able to do it in 1-3 years

This is wildly optimistic. I quit working in robotics because I got tired of all the bullshit promises everybody made all the time. I'm not saying robotics isn't advancing or the work is unimportant, but the spokespeople are about as reliable as Musk when it comes to timelines.

I doubt it will happen in 10 years, even with a constrained environment and hardware that costs well into 6 digits.

amarant · 2025-10-16T22:10:12 1760652612

I think GP was basically talking about doing it on a doll. As in, a robot in 1-3 years might be able to change diapers with occasional success, but half the tries will result in a dismembered diaper user: we'd use dolls in this scenario, since dismembering babies is taboo and generally frowned upon within the robotics community.

mitthrowaway2 · 2025-10-16T23:53:34 1760658814

It will have to be a robot doll. Changing my baby's diaper was a piece of cake until he learned how to escape midway through the process. Babies can be surprisingly hard to restrain!

delichon · 2025-10-17T12:34:21 1760704461

Musk said something the other day about SpaceX being an operation that converts impossible to late.

wiz21c · 2025-10-17T13:03:39 1760706219

after fiddling for 10 minutes with the baby, while being late for the day job, because it KEEPS ON MOVING while your changing the FRIGGING diaper that is full of FECES I can assure you that my tolerance is clearly above zero :-)

dylan604 · 2025-10-16T21:01:54 1760648514

> your tolerance for injuring babies is basically zero.

Um, no it's not. Is absolutely zero tolerance. There is not weasel words out of this. If a robot was to cause any pain to the baby, there would be no remorse. There would be no front of mind thoughts to not repeat the same thing the next time. There would be no guilt for causing pain to the baby.

Why you would "basically" this the way you have is disturbing.

robobenjie · 2025-10-16T21:39:14 1760650754

Sorry, this is me communicating like an engineer. In a technical sense risk of anything can only approach zero: never actually get there. I meant that there should be essentially zero chance, similar to holding a baby in your arms or putting it in a high chair, and probably less chance of injury than driving in a car with a baby in a car-seat. Basically zero.

poly2it · 2025-10-16T21:19:12 1760649552

I don't think the parent comment advocates for hurting babies. It just, probably correctly, states that cherry picked examples won't be representative of roboty safety with infants in the next years, but that true safety will improve over time as well.

trhway · 2025-10-16T22:22:01 1760653321

real world treatment of babies is very different from the zero tolerance you've described. From pregnant mothers smoking/drinking to medical care unavailability to doctor errors to various toxin contaminated baby products and the environment (Flint leaded water comes to mind) to babies left in hot cars and other abuse to poor availability of daycare (even less availability of daycare good for mental development) to ...

Granted most of this is unintentional. The same about injuries by robots - we're supposedly talking about unintentional injuries here. So, if robots save money/time/effort (like Flint water switch) i'm not sure that the society would suddenly change its current approach to unintentional baby injuries and implement zero tolerance.

To illustrate - Uber self-driving killed a woman, and another self-driving maimed a woman in SF. Uber case was an obvious criminal gross negligence running with explicitly disabled emergency braking), and the company wiggled out of it in part by having to shut down the self-driving. Where is in SF it was an obvious case of technology limitations and teething issue, so there were no real severe consequences as we're much more tolerant to honest technological accidents (at least when they happen not to us personally).

eru · 2025-10-17T06:37:16 1760683036

> From pregnant mothers smoking/drinking to medical care unavailability to doctor errors to various toxin contaminated baby products and the environment [...]

You don't even need to go so extreme. Driving involves risk. And so does getting out of bed at all (or staying in bed..)

If the chance of the robot hurting your kid becomes orders of magnitude smaller than the chance of getting hit by a freak asteroid, you can probably call that save enough, even if it's not strictly speaking zero.

otikik · 2025-10-17T08:51:28 1760691088

> Zero tolerance

Well that is simply not possible. Even mothers drop their babies to the floor sometimes (very infrequently, I hope). Even for humans the tolerance isn't zero.

robotresearcher · 2025-10-16T20:52:35 1760647955

> It would require an entirely different kind of robotics.

I was 100% with you until suddenly this technical claim pops out. You might feel this way, and might be right, but why? Changing a diaper is crazy hard, I absolutely agree, but you seem to be just declaring from vibes that we 'require an entirely different kind of robotics'. Can you put your finger on why this is true?

Not nitpicking for the fun of it - I'm genuinely interested. Robot person.

nerdsniper · 2025-10-16T21:57:09 1760651829

The main limitation right now is that robotics are very limited in their sense of touch.

After that, they are limited in their understanding of physics. After that, perhaps understanding of physics and physiology would come into play - but perhaps superhuman perception and reaction time could reduce the need for intuitive understanding physics and physiology.

nerdsniper · 2025-10-17T19:25:31 1760729131

s/"physics and physiology"/"baby behavior and physiology"

in case anyone was wondering why I proffered "physics" in two different sentences.

shubb · 2025-10-16T22:06:24 1760652384

I think it needs a water gun. If the diaper was a spray on layered rubber, like a sponge then an impermeable layer, and then you sprayed a solvent to clear the old diaper and poop and then spray on a new one. You'd just need to slot them into styrups briefly or some socks on strings to move the legs into a good position.

But can this be done with baby skin and lung safe chemicals at a reasonable temperature?

Point being humanoid designs for robots that manipulate objects designed for humans are an artificially hard problem we have decided to fail at solving.

camillomiller · 2025-10-17T02:40:22 1760668822

Yeah I also think we should just replace the baby with a spherical squishy lump of celluloid-wrapped fat with no limbs. Much more convenient.

delichon · 2025-10-17T12:58:52 1760705932

It's more common to replace them with dogs.

camillomiller · 2025-10-19T00:33:35 1760834015

I can see dog-walking robots happen

Retric · 2025-10-16T22:35:53 1760654153

Zero failure rates not just 0.000…1 are a very different and unrealistic bar. Software must be treated as actively malicious from a hardware standpoint from multiple bit flip errors etc. So it comes down to designing hardware capable of the task that’s also incapable of causing harm even with hardware defects etc.

Meanwhile it must also be strong enough to move and restrain a range of infants which is a level of force capable of harm without any possibility to fail deadly.

eru · 2025-10-17T06:39:15 1760683155

You can't get to zero failures.

You might always get hit by a freak accident. Or an unlucky combination of cosmic rays replaces all your software (including all the redundant and fail safe systems) all at once.

This is all extremely unlikely, but not literally 0.

Note: I specifically mention an unlucky combination of cosmic rays. You can protect against a single or even a handful of cosmic rays just fine.

Retric · 2025-10-17T12:01:09 1760702469

> You can't get to zero failures.

For this device you agree with mine and the original posters position.

A Diaper however can be designed not to risk grievous bodily harm for an infant when used correctly by a human. If someone doesn’t change their kids dipper that’s neglect by the parents not a negative news story for the manufacturers. We’re a long way from this point when it comes to robots.

dylan604 · 2025-10-16T21:05:41 1760648741

Well, Mr Robot person, would you let today's robotics change your clothes right now? If you wouldn't, then why would you allow it any where near a baby? If you would, why? What robot with what tech would you allow?

numpad0 · 2025-10-16T22:27:42 1760653662

I don't see why that would be so hard. This is potentially easier than reliably shooting guns at people.

That machine will look like a bean bag couch in rough shape of a giant human hand, with few of cooperative work robotic arms. The couch part hugs and secures all limbs of the baby to into the party escort submission position, then the cobots move in to find the disassembly markers on the diaper to tear it open to remove it. Then a showerhead, then a hair dryer, then baby powder sprayer can be brought out and ran to clean any residues and take care of rashes. Finally, the new diaper can be brought in, baby wrapped, and the double sided tapes on it lightly pressed on to secure it.

The entire machine would probably cost less than 10 million USD per unit if mass produced at reasonable scales, and most technological elements needed in such machines would be readily available.

vessenes · 2025-10-16T23:08:36 1760656116

How many diapers have you changed, out of curiosity? I've changed maybe 5,000 diapers (4 kids non-primary caregiver), and I feel confident that a shower head + hair dryer is not going to be safe or in fact work at all in many circumstances.

numpad0 · 2025-10-17T00:13:30 1760660010

Never actually... I think the key for the machine is to secure the baby in such ways that the pelvis stays at the same position in space, without breaking bones or tearing muscles. That's normally not possible because a human hand isn't big enough and grippy enough to hold them that way, and that allows the baby to slip out or wiggle around. But if you could, and if the I-shaped types of diapers are tolerated, then the problem reduces into the matter of washing the bottom floating in space and wrapping them with the diapers. Legs can probably be kept out of the way by some cushions.

vessenes · 2025-10-18T15:39:03 1760801943

Definitely recommend you do a few days of market research before you start the robot design

dmoy · 2025-10-17T01:25:57 1760664357

> washing the bottom

What about washing all the other places where baby's poop can get? (Legs, feet, hands, arms with some difficulty, all the way up the back, etc)

eru · 2025-10-17T06:43:45 1760683425

Showerhead can still do that.

You can pretty much put water everywhere on a human body, as long as you make sure that they can still breath.

Legs and arms and the back etc are fairly trivial to clean with a showerhead. Skinfolds are where I would expect problems.

Btw, your system doesn't need to handle all corner cases of cleaning to be useful. It needs to not hurt anyone (in all cases); handle most common cases of cleaning; and ideally alert you when it can't clean some spots.

But even a system that can't alert you is still useful for a parent, because the parent can still look over the child afterwards.

monknomo · 2025-10-21T16:15:21 1761063321

do you have any idea how many folds a baby has, and how much poop they can hide in them?

eru · 2025-10-22T04:46:11 1761108371

I have a daughter and never had much trouble getting the poop off her.

But perhaps sons are worse?

temp_praneshp · 2025-10-16T23:16:51 1760656611

> This is potentially easier than reliably shooting guns at people.

I suspect the shooting guns robots will be used against populations the owner considers sub-human, and reliability (accuracy in this context) is not a concern as long as it doesn't turn around 180degs.

eru · 2025-10-17T06:47:01 1760683621

You know they are adding AI to drones fighting in Ukraine (on both sides). Mostly to deal with signalling scamming that prevents remote operators from controlling the drone.

Whether you consider your opponent in a war sub-human or not is completely irrelevant to all the engineering problems you have to solve here.

Reliability is absolutely important, because you want that opposing tank or helicopter or soldier etc to no longer be opposing you. (But, of course, reliability is only one aspect, and engineers make lots and lots of trade-offs.)

What context do you have in mind where you need a robot to shoot people?

trhway · 2025-10-16T21:11:32 1760649092

> a baby some robot neglects to prevent from falling off the changing table

that is when we think about 2 handed robots. 6 handed robot can easily have 2-3 hands assigned to tightly keeping the baby. Humanoid robots are handicapped by their similarity to humans which is really an artificial constraint. After all we aren't building airplanes using birds as the blueprint.

On the similar note - while not about baby, was just rewatching an early Bing Bang Theory season with this episode where Howard "falls right into the mechanical hand"

bnjmn · 2025-10-17T15:00:10 1760713210

> Humanoid robots are handicapped by their similarity to humans which is really an artificial constraint.

YES, and I wish people would stop pretending we've unlocked some new generality by promoting generic humanoid robots over task-specific ones.

You can probably Rube-Goldberg your way to a diaper-changing robotic enclosure with a 3D baby bidet that uses many low-force robot arms to subdue (most) babies, but a humanoid robot is a very a poor substitute for a human here.

Plus, a human can take personal responsibility for the baby's safety, which is not something a robot can ever do, unless we somehow make the robot fear for its life/freedom/employment the same way the overarching social/legal system does for humans who sign contracts or accept highly accountable roles.

YeGoblynQueenne · 2025-10-21T17:36:27 1761068187

Wow, you want to coordinate six hands? How are you going to do that? Are you going to get a spider to teleoperate the robot to train it?

fgbarben · 2025-10-16T21:51:21 1760651481

won't the baby feel dis-abled by only having two arms?

trhway · 2025-10-16T21:56:58 1760651818

On the other hand - the baby will from the beginning develop an instinct to keep track of 6 hands flying around instead of just 2. Will help in future street fights :)

In general, looking at the AI coding agents i think we all either already feel or soon will feel disabled. And honestly i think human race with its perception of itself as the "top of the Creation" is due for a modesty lesson to help speed up the evolution. We're spending tremendous resources unproductively, be it wars or just ineffective economies, etc. We don't feel the urge to develop our civilization and to evolve ourselves in all aspects - from mental and biological to cyber-integration. The Mother Nature doesn't like such relaxed species.

imtringued · 2025-10-16T23:31:44 1760657504

Are you suggesting baby exoskeletons?

thelastgallon · 2025-10-25T12:57:13 1761397033

> It calls for an entirely different kind of robotic perception, feedback, and control.

Nearly every surgeon used advanced robots to assist with surgeries. My uncle does kidney transplants and he uses robots, so do most surgeons.

For robots to be developed, someone must pay for it. Doctors get skill++ with the surgery robots. There may be vasectomy robots that assist with circumcision, or some actively being developed that doctors would pay for. That would be a far more interesting development than changing diapers. Unless extremely competent and skilled people ( +1000/hour) need them for their work, they won't be developed, so I don't think a diaper changing robot is being developed.

thelastgallon · 2025-10-17T02:29:46 1760668186

I'm sure this can be done. Boss Baby already shows how to accomplish this: https://www.youtube.com/watch?v=rxkB20Tpvx0

thelastgallon · 2025-10-17T03:30:53 1760671853

This should be achievable once hand job robots are widely deployed and proven to be safe. Men are usually sacrificed first and I'm sure there will be volunteers.

eru · 2025-10-17T06:48:16 1760683696

Depends on your definition of a 'handjob'.

For some definitions, we already have the capability.

Also keep in mind that for the handjob robot, the user is expected to be cooperating and to be interested in self-preservation.

Small children defy these common sense expectations.

Onavo · 2025-10-16T21:27:31 1760650051

Why? There's nothing particularly special about this problem. I would bet a year for an alpha version, and production version in 5 years. We are not exactly limited by mechanical engineering here, there's nothing particularly unique about the human hand that can't be replicated. Tele operated surgical robotics have been a thing for decades. Give it a few months for the multimodal robotic VLM/LAMs to catch up. In many ways this particular problem is a lot more well defined than e.g. self driving cars.

tintor · 2025-10-16T21:56:21 1760651781

> there's nothing particularly unique about the human hand that can't be replicated

Humanity is far from replicating / matching performance of human hand.

eru · 2025-10-17T06:49:48 1760683788

We are far from matching the performance of a human hand in general, but for special tasks we can totally match or exceed the performance.

For example, a nutcracker can crack nuts better than my hand. A dishwasher can wash dishes better than my hand.

A special robot might be able to change diapers better than my hand.

imtringued · 2025-10-16T23:29:59 1760657399

The point is that the success rate needs to be 99% and safety needs to be 100%. You're not allowed to take shortcuts. That's what makes it difficult.

Also VLMs/LAMs aren't going to cut it. You're going to need something like TDMPC.

eru · 2025-10-17T06:51:36 1760683896

It's good that you make a distinction between success rate and safety rate.

However, safety rate doesn't need to be 100%. If you can keep the failure probability as low as the probability to be hit by a random asteroid falling from the sky, that's good enough.

Onavo · 2025-10-17T05:05:08 1760677508

TDMPC for one of the downstream networks perhaps, but the upstream will have to be some sort of reasoning model if you want it to be general purpose w.r.t. babies.

bnjmn · 2025-09-20T11:43:24 1758368604

Among other general advice, my CLAUDE.md insists Claude prove to me each unit of change works as expected, and I'm usually just hoping for it to write tests and convince me they're actually running and passing. A proof assistant seems overkill here, and yet Claude often struggles to assemble these informal proofs. I can see the benefit of a more formal proof language, along with adding a source of programmatic feedback, compared to open-ended verbal proof.

"Overkill" of course is an editorial word, and if you know about https://en.wikipedia.org/wiki/Curry%E2%80%93Howard_correspon... then you know many statically typed programming languages are essentially proof assistants, where the proof goal is producing well-typed programs. LLMs are already quite good at interacting with these programming language proof assistants, as you can see any time a competent LLM interacts with the Rust borrow checker, for example.

bnjmn · 2025-03-31T16:09:47 1743437387

They say the e-ink display has "Unmatched Speed, Never Seen on ePaper" so it would nice to know the actual refresh rate.

This is not an endorsement, but https://daylightcomputer.com/ claims 60fps, so that's the bar to meet in my opinion. Caveat: the daylight display is not true e-ink, but an e-ink-like LCD, IIUC.

pclowes · 2025-03-31T19:21:03 1743448863

Im confused, how is a different technology the bar to meet for a given technology?

That feels like saying “my car can go 60mph so thats the bar to beat for a bicycle”.

Whats the battery like on the daylight?

hatsix · 2025-03-31T20:54:29 1743454469

Both devices focus on readability and rely on a reflective screen. Both devices are monochrome.

This is like comparing OLED vs MicroLED. They're technically different technologies, and each has it's own strengths. The OP is saying that "Never Seen on ePaper" is like saying "The Best iPhone Ever"...

bnjmn · on Dec 23, 2024

> We still do not know, for instance, what makes a good tokeniser (Gowda and May, 2020; Cognetta et al., 2024): which characteristics should its produced subwords `s` have to be a good starting point for language modelling? If we knew this, then we could define an objective function which we could evaluate tokenisers with.

I don't see how the authors get past this true general statement from the first paragraph of the introduction. Finding a good tokenizer is not just NP-hard; we have no idea how hard it might be because we don't have theoretical agreement on what "good" means.

In order to have something to prove, the authors decide (somewhat arbitrarily):

> Specifically, we focus on finding tokenisers that maximise the compression of a text. Given this objective, we then define the tokenisation problem as the task of finding a tokeniser which compresses a dataset to at most δ symbols.

Is a tokenizer that maximizes the compression of text (e.g. by identifying longer tokens that tend to be used whole) necessarily a better tokenizer, in terms of overall model performance? Compression might be a useful property for an objective function to consider... but then again maybe not, if it makes the problem NP-hard.

I'm also not sure how realistic the limitation to "at most δ symbols" is. I mean, that limit is undeniably useful to make the proof of NP-completeness go through, because it's a similar mechanism to the minimum number of satisfied clauses in the MAX-2-SAT definition. But why not just keep adding tokens as needed, rather than imposing any preordained limit? IIRC OpenAI's tokenizer has a vocabulary of around 52k subword strings. When that tokenizer was being designed, I don't imagine they worried much if the final number had been 60k or even 100k. How could you possibly choose a meaningful δ from first principles?

To put that point a different way, imagine the authors had proven NP-completeness by reduction from the Knapsack Problem, where the knapsack you're packing has some maximum capacity. If you can easily swap your knapsack out for a larger knapsack whenever it gets (close to) full, then the problem becomes trivial.

If the authors managed to prove that any arbitrary objective function would lead to NP-hard tokenizer optimization problem, then their result would be more general. If the paper proves that somehow, I missed it.

I suppose this paper suggests "here be dragons" in an interesting if incomplete way, but I would also say there's no need to hurt yourself with an expensive optimization problem when you're not even sure it delivers the results you want.

immibis · on Dec 23, 2024

NP is a category of decision problems - problems with boolean answers. Saying that it's NP-complete to find the tokeniser that produces the fewest symbols is meaningless. You have to convert it to the form "is there a tokenizer that produces fewer than N symbols?" before it even makes sense to ask whether it's NP-complete.

bnjmn · on Dec 23, 2024

I fully agree with your final statement, but needing to constrain the problem in an artificial way to prove it's NP-complete doesn't mean the constraint was justified or realistic, because then you've only proved the constrained version of the decision problem is NP-hard.

There might be plenty of perfectly "good" tokenizers (whatever that ends up meaning) that can be found or generated without formulating their design as an NP-complete decision problem. Claiming "tokenization is NP-complete" (paper title) in general seems like an overstatement.

immibis · on Dec 23, 2024

If it's NP-hard to even know whether the answer is bigger or smaller than a certain number, then it's obvious that in a non-formal way, finding the exact answer is at least as hard as NP-hard, whatever that means.

cantpostcanceld · on Dec 23, 2024

I don't know if this is related but I work on a project with a language parser and so a tokenizer and one issue we have is the parser/tokenier was not designed to work on incomplete code so it's not useful for auto complete.

That property of how good the design is for helping write and debug (useful error messages) seems like yet another metric that could be used

I also don't know if tokenizers can be divorced from parsers. Some languages have constructs that require context that comes from the parser

conartist6 · on Dec 24, 2024

CSTML and BABLR are designed to close this gap completely for people in your position.

We invented a new kind of zero just so that you could write arbitrary parsers and deal with things that are missing

Vecr · on Dec 23, 2024

It's not optimal, but if you want to play it safe you can use 1 byte = 1 token, plus a few control tokens like stop, user turn, model turn, etc.

sureglymop · on Dec 23, 2024

Can anyone explain to a layman like myself why it wouldn't be better if we just used bytes as tokens?

mcyc · on Dec 23, 2024

It's two problems:

1) the sequence length increases too much. Idk what the average token length is for Llama, but imagine it's like 5+ bytes. Using individual bytes as tokens immediately makes the context 5x longer which is super bad for inference speed and memory requirements (since attention inference is quadratic in the length of the sequence).

2) individual bytes have essentially no meaning, so byte embeddings are harder to learn. Subword tokens aren't a perfect solution, but they definitely often have some standalone meaning where embeddings make sense.

I'll give another example from a recent paper that tries to eliminate tokenizers (this is a popular research direction) [1].

Figure 4 is a really good example of why byte-level models are wasting computation. Once part of a word is generated, most of the remaining bytes are assigned basically probability 1. But a byte-level model would still have to spend time decoding them. With a subword-level model most of these easy-to-decode bytes would be packed together in a single token so you don't have to decode them individually.

When model APIs bill by the token, this is an important consideration.

[1]: https://arxiv.org/abs/2412.09871

sureglymop · on Dec 24, 2024

Thank you very much for the thorough reply! I highly appreciate it.

mcyc · on Dec 23, 2024

Hi, I'm Cognetta from the above Cognetta et al. I can't answer all of your questions (and I can't speak for the authors of this paper ofc), but I will try to answer some.

> Is a tokenizer that maximizes the compression of text (e.g. by identifying longer tokens that tend to be used whole) necessarily a better tokenizer, in terms of overall model performance? Compression might be a useful property for an objective function to consider... but then again maybe not, if it makes the problem NP-hard.

Compression isn't necessarily the best metric for language modeling quality [1][2][3], but there are some papers that find a correlation between it and quality [4] and also it has one important benefit: it reduces inference time by making the input sequences shorter (this is particularly important for transformers, because the runtime is quadratic in the sequence length).

If you imagine that with enough data, basically any reasonable tokenization algorithm would be ok (I think this is mostly true; there are definitely bad and "better" tokenizers and you see this very clearly in small data settings, but once you get into the trillions-of-tokens and 10s-of-billion-of-parameters setting, other things are going to matter more), then optimizing the tokenizer for compression is a good choice as it will provide tangible, practical benefits in the sense of reduced inference time.

> I'm also not sure how realistic the limitation to "at most δ symbols" is. [...] But why not just keep adding tokens as needed, rather than imposing any preordained limit?

This is a pretty realistic limitation imo. Of course you can arbitrarily increase the vocabulary size, but there is a tradeoff between modeling quality, parameter count, and inference time. If you increase the vocabulary a bunch, your inference speed will probably improve (although now you have a much larger softmax at the end of your model, which isn't usually a bottleneck anymore, but still not great), parameter count will increase (due to the larger embedding table), and your modeling quality will go down (in that you have tokens which are so rare in the corpus that they are massively undertrained; this can cause big problems [5]).

So by constraining it to δ, you are basically setting a parameter budget for the vocabulary, and this is a pretty reasonable thing to do.

> IIRC OpenAI's tokenizer has a vocabulary of around 52k subword strings.

Yeah, the size of the vocabulary varies a lot across models, but it isn't unusual to see significantly larger vocabularies these days (e.g., gemma has ~256k). However, these are still finite and very small compared to the corpus size.

> How could you possibly choose a meaningful δ from first principles?

This is a really great question, and something that we don't know how to answer. A lot of work has tried to answer it [6][7], but it is very much an open question.

[1]: https://arxiv.org/abs/2310.08754

[2]: https://aclanthology.org/2023.acl-long.284/

[3]: https://aclanthology.org/2024.emnlp-main.40/

[4]: https://arxiv.org/abs/2403.06265

[5]: https://aclanthology.org/2024.emnlp-main.649/

[6]: https://aclanthology.org/2023.acl-long.284/

[7]: https://aclanthology.org/2020.findings-emnlp.352/

mcyc · on Dec 23, 2024

NB: Can't edit my original reply.

Sorry actually I misread part of your comment in relation to the paper and confused δ and another parameter, K.

To clarify, δ is the number of tokens in the tokenized corpus and K is the size of the vocabulary.

So, if you are asking about why would they limit _K_, then my answer still applies (after swapping δ for K). But if you still mean "why do they pick some arbitrary δ as the limit of the size of the tokenized corpus", then I think the answer is just "because that makes it a decision problem".

bnjmn · on Dec 23, 2024

Thanks for these detailed replies! Now I really want to read your paper.

mcyc · on Dec 23, 2024

Thanks!

Our paper [1] is kind of a goofy adversarial thing where we thought "here's this cool metric, how can we break it?". The tokenizers we propose are definitely not tokenizers you should use in practice.

The original paper that proposes the metric is, imo, much more interesting theoretically [2].

[1]: https://aclanthology.org/2024.lrec-main.1469/

[2]: https://aclanthology.org/2023.acl-long.284/

bnjmn · on Aug 24, 2023

On any macOS computer (or replace /usr/share/dict/words with your own word list):

  grep '^[qwertyuiop]*$' /usr/share/dict/words | \
  awk '{ print length(), $0 }' | \
  sort -n

juujian · on Aug 24, 2023

Works for Ubuntu, too. My Colemak self can only get fluffy (6) from the front row, that's the longest word. Middle row really shines though, I can get hardheartedness (15) or assassinations (14).

tiltowait · on Aug 24, 2023

Interesting that your dictionary doesn't have "tenderheartedness", which is two letters longer.

jmholla · on Aug 26, 2023

"tenderheartedness" uses every row, not one row.

cbsks · on Aug 27, 2023

Not on a Colmak keyboard

seabass-labrax · on Aug 24, 2023

Gulp, fluffy puppy pug! Yup. Fly, ugly pup, fly.

I note that hardheartedness and hotheadedness threaten the darnedest nonstandard assassinations. Such sordidness!

travisgriggs · on Aug 24, 2023

Nice.

Middle/Second row result is

8 flagfall "Flagfall, or flag fall, is a common Australian expression for a fixed start fee, especially in the taxi, haulage, railway, and toll road industries."

8 galagala "A name in the Philippine Islands of Dammara Philippinensis, a coniferous tree yielding dammar-resin."

Lower/Third Row: - None

There are no vowels on the bottom row. So no words. I've been typing at ~ 50wpm for 30 years, and I don't think I'd ever actually consciously recognized this fact about the bottom row.

(standard US keyboard layout)

JoshTriplett · on Aug 24, 2023

For QWERTY, I found two nine-letter words using only the middle row: halakhahs and haggadahs.

And yeah, nothing in the bottom row other than acronyms and similar pseudo-words.

Symbiote · on Aug 25, 2023

Dvorak:

  ',.PY FGCRL   pry or Lyly
  AOEUI DHTNS   tendentiousness
  ;QJKX BMWVZ   xxxv, www, bbq or mm

After 'apt install wbritish-insane'

  pyrryl (a chemical group)
  unostentatiousnesses (and anaesthetisations is good too)
  mmmm

tedunangst · on Aug 24, 2023

Knuth vs McIlroy all over again.

IshKebab · on Aug 25, 2023

Just use https://www.visca.com/regexdict/

bnjmn · on June 25, 2023

Are there any new cars / car brands that credibly promise not to track their drivers?

Any car with a network connection for software updates seems likely to be harvesting driver data, or is at least capable of doing so.

AlotOfReading · on June 25, 2023

Essentially, no. Your best bets are the German manufacturers, some of whom (e.g. Porsche) have data opt outs for all services and exist within a legislative framework that's at least mildly protective of consumer privacy. Even then you should be wary about cars built for export to other markets like the US. It's common for the regional manufacturer to be a separate legal entity, with separate backend services, delivering cars with different configurations and software.

For American, Japanese, and Korean manufacturers? You should just assume they're harvesting all the data they can get on you. GM's recent decision to ditch Android auto and Carplay was motivated in large part by their desire to better control and monetize user data collection.

Klonoar · on June 26, 2023

Japan's privacy laws are fairly strict and, to some degree, interoperate with GDPR.

For context, I've a Toyota GR86 where opting out of data sharing can be done via an app.

(Edits for some extra context)

https://www.toyota.com/privacyvts/images/doc/Toyota%20-%20Co...

> (i) the Connected Services Privacy Notice located at www.Toyota.com/privacyvts (“Privacy Notice”). Carefully review the Privacy Notice as it applies to your personal information and Vehicle (as defined below) data that we collect, use, store, share and secure to provide the Services. Please note that your Vehicle comes with a data communication module (“DCM”) that enables the Wireless/GPS Technology, as described in Section 11(a), and allows for the collection of data from you and your vehicle (e.g., location, health and driving data). BY DEFAULT, THE DCM IS ON/ACTIVE WHEN YOUR VEHICLE IS DELIVERED AND WILL REMAIN ON/ACTIVE (AND CONTINUE TO COLLECT DATA FROM YOU AND YOUR VEHICLE) UNTIL YOU CONTACT US AND REQUEST THAT IT BE DEACTIVATED.

> (b) DCM. If you cancel your Service Plan, we have the right (but, unless you ask us to, not the obligation) to turn off your DCM as of the effective date of cancellation. Once your DCM is turned off, the Vehicle will not send any data to Toyota. Depending on the connectivity to your Vehicle, your DCM may deactivate immediately, or it may take up to several days.

The GR86 forums have a few people also looking at ways to outright disconnect the hardware for this kind of stuff entirely, but I believe that's slow going - haven't kept up on it. I just have the above saved since I look at this stuff when buying any new car these days.

The obvious disclaimer applies that you may enter warranty hell if you're disabling it, because it's 2023 and I guess society is fine with this level of default tracking. You should confirm how it affects you before changing anything. Otherwise, hope people find it helpful!

falcolas · on June 26, 2023

> opting out of data sharing can be done via an app.

Oh, the irony.

As the owner of a new Toyota who received a phone call from our dealer because Toyota contacted them about a (routine) maintenance alert from our vehicle… the car / app combination is a privacy nightmare.

They offer an opt out, and that’s good, but by the gods their default is insane.

Klonoar · on June 26, 2023

I mean, you can also contact them - per my quoted text. The app is just easier if you don't feel like dealing with tracking things down. ;P

But yes, in general, the defaults on all new modern vehicles are a nightmare.

TacticalCoder · on June 26, 2023

> Your best bets are the German manufacturers, some of whom (e.g. Porsche) have data opt outs for all services and exist within a legislative framework

Funnily enough that car in white at the center of the picture in the WIRED article is a Porsche (the most luxurious and less sold of all the Porsche models: the Panamera).

OneLeggedCat · on June 26, 2023

Your can pull a fuse (I think it's marked DCM?) in some Toyotas if you don't care about the entertainment/navigation system ever contacting a cell tower again. I pulled the one in my 2021 Tundra.

voussoir · on June 26, 2023

I pulled a fuse out of my car to disable the onstar module because they were sending me "monthly diagnostic report" emails. I had never made an onstar account, and I even declined the free trial.

If they hadn't sent me those diagnostics I would not have guessed they were tracking me in the first place. Too bad they outed themselves, this time at least :)

passwordoops · on June 25, 2023

Are there any industries/companies who don't somehow track users these days?

kramerger · on June 26, 2023

Don't know about others, but European car manufacturers are under strict regulations and try their damn best to not record anything as it immediately becomes a liability.

Edit: I know at least a few companies have a test vehicle program that operates under different rules, but those are not sold to general public. So the company receive high quality data without getting into trouble with GDPR.

blendo · on June 26, 2023

Yes, our 2021 Mazda lets you opt out in two ways.

1. Every time you start the car, you can navigate into the system menus to deactivate data collection and transmission.

2. Call customer support, give them your VIN, and follow up in a week or two. That’s what we did.

Alternatively, I suppose you could just pull the SIM card or its fuse.

XCSme · on June 26, 2023

Waiting for The Framework Car.

JohnFen · on June 26, 2023

> Are there any new cars / car brands that credibly promise not to track their drivers?

Not that I can find. It's why I won't be buying any cars made this century.

moonterrace · on June 25, 2023

I HIGHLY doubt there are any that make that promise.

bnjmn · on Nov 12, 2020

Although these results are not exactly impressive or compelling (they don't make me want to change my exercise habits), it's reassuring to see researchers go through with publishing underwhelming results, rather than cherry-picking only the interesting results and sitting on the rest, which is a major contributing factor to the crisis of confidence/replication in the social sciences.

- https://en.wikipedia.org/wiki/Cherry_picking

- https://www.nature.com/news/scientific-method-statistical-er...

- https://fivethirtyeight.com/features/science-isnt-broken/

- etc...

tomerico · on Nov 12, 2020

Why do you say that the results aren't impressive? 49% reduction in all cause mortality when High intensity interval training is compared to moderate intensity continuous training seems like a very strong result.

bnjmn · on Nov 12, 2020

Mostly because they admit those results are not statistically significant (for example, at the bottom of the summary diagram). That said, I admire their honesty, and I hope these results suggest other kinds of experiments to other researchers. Maybe the study just needs to be larger, or longer.

Focusing on specific kinds of mortality might also give stronger results than measuring "all cause" mortality. I say that because it seems like cancer was the biggest killer in these groups. I am not a doctor, but I wouldn't have thought cancer was causally related to (lack of) exercise, the way cardiovascular diseases are believed to be. I'd be interested to see larger studies with enough non-cancer deaths to say something statistically significant about the effect of exercise on those outcomes.

fallingknife · on Nov 12, 2020

Why even bother doing a study with a sample size so small that a difference as large as 49% is still not significant?

andor · on Nov 12, 2020

They had 1567 participants. Compared to control (HIIT-like), HIIT reduced risk by 1.7% while moderate intensity training increased it by 1.2%. The overall mortality in the control group was 4.7%. For a controlled trial, that seems like a large number of participants (correct me if I'm wrong), they just were too healthy ;-)

I assume the significance problem is inherent to studies looking at mortality as the outcome is very binary and can take a long time to manifest. As an effect on all-cause mortality can be seen as the ultimate metric of how healthy something is, it's probably still worth it to investigate. In this case, they made it quite difficult for themselves by comparing active people to other active people.

It's also possible that they made other observations during the study that are or will be published separately.

sundvor · on Nov 12, 2020

Also, just adding some sugar to this, don't forget that this study was in Norway. Overall health levels would be higher than in, say, USA. I'm talking about weight specifically.

Assuming healthy / not overweight would be a very big assumption to make for the US (as a Norwegian I went to the the USA, once, in 1999, and what I saw there shocked me), where another study could be to monitor mortality outcome of getting weight under control and exercising vs not.

I'd also love to see impact of weight resistance training added to the mix surveyed.

sriram_malhar · on Nov 13, 2020

And not just Norway, but specifically Trondheim. Correct me if I am wrong, but I expect their lifestyle to be even less sedentary than the average Oslo resident. I expect the results to have slim pickings (pun intended)!

sundvor · on Nov 13, 2020

Roger that. But even by e.g. Australian standards (which are a lot closer to the US ones), Oslo is far from sedentary; I didn't even think about buying a car in Norway before I left it, being happy to walk/run, ride or PT it everywhere. Bad diet, alcohol use etc is a lot more common than in Norway too.

robocat · on Nov 12, 2020

HIIT reduced risk by 1.7 percentage points, mortality down to 3% for the high intensity group, from 4.7% for the control group.

The percentage decrease was 37%: Percentage points are not percent!

fallingknife · on Nov 12, 2020

I read that wrong. I thought that the overall mortality rate in the groups was 1.2% and 1.7%. Didn't realize those were both differences from the control.

TJSomething · on Nov 12, 2020

Because running a large enough study is expensive and it's cost-effective to perform a smaller study to determine an approximate size of the effect. Then, you can design the next study to have enough power to distinguish the expected effect size from the null hypothesis.

Additionally, these results could be aggregated with comparable results to yield a stronger result.

ardit33 · on Nov 12, 2020

"an absolute risk reduction of 1.7 percentage points was observed after HIIT (hazard ratio 0.63, 95% confidence interval 0.33 to 1.20) and an absolute increased risk of 1.2 percentage points after MICT (1.24, 0.73 to 2.10)."

So, 1.7% vs 1.2% seems just above noise level. The OP is right, the numbers are not that impressive.

We are not talking about reducing overall mortality by double digits...

Summary: looks like exercising is good, and adding HIIT has an increasing effect of reducing mortality, but the overall effect is small.

tomtheelder · on Nov 12, 2020

Those are absolute risk reductions, not relative ones. The 1.7 percentage point reduction equates to a 37% reduction in all cause mortality, and a 1.2 percentage point increase is a 25% increase.

Those are huge numbers. The problem is that the confidence interval is really wide.

BenoitEssiambre · on Nov 12, 2020

>but the overall effect is small.

quibble: non significant, wide confidence doesn't mean "small", it means _unknown_. It means the data is too sparse and/or too noisy to tell.

pas · on Nov 12, 2020

Doesn't this also imply that the effect is "small" even if real? After all if there would be a very strong correlation (eg a true and exclusive casual chain) then we would see a huge signal even in small and noisy datasets. Or am I missing something?

BenoitEssiambre · on Nov 12, 2020

No. If you look closely at the data you might be able to draw such conclusions but lack of statistical significance often doesn't suggest or imply a small effect. Notice that in this case in particular, the confidence intervals are consistent with very large positive or fairly large negative effects. Don't underestimate the amount of noise often found in studies. Lack of significance usually just means that data is too noisy to tell us anything. If you get significance you get to say: the data is probably not pure noise but the effect could still be very tiny or caused by systematic measurement errors. Null hypothesis testing is pretty useless really.

pas · on Nov 13, 2020

Thanks for your detailed reply! I meant that even this dataset puts a limit on the effect size, if viewed as an "evidence of absence of clear and large effect".

Of course since "everything is correlated" [0] expecting such truly simple signals might be nonsensical/pointless.

I was just lamenting the lack of simple magical treatments basically.

[0] https://www.gwern.net/Everything

BenoitEssiambre · on Nov 14, 2020

That gwern page is excellent!

oarabbus_ · on Nov 12, 2020

>Participants were randomised to two sessions weekly of high intensity interval training at about 90% of peak heart rate (HIIT, n=400), moderate intensity continuous training at about 70% of peak heart rate (MICT, n=387), or to follow the national guidelines for physical activity (n=780; control group); all for five years.

I can't quite tell by your summary whether you are saying "exercising is good, but the overall effect on mortality is small", or "exercising is good, and the effect of adding HIIT to baseline exercise is small".

You cannot make the former statement from this study as control group were not non-exercisers (and adherence was decent). The latter statement, does seem to be supported.

Regarding exercise in general, the literature shows the mortality gap between exercisers and non-exercisers is absolutely massive.

SpaceNugget · on Nov 12, 2020

2.9% seems like a significat risk reduction to me. HIIT results in a 1.7% risk REDUCTION and MICT results in a 1.2% risk INCREASE.

unreal37 · on Nov 12, 2020

No difference between people managing their own exercise program compared with those who had a trainer.

"These differences were not statistically significant" according to the study.

The study itself says that they observed no difference. That is not a strong result. Which is OK.

bnjmn · on July 10, 2020

"Potentially" is an understatement! A much better take, IMO.

bnjmn · on June 25, 2020

This isn't exactly what you asked for, but it seems pretty relevant to your stated interests: https://www.jasondavies.com/maps/voronoi/capitals/

foota · on June 25, 2020

My head nearly exploded trying to position the globe in the rotation I wanted :-) Neat though, thanks for sharing!

rjeli · on June 26, 2020

Generally with orbit controls you can drag circles with your mouse to roll, taking advantage of the noncommutativity of SO(3)

bnjmn · on Jan 10, 2020

I don't think you can have a proper Merkle tree if cycles are possible, and cycles definitely are possible in this AST representation: https://www.unisonweb.org/docs/faq#how-does-hashing-work-for...