That is an artifact of implementation. You can absolutely implement it using strict FP. But even if not, any given implementation will still do things in a specific order which can be documented. And then if you're running quantized (including KV cache), there's a lot less floating point involved.
I used to own COM at Microsoft; I think that MCP is the current re-instantiation of the ideas from COM and that English is now the new scripting language.
Did you use Claude Code to write the post? I'm finding that I'm using it for 100% of my own writing because agentic editing of markdown files is so good (and miles better than what you get with claude.ai artifacts or chatgpt.com canvas). This is how you can do things like merge deep research or other files into the doc that you are writing.
Right. But you can copy paste that into a separate doc and have Claude Code merge it in (and not a literal merge - a semantic merge "integrate relevant parts of this research into this doc"). This is super powerful - try it!
The models are the same, but the actual prompts sent to the model are likely somewhat different because of the agentic loop - so I would imagine (without having done the experiments) there will be slight differences. Unclear whether they will be more or less than the variance in responses sent multiple times to the same experience (e.g., Claude.ai variance vs. Claude Code variance vs. variance between Claude.ai and Claude Code). Would be an interesting controlled experiment to try!
As a counterpoint to a lot of the speculation on this thread, if you're interested in learning more about how and why we designed Python in Excel, I wrote up a doc (that is quite old but captures the core design quite well) here [1]. Disclosure: I was a founding member of the design team for the feature.
I'm genuinely curious why python instead of something like PowerShell for Excel specifically. Seems a little out of the farm but I also get how it's a more adopted language.
Python is the most popular language for data analysis with a rich ecosystem of existing libraries for that task.
Incidentally I've worked on many products in the past, and I've never seen anything that approaches the level of product-market-fit that this feature has.
Also, this is the work of many people at the company. To them go the real credit of shipping and getting it out the door to customers.
To associate Excel with all those third-party Python analytical packages. Monte Carlo comes to mind; in the distant past, that was an expensive third-party Excel plug-in.
It seems like this is an orchestration layer that runs on Apple Silicon, given that ChatGPT integration looks like an API call from that. It's not clear to me what is being computed on the "private cloud compute"?
If I understand correctly there's three things here:
- on-device models, which will power any tasks it's able to, including summarisation and conversation with Siri
- private compute models (still controlled by apple), for when it wants to do something bigger, that requires more compute
- external LLM APIs (only chatgpt for now), for when the above decide that it would be better for the given prompt, but always asks the user for confirmation
The second point makes sense. It gives Apple optionality to cut off the external LLMs at a later date if they want to. I wonder what % of requests will be handled by the private cloud models vs. local. I would imagine TTS and ASR is local for latency reasons. Natural language classifiers would certainly run on-device. I wonder if summarization and rewriting will though - those are more complex and definitely benefit from larger models.
This is a giant dataset of 536GB of embeddings. I wonder how much compression is possible by training or fine-tuning a transformer model directly using these embeddings, i.e., no tokenization/decoding steps? Could a 7B or 14B model "memorize" Wikipedia?
How large is the set of binaries needed to do this training job? The current pytorch + CUDA ecosystem is so incredibly gigantic and manipulating those container images is painful because they are so large. I was hopeful that this would be the beginnings of a much smaller training/fine-tuning stack?
That is 100% my intention and hope and I think we are very close to deleting all of that. Right now on master, I am already only using Python for the tokenization preprocessing. In principle the requirements for llm.c should be extremely minimal. I think this a few days of work that is high on my mind.
Biggest problem right now is finding a place that can host the 135GB of tokens for FineWeb100B. Will probably use S3 or something.
I wonder if enhanced operation of the lymphatic system might be causal to better mental health outcomes? The lymphatic system doesn't have a "pump", so it relies on muscle contraction to drive circulation. So more movement/exercise drives more lymphatic activity which may lead to better outcomes in people, especially if mental disorders are correlated with buildup of waste materials in the brain.
Well, the brain is a heavily vascularized organ. Proper sleep and exercise are both crucial for cleaning up waste products, and supplying nutrients and good blood flow.