I made the conversion of the Squeak verison to Pharo many years ago and I just tried to make it work in the latest version (which was not straightforward becasue Pharo deprecated and removed some Morphic parts it used). So, mostly the curiosity if it can still work and, if yes, how well/poorly.
I propose a different lens to look at this problem.
A neural net can be defined with less than 100LoC. The knowledge is in the weights. What if we went from source code of the web (HTML, CSS, JS, WASM) directly to generated interactive simulation of the said web? https://gamengen.github.io
What if this blob of weights could interpret way more stuff, not just the web?
Yes, what if instead of the computer being an Internet Communications Device (as Steve Jobs called the iPhone), it would just pretend to allow us to communicate with other humans while actually trapping us in a false reality, as if we were all in the Truman Show?
It might work, as indicated by the results in your link ("Human raters are only slightly better than random chance at distinguishing short clips of the game from clips of the simulation."), but the result would be a horrific dystopian nightmare, so why would we do this to ourselves?
Anyway, there is one aspect where the STEPS work is similar to this idea, in that it tries to build a more concise model of the system. But it does this using domain-specific languages rather than lossy blobs of model weights, so the result is (ideally) the complete opposite of what you proposed: A less blobby, more transparent and more comprehensible expression of what the system does.
We already interact with false reality through our "old school" computers – internet is full of bots arguing with each other and with us. But my proposition doesn't have to distort the interpreted content.
Neural nets (RNNs) are Turing-complete, so they can simulate web browser 1:1. In theory, of course. Let say we find a way to train a neural net to identically simulate web browser. The weights of this blob might at first seem like an opaque non-sense, but in reality it would/could contain a more efficient implementation than whatever we have came up with.
Alan Kay believed computer science should take its cues from biology. Rather than constructing software like static buildings, we ought to cultivate it like living, evolving organisms.
After tinking with some similar toy projects, I feel like iframes with a well-defined APi to use with postMessage() (and maybe a small library to provide frame-internal matching toolbars/controls) are definitely the way to go here, since they remove the need to tightly couple your "OS" with your "apps".
reply