Hacker Newsnew | past | comments | ask | show | jobs | submit | coldtea's commentslogin

>I think stuff like this, is trying to recreate a world that doesn't exist anymore

And that's fine. We should build the world as we want it to be, not accept whatever shit our era gives us.

This includes changes to some things to how they were in the past (if they were better) and changes to other things to how we envision the future.


>If he needed his app to be 30% faster he would have made it so

That still validates "In short, the maximum possible speed is the same (+/- some nitpicks), but there can be significant differences in typical code" the parent wrote


It would be marginally useful even at $500, annoying to use for long stretches, and very expensive.

In this economy it's dead in the water as anything other than a niche product for specific uses or an expensive geek toy. As is, it's not getting anywhere near iPod/iPhone status.


So they really all look the same, it's not just prejudice!

>Yes, maybe you think that you worked so hard to learn coding, and now machines are doing it for you. But what was the fire inside you, when you coded till night to see your project working? It was building.

Nope. It was coding. Enjoying the process itself.

If I wanted to hand out specs and review code (which is what an AI jockey does), I'd be having fucking project managers as role models, not coders...


A lot was lost then too.

>Why are you putting "value" above human decency?

Because human decency is often overrated and hard usable value is often underrated.

If we removed the value (changes, inventions, artworks, products, etc) made by people which were lacking in "human decency" in this or that aspect, billions would be poorer, sicker, die sooner, and have much worse cultures.

>There are plenty people just the same, with the same capabilities without the quality of being a tarpit of suck.

Understanding is a great component of human decency too, as is not being a sanctimonious hollier-than-thou type. For example, not labelling someone who "wrote something mean in a forum" as "a tarpit of suck", as if that defines them totally, or as if the persons making such statements shit doesn't smell.

Plus "plenty people just the same, with the same capabilities", really? As if the output of an artist is interchangeable with that of another, so that we can just discard those that have done such grave offenses as "being rude on a forum" and just listen to another?


There's also that fact that Miles Davis doesn't get to review our own behavior as human beings. He might not have liked us as his audience either. His behavior is publicized, and ours (whether it is) is not.

>How strange it is that we so easily forgive bad behavior from people we love.

That's part of what loving someone means. It's easy to love someone convenient who never does anything to bother or hurt you.

Besides, he was trolling. It's not like it's a big deal. If you were on a mailing list or usenet group or forum in the 80s and 90s everybody did that, and few if any had an issue with it, we could take it!

We not only forgive but tolerate 100000x worse stuff everyday that directly fucks our lives that we could prioritize not tolerating.


>despite not yet reaching bare-metal levels of performance and energy efficiency.

"Not yet"? It will never reach "bare-metal levels of performance and energy efficiency".


FWIW the native and WASM versions of my home computer emulators are within about 5% of each other (on an ARM Mac), e.g. more or less 'measuring noise':

https://floooh.github.io/tiny8bit/

You can squeeze out a bit more by building with -march=native, but then there's no reason that a WASM engine couldn't do the same.


SIMD and multithreading support really helped with closing the performance gap.

Still surprised about the 5% though- I’ve generally seen quite a bit more of a gap.


Maybe the emulator code is particularly WASM friendly ... it's mostly bit twiddling on 64-bit integers with very little regular integer math (except incrementing counters) and relatively few memory load/stores.

I'd have to take a contrary view on that. It'll take some time for the technologies to be developed, but ultimately managed JIT compilation has the potential to exceed native compiled speeds. It'll be a fun journey getting there though.

The initial order-of-magnitude jump in perf that JITs provided took us from the 5-2x overhead for managed runtimes down to some (1 + delta)x. That was driven by runtime type inference combined with a type-aware JIT compiler.

I expect that there's another significant, but smaller perf jump that we haven't really plumbed out - mostly to be gained from dynamic _value_ inference that's sensitive to _transient_ meta-stability in values flowing through the program.

Basically you can gather actual values flowing through code at runtime, look for patterns, and then inline / type-specialize those by deriving runtime types that are _tighter_ than the annotated types.

I think there's a reasonable amount of juice left in combining those techniques with partial specialization and JIT compilation, and that should get us over the hump from "slightly slower than native" to "slightly faster than native".

I get it's an outlier viewpoint though. Whenever I hear "managed jitcode will never be as fast as native", I interpret that as a friendly bet :)


> JIT compilation has the potential to exceed native compiled speeds

The battlecry of Java developers riding their tortoises.

Don’t we have decades of real-world experience showing native code almost always performs better?

For most things it doesn’t matter, but it always rubs me the wrong way when people mention this about JIT since it almost never works that way in the real world (you can look at web framework benchmarks as an easy example)


It's not that surprising to people who are old enough to have lived through the "reality" of "interpreted languages will never be faster than about 2x compiled languages".

The idea that an absurdly dynamic language like JS, where all objects are arbitrary property bags with prototypical dependency chains that are runtime mutable, would execute at a tech budget under 2x raw performance was just a matter of fact impossibility.

Until it wasn't. And the technology reason it ended up happening was research that was done in the 80s.

It's not surprising to me that it hasn't happened yet. This stuff is not easy to engineer and implement. Even the research isn't really there yet. Most of the modern dynamic language JIT ideas which came to the fore in the mid 200X's were directly adapting research work on Self from about two decades prior.

Dynamic runtime optimization isn't too hot in research right now, and it never was to be honest. Most of the language theory folks tend to lean more in the type theory direction.

The industry attention too has shifted away. Browsers were cutting edge a while back and there was a lot of investment in core research tech associated with that, but that's shifting more to the AI space now.

Overall the market value prop and the landscape for it just doesn't quite exist yet. Hard things are hard.


You nailed it -- the tech enabling JS to match native speed was Self research from the 80s, adapted two decades later. Let me fill in some specifics from people whose papers I highly recommend, and who I've asked questions of and had interesting discussions with!

Vanessa Freudenberg [1], Craig Latta [2], Dave Ungar [3], Dan Ingalls, and Alan Kay had some great historical and fresh insights. Vanessa passed recently -- here's a thread where we discussed these exact issues:

https://news.ycombinator.com/item?id=40917424

Vanessa had this exactly right. I asked her what she thought of using WASM with its new GC support for her SqueakJS [1] Smalltalk VM.

Everyone keeps asking why we don't just target WebAssembly instead of JavaScript. Vanessa's answer -- backed by real systems, not thought experiments -- was: why would you throw away the best dynamic runtime ever built?

To understand why, you need to know where V8 came from -- and it's not where JavaScript came from.

David Ungar and Randall B. Smith created Self [3] in 1986. Self was radical, but the radicalism was in service of simplicity: no classes, just objects with slots. Objects delegate to parent objects -- multiple parents, dynamically added and removed at runtime. That's it.

The Self team -- Ungar, Craig Chambers, Urs Hoelzle, Lars Bak -- invented most of what makes dynamic languages fast: maps (hidden classes), polymorphic inline caches, adaptive optimization, dynamic deoptimization [4], on-stack replacement. Hoelzle's 1992 deoptimization paper blew my mind -- they delivered simplicity AND performance AND debugging.

That team built Strongtalk [5] (high-performance Smalltalk), got acquired by Sun and built HotSpot (why Java got fast), then Lars Bak went to Google and built V8 [6] (why JavaScript got fast). Same playbook: hidden classes, inline caching, tiered compilation. Self's legacy is inside every browser engine.

Brendan Eich claims JavaScript was inspired by Self. This is an exaggeration based on a deep misunderstanding that borders on insult. The whole point of Self was simplicity -- objects with slots, multiple parents, dynamic delegation, everything just another object.

JavaScript took "prototypes" and made them harder than classes: __proto__ vs .prototype (two different things that sound the same), constructor functions you must call with "new" (forget it and "this" binds wrong -- silent corruption), only one constructor per prototype, single inheritance only. And of course == -- type coercion so broken you need a separate === operator to get actual equality. Brendan has a pattern of not understanding equality.

The ES6 "class" syntax was basically an admission that the prototype model was too confusing for anyone to use correctly. They bolted classes back on top -- but it's just syntax sugar over the same broken constructor/prototype mess underneath. Twenty years to arrive back at what Smalltalk had in 1980, except worse.

Self's simplicity was the point. JavaScript's prototype system is more complicated than classes, not less. It's prototype theater. The engines are brilliant -- Self's legacy. The language design fumbled the thing it claimed to borrow.

Vanessa Freudenberg worked for over two decades on live, self-supporting systems [9]. She contributed to Squeak EToys, Scratch, and Lively. She was co-founder of Croquet Corp and principal engineer of the Teatime client/server architecture that makes Croquet's replicated computation work. She brought Alan Kay's vision of computing into browsers and multiplayer worlds.

SqueakJS [7] was her masterpiece -- a bit-compatible Squeak/Smalltalk VM written entirely in JavaScript. Not a port, not a subset -- the real thing, running in your browser, with the image, the debugger, the inspector, live all the way down. It received the Dynamic Languages Symposium Most Notable Paper Award in 2024, ten years after publication [1].

The genius of her approach was the garbage collection integration. It amazed me how she pulled a rabbit out of a hat -- representing Squeak objects as plain JavaScript objects and cooperating with the host GC instead of fighting it. Most VM implementations end up with two garbage collectors in a knife fight over the heap. She made them cooperate through a hybrid scheme that allowed Squeak object enumeration without a dedicated object table. No dueling collectors. Just leverage the machinery you've already paid for.

But it wasn't just technical cleverness -- it was philosophy. She wrote:

"I just love coding and debugging in a dynamic high-level language. The only thing we could potentially gain from WASM is speed, but we would lose a lot in readability, flexibility, and to be honest, fun."

"I'd much rather make the SqueakJS JIT produce code that the JavaScript JIT can optimize well. That would potentially give us more speed than even WASM."

Her guiding principle: do as little as necessary to leverage the enormous engineering achievements in modern JS runtimes [8]. Structure your generated code so the host JIT can optimize it. Don't fight the platform -- ride it.

She was clear-eyed about WASM: yes, it helps for tight inner loops like BitBlt. But for the VM as a whole? You gain some speed and lose readability, flexibility, debuggability, and joy. Bad trade.

This wasn't conservatism. It was confidence.

Vanessa understood that JS-the-engine isn't the enemy -- it's the substrate. Work with it instead of against it, and you can go faster than "native" while keeping the system alive and humane. Keep the debugger working. Keep the image snapshotable. Keep programming joyful. Vanessa knew that, and proved it!

[1] Freudenberg et al. SqueakJS paper (DLS 2014, Most Notable Paper Award 2024). https://freudenbergs.de/vanessa/publications/Freudenberg-201...

[2] Craig Latta, Caffeine. Smalltalk livecoding in the browser. https://thiscontext.com/

[3] Self programming language. Prototype-based OO with multiple inheritance. https://selflanguage.org/

[4] Hoelzle, Chambers & Ungar. Debugging Optimized Code with Dynamic Deoptimization (1992). https://bibliography.selflanguage.org/dynamic-deoptimization...

[5] Strongtalk. High-performance Smalltalk with optional types. http://strongtalk.org/

[6] Lars Bak. Architect of Self VM, Strongtalk, HotSpot, V8. https://en.wikipedia.org/wiki/Lars_Bak_(computer_programmer)

[7] SqueakJS. Bit-compatible Squeak/Smalltalk VM in pure JavaScript. https://squeak.js.org/

[8] SqueakJS JIT design notes. Leveraging the host JS JIT. https://squeak.js.org/docs/jit.md.html

[9] Vanessa Freudenberg. Profile and contributions. https://conf.researchr.org/profile/vanessafreudenberg


Only if it doesn't make use of dynamic linking, reflection and is written to take advantage of value types.

AOT compilers without PGO data usually tend to perform worse when those conditions aren't met.

Which is why the best of both worlds is using JIT caches that survive execution runs.


Yeah I've heard this my whole career, and while it sounds great it's been long enough that we'd be able to list some major examples by now.

What are the real world chances that a) one's compiled code benefits strongly from runtime data flow analysis AND b) no one did that analysis at the compilation stage?

Some sort of crazy off label use is the only situation I think qualifies and that's not enough.


Compiled Lua vs LuaJIT is a major example imho, but maybe it's not especially pertinent given the looseness of the Lua language. I do think it demonstrates that the concept that it is possible to have a tighter type-system at runtime than at compile time (that can in turn result in real performant benefits) is a sound concept, however.

The major Javascript engines already have the concept of a type system that applies at runtime. Their JITs will learn the 'shapes' of objects that commonly go through hot-path functions and will JIT against those with appropriate bailout paths to slower dynamic implementations in case a value with an unexpected 'shape' ends up being used instead.

There's a lot of lore you pick up with Javascript when you start getting into serious optimization with it; and one of the first things you learn in that area is to avoid changing the shapes of your objects because it invalidates JIT assumptions and results in your code running slower -- even though it's 100% valid Javascript.


Totally agree on js, but it doesn't have the same easy same-language comparison that you get from compiled Lua vs LuaJIT. Although I suppose you could pre-compile JavaScript to a binary with eg QuickJS but I don't think this is as apples-to-apples comparison as compiled Lua to LuaJIT.

Any optimizations discovered at runtime by a JIT can also be applied to precompiled code. The precompiled code is then not spending runtime cycles looking for patterns, or only doing so in the minimally necessary way. So for projects which are maximally sensitive to performance, native will always be capable of outperforming JIT.

It's then just a matter of how your team values runtime performance vs other considerations such as workflow, binary portability, etc. Virtually all projects have an acceptable range of these competing values, which is where JIT shines, in giving you almost all of the performance with much better dev economics.


I think you can capture that constraint as "anything that requires finely deterministic high performance is out of reach of JIT-compiled outputs".

Obviously JITting means you'll have a compiler executing sometimes along with the program which implies a runtime by construction, and some notion of warmup to get to a steady state.

Where I think there's probably untapped opportunity is in identifying these meta-stable situations in program execution. My expectation is that there are execution "modes" that cluster together more finely than static typing would allow you to infer. This would apply to runtimes like wasm too - where the modes of execution would be characterized by the actual clusters of numeric values flowing to different code locations and influencing different code-paths to pick different control flows.

You're right that on the balance of things, trying to say.. allocate registers at runtime will necessarily allow for less optimization scope than doing it prior.

But, if you can be clever enough to identify, at runtime, preferred code-paths with higher resolution than what (generic) PGO allows (because now you can respond to temporal changes in those code-path profiles), then you can actually eliminate entire codepaths from the compiler's consideration. That tends to greatly affect the register pressure (for the better).

It might be interesting just to profile some wasm executions of common programs. If there are transient clusterings of control flow paths that manifest during execution. It'd be a fun exercise...


Why? My only guess is that the instructions don't match x86 instructions well (way too few Wasm instructions) and the runtime doesn't have enough time to compile them to x86 instructions as well as, say, GCC could.

To be fair, x86 instructions don't match internal x86 processor architecture either.

How don't they? Most x86 instructions map to just one or two uops as you can see at https://uops.info

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: