That expectation is off by nearly a decade. WASM games in terms of performance are closer to 2000 than 2010. Here's an open source indie game from 2003 that I ported to WASM [1]. It struggles to hit 60 FPS on devices that were released a good 15 years after the game itself. Browsers sometimes struggle to animate simple lines at 60 FPS still to this day, because they're just hugely massive platforms with thousands of moving parts with no room for big, complex apps on top of them.
Glancing at the project, one area where things could go terribly wrong is the translation layer from GL 1.x to WebGL. For instance 'tricks' that are common on native GL implementations like buffer orphaning are actually harmful on WebGL and may cause lock stalls which would cause exactly the issues you describe (e.g. struggling to hit a consistent frame rate despite extremely low CPU and GPU load).
Even though WebGL and WebGL2 are implementations of the GLES2 and GLES3 APIs you usually can't just take a non-trivial GLES2/3 code base and expect it to work without hickups on WebGL, what's going on under the hood is just too different.
A couple of years back I also wrote a pure WebGL renderer for Neverball levels [1]. No physics, no translation to WebGL, just the 3D renderer part in direct WebGL. It also has a surprisingly low performance ceiling. I'd say I managed even worse than what the gl4es translation layer does for Neverball. By far the biggest performance boosts were instanced arrays and vertex array objects - basically just reducing the number of calls into WebGL. Seems to me that WebGL has a lot of overhead with or without a translation layer.
I did some GL benchmarking for simple draw calls (on desktop) a couple of years ago, and while it's true that WebGL came out lowest, the difference between native drivers were much bigger. I basically checked how many trivial draw calls (16 bytes uniform update + draw call) were needed on Windows before the frame rate drops below 60 fps, and for WebGL in Chrome this was around 5k, Intel native was around 13k, and NVIDIA topped out at arond 150k (not all that surprising, since NVIDIA has traditionally the best GL drivers).
It is definitely true though that you need to employ old-school batching tricks to get any sort of performance out of WebGL, but that's also not surprising, because WebGL is only a GL by name, internally it works entirely different (for instance on Windows, WebGL implementations are backed by D3D).
Let's be clear what Flash was though. It was a native binary that would paint to a region owned by the browser. So yes, it was performant because basically the only thing it was doing differently than any other native application was how it was hosted by the browser.
That's also why it was such a security nightmare. It was a complete escape from the browser sandbox.
It was a great development experience, destroyed by pitch forks and lanterns by the folks that a decade later have failed to deliver a sound alternative.
Hence why the gaming industry is focusing on streaming instead.
Flash was destroyed by Apple, which had no interest in improving the web either. The iOS platform turned out to be a pretty good replacement for Flash though.
Sure, but what's the point if it's not supported by Safari. Adobe Air was at best a porting aid to get your old Flash code wrapped in an iOS app, but the writing was on the wall and nobody in their right mind would start new Flash projects.
PNaCl was a joke though, it suffered from much longer client-side compilation times than both asm.js and WASM ever had, and performance was at most on par with asm.js.
(NaCl was better in those areas, but required to stamp out one binary for each target CPU architecture)
There's definitely something strange going on here. I'm getting about 3x as much CPU load on the WASM version as compared to native. Still low enough for a solid 144fps on my machine, but there shouldn't be this much overhead.
The number of calls needs to be lower for a WebGL application. You have to use as much instancing as possible. The security layer makes the calls the slowest part of the pipeline. That's why you see amazing things in shadertoy. When the whole shebang is in the shader it runs smooth.
Devtools or F9 while in game. Type a famous magic word in the title screen to access all levels. Not sure how to do this on a phone which is what I was referring to - but framedrops are pretty obvious.
I've done a quick test and it runs very smoothly on my original OnePlus One from 2014 (so 11 years after the game). This is on latest Chrome, Android 11 (LineageOS 18.1).
I still find it amazing that this game never made with this in mind, the web tech at the time on the OnePlus One was nowhere near able to run this in browser and it works perfectly today!
To be fair my reference browser is Firefox, WebGL is a fair bit slower there.
What blows my mind with this technology is little things: porting the game to the browser gave me a half-working mobile port basically for free (had to implement touch input handling in Neverball). On top of that, thanks to SDL2 and the game controller web API, I can attach a game controller to my phone and play the game in the browser on my phone with a game controller. It just seems unreal that this combination of technology just works.
[1] https://neverball.github.io/