Readers may prefer the original post by the programmer behind the rewrite: https...

tomcam · on Aug 25, 2023

Everything about that article is delightful. Author is way too modest to point out that it’s a stunning achievement.

_a_a_a_ · on Aug 25, 2023

I'm sorry, 6,000 FPS? How is that possible?

squeaky-clean · on Aug 25, 2023

It means it's calculating 6000 frames per second, but all of them don't need to make it to the screen. On a 60hz screen, every 100th frame calculated would actually be displayed. (If we assume a perfect 6000fps, the video shows fluctuations between 4000fps-7000fps.

If you just mean how can it be so high in general? Old, well optimized, games run really well on faster modern hardware. I remember getting over 1,000 fps in Guild Wars 1 on a GTX 1060 when looking at an area with no monsters/NPCs.

edit( this paragraph doesn't apply here) ~~The PS1 also doesn't have floating point math, mostly everything is done in fixed point with integer math which is obscenely fast compared to floating point (it could also simulate an FPU if the precision is absolutely necessary but that's not suitable for realtime)~~

Just read further down the article that they converted it to float. I guess the game is just super optimized. I would have thought fixed point savings factored in here.

_a_a_a_ · on Aug 25, 2023

> all of them don't need to make it to the screen

ah gotcha, thanks. The bandwidth would have to be terabytes/sec. Come to think of it, pretty dumb of me to think they did.

squeaky-clean · on Aug 25, 2023

Nah, most people would probably not think of unseen frames being generated unless you're into studying your graphics performance.

It also technically does make a difference in some games. Even if the frames don't make it to the screen, if your mouse polls at 1000hz then you can get more precise input tracking by rendering up to 1000fps. It's something only really noticeable to professional gamers. But pro CSGO players are positive they can feel an improvement when running at fps more than double their monitor rate.

kmeisthax · on Aug 25, 2023

Also keep in mind that games don't run at a constant FPS. The "FPS" stat that we talk about is an average of the time each frame took to process and render, 1/x. But we can also talk about deviation from that average, which is where you get "1% low" and "0.1% low" stats. If your 1%s or 0.1%s dip below your monitor refresh rate, that becomes microstutter, which is way more noticeable than a reduction in the average frame time.

Wanting to have FPS at 2x your monitor's refresh rate will[0] provide a significant safety margin against frames that take too long to render. I will also point out that this 2x rule came into the community before we had adaptive sync, so the alternative would be turning off vsync in the vain hope that microstutter would get lost in the tearing.

For the record, "microstutter" wasn't really a common thing people talked about until fairly recently, so I'm kind of applying today's more scientific analysis of game performance to things that back then were largely superstitions that happened to work.

[0] Assuming a fast CPU and GPU drivers worth using

squeaky-clean · on Aug 26, 2023

Very true, I didn't want to go too deep in one comment, but I tried to mention that the video of the modded Wipeout is fluctuating between 4000fps and 6000fps, but that doesn't cover the fact that 5999 frames may have been calculated in 0.1s and 1 frame took 0.9s

I actually tried increasing the input handling resolution of a game of mine by running the input polling and physics system 4x for each actual rendered frame. I thought it would be a great idea, you can get more precise input and physics without also calculating batching, draw calls, or sending anything to the GPU.

Only problem was that because the actual game logic and physics in my game was so simple, the visual frame would take about 5ms to calculate/render, and then the 4 non-visual frames would take about 0.2ms each to calculate. So even with a 1000hz mouse it was running the non-visual series of frames faster than the mouse would update, and then stalling for 5ms while the visual frame rendered.

syntheweave · on Aug 26, 2023

When trying to run one system at a higher rate than the other(whether it's render, physics, input) the principles of DSP come into play: you're running a bunch of things that can be coordinated together for maximum smoothness if you define the sample rates to be consistent and interpolate the sampled data appropriately. If you can run something very fast but you don't define a set pace, you don't have a theoretically sound starting point, and that's where a lot of game timing systems(from basically every era of gaming) fall over and accidentally drop information or add timing artifacts.

So, like you, I made a system with a target physics refresh that was above render, but I did it with the goal of visually smooth operation - and therefore, I derived a frame pace from time-from-bootup, and guided the physics refresh not around the direct multiple of vsync, but "how many frames do I need to issue to keep pace with real time as defined by the system clock". Doing this naively creates disturbing rubberbanding effects, since the pace naturally oscillates, but adding a low-pass filter to reduce jitter and a "dropped frames" adjustment produced motion with a very satisfying quality.

I'm forgetting precisely what I did with the input, on the other hand, but I think I determined that "as fast as possible" was still an improvement because the way I was issuing frames was reducing the amount of aliasing of deadlines on the margin.

It's an area where you can definitely get pretty sophisticated. Many emulators for older systems now are emulating ahead, displaying that result, and then rolling back emulation state to create a configurable negative input latency.

squeaky-clean · on Aug 26, 2023

Funny you mention DSP, that's a hobby of mine (for music) and I've never really connected DSP fundamentals and FPS outside of animation keyframes.

Your system sounds a lot like the current industry standard for deterministic physics engines if the input was processed without regards to the physics or rendering speed (you just need to run the physics at a fixed tickrate and it's deterministic). Did you wait for the real time that physics should be occurring at? Most of those don't actually run the physics ticks in realtime, basically if you're not rendering them you can process a bunch of them in a row and just simulate the clock stepping forward. For physics decoupling I normally just use Unity's built in interpolation system which does it that way, but I was trying to get fancy here. The issue in my case is because it depends on external input the physics processing would need to occur at specific real times. And unless I can know the time of the next frame in advance, that's difficult and not entirely possible (I didn't want to enforce vsync). It would have been fun to go down that rabbithole but at that point I decided to take the easy path and tie input polling to the render rate.

And then like 6 months later Unity released the new Input system which can be completely decoupled from any kind of framerate and just gives you realtime input timing values if you want.

spookie · on Aug 25, 2023

Very much true, just wanted to add something. In the before times, games would be running on pure forward renderers. Which, in turn, would mitigate many of the the frame timing inconsistencies of the more complex pipelines of today! Some still do that, of course. With the exception that we aren't relying in fixed function hw anymore!

kmeisthax · on Aug 26, 2023

Wait, really? That's odd. I thought the whole point of a deferred rendering pipeline was to reduce inconsistency by doing all your lighting calculations in one pass on one quad. In forward rendering you have to worry about overdraw - i.e. if you have a model that's half-obscured by another, but you render it first, you still wind up drawing the whole model, including the expensive pixel shader material you attached to it[0]. With deferred rendering all your model is doing is drawing textures to various channels in the G-buffer, which is cheap.

I thought the major downside of deferred was memory bandwidth - i.e. you have to write and read the entire G-buffer with at least 11[1] channels in order to produce an RGB image. That's a cost you pay every frame so it wouldn't hurt frame time consistency.

Meanwhile in forward-land it was the case that your FPS was extremely viewport dependent. Like, I remember looking down at a floor would double FPS, looking at a large scene with a bunch of objects or people in it would tank FPS, etc.

[0] Unless you get lucky with drawing order or sorted everything from front-to-back so that you can rely on early depth testing to kill those pixel draws. Which would also ruin your frame time consistency.

[1] XYZ normal, depth, RGB diffuse, RGB specular, and some kind of 'shininess' parameter that controls the specular exponent. Most practical deferred implementations will also have either a "material ID" channel or some special-purpose channels for controlling various visual effects in the game.

This is also why Breath of the Wild has a weird column where if you stand inside of it Link stops getting toon shading.

spookie · on Aug 27, 2023

That's true, and a very good counter argument. But, bear in mind, with fixed function hardware I was assuming no lights were involved.

ufjfjjfjfj · on Aug 25, 2023

Isn't it a waste of computing power to produce 6000 fps and only use a fraction of them

ComputerGuru · on Aug 25, 2023

Yes, but in practice, no. These games are usually coded as a loop that runs with full CPU as fast as it can (unless capped, which the old one wasn’t), using as much cpu as is available. In that case, the fps is a side effect of how long the loop takes to run each pass (which is what happened here) - i.e. you don’t determine the fps, the fps is a result of how complicated or (in)efficient your code is. So going from running at 30 fps because it was so poorly coded and made such inefficient use of the cpu to running at 6000 fps because it now completes each loop pass that much faster, the cpu usage is actually the same.

Now if your code is so optimized that it can run at 6000 fps, at that point you can say “gee, I don’t need this many updates a second, let me cap it to x frames per second.” But how do you do that? The GPU is grabbing finished frames out of the buffer at its own pace, whether you are generating them at 6k/sec or just 5/sec. To cap your cpu consumption you would usually say “we need a new frame every 0.015s to always have a new frame ready for the GPU so that the screen updates sixty times a second, so if we finish a frame in 0.001s instead, sleep (effectively yielding cpu consumption to other processes) for 0.01 seconds after we run through the loop” - but while that may work for some things, there are other stuff that need to happen “in real-time” such as reloading the audio buffer (to avoid pauses or corrupted/garbled audio), etc and you also can’t rely on the system to actually wake you before 0.015s even though you asked it to wake you after just 0.01s to be extra safe.

Tl;dr, yes, once your code is running at 6k fps, then capping it to reduce consumption is an option, but running at 6k fps doesn’t actually increase cpu vs inefficiently running at 30fps.

johncoatesdev · on Aug 25, 2023

You can get a callback when a frame is going to get drawn and only render then. That way you don't render needless frames.

Your logic loop that controls the game state can be set to an optimal tick rate so it's not just maxing out a core.

The audio buffers I've worked with have also supported callbacks so they can remain optimally filled.

leptons · on Aug 25, 2023

It's possible that going far above "6000fps" might be necessary someday for holographic/3D displays that need to render the scene from hundreds or thousands of different viewpoints for one single frame.

Say you need to render a scene from 1000 different angles for a 3D display, just to get to a 60hz refresh rate you would need to render the scene 60,000 times.

astrange · on Aug 27, 2023

This is the game update loop, which excludes rendering. (for some reason people still use FPS which is confusing)

I'm not aware of any displays like that, but if there were, you could optimize by eye tracking each viewer and only rendering the direction they're seeing it from. The "New 3DS" (note: different from the regular 3DS) did this.

ufjfjjfjfj · on Aug 27, 2023

Reminds my of the Turbo button

veave · on Aug 25, 2023

That is so absolutely false. Any game you run if you don't cap fps it uses 100% of your gpu and potentially your cpu. As soon as you cap the framerate to 60 fps it starts behaving normally.

ComputerGuru · on Aug 26, 2023

I said as much, though. Read again. I had a caveat about when you start limiting your frame rate.

naikrovek · on Aug 26, 2023

"rendered" and "sent to the screen" are very different.

there are nuances about all of this which make both of you correct in different situations.

somat · on Aug 26, 2023

I think this was just a case of optimized code runs really fast, but sometimes the game will decouple the physics simulation from the graphics, I have seen this done in both directions, racing games where you want the physics to run faster than the graphics for a nice smooth car control, and building games where you want the physics to run slower than the graphics, mainly because you have so much physics you can not calculate it all every frame.

throwaway14356 · on Aug 25, 2023

you need about the same impossible accuracy to play this. im only half joking, if you want to improve you will need to somehow respond more accurately. Seeing someone play it well is mind blowing if you know how hard it gets.

squeaky-clean · on Aug 26, 2023

It's not intended to run at 6000fps. That's just how quickly it will run without any form of limiter. You can use your GPU settings to limit the framerate, or many games have a built in frame-limiter.

lunchdetail · on Aug 25, 2023

They're talking about the speed of the internal engine, not the display. So the display could still only be showing 60 frames to the user each second (or 120, or anything) but the internal engine is running at 6000fps.

Plenty (maybe nearly all?) of games do this because modern engines decouple the engine speed from the display speed. In older systems where you knew the engine was only going to run for a specific game on specific hardware (e.g. a SNES or GameCube or PlayStation), and you knew you were always going to be targeting 30fps, no more no less, you could pretty safely assume the game would _always_ run at 30fps and could use a "frame" as a unit of time. So if you want some in-game action like a melee attack to take 1 second, you could just count 30 frames and you would know it was 1 second long. But if somehow this game was later run at 60fps, that same attack would now only take .5 seconds, since there were twice as many frames in a second now.

So if you took a game like this meant for 30fps and ran it at 60, everything would just run twice as fast. You wouldn't actually be able to play the original game at anything higher than the original frame rate.

What they're saying here is that they decoupled the two, where originally they were coupled. So now the game can run at high fps and feel smoother than the original lower fps rate, but the gameplay is still at the original intended speed.

bityard · on Aug 25, 2023

Interesting twist: Wipeout XL/2097 for PC was a terrifically bad port, and the game speed was proportional to how fast your video card could draw the 3D scene, just as you describe.

There was a patch at some point to fix this, but honestly just easier to load the game up in a PSX emulator these days.

mmastrac · on Aug 25, 2023

The code was probably hyper-optimized for decade+ old hardware.

db48x · on Aug 25, 2023

No, it wasn’t hyper–optimized, it just doesn’t draw many triangles or use many light sources. There is simply a lot less for the renderer to do than in a modern game.