Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Just make sure not to get caught in the pitfall that is maximum render speed, which can lead to missing out on efficiency during slow and partial rendering.

Missing damage tracking, always painting everything (even when the window is a full 4k monitor), etc. kills performance and input latency when dealing with realistic redraw workloads like text editing, blinking cursors and progress bars. Much too often to terminals worry only about the performance of `cat /dev/urandom`...



> kills performance

And battery.

I gave up on alacritty because it was always using the dedicated graphics card of my MacBook and there was no reasonably way to use the integrated graphics card because it was “low performance”.


- Ghostty does vsync by default and supports variable refresh rates (DisplayLink). If you're on battery and macOS wants to slow Ghostty down, it can and we respect it.

- Ghostty picks your integrated GPU over dedicated/external

- Ghostty sets non-focused rendering threads as QoS background to go onto E-cores

- Ghostty slows down rendering significantly if the window is obscured completely (not visible)

No idea if Alacritty does this, I'm not commenting about that. They might! I'm just talking from the Ghostty side.


That's a great approach.

Not sure on the current state of Alacritty, but a few years back the suggested solution for users interested in battery performance was to switch to a different terminal emulator: https://github.com/alacritty/alacritty/issues/3473#issuecomm...


I, a person who don't care about battery performance one iota (because my computer has no battery), love this answer and approach. Not every software is for everyone, and authors drawing a line in the sand like that works out better for everyone in the long-term, instead of software that kind of works OK for everything.


In some cases yes. In this case, in my opinion it can be strictly wrong.

The GPU requirements of a terminal are _minuscule_ even under heavy load. We're not building AAA games here, we're building a thing that draws a text grid. There is no integrated GPU on the planet that wouldn't be able to keep a terminal going at an associated monitor's refresh rate.

From a technical standpoint, there is zero downside whatsoever to always using the integrated GPU (the stance Ghostty takes) and plenty of upside.


Because _my_ computer has no battery. There is a plethora of computers out there with batteries who can run Linux, Windows, and macOS. These computers can, on paper, run Alacritty.

Cherry on top is me being a former user of a MBP 2010 who'd crash when using discrete GPU (it was _the_ reason Apple went with AMD later on). And some apps insisted on using it, even when I disabled it.

I like Rust applications but I don't like this response. The dev sounds worn out; whereas the dev of Ghostty seems to be a pleasure to deal with.


More than happy for software authors to draw a line in the sand - I’ve done that myself too.

I just find myself on the other side of the line for Alacritty.


This is the Alacritty answer to a lot of queries. I took the advice, eventually.


Yep. That comment was when I stopped using Alacritty.


> - Ghostty slows down rendering significantly if the window is obscured completely (not visible)

About this. For whatever reason, I often end up with foreground windows (e.g. Chrome) covering the background window entirely, except for a sliver a few pixels wide.

Would Ghostty handle this case? I don't believe there's any point in full-speed rendering if less than a single line of text is shown, but the window isn't technically obscured completely.


We rely on the OS to tell us when we're obscured, and macOS will only tell us if the window is fully obscured (1 pixel showing is not obscured).


> - Ghostty sets non-focused rendering threads as QoS background to go onto E-cores

Assuming you're referring to Apple Silicon chips, how does Ghostty explicitly pin a thread to an E-core? IIRC there isn't an explicit way to do it, but I may be misremembering.


The QoS influences the core threads are placed on https://developer.apple.com/documentation/apple-silicon/tuni....


You can tell something to run on e-core just not p-cores


can i configure it to run on dedicated? im on desktop, power consumption is not an issue.


I struggle to understand why any of the above approaches would cause any impact worth maintaining a configuration flag over.


> Missing damage tracking, always painting everything (even when the window is a full 4k monitor),

Out of curiosity, what GPU accelerated terminal does this.


+1, no idea.

But maybe to add a little bit of context, "damage tracking" means for example, that if there is any ongoing animation (like a spinner), then only a small part of the screen will be re-rendered (with proper vertex-time scissors, so only relevant pixels will be computed). I am not sure if it makes sense in the context of a terminal emulator, but it's certainly a big issue for any non-toy GUI toolkits.

GPUs are incredibly fast parallel computers, so you will likely not observe any perf difference (unless you need order-dependent transparency which you don't), but it might improve your battery life significantly.


No, damage tracking is important as it about reporting that you only updated that spinner, which means that your display server also knows to only redraw and repaint that area, and in turn that your GPU knows during scanout that only that area changed.

Without, even if you only redrew your spinner, your display server ends up having to redraw what might be a full screen 4k window, as well as every intersecting window above and below until an opaque surfaces are hit to stop blending.


Well it sounds like ghostty is like all the other major GPU terminal emulators (unless you know of a counterexample) and does full redraw, though it appears to have some optimizations about how often that occurs.

The power issue might be true in some cases, but even as foot’s own benchmarks against alacritty demonstrate, it’s hyperbolic to say it “kills performance”.


We do a full redraw but do damage tracking ("dirty tracking" for us) on the cell state so we only rebuild the GPU state that changed. The CPU time to rebuild a frame is way more expensive than the mostly non-existent GPU time to render a frame since our render pipeline is so cheap.

As I said in another thread, this ain't a AAA game. Shading a text grid is basically free on the GPU lol.


It's actually not at all free, even if you're barely using the shading capacity. The issue is that you keep the power-hungry shader units on, whereas when truly idle their power is cut. Battery life is all about letting hardware turn off entirely.

Also, if you do damage tracking, make sure to report it to the display server so they can avoid doing more expensive work blending several sources together, and in case of certain GPUs and certain scanout modes, also more efficient scanout. Depending on your choice of APIs, that would be something like eglSwapBuffersWithDamage, wl_surface_damage_buffer, and so forth.


> The issue is that you keep the power-hungry shader units on, whereas when truly idle their power is cut.

Even in the most perverse scenario of a single cell update the load for a terminal is still bursty enough that it's not like the GPU doesn't enter some power saving states. Running intel_gpu_top in kytty with a 100ms update is at least suggestive, it never drops below 90% RC6 (even at 50ms, which is a completely uselessly fast update rate we're still in the high 80s). If you're updating faster than 100ms legitimately, it's probably video or animation that is updating a large percentage of the display area anyway. The overall time my terminal is doing some animation while on battery is low enough that in practice it just doesn't matter.

https://en.wikipedia.org/wiki/Amdahl%27s_law

The problem you're up against is that, maybe if this was optimized most people would get 2 or 3 (or even 10) more minutes on a 12 hour battery life or something. No one really cares. Maybe they should, but they don't. And there's plenty of other suck in their power budget.

You make it sound like a binary power saving scenario, but it tends to be more nuanced in practice.

Most people run their terminal opaque, most display systems optimize this common case.

> Also, if you do damage tracking, make sure to report it to the display server

I'm not unsympathetic to your point of view. But I am skeptical that the power savings for most use cases ends up being much of a big deal in practice for most people (even accepting there may be some annoying edge cases that annoy some). I am interested in this topic, but I am still awaiting an example of a GPU accelerated terminal emulator that works this way to even make a real world comparison.


It is very nuanced, but it's important to realize how small the power budget is and how just tens of milliwatt here and there make a huge difference.

To get over 12 hours of battery life out of a 60 Wh battery - which isn't impressive nowadays with laptops rocking 20+ hours - you need to stay below 5 watts of battery draw on average, and considering that the machine will likely do some actual computation occasionally, you'll need to idle closer to 2-3 watts at the battery including the monitor and any conversion losses.

The really big gains in battery life are from cutting tens to hundreds of mW off things at the bottom by keeping hardware off and using fixed function hardware like e.g. avoiding rendering and doing direct scanout and using partial panel self refresh. Execution units do not turn on and off instantly so pinging them even briefly is bad, and the entire system needs to be aligned with the goal of only using them when necessary for them to stay off.

Efforts like libliftoff to do efficient plane offload to avoid render steps in the display server can save in the area of half a watt or more, but it's not a whole lot of help if applications don't do their part.

Bigger GPUs than your iGPU (or even just later iGPUs) will also likely see even bigger impacts, as their bigger shader units are likely much hungrier.

(As an aside, I am not a fan of kitty - they have really weird frame management and terrible recommendations on their wiki. Foot, alacritty or if ghostty turns out good, maybe even that would be better suggestions. Note that comparing to foot can give a wrong image, as CPU-based rendering pushes work to the display server and gives the illusion of being faster and more efficient than it really is.)


Well I would be very interested in some of this, but it all seems theoretical and mythical. Seriously, what terminal is giving 20% better battery life (or whatever number that people will notice) than kitty.

How can I observe any of these claims in practice? You’ve put some down some bold claims about how things should be done but no way to verify or validate them at all. Put up with some real power benchmarks or this is just crack pot.

> To get over 12 hours of battery life out of a 60 Wh battery - which isn't impressive nowadays with laptops rocking 20+ hours

I used 12 hours to be nice. The sell of getting another 10 minutes or so out of 20 hours is even more stark.

The cases where you push a line and scroll, you're repainting most of it anyway. The cases where you're not are end up being infrequent enough that optimizing them in the ways suggested makes an unnoticeable impact. Build it and they will come maybe?

> Bigger GPUs than your iGPU (or even just later iGPUs) will also likely see even bigger impacts.

In most cases people can get by with an iGPU for a battery laptop cases. If you're in a must pull down more graphical power case, you're often plugged in and few care about 10s of milliwatts then.

> (As an aside, I am not a fan of kitty - they have really weird frame management and terrible recommendations on their wiki. Foot, alacritty or if ghostty turns out good, maybe even that would be better suggestions. Note that comparing to foot can give a wrong image, as CPU-based rendering pushes work to the display server and gives the illusion of being faster and more efficient than it really is.)

Once again, what is the exemplar of an efficient terminal then. We've already established ghostty doesn't operate the way you think it should so how can it turn out good?


Perhaps that came across wrong, no shade intended, what was meant was that ghostty seems to be like all the other mature GPU based emulators. Which means there's no damage reporting to a display server or anything like that. I don't think it's quite the deal breaker the GGP implies.


That's what I meant to say. There are 2 parts, and both need to work correctly, otherwise you're wasting power.


Right, input latency is what matters for me. I'm not seeing whether they've measured that in the docs/on Github.


He mentions input latency[1] as one of 4 aspects of being fast that were considered during development. I’m not aware of how that was tested, but would trust that it outperforms Iterm2 in that regard.

[1] https://www.youtube.com/watch?v=cPaGkEesw20&t=3015s




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: