Hacker News new | past | comments | ask | show | jobs | submit login
Fixing stutters in Papers Please on Linux (jhm.dev)
726 points by rdpintqogeogsaa on Jan 2, 2022 | hide | past | favorite | 190 comments




What the proposed patch does is delay a specific latent operation to an asynchronous context so that close() doesn’t block on that operation (which is freeing some memory).

The proposed patch isn’t a comprehensive fix, it admits there are still other sources of relatively high close() latency.

So that got me thinking, there is no way to fix this “bug” because there is no specification on how long close() should take to complete. As far as we are promised in user-land, close() is not an instantaneous operation. close() is a blocking operation! Even worse, it’s an IO operation.

So now I think the bug is in the application. If you want to avoid the latency of close() you should do it asynchronously in another thread. This is similar to the rule that you should not do blocking IO on the main thread in an event-loop based application.


Patch author here.

It is important to not conflate POSIX requirements with expected behavior, especially for device files which require very specific knowledge of their implementation to use (DRM ioctl's and resources anyone?).

You might think that as a well-behaved game should not be opening/closing evdev fds during gameplay at all, this is clearly just an application bug. However, games are not the main user of evdev devices, your display server is! This bug causes input device closure during session switching (e.g. VT switching) to take abnormally long - on the machine I discovered the bug on, it ends up adding over a second to the session switch time, significantly impacting responsiveness.

This is absolutely a kernel bug. I did not push the patch further as I had other priorities, and testing this kind of patch is quite time-consuming when it only reproduces in a measurable way on single physical machine. Other machines end up with a much shorter synchronize_rcu wait and often have many fewer input devices, explaining why the issue was not discovered/fixed earlier.

call_rcu is intended to be used wherever you do not want the writer to block, while alternative fixes involve synchronize_rcu_expedited (very fast but expensive), identifying if the long synchronize_rcu wait is itself a bug that could be fixed (might be correct), or possibly refactoring evdev (which is quite a simple device file).

As for putting things in threads, I would consider it a huge hack to move open/close. Threads are not and will never be mandatory to have great responsiveness.


> As for putting things in threads, I would consider it a huge hack to move open/close. Threads are not and will never be mandatory to have great responsiveness.

The POSIX interface was invented for batch processing. Long running non-interactive jobs. This is why it lacks timing requirements. All well-designed interactive GUI applications do not interact with the file system on their main thread. This is especially true for game display loops. The fundamental problem here is that they are doing unbounded work on a thread that has specific timing requirements (usually 16.6ms per loop). As I’ve said elsewhere, this bug will still manifest itself no matter how fast you make close(), just depends on how many device files are present on that particular system. It’s a poor design. Well designed games account for every line of code run in their drawing loop.

> This is absolutely a kernel bug.

I don’t think that is proven unless the original author can chime in. It’s your best guess and opinion that the author intended to not block on synchronize_rcu but it’s perfectly possible they did indeed intend the code as written. synchronize_rcu is used in plenty of other critical system call paths in similar ways, not every one of those uses is a bug. I would guess you might be slightly suffering from tunnel vision a bit here given how the behavior was discovered.

If it is indeed the case the synchronize_rcu is taking up to 50ms I would suspect there is a deeper issue at play on this machine. By search/replacing the call with call_rcu or similar you may just be masking the problem. RCU updates should not be taking that long.


> All well-designed interactive GUI applications do not interact with the file system on their main thread

I strongly disagree. A well-designed interactive GUI application can absolutely interact with the filesystem on its main thread without any impact to responsiveness what-so-ever. You only need threads once you need more CPU time.

The POSIX interfaces provide sufficient non-blocking functionality for this to be true, and the (as per the documentation, "brief") blocking allowed by things like open/close is not an issue.

(io_uring is still a nice improvement though.)

> I don’t think that is proven unless the original author can chime in.

This argument is nonsense. Whether or not code is buggy does not depend on whether or not the author comments on the matter. This is especially true for a project as vast as the Linux kernel with its massive number of ever-changing authors.

> If it is indeed the case the synchronize_rcu is taking up to 50ms I would suspect there is a deeper issue at play on this machine. By search/replacing the call with call_rcu or similar you may just be masking the problem. RCU updates should not be taking that long.

synchronize_rcu is designed to block for a significant amount of time, but I did not push the patch further exactly because I would like to dig deeper into the issue rather than making a text-book RCU fix.


> A well-designed interactive GUI application can absolutely interact with the filesystem on its main thread without any impact to responsiveness what-so-ever. You only need threads once you need more CPU time.

The "well-designed" argument here is a bit No True Scotsman, and absolutely not true. Consider a lagging NFS mount. Or old hard drives; a disk seek could take milliseconds!

Real time computing isn't about what is normal or average, it's about the worst case. Filesystem IO can block, therefore you must assume it will.


> The "well-designed" argument here is a bit No True Scotsman, and absolutely not true.

This counter arguments can be interpreted as a mere No True Scotsman of "responsiveness", so this is not a very productive line of argument.

Should one be interested in having a discussion like this again, I would suggest strictly establishing what "responsive" means (which is a subjective experience), including defining when a "responsive" application may be "unresponsive" (swapping to disk, no CPU/GPU time, the cat ate the RAM), and evading terms like "well-designed" (I included it in protest of its use in the comment I responded to).

For example, failing to process input or skipping frames in gameplay would be bad, but no one would see a skipped frame in a config menu, and frames cannot even be skipped if there are no frames to be rendered.


> Should one be interested in having a discussion like this again, I would suggest strictly establishing what "responsive" means (which is a subjective experience)

This has been established for years. This is the basis of building real time systems. For example, Flight control systems absolutely must be responsive, no exceptions. What does that mean? That the system is guaranteed to respond to an input within a maximum time limit. POSIX applications may generally give the appearance of being responsive but absolutely are not unless specially configured. There is no upper bound on how long any operation will complete. This will be apparent the minute your entire system starts to choke because of a misbehaving application. Responsive systems have a hard bound on worst case behavior.


> A well-designed interactive GUI application can absolutely interact with the filesystem on its main thread without any impact to responsiveness what-so-ever. You only need threads once you need more CPU time.

Hmm. If you call open()/read()/close() on the main thread and it causes a high latency network operation because that user happens to have their home directory on a network file system like NFS or SMB, your application will appear to hang. When you design applications you can’t just assume your users have the same setup as you.

> The POSIX interfaces provide sufficient non-blocking functionality for this to be true

POSIX file system IO is always blocking, even with O_NONBLOCK. You can use something like io_uring to do non blocking file system io but that would no longer be POSIX.

> Whether or not code is buggy does not depend on whether or not the author comments on the matter.

That would depend on if you knew more about how the code is intended to work than the original author of the code. Do you presume to know more about how this code is intended to work than the original author?


> That would depend on if you knew more about how the code is intended to work than the original author of the code. Do you presume to know more about how this code is intended to work than the original author?

I am not sure if you are suggesting that only the author can know how code is supposed to work, that finding bugs require understanding of the code strictly superior to the author, or that the author is infallible and intended every behavior of the current operation.

Either way, this attitude would not have made for a healthy open source contribution environment.


> that finding bugs require understanding of the code strictly superior to the author,

Evaluating whether or not something is a bug in a specific part of a system absolutely requires understanding the intent of the code equal to the author. You have found undesirable application-level behavior and have attributed the cause to a specific line of code in the kernel but it’s possible you are missing the bigger picture of how everything is intended to work. Just because latency has been tracked down to that line of code does not mean the root source of that latency is that line of code. Symptoms vs root causes.


close() is typically a blocking operation. But when it happens in devfs, procfs, tmpfs, or some other ram only filesystem I expect it to be fast unless documented otherwise.


Especially when you are in devfs you should not assume anything at all! Close in devfs is just a function pointer which is overridden by each of the myriad device drivers that expose files in /dev. Your close() could be the final one which lets the driver perform some cleanup. It might decide to borrow your thread to do it. Maybe some device was about to be ejected/disabled but could not previously because you were holding an FD to it.

The same goes for /proc and /sys which are very similar to /dev in that they represent various entry points into the kernel.


It can be slow every time if your AV software hooks close to do its expensive scan operation like Windows Defender.


> I expect it to be fast unless documented otherwise.

Logically you should expect it to block indefinitely unless documented otherwise. The exception would be completing within a time bound, the rule is blocking indefinitely.


> Logically you should expect it to block indefinitely

Frankly, that’s completely insane. It should block if and only if there is actual io in flight which could produce a failure return that an application needs. Syscalls should be fast unless there is a very good reason not to be.


> It should block if and only if there is actual io in flight which could produce a failure return that an application needs.

Blocking simply means that the specification does not guarantee an upper bound on the completion time. There is no other meaningful definition. POSIX is not an RTOS therefore nearly all system calls block. The alternative is that the specification guarantees an upper bound on completion time. In that case what is an acceptable upper bound for close() to complete in? 1ms? 10ms? 100ms? Any answer diminishes the versatility of the POSIX VFS.

> Syscalls should be fast unless there is a very good reason not to be.

I think this is an instance of confusing what should be with what is. We’ve been through this before with O_PONIES. The reality is that system calls aren’t “fast” and they can’t portably or dynamically be guaranteed to be fast. So far the only exception to this is gettimeofday() and friends.

Robust systems aren’t built on undocumented assumptions. Again, POSIX is not an RTOS. Anything you build that assumes a deterministic upper bound to a blocking system call execution time will inevitably break, evidenced by OP.


> The reality is that system calls aren’t “fast” and they can’t portably or dynamically be guaranteed to be fast.

Perhaps, but the reality is also that the vast majority of games and other interactive applications routinely make blocking system calls in a tight main loop and expect these calls to take an unspecified but reasonable amount of time.

“It’s a blocking syscall so if it takes 1s to close a file, that’s technically not a bug” is correct, but is any player of “Papers, Please” going to be sympathetic to that explanation? Probably not; they’ll think “Linux is slow,” “Linux is buggy,” “why can’t Linux run basic applications correctly that I have no problem running on Windows or OS X?,” etc.

“Syscalls should be fast unless there is a very good reason not to be” strikes me as a wise operating principle, which weights usability and usefulness of the operating system alongside being technically correct.


> “It’s a blocking syscall so if it takes 1s to close a file, that’s technically not a bug” is correct, but is any player of “Papers, Please” going to be sympathetic to that explanation? Probably not; they’ll think “Linux is slow,” “Linux is buggy,” “why can’t Linux run basic applications correctly that I have no problem running on Windows or OS X?,” etc.

I don’t agree with this logic. Windows and macOS system calls also block. The issue of people considering Linux to be slow is not relevant to the fact that its systems calls block. The poorer quality of Linux games, and commercial Linux software in general, is more likely due to smaller market size / profit opportunity and the consequential lack of effort / investment into the Linux desktop/gaming ecosystem.

Now if your argument is we should work around buggy applications and distribute hacked patches when the developers have abandoned them for the sake of improving user experience. I agree with that.

> “Syscalls should be fast unless there is a very good reason not to be” strikes me as a wise operating principle, which weights usability and usefulness of the operating system alongside being technically correct.

Linux already operates by this principle. We are examining a situation where best effort was not good enough to hide poor application design.


> Linux already operates by this principle. We are examining a situation where best effort was not good enough to hide poor application design.

Linux has this principle as a goal, but it's probably not checked often.

I would say this code fails the principle, independent of particular application problems.


> I would say this code fails the principle, independent of particular application problems.

For every system call you determine satisfies that principle, I could come up with a application level algorithm that is broken because of it. The principle is aspirational, Linux does a best effort as all Unix systems do not because Linux is buggy but because it can never be 100% given the spec. The core issue here was not close() taking 100ms or whatever it took, the core issue was doing unbounded work on the main drawing thread, which has strict timing requirements.


They're both problems.

This slowness is approaching the point where even checking for joysticks on a dedicated thread would start having delay problems. And spawning a thread per file would be ridiculous and would get even more scorn if it was slow, "why are you spawning so many threads, of course that's not efficient".


> This slowness is approaching the point where even checking for joysticks on a dedicated thread would start having delay problems.

Poorly designed code will perform poorly. Well designed code won’t have delay problems.

> And spawning a thread per file would be ridiculous and would get even more scorn if it was slow,

Where in this entire thread was it suggested to spawn a thread per file? Threads are able to perform more than a single unit of work.


> Poorly designed code will perform poorly. Well designed code won’t have delay problems.

If I need to open and close 20 files every few seconds, and they all might have unpredictable latencies, even the best designed code in the world could have delay problems.

> Where in this entire thread was it suggested to spawn a thread per file?

You just implied that checking all the files on a dedicated thread is still 'poorly designed code', didn't you?

So if a dedicated thread for the whole group of files isn't enough, sounds like you need to move to a thread per file. Unless it's wrong to use close() at all, or something? You can only blame the code so much.


> Blocking simply means that the specification does not guarantee an upper bound on the completion time.

I don't think that's a commonly-accepted (or useful) definition of "blocking." By that definition, getpid(2) is blocking.

> I think this is an instance of confusing what should be with what is.

Who is doing the confusing? I said "should be." Are you saying they're fast now but should be slow? Why?

> The reality is that system calls aren’t “fast” and they can’t portably or dynamically be guaranteed to be fast.

This isn't a portable program; it's a Linux program. The problem isn't that close can't be portably guaranteed to complete in some time bound; it's that Linux is adding what is essentially an extra usleep(100000), with very high probability, for the devfs synthetic filesystem in Linux.

This is entirely an own-goal; Linux has historically explicitly aimed to complete system calls quickly, when that does not break other functionality. It is a bug that can be fixed, e.g., with the proposed patch(es).

POSIX does not mandate that close blocks on anything other than removing the index from the fd table -- it's even allowed to leave associated IO in-flight and silently ignore errors. It makes little sense for a synthetic filesystem without real IO to block close so grossly.


CyberRabbi's definition of blocking is correct and what I've always seen commonly accepted.

Blocking means you don't know how long it'll take, and you want to wait for it to finish. The only safe assumption is that you cannot guarantee how long it'll take.

getpid is accurately therefore a blocking call. You don't know how long it'll take. You can profile and make best guesses, but you can never assuredly say how long it'll take.


Every operation in a non-RTOS is blocking by this definition, even local function calls that don’t enter the kernel, because the kernel may switch to another thread at any time. It’s utterly useless as a definition. Much more common is to divide system calls into ones that call depend on some external actor and those that don’t. Eg, recv() on a socket, blocking on a futex held by some other process, or waiting on IO to some disk controller. Getpid() is synchronous but does not block.


Blocking in that sense is usually used in relation to some event. E.g. sleep() blocks on a timer, read() blocks on IO, etc.

In the general sense, it means that the call has an indefinite run time. E.g. “this call blocks” = “this call could take an arbitrarily long amount of time”

getpid() is blocking but it likely does not block on IO (though it could as that is allowed by the spec).


If you call getpid, or even local functions, can the rest of your code (in a single thread) continue till getpid returns?

E.g if you do this inside a function (useless code)

int pid = getpid(); std::cout << pid+2 << std::endl;

Will the output print even if the hypothetical call to getpid takes a second?

If the answer is the print will wait, then it's a blocking call.

If it was an async call, then it could happen concurrently or in parallel, and unless you waited, it would continue on in a non blocking fashion.

Waiting for a return == blocking. It may be quick but unless the spec specifies that it must be synchronous+non-blocking, the distinction between the two is moot.


But with such an extreme definition, can you even show me what an an async non-blocking syscall would look like?

Because I'm going to point at the assembly instructions that pass the parameters, and say "an interrupt happens here, delaying it for 1 second".

Any definition of blocking that includes "int fifty() {return 50;}" strikes me as having problems.

More specifically, I'd say there's some amount of "kernel does a thing" that needs to be excusable when you're talking about whether a syscall is blocking or not, otherwise everything is blocking.

Unless we want to say that 'nonblocking' is fake on non-RTOS systems, and not even try to define the term in that context.


There are two points that I've made a couple times that are perhaps getting lost:

1. It's about blocking your logic flow, not about how the system is actually executing it or what the machine code resolves to. If a subsequent call is blocked on a previous one, then it's blocking. Spawning an async function or creating a new thread etc can be blocking, whereas what runs on it isn't (for your current thread).

2. Being blocking or not is independent of performance. A blocking function call can be near instant, it may get inlined, it may take a year to run. Similarly an async or non blocking call can also have the same time complexity. The issue is that if the spec doesn't say it returns instantly, or you don't know for sure that it does, you can't guarantee that the blocking time will be short enough to be acceptable. So while getpid or close will almost always return instantly, it's still blocking. And if the spec doesn't say it's guaranteed, then the performance acceptability in the hot path can change.

End of the day it's all just (often pedantic) semantics to let people describe the execution nature of things so devs can make the best decisions for their performance needs.


I think you replied before I added 'Unless we want to say that 'nonblocking' is fake on non-RTOS systems, and not even try to define the term in that context. "

Sure, the spec doesn't give a guarantee. But let's say it's impossible to give a guarantee on Linux. Is it really the best option to give up on defining 'nonblocking' entirely? Maybe we should formulate guarantees with an escape hatch for non-RTOS hazards. If we can do that, then getpid deserves one of those conditional guarantees.

And since I'm pretty sure the intent of mentioning getpid was to talk about the code, not the documentation, I think that would make it nonblocking.

> End of the day it's all just (often pedantic) semantics to let people describe the execution nature of things so devs can make the best decisions for their performance needs.

Which is why you don't want to label everything blocking. Nobody can have a useful discussion then.

And also why it's useful to talk about the execution nature of code, even when no spec exists. You don't want to get stuck on implementation details but you shouldn't ignore implementation either.

Edit:

> Spawning an async function or creating a new thread etc can be blocking, whereas what runs on it isn't (for your current thread).

There's some value in talking about functions that way, but for a syscall in particular you need a nonblocking spawn for the syscall to be nonblocking. If that's definitionally impossible, then something bad has happened to the definitions being used.


The only reason I mentioned that spawning threads/creating an async future is blocking is because you had mentioned that async would generate blocking assembly by my definition.

And I agree, it would and therefore the definition is potentially meaningless. But pedantically it is blocking (but the functions called within it aren't to the current thread).

In a colloquial every day sense, I'd not be this pedantic. but this is a thread specifically about that pedantry.

End of the day, if I were talking colloquially, I'd only talk about expensive blocking calls as being blocking, regardless of IO when responsiveness is important. Otherwise it doesn't matter unless it's parallelizable and there are performance gains to be had.


> And I agree, it would and therefore the definition is potentially meaningless. But pedantically it is blocking (but the functions called within it aren't to the current thread).

If I was going for maximally pedantic but still useful definitions, I'd say that a "[non-]blocking syscall" is a different concept from how you'd describe running functions synchronously or asynchronously. And to elaborate, something like: Code that runs asynchronously is non-blocking, code that runs synchronously can be either blocking or non-blocking, and a syscall always has at least some synchronous code.

I like the idea of saying a syscall is non-blocking if the spec says it returns instantly. But I would add on to that, and say that if "this is not a real-time-OS" is the only reason the spec doesn't say it returns instantly, then we should call that non-blocking too. Or "non-blocking*" with a footnote that mentions RTOS issues.

You ask about getpid() taking a second. I'd say that within the model of "put those RTOS issues aside", that doesn't happen and can't happen. Just like we usually exclude unplugging the computer from our execution model, so too we exclude "linux isn't RTOS" from our execution model. getpid can't get stuck waiting on any resources, and does only trivial computation, so it will return immediately.


> I like the idea of saying a syscall is non-blocking if the spec says it returns instantly.

”instantly” is not a strong enough guarantee to call the syscall non-blocking. The caller needs to know exactly how the callee will perform in terms of run time. Most high level RTOSes spec this as saying the call will take a constant amount of time, allowing you to measure the call once during your testing and using that to estimate future runs.

Words like “fast” “slow” “instantly” are not useful in the domain of building real time systems at all. It’s about specifying a predictable run time.

Without providing any spec on the runtime of a system call, the only robust assumption is to assume it blocks indefinitely. When you assume a run time spec for a call where one is not spec’d (e.g. close()) that will inevitably result in unexpected behavior. Using calls that take unbounded time in a process that has strict time requirements is a recipe for failure. The domain of real-time interactive systems is not the same as the domain of batch processing.

> You ask about getpid() taking a second. I'd say that within the model of "put those RTOS issues aside", that doesn't happen and can't happen. Just like we usually exclude unplugging the computer from our execution model, so too we exclude "linux isn't RTOS" from our execution model. getpid can't get stuck waiting on any resources, and does only trivial computation, so it will return immediately.

This further shows that there is a fundamental misunderstanding in how POSIX systems operate. It’s very possible for getpid() to take longer than one second during normal operation because it’s stuck on a resource and POSIX allows for that on purpose. Every entry into a system call invokes a litany of bookkeeping tasks by the kernel before returning to user space, with the exception of VDSO calls like gettimeofday(). Please see exit_to_user_mode_loop() which gets called before every syscall returns to user space to see all the potentials sources of additional latency a call like getpid() may incur: https://github.com/torvalds/linux/blob/c9e6606c7fe92b50a02ce...

Again this is not by accident, this is on purpose. You’ll find a similar loop in all POSIX kernel system call entry/exit code.


Pretend I said 10 microseconds everywhere I said instantly, then. Same argument, more or less.

Anything that could make getpid take too long is outside the scope of what linux could guarantee.

But inside that scope, it's still worthwhile to distinguish between "blocking" and "nonblocking with very specific exceptions"

> It’s very possible for getpid() to take longer than one second during normal operation being stuck on a resource

What resource? I did my best to look at the implementation, but the source code is complicated and scattered. I can't really process your link by itself. How often are these things causing delays?

"Being rescheduled" is already part of the model of any process, anyway. If a system call doesn't make it any more likely that my process stops compared to the baseline, then I think "nonblocking" is a reasonable term to want to use.


> What resource? I did my best to look at the implementation, but the source code is complicated and scattered. I can't really process your link by itself. How often are these things causing delays?

A signal may need to be invoked and that could cause paging to disk. The point is that the kernel is allowed to do a non-predictable amount of work on most system calls and therefore you cannot assume getpid() completes in any amount of time. If you’re building a real time interactive system, then this matters. If you’re building a system that’s allowed to be non-responsive (for running batch processes, network servers) then it doesn’t.


People are going to keep using non-realtime systems to run soft realtime UIs.

We can't make them stop, so it's still important to distinguish between "this syscall might hit a signal or an interrupt, just like every single line of code in the program" and "this syscall might hit a signal or an interrupt, but also it might get stuck waiting on a resource in a way that couldn't have otherwise happened".

If you want to suggest different terms from "nonblocking" and "blocking" I'm open to change. But in the absence of better terms, I'm going to keep using those, with an asterisk that says I'm inside linux and literally anything could technically block.


> People are going to keep using non-realtime systems to run soft realtime UIs.

Very true and if they want their applications to work well they should write they applications correctly!


The best way to help them write applications correctly is not to say "all syscalls are blocking, none are nonblocking, no other categories".

I'd say that the commonly accepted definition for a blocking call is one that may depend on I/O to complete, releasing control of the CPU core while waiting.

By that definition, getpid() is definitely nonblocking, though it doesn't have an upper bound in execution time. POSIX does not offer hard realtime guarantees.

close() in general would probably be blocking (as a filesystem may need to do I/O), but I'd expect it to behave nonblocking in most cases, especially when operating on virtual files opened read-only. Unfortunately, I don't think those kinds of behavioral details are documented.


A function that sleeps for 5 seconds is blocking. No IO involved.

Blocking just means that you're blocking your current code till you return out of the called function.

Anything else regarding a function call is an assumption unless you know the exact implementation.


> I don't think that's a commonly-accepted (or useful) definition of "blocking." By that definition, getpid(2) is blocking.

When it comes to expecting a specific duration, getpid() is blocking. If you run getpid() in a tight loop and then have performance issues you can’t reasonably blame the system.

> This isn't a portable program; it's a Linux program

But the interface is a portable interface

> POSIX does not mandate that close blocks on anything other than removing the index from the fd table

And what if the fd-table is a very large hash table with high collision rate? How do you then specify how quickly close() should complete? 1ms/open fd? 10ms/open fd? Etc.

It should be clear that the problem here is that the author of the code had a faulty understanding of the system in which their code runs. Today the issue was close() just happened to be too “slow.” If the amount of input devices were higher, let’s say 2x more, then the same issue would have manifested even if close() were 2x “faster.” No matter how fast you make close() there is a situation in which this issue would manifest itself. I.e. the application has a design flaw.


> Today the issue was close() just happened to be too “slow.” If the amount of input devices were higher, let’s say 2x more, then the same issue would have manifested even if close() were 2x “faster.” No matter how fast you make close() there is a situation in which this issue would manifest itself.

Close, on an fd for which no asynchronous IO has occurred, should be 10000x faster, or more. It’s unlikely a user will have even 100 real input devices. I agree the algorithm leaves something to be desired, but the only reason it is user-visible is the performance bug in Linux.

I’ve worked on performance in both userspace and the kernel and I think you’re fundamentally way off-base in a way we’ll never reconcile.


> I agree the algorithm leaves something to be desired, but the only reason it is user-visible is the performance bug in Linux.

The only reason it wasn’t user-visible was luck. Robust applications don’t depend on luck.

Something tells me you’ll think twice before calling close() in a time-sensitive context in your future performance engineering endeavors. That’s because both you and I now know that no implementation of POSIX makes any guarantee on the runtime of close() nor will likely do so in the future. That’s just reality kicking in. Welcome to the club :)


There's no guarantee for the runtime of any function. It's perfectly valid for the OS to swap your program instructions to disk, and then take seconds or even minutes to load it back.

It's effectively impossible to avoid depending on what you call "luck". The OS does not provide nearly enough guarantees to build useful interactive applications without also depending on other reasonable performance expectations.


> It's perfectly valid for the OS to swap your program instructions to disk, and then take seconds or even minutes to load it back.

It’s not valid to swap your program instructions to disk if you call mlock() on your executable pages. Indeed, performance sensitive applications do just that. https://man7.org/linux/man-pages/man2/mlock.2.html

> It's effectively impossible to avoid depending on what you call "luck". The OS does not provide nearly enough guarantees to build useful interactive applications without also depending on other reasonable performance expectations.

This is all self-evidently false. You likely wrote your comment on a POSIX-based interactive application. It just takes knowledge of how the system works and what the specifications are. Well-designed programs are hard to come by but they do exist.


Does mlock itself have a guaranteed maximum execution time? Is it guaranteed to return success under the relevant conditions? While that is an excellent way to address the problem I mentioned, you still have to depend on more than just the guaranteed behaviour of the OS.

> You likely wrote your comment on a POSIX-based interactive application. It just takes knowledge of how the system works and what the specifications are.

I wrote my comment on an interactive POSIX application, yes, but I believe my browser depends on "reasonable performance" of OS-provided functions in order to be usable.

It would be a fun exercise to evaluate such a program that supposedly did not. For any given program, I suspect I could patch the Linux kernel in such a way that the kernel still fulfilled all guaranteed behaviour while still making the program unusable.


I agree the application should not have done this. On the other hand I also agree indefinite block time is not a useful definition despite being correct in theory, perhaps a more pragmatic one would be some time / compute unit percentile? So a consistent 100ms close call which is proven to be a bug won't get lost in definition.


The machine is not running POSIX, it's running Linux which is POSIX-ey, and an RTOS does not guarantee that system calls do not block. The insistence on only referring to POSIX was what caused the O_PONIES debate in the first place.

If one assumes that "there is no upper bound on the completion time", then that also means assuming that a poll/read/write will never return within the lifetime of the machine as it could block for that long (maybe you're using this computer: https://www.youtube.com/watch?v=nm0POwEtiqE), and so it is impossible to implement a functioning, responsive application, much less a game.

In the real-world you need to make slightly more reasonable assumptions. And, again, when interacting with device files you must refer to the kernel documentation rather than POSIX, as POSIX does not describe how these files work in any meaningful way or form.


> poll/read/write

The “non-blocking” nature of those calls were invented for network servers, not for video games. Not only is jitter tolerable there but high latency is allowed from the lowest layers of the stack. It’s not uncommon to simply get no response from a network request.

A video game should never ever do arbitrary system calls on its main drawing thread unless those system calls are specifically intended for that use case. Jitter is not tolerable in this use case since the timing requirements are so strict. The code must product a frame every 16.6ms, no exceptions. The interface must never become unresponsive.

> RTOS does not guarantee that system calls do not block

RTOSes do indeed provide upper bounds for all calls.

> And, again, when interacting with device files you must refer to the kernel documentation rather than POSIX

Yes that would be a relevant point if it were the case that the kernel documentation for these devices specified that close() should complete within some time bound.


Very similar to people using node.getenv in hot sections of code and the resulting not understanding what's happening.

https://github.com/nodejs/node/issues/3104

When you call out to the sys or libc things are going to happen and you should try and be aware of what those are.


Sorry... what? Why the hell was an application using env() to carry application state?!

The environment list is created at init, it's literally placed right behind the C argument list as an array -- AUXV if you want to go read the ABI Specification for it.

Therefore, anything you grab using getenv() can be considered to be static (Barring use of setenv), so the proper and correct thing to do is shove the things you need into a variable at init. Unless you yourself are editing it, but you should still use a variable because variables are typed and getenv is not (Thinking along the lines of storing port information, or whatever, where you need to parse it into a string to get it into the environment, and then need to parse it out of a string). For things like $HOME, those only ever change once, and you should really have a list of those that you check, because you will want to check XDG_HOME_DIR, and a few other areas. So you will want those in a list anyway, might as well do it at creation time when the data is fresh.

Anything you set with setenv() only alters the your environment state, and that will carry down to newly created children at creation time. So the only reason I can think of why anyone would do this would be to communicate data to child processes. Except there are so, so many better and non-stringly typed ways to do this, including global variables. Child processes inherit copies(?) of their parent's state, you can just use that, so there is literally, NO reason ever to do this.


… unless you intend to exec after forking


Sure, but just use execvp and it's a damn sight safer because you know exactly the state of your child's environment state. You can see this in the CERT C coding guidelines: https://wiki.sei.cmu.edu/confluence/display/c/ENV03-C.+Sanit...

also ENV02-C comes into effect, as well, if your program is invoked with

    SOME_INTERNAL_VARIABLE=1 PORT=2000 ./prog
then you try to invoke your child with:

    setenv("SOME_INTERNAL_VARIABLE", "2", 1);
    (fork blah blah)


u/CyberRabbi is absolutely correct. It's true that for _some_ kinds of devices you could expect fast close(2) IF the device documents that. But as you can see, implementing this can be hard even for devices where you'd think close(2) has to be fast. Even a tmpfs might have trouble making close(2) fast due to concurrency issues.

The correct thing to do when you don't care about the result of close(2) is to call it in a worker thread. Ideally there would be async system calls for everything including closing open resources. Ideally there would be only async system calls except for a "wait for next event on this(these) handle(s)" system call.


Or, io_uring the thing. One could probably wrap close() with LD_PRELOAD and not touch the binary...


While tempting, you can’t generally fix this by simply patching close() with some function that converts it to an unchecked asynchronous operation. If that were the case, you could just do that in the kernel. Close() is expected to complete synchronously. This matters because posix guarantees that open()/pipe() etc. will return the lowest file descriptor not in use[1]. I.e. this should work:

    close(0);
    fd = open(“/foo/bar”, …);
    // fd is guaranteed to be 0
If you made close() just dispatch an asynchronous operation and not wait on the result, then the code above would break. Any code that uses dup() likely has code that expects close() to behave that way.

The other issue is that close() can return errors. Most applications ignore close errors but to be a robust solution you’d need to ensure the target application ignores those errors as well.

[1]: https://pubs.opengroup.org/onlinepubs/9699919799/functions/V...


I did not know this, and for some reason it really annoys me. Why are our process contexts littered with useless little synchronous properties? How many other tedious and slow bookkeeping tasks does the OS have to do just to meet some outdated spec that was probably just an ossified implementation detail in the first place? I feel compelled to make it so that new fds are explicitly randomized just so you can't do this, like how Go randomizes map iteration order.


This argument does not make sense - the kernel already needs to track per-process file descriptors. It just looks for the first hole instead of giving the "next" value.

Go's random map iteration does not apply here. Not only is this not an iterable map, the kernel has no problem providing this insertion guarantee so adding additional costly randomization has no benefit and just burns additional cycles.

Go would also be better off without, but they are catering to a different audience and different degree of specification, and apparently need to actively deter developers from ignoring documentation.


The correct term for this is not "developers ignoring documentation" it's "ossification" or Hyrum's Law:

    With a sufficient number of users of an API,
    it does not matter what you promise in the contract:
    all observable behaviors of your system
    will be depended on by somebody.

I guess that we got this "lowest available" rule because that's what the first implementation happened to do (it's the obvious thing to do if you have a single core), then someone 'clever' noticed that they could save 3 cycles by hard coding and reusing the fd in their IO-bound loop, and anyone that tried to implement fd allocation differently was instantly met by "your OS breaks my app", and thus the first implementation was permanently ossified in stone. To be clear I'm not making any historical claims and this is pure speculation.

"Stupid developers should have rtfm humph" is not a useful position because it ignores this behavior ossification.

The Go map example is actually very relevant, it's an "anti-ossification" feature that makes the behavior match the spec. If the spec says iteration order is not guaranteed, but in practice people can rely on it being the same in some specific situation (say, in a unit test on a particular version of Go) then the spec is ignored and it breaks people's programs when the situation changes (e.g. Go version updates). This actually happened. Instead of giving in and ossifying the first implementation's details into the spec, Go chose the only other approach: Make the behavior match the spec: "iteration order is not guaranteed" == "iteration order is explicitly randomized". (They do it pretty efficiently actually.)


As mentioned elsewhere, the file descriptor table is an array and a bitmask - finding the next fd is a matter of finding the first unset bit, which is extremely efficient. And that's before we ignore that the file descriptor table is read-heavy, not write-heavy.

Should you want to have per-process file descriptor tables, you can do just that: Just create a process without CLONE_FILES. You can still maintain other thread-like behaviors if you want. I doubt you'll ever sit with a profile that shows fd allocation as main culprit however.

> If the spec says iteration order is not guaranteed, but in practice people can rely on it being the same in some specific situation ... This actually happened.

If Hyrum's law held, the API would already be "ossified" at this point.

Instead, the Go developers decided to make a statement: "The language spec rather than implementation is authoritative". They broke this misuse permanently by making the API actively hostile, not by making it "match the spec" as it already did.

While one could interpret the current implementation as "anti-ossification", I interpret the action as anti-Hyrum's Law by choosing to break existing users in the name of the contract.


There you can queue the 'workflow' xkcd https://xkcd.com/1172 and while the joke is funny I wish everyone would stop breaking my workflow.

Maybe I'm getting old, or maybe I find the permanent useless change tiring. I'm looking at GNOME, Android, Windows in particular.


If we ignore POSIX for a moment, the kernel could avoid contending on the one-per-process fd map by sharding the integers into distinct allocation ranges per thread. This would eliminate a source of contention between threads.

In addition to violating POSIX’ lowest hole rule, it would break select(2) (more than it’s already broken).


This sounds like premature optimization. FD availability is tracked in a bitmask, and finding the next available slot is a matter of scanning for the first unset bit under a spinlock. This is going to be extremely fast.

While you could shard the file descriptor tables for CLONE_FILES processes such as threads, you would likely complicate file descriptor table management and harm the much more important read performance (which is currently just a plain array index and pretty hard to beat).

You could also juts create your processes (or threads) without CLONE_FILES so that they get their own file descriptor table. ------

The fdtable can be seen here: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/lin..., and alloc_fd and __fget can be found here: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/lin....


Are you sure such code exists? Doesn't the standard tell you to always treat the fd type as opaque anyway?

Referring to exactly the point you cite, the standard seems to be making no strong statement at all. It says to allocate from the lowest fd but that calls which may return multiple fds do not need to guarantee they are adjacent. I always took this to mean the values should pack downward and should not be e.g. allocated randomly, though it never seemed clear to me why, as the standard seems to be planning for multithreaded code.

So you are interpreting it one way, but the same statement seems to imply that fds are not meant to be introspected and should always be taken at face value from a call that generates a valid fd.


> I always took this to mean the values should pack downward and should not be e.g. allocated randomly, though it never seemed clear to me why,

The reason for this requirement is that early versions of Unix did not have dup2(), only dup(). It has nothing to do with multi threading as this predates pthreads by more than two decades. The shell (sh) makes use of the lowest numbered property to redirect standard in/out/error when setting up pipelines:

    int pipes[2];
    /* ignore errors */
    (void) pipe(&pipes);
    if (fork()) {
        close(0);
        /* guaranteed to return 0 */
        (void) dup(pipes[0]);
        close(pipes[0]);
        close(pipes[1]);
        exec_child();
    } else {
        close(1);
        /* guaranteed to return 1, we know 0 is taken */
        (void) dup(pipes[1]);
        close(pipes[0]);
        close(pipes[1]);
        exec_parent();
    }
   
Code like this exists in literally every POSIX shell. Anyone saying code like this isn’t common has no idea what they’re talking about.


> says to allocate from the lowest fd but that calls which may return multiple fds do not need to guarantee they are adjacent.

If the program has fd 0-3 and 5 open, socketpair should return 4 and 6, which are not adjacent. If socketpair is called again, while close(N) (N < 7) is being called in a separate thread, you could get {7, 8}, {N, 7}, or {7, N}, depending on kernel and timing details. All of those returns fit the requirement that the fds be allocated lowest first, but may or may not be adjacent or in absolute order.


>This matters because posix guarantees that open()/pipe() etc. will return the lowest file descriptor not in use[1]. I.e. this should work: close(0); fd = open(“/foo/bar”, …); // fd is guaranteed to be 0

On a multi threaded system that isn't guaranteed is it? Meaning, another thread could call open in-between your close & open.


It is guaranteed whether multi-threaded or not. It’s a process level guarantee. If your application is designed such that you don’t know what your other threads are doing then POSIX cannot help you.


What you’re getting at is that an individual thread cannot really use this property without some form of synchronization with other threads in the process. Eg, to use this property, other threads either do not allocate fds, or you take some central lock around all fd allocations. Most well-written programs do not rely on it.


Oooh yes, thanks you're right. That, would make for tricky shadow fd accounting... Ugh.


You should use io_uring to open and close files asynchronously, instead of open/close.


You would not use io_uring for things like that. Not only will you still use regular file operations on device files for various reasons, should you chose to use io_uring you would want it to run your entire eventloop and all you I/O rather than single operations here and there. Otherwise it just adds complexity with no benefit.


I don't see the big issue. There is no other way in Linux or Posix to open a file asynchronously (not sure about closing). Dan Bernstein complained about that 20 years ago(?) and io_uring finally fixes it. Before that, runtimes with lightweight processes/threads (Erlang, GHC) used a Posix threadpool to open files in the background. That seems just as messy as using io_uring, which at least keeps everything in the same thread.

http://cr.yp.to/unix/asyncdisk.html


Just curious, how did you nail it down to that specific issue and patch? That seems like a great skill to have.


With git, the command git bisect provides a mechanism for using binary search on a range of commits.


I just did a quick check the posted fix is not in the most recent -rc branch in the public git repo.


This is the issue with using mailing lists... Large numbers of perfectly good fixes, embodying many hours of effort, just get missed and forgotten about.

At least with GitHub PR's, every request either needs to be merged or rejected.


I’m no fan of mailing lists, but GitHub PRs get ignored in much the same way.


There’s ignored, and there is “not aware that it’s unresolved”. How do mailing list flows handle the “give me a list of open patches”?


People treat emails like tickets, representing things that should be done. They put them in particular email directories depending on their personal workflow. When they are either done or rejected, they delete the email, archive it, or mark it as read.

It's not unlike the github workflow except that it's up to each person to define the way they prefer to work. Not a single policy that's decided by the project owner. Planning/tracking also happens more in private instead of public. You may think that's worse, but perhaps you can also see why some might prefer it?


The modern trend is autoclosing PR's after they haven't had any activity in X months, so there's not much difference with mailing lists anymore...


I really enjoyed the debugging process here, and am glad to have learnt about the -k flag which seems to only be available on systems with strace version 5.5, at least for me.

As for the patch (and my love for all things Frida [1]), I think a call to Intercerptor.replace() after locating the symbol with Module.getExportByName() [2] would make for a simpler patch (at the cost of installing Frida). For example:

  const sym = Module.getExportByName("lime.ndll", "SDL_SemWait");
  Interceptor.replace(sym, {
    onEnter: function() {},
    onLeave: function() {}
  });
[1] https://frida.re/

[2] https://frida.re/docs/javascript-api/#module


From the description of the problem (a freeze every 3 seconds) I knew exactly what it was. You can fix it by simply upgrading SDL as they fixed this bug 2 years ago.

https://github.com/spurious/SDL-mirror/commit/59728f9802c786...


From one of the printouts in the post, it seems that Papers Please is using a bundled and statically linked SDL.

So it would be the game developer that would have to update the version of the SDL library. The binary patching done seems like a good-enough alternative in the meantime.

I have the feeling such bundling of dependencies is fairly common when porting games for Linux.


SDL has its own "dynapi" layer, where you can override it with your own copy of SDL even if it was statically linked: https://github.com/libsdl-org/SDL/blob/main/docs/README-dyna...


Just pray they didn't alter SDL like factorio does: https://news.ycombinator.com/item?id=27246164


Is the version statically linked recent enough to support it? Also, can’t decide if it’s genius or insane, that extra layer of dynamic linking…


According to [1] it was added (but not released) January 8th, 2014. Papers Please came out on Linux February 12, 2014 so I'd figure it's not in there unless the version of SDL was updated in a later update.

[1] https://old.reddit.com/r/linux_gaming/comments/1upn39/sdl2_a...

Edit: (what I believe to be) This freezing bug was only added 3 years ago, so it might actually have it.


At least for Steam you are recommended to link against Valve's Steam Linux Runtime which is a set of dynamic libraries including SDL.


It's common in application of certain size and compatibility expectations. Windows and Mac games will bundle their dependencies as much as possible as well. Same for large apps. Nobody wants to end to in a situation where their relatively expensive purchase doesn't work because of the version of local libs.


Does bundling dependencies imply static linking, though? Why can’t they just dynamically link to the bundled dependency?


Not really, it can be bundled either as static or dynamic.


Why is it even doing this on the main thread at all? The obvious thing would be to have a background thread polling for changes and then sending messages asynchronously to the main thread if a change actually occurred...


It is appropriate to use your main thread for your OS interaction - polling fds, talking to display server, whatever I/O you need, etc. An open/close call should never take this long, and you should never need to make a large amount of them in sequence after startup.

What should not be on your main thread is any long blocking compute, which is why rendering/game logic often goes to another thread - although simple games could easily be single-threaded.


> An open/close call should never take this long

> What should not be on your main thread is any long blocking compute

Isn't that contradicting yourself? I'm pretty sure open() can block.


> Isn't that contradicting yourself? I'm pretty sure open() can block.

No, blocking compute would be you doing something for a long period of time.

"open/close can block" means very little. You only need a mitigation if it ends up blocking long enough to be a problem in reasonable setups.

You do that have to care about what happens if someone runs the code on an intentionally terrible/horribly slow but technically in spec toy filesystem. No need to prematurely optimize for this scenario.

And especially with devfs you cannot just listen to POSIX and must know what the kernel is providing you - knowing how long operations take on such fds is normal design input.


"You only need a mitigation if it ends up blocking long enough to be a problem in reasonable setups."

That criteria is established here - the OP is about an issue affecting paying end-users! Premature optimisation is not relevant - the software is failing.

It is unsafe to make fair-weather assumptions about customer systems.

Consider a common software failure: where the user is saving data to a SMB or NFS partition, and then there is a loss of connectivity to that file-server, and the developer has done whatever I/O they need on the main thread. This causes data loss.

You /should/ be able to assume (1) reliably fast return from core async-coordination syscalls (e.g. select, poll), and (2) that you will not suffer process or thread starvation caused by someone else. Respecting those constraints, it is good practice to isolate sync calls to a non-main thread in order to catch when they are not returning. This robustly covers both common scenarios (like the missing-filesystem) and obscure scenarios like the one in blog post.


> That criteria is established here - the OP is about an issue affecting paying end-users! Premature optimisation is not relevant - the software is failing.

The software is not failing, it is experiencing performance degradation: a 500ms pause whenever excessive and entirely unnecessary work is done.

The solution is to not do the work. Moving open/close to a different thread is premature optimization, as no necessary call has been profiled to cause issues on any known system.

Performance 101, do not do things that you do not need done. Even if you want live hotplug and input reconfiguration during gameplay without touching any menus, you only open a device when it appears.

> It is unsafe to make fair-weather assumptions about customer systems.

It is more pointless to optimize for worst-case scenarios - experiencing performance degradation on a faulty system is fine.

All applications have a minimum performance requirement to remain responsive, which is equivalent to always making a certain degree of "fair-weather assumptions".

> Consider a common software failure: where the user is saving data to a SMB or NFS partition, and then there is a loss of connectivity to that file-server, and the developer has done whatever I/O they need on the main thread. This causes data loss.

This is non-sequitur - doing something on the main thread does not cause data-loss. Losing connectivity causes data-loss.

Heck, as main-thread I/O with an event loop implies non-blocking fds, you would not even be blocked by this unless you call fsync(2) to explicitly block until flush is complete, which a normal application does not need to do. The other (horrible) side-effects of network filesystems will cause problems for your application no matter how you interact with the fd.

Furthermore, you cannot use device files without reasoning about their exact implementation. They are not basic files.

> Respecting those constraints, it is good practice to isolate sync calls to a non-main thread in order to catch when they are not returning.

That's a hack, and is not even a solution. What are you going to do when they don't return? Accumulate dead threads and inconsistent shared application state?


"implies non-blocking fds"

I don't think the open and close calls respect non-blocking on linux when operating on files (specifically files, not sockets - for sockets the return is always quick as far as I know). From man open(2), "I/O operations will (briefly) block when device activity is required, regardless of whether O_NONBLOCK is set". And my recollection is that they will non-briefly block if the problem is a hanging NFS mount.

"What are you going to do when they don't return? Accumulate dead threads and inconsistent shared application state?"

Good point. My practice is to use child processes. These can be killed, and so I do not run into this. But subprocesses ratchets up the amount of work to be done, because you then need to do async IPC. So it's now a lot of extra work. It's even worse for multiplatform stuff because now you are exposed to platform differences (e.g. select is suboptimal on Linux, but poll is not available on Windows).

As you say, using threads within same proc would lead to stale threads. In some contexts this would be tolerable but it is not nearly as simple+clean as I presented.

Thinking hard about addressing this on Linux brings me down, every time. I hope io_uring will make pure async practical within a single process. Even if it does, the multiplatform story will remain complex. I am not fond of your disregard for user data in the (tangential) discussion about disappearing filesystems, but you have won me over to embracing the main thread in this context.


You shouldn't be doing IO on your game thread though. (Main and game thread may differ)


Because architectural simplicity is valuable and some operating systems deliver input events on a particular thread.


The problem probably only shows up on some machines and hasn't been noticed during development and testing. And TBH, polling what input devices are connected should never take more than a few microseconds, no matter how much operating system code sits between the hardware and game code.


The simple answer is that it wasn't needed for udev. It may not have blown up on the dev's machine because their input devices were different. It might be tested less than the udev version. As the other commenter stated it's simpler to just check every 3 seconds instead of adding threading.


This is one of the reasons I use Gentoo in my desktops: you can run a "stable" (as in old) system, but pull a more recent version of a library or application if you need it. For example, I remember having problems accessing files that I had stored in my mobile phone. Solution: updating libmtp and libmtp only. I suppose you can't do this in a distro such as Debian without upgrading half the packages.

Are there more distros that allow you to do this?


If you are advanced enough to run Gentoo, you should be able to use Debian and force an install of (or re-compile yourself) a new package of the newer version, working around the fact that the official newer version would otherwise require other new packages.


You could. But the amount of time it takes is significantly more than yay -S <package> or emerge <package> vs the hell that this poses on Debian (not to mention the dependency hell you can run into)


I don't know the situation on Gentoo, but partial updates are explicitly unsupported on Arch, not least because they don't do stable ABIs; Debian should have a much easier time upgrading just one package.


It's not necessarily recommended to mix stable and testing but it mostly works fine in my experience. I'd guess Gentoo gets around quite a few problems as everything is compiled from source. So updating a single libary would cause a rebuild of everything that depends on it.

Gentoo also has the concept of "Slots", so you could have multiple versions of the same libary installed and packages will choose their version to build against accordingly.


Having used Arch and Debian, I‘ve definitely had an easier time installing the latest version of arbitrary packages on Arch. Something like SDL is part of base and thus is already running latest.


Why is udev not used in this case?


Probably because a commercial developer is not going to want to link a LGPL library with their program. [0]

https://github.com/systemd/systemd/blob/main/src/libudev/lib...


Because this is the fallback when you compile without udev support


I can guess that, but I was wondering why ... Is there a distro without udev? The steam/whatever sandbox does not support udev?


It's not the system that is not supporting udev, it's the choice of the game developers how they compiled SDL .. without dependencies, and so without udev.


That's standard for game devs in the Linux world. The less dependencies you have to rely on the distribution for, the better - Windows stuff is either already present or shared OS-wide with binary backwards compatibility (=DirectX), so you can get away with shipping stuff that has a chance to run even 25 years in the future without major modification.


This reminds me of the recent Linus Tech Tips series on gaming on Linux[1]. Their conclusion is that although many games work out of the box (although usually not at launch), Linux is not ready for mainstream gamers. Not many people would have the expertise or the interest to troubleshoot the problem as OP did.

[1] https://www.youtube.com/watch?v=Rlg4K16ujFw


It definitely isn't. I've been a huge linux nerd since my preteens in the late 2000s, I jumped on to squeeze more performance out of the thoroughly mediocre hardware I had access to. I wanted to program, and I found Visual Studio to be incomprehensibly dense and confusing, while Linux tools were so much simpler, with GCC, GEdit, makefiles and the like being more to my liking. I fell deep into the rabbit hole, learned emacs, then vim (it was more responsive on my intel atom-powered netbook), became a "shell guru", eventually went to college at 16 and started doing cybersecurity work/pentesting professionally. I've even made a tiny contribution to the Linux kernel, which I'm pretty proud of.

All this anecdata to say, I consider myself pretty okay at using Linux, I "prefer" Linux, but I don't use Linux for gaming. Not unless it makes sense. I play Minecraft on Linux, and FOSS games that were developed on Linux. There's a POWER9 desktop on my desk that runs Linux, and all my professional and hobby work goes there. I love it.

But any commercial games? They go on my old college-days Intel desktop, running Win10. I can do the work to get games running on Linux, but why bother? Like Linus says in that video, when I have time to play video games, I really don't want to pull out a debugger and strace and crap to do more $DAYJOB work.

Not to say I never do that for fun. I do. I've done some work with https://github.com/ptitSeb/box86, and that involves a similar process. But I just frankly don't find doing it to your average Steam game to be very fun. Sometimes the muse strikes, usually it doesn't.

And for your average Linux user, much less your average computer user overall, you can forget about it. IMO, unless you have a strong ideological reason to only use FOSS OSes (and all the power to you!), the reason you use Linux is because it's a vastly superior tool for certain problems.

Playing your average commercial game is not one of them.


I disagree, in my experience with the recent improvements to Wine, drivers, etc, if you have a bit of Linux knowledge you can get games work just as fine in Windows - with the possible exception of games that use anticheat malware (because frankly, software that relies on kernel-level hacks, etc is basically malware). Though personally i stick with SP games anyway.

Also...

> Like Linus says in that video, when I have time to play video games, I really don't want to pull out a debugger

...Linus is most likely blind to all the issues he may have to get games to work under Windows because he's used to them. I've been playing games on Windows for decades and it was never a plug-and-play experience (...or it was, if you consider the original PnP experience back in the 90s :-P). If games worked perfectly under Windows you wouldn't have sites like pcgamingwiki.

Hell, i remember buying Tomb Raider 2013 back when it was new and having to trace through its registry calls on Windows to get it to work properly because its launcher was broken on my PC at the time. Incidentally that was supposed to be my relaxation time when i went home after work.

(of course i do not expect Linus -or most people- to do the same, they'd most likely just drop it for some other game and wait for a fix - but i had just bought the game and i wanted to play it right then)

In my experience it is rare to have a PC game on Windows play right away without any issues. If anything when it comes to slightly older games on Windows i had to use wrappers like DXVK to get games working properly due to the broken AMD Windows drivers.

At the past i might have written here that if you want a problem-free experience stick with consoles, but judging from videos i see from channels like DigitalFoundry, it seems consoles have a ton of issues nowadays too (it isn't common but i found it amusing that some people seem to jailbreak their Switches to make games work better :-P). And these come with their own issues anyway, personally i wouldn't touch any locked down DRM riddled system anyway.


Have you tried Steam's Proton compatibility layer yet? I was surprised to find many (not all) games in my library running with near 1:1 performance and stability to Windows.


It's not, but it really has come a far way and I'm extremely impressed. I'm kind of the other way around, I've never been more than a very casual gamer and I'm simply not interested in keeping a separate Windows pc or dual boot install for games. If I can't get it working on Linux I'm not bothering with it. Right now I can play any game I want to play with very minimal tinkering (that probably says more about me than the state of Wine/Proton, but still).


Linux has absolutely come very far, don't get me wrong!

I'm also mostly a casual gamer, and only have my "dedicated gaming PC" because it's 7 year old hardware I've replaced with a dedicated "workstation" I bought after getting a job and saving some money.

On all my other hardware, I just run Linux, and I pretty much do the same as you -- most of my games work fine on Linux, a surprising number natively!

Linus brought this up in his video as well, that if you don't really care which games you play, you'll be fine. The problem is there are a few games I like playing or like to play with my local friend circle that just don't work well on Linux.

The problems are really stupid, too, and often not the fault of Linux per se. Garbage like Elder Scrolls Online still relying on a TLS cert signed by a CA that's been almost universally revoked, so the launcher will silently hang on linux. Bypassing this relies on either adding the (revoked for security reasons) CAs to your system's trust chain, or man-in-the-middleing the game process to force the updates regardless of the cert problems.

It's not like I really care that much about this specific game, but it's nice to play every now and again with friends. Maybe my long rambly point is, if you primarily use Linux for other reasons and occasionally play games with it, it's awesome. But if you're the average "PC gamer" whose computer is primarily for gaming, Linux will probably disappoint you.


Where does this sentiment come from that Linux has come very far when it comes to gaming? When Doom 3 released in 2004 I had to use a hex editor to hand patch the executable to get sound working. Luckily someone did what OP did and posted the instructions on a forum. I've used Linux/Unix for 20+ years but I wouldn't recommend it for gaming unless you enjoy debugging Linux software and want to do more of it. Frankly, you can learn a ton by doing so, it's not a waste of time, but priorities and frustration tolerances change as people get older. Then the demographic that no longer wants to debug their games tends to also be the cohort that has more money and is more willing to spend for a better experience, which means more money is being allocated to the platforms they are on as opposed to Linux.


>Where does this sentiment come from that Linux has come very far when it comes to gaming?

Look at what you could run in Wine in 2004 and what you had to do to get it working and what you can do now in Proton. It might technically not be "native Linux" but I couldn't care less as long as it works.

>I've used Linux/Unix for 20+ years but I wouldn't recommend it for gaming unless you enjoy debugging Linux software and want to do more of it.

I game a little bit on Linux and have never touched a hex editor to do it. Right now I only have one game where I have to manually download a dll, all the rest I want to play works flawlessly. I only have to enable Proton in Steam and that's it. Yes, they're mostly older games and I still certainly wouldn't recommend Linux for gaming, but there is definitely progress.


> Not many people would have the expertise or the interest to troubleshoot the problem as OP did

Not many people have the expertise to do this on windows either(i would expect even fewer). Bear in mind that the developer sells this game as supported on Linux. The main reason is developers understandably don't care for bugs encountered by 2% users, which makes it a completely different discussion.


Then again I play Papers Please through Steam and have no problems, so maybe they're using the 32 bit version.

I agree that Linux use in general requires troubleshooting skill. We shouldn't assume there will never be any issues worth troubleshooting and recommend Linux to novices as a Microsoft killer. We should instead assume problems will happen and therefore a robust restore process is much more valuable to a novice Linux user.

This is why I believe in btrfs that Fedora uses. Imagine having the powerful restore options many Windows computers ship with. Just press a key, go into a menu, select a point in time recovery, restore.

But that said, what I really wanted to say was that I play exclusively on Linux now thanks to Proton and it's amazing. I can play big titles like Witcher 3, RDR2 and more, but I mostly play smaller titles like Oxygen not included, Rimworld and Ostriv.


strace tip of the day: you don't need lsof, strace can keep track of open fds quite well, just use the -y flag.

       -y
       --decode-fds
       --decode-fds=path
              Print paths associated with file descriptor arguments.

       -yy
       --decode-fds=all
              Print all available information associated with file
              descriptors: protocol-specific information associated with
              socket file descriptors, block/character device number
              associated with device file descriptors, and PIDs
              associated with pidfd file descriptors.


That's a great tip, thanks.


Hey, I know this issue! I ran into it in CK3 when it launched. You can also work around it by running chmod go-rx /dev/input/ while playing your game. Whether this is more or less invasive than binary-patching the game is up for debate.


Fun write up. Here’s another example of a binary patch to fix a Linux game issue:

https://steamcommunity.com/app/333300/discussions/0/26463606...


There was also an Age of Empires 2 patch that fixed a bug in Linux that made the game pan indefinitely to some corner after running the game and alt-tabbing to another window momentarily.


Gotta remember the byte patternsearch using grep (and replacement using printf/dd). Clever use of Unix tools.


Interesting tidbit about Papers, Please: it's written with Haxe (a programming language), on top of OpenFL (an open-source Flash alternative), which uses Lime as its foundational cross-platform library. The nice stack traces you see in the article, such as:

openfl::display::Application_obj::__construct

are because Haxe actually compiles down to C++! (That's also why it has nice interop with C/C++ libraries like SDL.) Haxe boasts an insane amount of compile targets -- just taking a glance at its Github page, it can compile to Javascript, C#, C++, Java, Lua, PHP, Python, and Flash, plus a couple different Haxe-specific interpreters.

Not many game developers use Haxe, so it has a small, tight-knit community. Other well-known games written with Haxe include Dead Cells, Northgard, and Dicey Dungeons.


Why is the engine even checking input devices so often? Shouldn't the input device be registered via settings and then assumed to exist when the game runs? It seems wasteful to check all input devices every few seconds.


A lot of games will automatically switch between keyboard and gamepad when a gamepad is connected. Perhaps this is some automatic background function that SDL handles.


Windows uses a similar polling-based approach, with a warning that you shouldn't poll for new gamepads every frame for performance reasons:

https://docs.microsoft.com/en-us/windows/win32/xinput/gettin...

In general, "notifying the program when a new device has become available" seems to be a surprisingly difficult problem. I've encountered trouble with multiple device types across multiple platforms.


There’s a reason Plug’n’Play and USB were big deals back then. The concept of plugging in a new peripheral and it just working, without rebooting, was rather revolutionary. Even though the former was more aptly called “Plug’n’Pray” in the early years…


SDL has an event for when a new gamepad is detected or removed: http://wiki.libsdl.org/SDL_ControllerDeviceEvent although I don’t know what it does internally in order to detect this (well, the article describes what it does).

But you can also manually enumerate devices as in the example here: http://wiki.libsdl.org/SDL_GameControllerOpen


SDL should probably use inotify() on linux so the kernel can let it know when /dev/input has changed rather than polling it.


SDL has three methods for detecting input devices [1]: udev, inotify, and, as a fallback, enumerating /dev/input.

It seems like Papers, Please uses a statically linked version of SDL, without udev or inotify support compiled in.

1: https://github.com/libsdl-org/SDL/blob/d0de4c625ad26ef540166...


I've been wondering.. is it possible to write something to override the statically linked functions? In this case, most (if not all) functions have an SDL_ prefix. Would it be possible to LD_PRELOAD a library that loads a shared version of SDL and goes over all the function pointers to move them point them to a new location? Is there a tool for this?


> I've been wondering.. is it possible to write something to override the statically linked functions?

SDL does have a built-in way to do that trick. A quick web search tells me it's called SDL_DYNAMIC_API.


cool, I never knew! Somehow the game I thought it would add a feature is still lacking it. For some reason rumble on my xbox joystick with Enter the Gungeon never worked. I thought it was because of an old SDL version, because experimentation showed that. But by using the SDL_DYNAMIC_API env and loading my system SDL the game still not added rumble to my joystick. Ohwell.


SDL_DYNAMIC_API is a relatively recent addition (IIRC 2014), so static SDL2 builds from before that won't work this way.


It looks like SDL's public symbols are all global in lime.ndll so LD_PRELOADing SDL should do what you want. Of course it is possible that lime.ndll was built with -fno-semantic-interposition or equivalent in which case the functions might be called directly without going through the dynamic linker or even (partially) inlined.


Well if you know where to fork, you could use Intel Pin and divert the CFG, favorite tool for binary 'patching'.

Edit: though here if it's a problem of file enumeration and access, I'd probably just LD_PRELOAD something to bypass libc file access functions and return the same result than the first time, with no delay.


Static linking means the features of the Linux dynamic loader, like using the environment variable LD_PRELOAD to pre-load a dynamic library, are not going to have any effect.


Actually, I think the truly preferred path is to just monitor for udev events, which SDL supports but is presumably not enabled for Papers, Please for one reason or another.


Exactly. It should enumerate them when the player opens settings. Or at startup.

But even if it wants to do this, why is it doing it on the main thread!? :(


If I start the game without a gamepad attached to the computer, and then attach the gamepad, I'd like to use the gamepad without restarting the game. And one would expect that polling the attached input devices should never take hundreds or thousands of milliseconds, there must be something seriously wrong in the Linux input device stack or maybe in one of the input device drivers.


The input stack is fine, this game disabled the things that would let it work nicely (see upthread discussion of SDL supporting udev and inotify)


Or maybe just poll for new input devices when the game is paused, or before the game starts.


Then you still have problems to handle like accidentally disconnecting/reconnecting the gamepad when somebody stumbles over the cable (for instance the game might want to automatically pause if the gamepad suddenly 'disappears' for any reason). Gamepads should be automatically detected at any time in the game as they are connected or disconnected. That's how it works on game consoles, and PC games shouldn't behave any different in that regard IMHO.


SDL wants to support hotplugging where you can plug a controller in after you have already started the game.


Operating systems let you know when a device change has happened. You can even cache this where you pool the first time for the initial set of data and then you just check when a device is plugged in to update your knowledge of the state of the world.

I would imagine that’s what’s done if you’re running udev but maybe SDL doesn’t do that.


As has been noted in another subthread, SDL can do this but apparently the particular statically linked version shipped with Papers Please is compiled without udev or inotify support and has to fallback to manual checking.


Yes, and even if it checks for new devices, it should only need to check devices it hasn't already checked.


Ah, but are /dev/input entries reusable?

Let's say you have /dev/input/event{0,10}, event5 is a USB keyboard, you unplug it, I assume event5 goes away.

But then you plug in a controller, does this get mapped to event11, or does event5 get reused? Is the behaviour reliable in all versions of linux?

You might argue that metadata should do the trick, but in my experience, on device files, anything beyond read/write is a crapshoot, whether metadata makes any sense is basically a roll of the dice.

So if you have to open device files in order to check their identity, you might as well skip the identity bit and just check if you're a gamepad.

edit: per charcircuit's comment below, it looks like the metadata of /dev/input at least are considered reliable, and this was used to mitigate the issue by checking the mtime of /dev/input itself against a stored timestamp: https://github.com/spurious/SDL-mirror/commit/59728f9802c786...


The actual SDL fix was even simpler, they now just check if the mtime of the /dev/input directory changed: https://github.com/spurious/SDL-mirror/commit/59728f9802c786...


Upstream fixes are nice, but since the game statically links SDL you can't put in a newer version of libSDL.so in the game path and have it patched like that. Are there other ways of patching statically linked binaries with updated functions?


Isn’t this the whole point of using file descriptors? As long as you have an open file descriptor, the kernel resource it references should remain stable. And if the resource is unexpectedly destroyed from under the process’s nose, the file descriptor should report an I/O error the next time you try to read or write from it.


> Isn’t this the whole point of using file descriptors?

Opening the same file multiple times will yield different fds, and the paths can be modified independently of the fd.

The goal here is to find if:

1. there are new input devices

2. which are joysticks (a category which, for SDL, includes gamepads, so basically "has the user plugged in a new gamepad they might want to use for the game")

How would keeping a bunch of fds around help?


Some time ago I've also stumbled upon this issue and found the workaround that I've posted here: https://www.gog.com/forum/papers_please/terrible_lags_on_lin...

Thanks for a more proper digging.


This looks very like a problem I encountered some time ago running the closed source 3DO emulator "Phoenix Project", and similarly the open source "FreeDO" project that it was forked from. I narrowed it down (also using strace, IIRC) to these programs repeatedly opening and closing the /dev/input/event* files, and that being weirdly slow. I made a seperate little test program just to open and close those files to confirm it. It was only slow on my main desktop machine; while on my lesser-powered laptop, running a practically identical Arch Linux setup, those file operations were quick and the programs ran fine. None of these programs use SDL. I couldn't/didn't progress any further then, but it's good to find some pointers here for further investigation (ie. the libinput issue).

Funnily enough I'm pretty sure I played Papers Please on this machine at length without problems but I think that was probably the Windows version through Wine.


Wait, why is `close` in libpthread.so?


> why is `close` in libpthread.so?

That's because close() is a pthreads "cancellation point" (see https://man7.org/linux/man-pages/man7/pthreads.7.html for details), so it needs special handling when the process is using pthreads. If the process does not link to libpthread.so, the implementation in libc.so (which probably doesn't have cancellation point support) will be used.


I believe that a few libc functions are reimplemented in libpthread, the idea being that if you don’t link to pthreads, you don’t need the overhead (locking, etc.) that is needed in multithreaded situations. Feels a bit antiquated now…

As for why close specifically though, that’s a good question. I wonder if it has something to do with special libc treatment of the standard fds or anything like that.


As you can see in the disassembly, it has to do with implementing async cancellation. I think they wrap many blocking syscalls in the same way. https://man7.org/linux/man-pages/man3/pthread_cancel.3.html


Why would you not let another thread do this nasty kind of polling and let the main loop check for a changed result, if at all?


Close, on a synthetic fd with no I/O performed, should not take 100ms per call. This is a Linux performance bug.


close can take arbitrarily long, it's a blocking operation.

Don't ever call close on the hot path.


I wonder if using the Windows version with Wine/Proton does work better than using the Linux x64 version.


This is interesting. I never noticed these pauses using the native port from GOG on Ubuntu. I'm very sensitive to this kind of thing (low refresh rates on CRT monitors used to drive me crazy when nobody else noticed).

Will have to fire up my copy again. A good excuse to play this marvelous game again.


Interesting investigation. I don't remember having this problem when playing the GOG version, but may be I didn't have a lot of input devices?


Great post, learned a bunch of things. Only one blog post, and no RSS (yet?) unfortunately.


Glory to Arstotzka


I guess that is the kind of challenge to have Windows games on GNU/Linux, fix them instead of playing.


https://www.pcgamer.com/indie-dev-finds-that-linux-users-gen...

This dev found the linux users returned high quality bug reports

>Only 3 of the roughly 400 bug reports submitted by Linux users were platform specific, that is, would only happen on Linux.

While this post is a linux-specific bug, in general they can end up identifying underlying bugs that affect all platforms.


Isn't this a native port though?

I remember playing Papers Please even before it had a native port, and enjoying it with zero problems.


Apparently I got that wrong, point still stands though.


It does if you sell on platforms that you don't intend to support.


The goal of stuff like WINE and Proton is exactly to enable using games on a platform whose creators never intended to support.


It's not like Windows doesn't have its own issues to fix. Except developers do it anyway, because of the market size. Windows isn't perfect or better for gaming.


> Windows isn't perfect or better for gaming.

This assessment depends entirely on the perspective.

From a developer's POV, Windows definitively is the better platform, as it's very monolithic in that you can rely on the presence and longevity of APIs. Depending on the dev's influence on the market and the success of the game, you even get free optimisation, support, and bug fixes from h/w vendors in the form of game-specific driver patches.

From a gamer's POV, Windows has advantages as well, since bugs are rarely OS-related and h/w vendors offer a lot more features OOTB.

If you love tinkering with the OS and don't care if some titles or features just won't work, Linux is a valid option for gaming. Otherwise Windows is objectively the better option by default, since I can rely on the games working with all available features (e.g. multiplayer).


> you can rely on the presence and longevity of APIs

That can be moot. Arguably, I can run more Windows games on Linux using Wine than on actual Windows, especially the older those games are.

Optimizations or work on drivers done by outside developers isn't unusual for Linux too. In fact something like Cyberpunk 2077 became playable on Linux without CDPR getting involved, except for them providing the game to Mesa and Wine developers before the release. And they even added a whole Vulkan extension to make it more playable without CDPR lifting a finger.

Overall I'd say Windows offers no advantages besides being more entrenched among gaming developers for historic reasons.

If Linux would have provided the same market size as Windows, developers would work with it no matter OS specific idiosyncrasies, same as they do with Windows now.


> Overall I'd say Windows offers no advantages besides being more entrenched among gaming developers for historic reasons.

So if you were a game developer you wouldn't consider the fact that for every 1 Linux gamer on Steam there are 99 Windows gamers and thus optimize for the much larger market?


That's exactly what I said above, Windows is addressed not because it's better for gaming or is somehow superior to Linux in avoiding issues like above, but because developers don't want to ignore its market size.

With comparable market size, Linux won't be ignored either, its issues regardless.




Consider applying for YC's Summer 2025 batch! Applications are open till May 13

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: