> I don't think that's a commonly-accepted (or useful) definition of "blocking." By that definition, getpid(2) is blocking.
When it comes to expecting a specific duration, getpid() is blocking. If you run getpid() in a tight loop and then have performance issues you can’t reasonably blame the system.
> This isn't a portable program; it's a Linux program
But the interface is a portable interface
> POSIX does not mandate that close blocks on anything other than removing the index from the fd table
And what if the fd-table is a very large hash table with high collision rate? How do you then specify how quickly close() should complete? 1ms/open fd? 10ms/open fd? Etc.
It should be clear that the problem here is that the author of the code had a faulty understanding of the system in which their code runs. Today the issue was close() just happened to be too “slow.” If the amount of input devices were higher, let’s say 2x more, then the same issue would have manifested even if close() were 2x “faster.” No matter how fast you make close() there is a situation in which this issue would manifest itself. I.e. the application has a design flaw.
> Today the issue was close() just happened to be too “slow.” If the amount of input devices were higher, let’s say 2x more, then the same issue would have manifested even if close() were 2x “faster.” No matter how fast you make close() there is a situation in which this issue would manifest itself.
Close, on an fd for which no asynchronous IO has occurred, should be 10000x faster, or more. It’s unlikely a user will have even 100 real input devices. I agree the algorithm leaves something to be desired, but the only reason it is user-visible is the performance bug in Linux.
I’ve worked on performance in both userspace and the kernel and I think you’re fundamentally way off-base in a way we’ll never reconcile.
> I agree the algorithm leaves something to be desired, but the only reason it is user-visible is the performance bug in Linux.
The only reason it wasn’t user-visible was luck. Robust applications don’t depend on luck.
Something tells me you’ll think twice before calling close() in a time-sensitive context in your future performance engineering endeavors. That’s because both you and I now know that no implementation of POSIX makes any guarantee on the runtime of close() nor will likely do so in the future. That’s just reality kicking in. Welcome to the club :)
There's no guarantee for the runtime of any function. It's perfectly valid for the OS to swap your program instructions to disk, and then take seconds or even minutes to load it back.
It's effectively impossible to avoid depending on what you call "luck". The OS does not provide nearly enough guarantees to build useful interactive applications without also depending on other reasonable performance expectations.
> It's perfectly valid for the OS to swap your program instructions to disk, and then take seconds or even minutes to load it back.
It’s not valid to swap your program instructions to disk if you call mlock() on your executable pages. Indeed, performance sensitive applications do just that. https://man7.org/linux/man-pages/man2/mlock.2.html
> It's effectively impossible to avoid depending on what you call "luck". The OS does not provide nearly enough guarantees to build useful interactive applications without also depending on other reasonable performance expectations.
This is all self-evidently false. You likely wrote your comment on a POSIX-based interactive application. It just takes knowledge of how the system works and what the specifications are. Well-designed programs are hard to come by but they do exist.
Does mlock itself have a guaranteed maximum execution time? Is it guaranteed to return success under the relevant conditions? While that is an excellent way to address the problem I mentioned, you still have to depend on more than just the guaranteed behaviour of the OS.
> You likely wrote your comment on a POSIX-based interactive application. It just takes knowledge of how the system works and what the specifications are.
I wrote my comment on an interactive POSIX application, yes, but I believe my browser depends on "reasonable performance" of OS-provided functions in order to be usable.
It would be a fun exercise to evaluate such a program that supposedly did not. For any given program, I suspect I could patch the Linux kernel in such a way that the kernel still fulfilled all guaranteed behaviour while still making the program unusable.
I agree the application should not have done this.
On the other hand I also agree indefinite block time is not a useful definition despite being correct in theory, perhaps a more pragmatic one would be some time / compute unit percentile? So a consistent 100ms close call which is proven to be a bug won't get lost in definition.
When it comes to expecting a specific duration, getpid() is blocking. If you run getpid() in a tight loop and then have performance issues you can’t reasonably blame the system.
> This isn't a portable program; it's a Linux program
But the interface is a portable interface
> POSIX does not mandate that close blocks on anything other than removing the index from the fd table
And what if the fd-table is a very large hash table with high collision rate? How do you then specify how quickly close() should complete? 1ms/open fd? 10ms/open fd? Etc.
It should be clear that the problem here is that the author of the code had a faulty understanding of the system in which their code runs. Today the issue was close() just happened to be too “slow.” If the amount of input devices were higher, let’s say 2x more, then the same issue would have manifested even if close() were 2x “faster.” No matter how fast you make close() there is a situation in which this issue would manifest itself. I.e. the application has a design flaw.