GP is aware. mmap makes files act like memory. Memory is always synchronous, thu...

jerf · 2024-08-24T20:38:23 1724531903

The term "blocking" has diverged between various communities and it is important to recognize those differences or you'll have dozens of people talking past each other for hundreds of messages as they all say "blocking" and think they mean the same thing, and then get very confused and angry at all the other people who are so obviously wrong (and in their context, they are) but just can't see it.

It is obvious that a given "execution context", which is my generalized term for a thread and an async job and anything else of a similar nature, when it reaches for a value from an mmap'd file will be blocked until it is available. However, different communities have different ideas of an "execution context".

Threaded language users tend to see them as threads, so while a given thread may be blocked the rest of the program can generally proceed. (Although historically the full story around file operations and what other threads can proceed past has been quite complicated.)

Async users on the other hand are surprised here because the operation is blocking their entire executor, even though in principle it ought to be able to proceed with some other context. Because it's invisible to the executor, it isn't able to context-switch.

In this case, the threaded world view is reasonably "obvious" but it can be non-obvious that a given async environment may not be able to task switch and it may freeze an entire executor, and since "one executor" is still a fairly common scenario, the entire OS process.

(I am expressing no opinion about whether it must block an executor. System calls come with a lot of flags nowadays and for all I know there's some way an async executor could "catch" a mapped access and have an opportunity to switch. I am taking the original article's implicit claim that there isn't one happening in their particular environment at face value.)

As long as you do not distinguish how various communities use the term "blocking", you will get very, very deeply nested threads full of arguments about something that, if you just are careful with your terminology, isn't complicated for anyone from any subculture to understand.

dataflow · 2024-08-24T18:00:57 1724522457

At first I thought the title meant the mmap() call itself blocks, which I figured could be slightly surprising. But it seems they're referring to I/O on the mapped file? I'm also baffled, how could it possibly not block?

anonymoushn · 2024-08-24T20:19:41 1724530781

Well, the OP could probably get their benchmarks to run faster if they passed MAP_POPULATE, which would make the mmap call block for longer.

In a pedantic sense, the mmap call is already blocking, because any system call takes longer than e.g. stuffing an sqe onto a queue and then making 0 syscalls, and it could take a variable amount of time depending on factors beyond that one process's control. I don't think anyone actually needs to offload their non-MAP_POPULATE mmaps to a separate thread or whatever though.

dbaupp · 2024-08-25T02:09:21 1724551761

Yes, it couldn't not block, it's obvious... but I've encountered non-trivial amounts of "magical thinking" around both async/await ("go-fast juice") and mmap ("go-fast juice") separately, so the intersection surely has a bunch of magical thinking too, where people haven't taken the time to properly think through what's going on.

Hence, my investigation to try to make the "obvious" conclusion obvious to more people.

(Author here)

icedchai · 2024-08-24T19:04:15 1724526255

One trick is to read the file into memory at application startup. All data is paged in, so it's hot and ready to go: no page faults. In the early 2000's, I worked on a near real time system that used memory mapped I/O. At app startup, several gigabytes were read into memory. It never blocked under normal circumstances (in production) since the systems were provisioned with enough memory.

dataflow · 2024-08-24T19:12:16 1724526736

But that requires knowing your RAM is big enough to fit the file. It can't work in general.

usefulcat · 2024-08-24T19:01:29 1724526089

> how could it possibly not block?

When the bytes being read are already in the cache. Hence the later part of the article where the author shows that reading mapped memory can be significantly faster.

magicalhippo · 2024-08-24T19:09:17 1724526557

It still blocks. It just completes orders of magnitudes faster.

danarmak · 2024-08-24T19:34:23 1724528063

No, if the memory-mapped page you're accessing is in RAM, then you're just reading the RAM; there is no page fault and no syscall and nothing blocks.

You could say that any non-register memory access "blocks" but I feel that's needlessly confusing. Normal async code doesn't "block" in any relevant sense when it accesses the heap.

magicalhippo · 2024-08-24T19:56:18 1724529378

When dealing with async I think it is very relevant to think of exactly the points where control can be switched.

As such a regular memory read is blocking, in that control will not switch while you're doing the read (ie your not doing anything else while it's copying). This is unlike issuing an async read, which is exactly a point where control can switch.

edit: As an example, consider synchronous memory copy vs asynchronous DMA-based memory copy. From the point of view of your thread, the synchronous copying blocks, while with the DMA-based copying the thread can do other stuff while the copying progresses.

nemetroid · 2024-08-24T21:14:46 1724534086

So what is the definition of "blocking" here? That it takes more than 1 µs?

dbaupp · 2024-08-25T02:12:29 1724551949

As the author, I don't think there's a clear definition of "blocking" in this space, other some vibes about an async task not switching back to the executor for too long, for some context-dependent definition of "too long".

It's all fuzzy and my understanding is that what one use-case considers being blocked for too long might be fine for another. For instance, a web server trying to juggle many requests might use async/await for performance and find 0.1ms of blocking too much, vs. a local app that uses async/await for its programming model might be fine with 10ms of "blocking"!

https://tokio.rs/blog/2020-04-preemption#a-note-on-blocking discusses this in more detail.

danarmak · 2024-08-24T22:26:04 1724538364

That the process/thread enters kernel mode and then is suspended waiting for IO or for some other event. As long as the thread is running your code (or, is scheduleable) it's not blocked. And then the async implementation can ensure your code cooperatively gives up the CPU for other code.

magicalhippo · 2024-08-24T23:12:33 1724541153

If your memory is paged out and you then access it, using your definition, it would block.

So, in the context of async code, there's no difference from the application perspective between reading mmap'ed data and reading "regular" data (ie memory from the regular paged pool), as both could incur blocking IO.

If you're lucky and the mmap'ed data is in the system cache, then reading that data will not block and is fast. If you're unlucky and your process has been swapped out, then doing a regular memory read will block and is slow.

icedchai · 2024-08-24T19:32:01 1724527921

Do you consider reading from a normal array (one not backed by a memory mapped file) to also be blocking?

nemetroid · 2024-08-24T21:15:58 1724534158

If the memory has been paged to disk, I guess so?

magicalhippo · 2024-08-24T19:43:40 1724528620

In the languages and platforms I use, absolutely yes. Do you have some examples where a normal memory read is async?

icedchai · 2024-08-24T19:56:49 1724529409

Your definition of blocking is a bit different from my own. Synchronous is not always blocking. If the data is there, ready to go, there is no "blocking."

If you consider all memory reads to be "blocking", then everything must be "blocking". The executable code must, after all, be read by the processor. In an extreme case, the entire executable could be paged out to disk! This interpretation is not what most people mean by "blocking."

magicalhippo · 2024-08-24T20:09:42 1724530182

Fair point. I guess I conflate the two, because what's interesting to me, most of the time, is where does the control flow switch.

I never rely on synchronous IO being non-blocking when writing regular code (ie not embedded). As such reading from cache (non-blocking) vs disk (blocking) doesn't matter that much, as such. It's synchronous and that's all I need to reason about how it behaves.

If I need it to be non-blocking, ie playing audio from a file, then I need to ensure it via other means (pre-loading buffer in a background thread, etc etc).

edit: And if I really need it not to block, the buffer needs to reside in the non-paged pool. Otherwise it can get swapped to disk.

juped · 2024-08-25T01:52:11 1724550731

you don't yield to your cooperative-multitasking runtime during reading from it, which is obviously what everyone in this thread means, and it's not helpful to start telling them "you're using the word blocking wrong" apropos of nothing

icedchai · 2024-08-26T17:29:02 1724693342

Why would it yield when reading from local memory? Are there any cooperative environments that do that? Seems like an unusual expectation.

rbanffy · 2024-08-24T22:27:15 1724538435

> Do you have some examples where a normal memory read is async?

This hints at a way to make it work, but would need the compiler (or explicit syntax) to make it clear you want to be able to switch to another task when the page fault triggers the disk read, and return to a blocking access that resolves the read from memory after the IO part is concluded.

It could look like a memory read but would include a preparation step.

okr · 2024-08-24T19:36:05 1724528165

Ha. Exactly.

dbaupp · 2024-08-25T02:01:11 1724551271

Yeah, of course a synchronous call that might block the thread is blocking, I agree... but, if I didn't have the context of "we're in a comment thread about a blog post about mmap", I'm pretty sure I wouldn't flag `x[i]` on `x: &[u8]` (or any other access) as a synchronous call that is worth worrying about.

Hence the discussion of subtlety in https://huonw.github.io/blog/2024/08/async-hazard-mmap/#at-a...

It's obvious when pointed out, but I don't think it's obvious without the context of "there's an mmap nearby".

(Author here, this behaviour wasn't surprising to me, but felt subtle enough to be worth investigating, and get some concrete data.)