Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Yes after reading through the article it's not very clear to me what the actual problem is with using Python/Gunicorn/Gevent.

The author seems to be saying something about how if a worker is busy doing CPU intensive work (is decoding JSON really that intensive?) then other requests accepted by that worker have to wait for that work to complete before they can respond, and the client might timeout while waiting?

If that's the case:

1. Wouldn't this affect any language/framework that uses a cooperative concurrency model, including node.js and ASP.NET or even Python's async/await based frameworks? How is this problem specific to Python/Gunicorn/Gevent?

2. What would be a better alternative? The author says something about using actual OS-level threads but I thought the whole point of green threads was that they are cheaper than thread switching?



1. Yes, it would affect other things. This is just an illustrative example.

2. Green threads have lower overhead, but it's a false economy if it causes you to do needlessly redo work because of timeouts that could have been avoided.

Which it seems it must because the kernel doesn't have the insight to know whether a green thread is doing that epoll because it's ACTUALLY idle or because it's not but it's willing to try to juggle a second (or third) thing while it has something on the back burner. So the kernel indiscriminately assigns work to threads without regard for whether they are juggling a lot or nothing.

Whereas with native threads, they never ask the kernel for more work while they're blocked on something else because they are literally blocked and thus won't be making that epoll system call.

(The article also mentions something about LIFO policy, which exacerbates the problem because it favors assigning work to the process which is likely to already have most of it.)


How come there's no work stealing? Green threads are supposed to be backed by some N:M thread pool, no?

Also, isn't the problem that JSON decoding (or whatever computation) simply block the thread and the other green threads cannot proceed at all, because there are simply no safepoints (yield points) inside these low level functions?

And in all these cases shouldn't the application estimate work (eg. in case of JSON if the string is longer than 100K), and if it's too big just put it on a dedicated heavy compute N:N thread pool?

For Python it's best practice anyway because the GIL, no?


Is there a n:m green threading mode for python? What does gunicorn/gevent do? Sounds like “not that”.

Yes best practices and all that, but that really doesn’t sound like what’s happening.


Python was always more about breadth than depth. (CPython is full of known inefficiencies, but it's with us since 1989, and basically the core dev who worked most on performance - Victor Stinner - thinks the best way is to introduce subinterpreters - https://github.com/vstinner/talks/blob/master/2019-EuroPytho... )

Oh, that PDF is interesting, Python 3.8 has shared memory for multiprocessing, no more pipe objects between processes.

Furthermore extension and internal stuff always had the ability to release the GIL and do its own thing (for example, on a threadpool, or using async/nonblocking I/O). But I have no idea about Gevent. I never liked it. (Just as Twisted/Tornado it was too much magic for too little benefit.)


I don’t know what gevent is doing. But if you have a global interpreter lock then M:N might not be worth it, since only thread will make progress (outside of the syscalls, which are non blocking).


Surprised nobody mentioned asyncio.run_in_executor yet. It's designed to offload the event loop from long running cpu bound tasks, by moving them to another thread pool (or process pool if you are afraid of GIL). Eventually that thread pool will obviously also get starved given enough load but at least you wont have CPU blocking IO and vice versa. Tricky thing is knowing when an operation might grow to become too slow for io-thread given dynamic inputs.

https://docs.python.org/3/library/asyncio-eventloop.html#asy...


that's because `run_in_executor` doesn't spread CPU usage. All it does is wrap functions in threads so you can call them async. It doesn't create multiple processes so you're still limited to a single core in Python.


See example 3 in link above


Decoding JSON is surprisingly intensive. Check out this to see what's going on with it in Dotnet: https://michaelscodingspot.com/the-battle-of-c-to-json-seria... .

Node.js will have that issue and in fact the stdlib JSON encoding/decoding can't even be paused so once you start a processing something you're stuck until it's done. You could, however, write an incremental serializer/deserializer that could spread processing out across many event loop cycles to mitigate.

Go, ASP.NET, and others not so much, depending, because the schedulers can pause and resume the tasks(on top of being threaded).


> 1. Wouldn't this affect any language/framework that uses a cooperative concurrency model, including node.js and ASP.NET or even Python's async/await based frameworks? How is this problem specific to Python/Gunicorn/Gevent?

I think she's against anything that has that problem. Not every green thread implementation has that problem. For example Go doesn't have that problem. Because there were 4 CPU threads (I think) and only 2 things needing to be done, with Go's M:N scheduling those 2 things would be sure to both be running.


> The author seems to be saying something about how if a worker is busy doing CPU intensive work (is decoding JSON really that intensive?) then other requests accepted by that worker have to wait for that work to complete before they can respond, and the client might timeout while waiting?

Yes. Decoding JSON with Python is CPU-intensive.

This a very simple shell script around Python, that is designed from the get-go to crash with an exception. However, it may not be the exception you expect:

    n="$(python3 -c 'import math; import sys; sys.stdout.write(str(math.floor(sys.getrecursionlimit() - 4)))')"
    left="$(yes [ | head -n "$n" | tr -d '\n')"
    echo "$left" | python3 -c 'import json; print(json.loads(input()))'
Python's docs suggest you should arrive at one of two errors here: MemoryError (We are trying to parse something sizeable) or json.DecodeError (The JSON is invalid).

You won't. You'll hit RecursionError.

Because, despite how badly Python deals with recursion, the JSON library depends on it extensively. Which means that there is a huge stack-tree being built every time you try and decode JSON in Python (Dictionaries, function call overhead, etc, etc.), only for it to be thrown away.


Decoding JSON in Python without using a C library is indeed CPU-intensive.


Python's standard library json module uses a C extension module for the CPU intensive stuff.


Yes, decoding JSON in python is much more efficient if you are not actually decoding JSON in python.


It's damn slow too compared to protobufs, or, especially, FlatBuffers.


Yeah but I think it's reasonable to assume no one using Python in production actually does this.


(We detached this subthread from https://news.ycombinator.com/item?id=22514448, which it's more on-topic than.)


> (is decoding JSON really that intensive?)

in Python, everything is generally CPU intensive compared to what it would be in compiled languages, even though things like JSON decoding are usually happening in a C library, Python programs that do close to nothing still use way more CPU than you would if you were running in the JVM, or Go, C, whatever.

> Wouldn't this affect any language/framework that uses a cooperative concurrency model, including node.js and ASP.NET or even Python's async/await based frameworks? How is this problem specific to Python/Gunicorn/Gevent?

CPU bound-ness affects all of these platforms, yes. It affects Python and other intepreted languages the most however because these platforms get the most CPU-bound the most quickly. Also, applications that are written in scripting languages tend to have a lot of business logic going on in the first place; after all, if you just wanted to serve static pages you could use Apache with the Event NPM and if you wanted to proxy HTTP requests you'd use HAProxy; both event-based systems that are very much not CPU bound.

But yes, most importantly, Python's asyncio system is completely impacted by these same issues and I would have preferred she address that, as asyncio is part of the standard library now and is way more popular than gevent.

> What would be a better alternative? The author says something about using actual OS-level threads but I thought the whole point of green threads was that they are cheaper than thread switching?

I will grant she lost me a bit with the "use a real RPC system with <feature> <feature> <feature>" thing, and additionally the "load the application in the child process" thing is pretty typical, a worker process should obviously have either threads or greenthreads in use so that each process can handle multiple concurrent requests, but only as many as you'd want handled effectively by one core since the GIL is going to enforce that (another thing you wouldn't have to deal with in other languages such as the above mentioned compiled languages), but it's typical that child processes are going to have a mostly original copy of things.

But as far as the "context switching" thing, I've yet to see benchmarks that show the overhead of OS-level context switching actually being more of a performance burden than the less frequent, but more work intensive context switching that user-space schemes like asyncio have to use. If you are writing a logic-heavy, or even a logic-just-a-bit service that receives Python requests you will also have to worry about CPU-bound issues all the time. Using regular threads with processes, like what you get using something like mod_wsgi, will allow individual processes to attend to web requests more evenly. With mod_wsgi you can configure worker daemons that run multiple OS level threads and you can also have multiple daemon processes.

I'm not sure if the multi-process model used by mod_wsgi has solved the accept() problem, however in my experience the bigger problem is when a service configures itself to allow for 1000 greenlets within each process, while each process is realistically capable from a CPU perspective of handling maybe 5 or 10 concurrent requests, there's no mechanism that ensures that each process gets an even balance of requests. That is, you might have all your requests waiting in one process, because you told them it can process 1000 at a time, while other processes are idle.

TL;DR I'm in the "event based programming is extremely overrated in Python" camp.


> Python programs that do close to nothing still use way more CPU than you would if you were running in the JVM, or Go, C, whatever.

Yeah but that's not exactly shocking news to anyone, is it? People generally choose Python for other reasons (productivity, library ecosystem, etc.) because the performance is "good enough" for most web apps, and if you reach a traffic level where performance becomes an issue that's a good problem that you can optimize for later. (Like Facebook did with PHP, Twitter did with Ruby, etc.)

> But yes, most importantly, Python's asyncio system is completely impacted by these same issues and I would have preferred she address that, as asyncio is part of the standard library now and is way more popular than gevent.

Right, the blog post gave me the impression she was calling out the combination of Python/Gunicorn/Gevent specifically for some reason. But if the underlying goal was to just point out that Python is slow then I am curious what people think the right solution is? Just switch out of Python and use Go or something else?


I love to work in python and I came here just to point out that it's pretty obvious one should not use it if raw performance is a concern (which is not in many situations). I remembered this Cal Henderson talk at Djangocon:

https://i.postimg.cc/Dy81R2QQ/yourmom.png

That said, I wrote a bit of Go recently and the experience was pleasant enough that I'd consider it for future works (should the needed libraries exist), as the extra performance and ease of deployment comes with very little effort from the developer.


The underlying goal was to show that python gets CPU bound very easily, this is not at all the same as saying "python is slow". If I want slow I'd use Ruby.


Having used both, my experience is that I can make Ruby fast, while making Python fast starts with "let's stop using python in this code path" ...


It's a shame this is not a top voted comment. Not many people understand both the intricacies and tradeoffs involved in all these approaches, and get burned, then blame the tools.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: