Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Sigh. Yes. I have been there and done that (more or less) and it sucks. The root problem is that data scientists really want to use Python for machine learning, but wrapping a Python model in a service that uses CPU and memory efficiently is really difficult.

Because of the GIL, you can't make predictions at the same time you're processing network IO, which means that you need multiple processes to respond to clients quickly and keep the CPU busy. But models use a lot of memory and so you can't run all THAT many processes.

I actually did get the load-then-fork, copy-on-write thing to work, but Python's garbage collections cause things to get moved around in memory and triggers copying and makes the processes gradually consume more and more memory as the model becomes less and less shared. Ok, so then you can terminate and re-fork the processes periodically, and avoid OOM errors, but there's still a lot of memory overhead and CPU usage is pretty low even when there are lots of clients waiting and...

You know I hear Julia is pretty mature these days and hey didn't Google release this nifty C++ library for ML and notebooks aren't THAT much easier. Between the GIL and the complete insanity that is python packaging, I think it's actually the worst possible language to use for ML.



She's talking about green threads which is different from regular threading in python. Under nodejs/python style green threads only IO calls are concurrent to a single computation task. There is no parallelism under both styles of threading unless you count concurrent IO as parallel.

She is basically complaining about a pattern that was popularized by NodeJS and emulated in python by older libraries like gevent, twisted and tornado. Currently python3 uses keywords async/await as an API around the same concepts implemented in the older libraries.

This has nothing to do with GIL.


In the case of the article, you are correct. I have a slightly different case where I'm wrapping scikit-learn model. We're NOT just calling another service and waiting for a response, we're doing computation, in Python. So the GIL is actually a problem.


> Because of the GIL, you can't make predictions at the same time you're processing network IO

Why not? If the model is a Python wrapper around some C/C++ library, then GIL can be released and this is actually a recommended approach used by almost any CPU-intensive python libraries - https://docs.python.org/3/c-api/init.html#releasing-the-gil-... You can have parallel computations inside your wrapped C extension, while Python interpreter is processing IO.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: