> Go back and look. I said that it forks and then it imports your app. Your app ...

cakoose · on March 7, 2020

Even if you manage to preload everything you need, Python's reference counting mechanism will cause everything to be copied anyway.

Every time you access an object, its reference count is mutated, which will the memory page to be copied.

There are workarounds if you're willing to mess with the Python interpreter: https://instagram-engineering.com/copy-on-write-friendly-pyt...

pas · on March 8, 2020

It still helps with the loaded .so files and whatnot. The code objects and other things that are immutable. (CPython refcounts those too?)

pdonis · on March 8, 2020

> The code objects and other things that are immutable. (CPython refcounts those too?)

CPython refcounts all objects. Refcounting is not required because of mutability; it's required because the interpreter needs to know when an object's memory can be reclaimed for something else.

I don't know if code objects specifically would have their refcounts mutated a lot, since typically they're only referenced by one object, the function that they're the code for. But function objects will have their refcounts mutated every time the function is called, since that sets up a stack frame that grabs a reference to the function object and then releases it when the function returns.

rcxdude · on March 8, 2020

The code and read-only data of .so files will be shared anyway

ghostwriter · on March 7, 2020

> Python's reference counting mechanism will cause everything to be copied anyway.

isn't it CPython specific though, and may not be observed when running on PyPy?

dfox · on March 8, 2020

It is. And in practice it is far from everything, only stuff that is actively directly referenced by the service code.

lozenge · on March 7, 2020

gc.freeze() reached Python 3.7 as mentioned there, I'm not sure how many web frameworks are using it though.

nopurpose · on March 7, 2020

> If you're using a standard framework like Django or Flask then this works really well and without much effort.

I dug into it ~year ago, Django loads almost everything lazily, so simple --preload did next to nothing. I had to write code to load app for real at import time, exact thing article and common wisdom tells us not to do.

orf · on March 8, 2020

Django loads some things lazily but not almost everything. Nearly all your imports should be loaded by the preloaded application and shared across forks (COW semantics aside), and this usually takes up a non-trivial amount of memory. The things that are lazy are usually lazy for a reason - database connections, caching etc.

I believe the i18n system is also lazily loaded and depending on the languages you configure it can take up a fair bit of memory.

closeparen · on March 7, 2020

The whole discipline Rachel writes about is clearly intended for mature, scaled operations where outages and inefficiencies are legitimately worth much more than the systems wizards to stop them. There’s a time and a place for “move fast and break things” and if that’s where you are, it’s probably not for you.

worik · on March 7, 2020

"The whole discipline Rachel writes about is clearly intended for mature, scaled operations where outages..."

That is not true. RotB is describing saftware inefficiencies that we learnt to do without 20 years plus ago.

Because the Python hackers that built all these tools did not pay attention, when they built new tools they recreated the old problems.

Worik's 23.6918th rule of creativity: It is easier to write than read.

FridgeSeal · on March 8, 2020

I don’t think you guys are saying different thing here.

The article in this case is describing a bunch of common processes/optimisations/features that we have learnt to be critical for effective and efficient running of software. The author does this because the audience she writes for is, as the previous comment puts it “mature, scaled operations where outages...” etc etc

ary · on March 7, 2020

> You can just pass `--preload` to have gunicorn load the application once. If you're using a standard framework like Django or Flask and not doing anything obviously insane then this works really well and without much effort. Yeah I'm sure some dumb libraries do some dumb things, but that's on them, and you for using those libraries.

It's not always trivial to ensure none of your dependencies have import-time side effects. Sometimes the productivity/business benefit provided by the depedendency outweighs the pain introduced by the side effects.

Skunkleton · on March 7, 2020

If spinning up a few more workers will solve a performance problem for you, it’s probably worth the time to throw the preload flag on there and see what it does to your test suite. Since you are already cost optimizing at this point you probably have the time.

kodablah · on March 7, 2020

> everyone else will keep quickly shipping things with it while you worry about five processes waking up from a system call at once or an extra 150mb of memory usage

With the current state of ecosystems, this quality-vs-quantity mutual exclusivity is much less pronounced. These days, you can fire up these services as quick or quicker than in Python with better performance and resource usage that is also more maintainable. Unless you speak of highly ecosystem-dependant libraries (e.g. ML), Python defenses that rely on time to market say more about the author's narrow comfort than general expediency.

nimish · on March 7, 2020

Of course weird domain specific libraries are a major reason to use python. I can get my oddball service up in a few days vs weeks or months to replicate some bizarre library.

If all you're doing is basic web stuff then sure you can do it in Lang du jour but even after a decade go doesn't have a competent xml library for example.

Time to market is killer. Python buys you time to determine whether your product is even worth building. Deployment sucks though.

e: I literally could not have written the services in my current role in a language other than Java or Python without replicating 100kloc libraries. Java would have required a bunch of work to integrate with the other services we had. So: python. If that costs me an extra $1k/mo for servers but gets us a customer paying $100k a month, was it wasteful?

lmm · on March 7, 2020

Why was it easier to integrate with those other services from Python than from Java? Only because you already picked Python for those, right?

(I agree with you about time to market being the important thing. I don't think Python is winning that game any more though: its dependency management and deployment has fallen behind the rest of the industry, and newer languages have largely caught up with its conciseness without having to make the same compromises)

nimish · on March 8, 2020

Not really. Setting up a scalable Java service is complicated even with tooling like spring boot, and Java has better libraries for our use case, but the feedback cycle and general code velocity was much slower since we'd be on java 8 among other things. Plus learning spring or a Java ee framework is tantamount to learning a whole new language it wasn't worth the time then.

lmm · on March 8, 2020

> Setting up a scalable Java service is complicated even with tooling like spring boot

How so? You can use much the same techniques you would in Python, or you can deploy a .war to a bunch of application servers and achieve what you'd do with docker/kubernetes/etc. in a much simpler way. You've also got a much better chance of scaling up with a single instance and not needing to scale horizontally.

> the feedback cycle and general code velocity was much slower since we'd be on java 8 among other things

What's keeping you on Java 8? Major JVM version upgrades are much easier and safer than even minor Python upgrades. I'm not doubting your situation, but old version of one language versus new version of another is not really a fair basis for comparison.

> Plus learning spring or a Java ee framework is tantamount to learning a whole new language it wasn't worth the time then.

Sure. I'm not saying it's wrong to choose to stick with the technology you're currently using - there's definitely a cost to switching or learning something new. But it's worth being conscious of whether your technology choices are being driven by legacy constraints and whether you'd want to make a different choice on a green-field project.

nimish · on March 8, 2020

1. "Deploying a war to an application server" is a giant pain in the ass when you don't have people deeply familiar with EE app servers and tuning them. Python is not the best choice here but little can beat go's scp deploy

2. Tell that to every library that relied on java EE being shipped with the JDK. A lot of the reason to use Java was these libraries that are, if you're lucky, in maintenance mode. They're still stuck on 8 since it's not worth putting the time to migrate them to 11+ (I don't have that luxury, unfortunately)

3. > technology choices are being driven by legacy constraints

This project was greenfield but the problem domain is plagued by legacy constraints (all the way back to mainframes). "Rebuild from scratch" in a nicer, newer language would take years compared to a runway measured in months, so you do the math.

lmm · on March 8, 2020

> Python is not the best choice here but little can beat go's scp deploy

Java is pretty damn close to that if you take the route of building a shaded jar (and embedding jetty if you need a web server). You need to install the JVM on the target server but that's all.

> Tell that to every library that relied on java EE being shipped with the JDK. A lot of the reason to use Java was these libraries that are, if you're lucky, in maintenance mode.

I don't think I ever saw a library like that? There's a huge, high-quality ecosystem of open-source Java libraries and I've never heard of any of them being reliant on Java EE.

> This project was greenfield but the problem domain is plagued by legacy constraints (all the way back to mainframes). "Rebuild from scratch" in a nicer, newer language would take years compared to a runway measured in months, so you do the math.

Sure, but you can't say this project is an example of when Python is a good technology choice if really the main reason you were using Python was because of legacy constraints.

orf · on March 7, 2020

Citation needed. You can write anything you want in any language you want, but if your team is experienced with Python then they will continue to ship value quickly. Sure, maybe if they were all well versed in brainfuck they could ship things quicker.

Narrowing in on the rather specific point about shipping things with Python and ignoring the larger argument that it doesn't freaking matter if things are not as efficient as they could possibly be is quite odd to be honest. I'm sure some of the arguments in the blog post would apply to whatever language you had in mind while writing your reply.

sneak · on March 7, 2020

> a damning condemnation of the language and it's ecosystem from Rachel By The Bay, an all-knowing and experienced higher power

If I were her, I’d put that on my CV. :D