People always get up in arms about this, but as someone who has used Python as her daily driver for years it's really... never been this serious of an issue for me?
I have used virtualenv/venv and pip to install dependencies for years and years, since I was a teen hacking around with Python. Packaging files with setup.py doesn't really seem that hard. I've published a few packages on pypi for my own personal use and it's not been too frustrating.
A lot of the issues people have with Python packaging seem like they can get replaced with a couple shell aliases. Dependency hell with too many dependencies becomes unruly in any package manager I've tried.
Is the "silent majority" just productive with the status quo and getting work done with Python behind the scenes? Why is my experience apparently so atypical?
I think it depends on the use case. If I'm developing my own stuff my peen package management is fine.
If trying to run various existing python programs to analyze biology data, I soon run into various problems. Is this a Conda?/ or can I use my Python environment? which version of python? will let me run the thing and what libraries do I need? This breaks in that version?
Sometimes I feel that one kinda ok way of doing things, would be better than having 6 ways , one of which will suit my use case perfectly.
> Is this a Conda?/ or can I use my Python environment?
Can you elaborate a bit there? I use conda because I like some of their features over standard virtualenv (being able to specify a python version when i create my venv) - but I've never had a problem running code in env's created by one vs. the other.
Sometimes developers distribute their software via Conda installs, in those cases they sometimes don't provide instructions on running other ways (eg using the pyenv environment which is my default. ). I'm ok with this, but when conda fails as sometimes happens, it can mean some digging to get the install to work.
I was thinking of my latest install, which was CRISPRESSO2, which installs via docker or bioconda... I was able to get it going, but it took a bit on some systems.. (Python 2.7 old libraries.. etc..)
Docker didn't seem to work.
I like virtual env, but sometimes I feel I have to have a new environment for each piece of software I'm running, which feels weird.
This sort of attitude is the reason why the world doesn't move away from awful solutions. It is a testament to the lack of ability to see beyond your own nose.
A lot of people who use Python, don't have the luxury of it being their "daily driver for years", so the conflicting documentation, decision paralysis and other problems that come with it end up being a huge time sink.
A lot of non-programmers are being forced to use Python for various automation tasks. A lot of the CAD-software that construction engineers use, support Python-plugins. Network admins that have been configuring switches and routers on CLI for decades now have to configure them using Python.
Look at "cargo" to see what the world could be like.
Still, it's worth keeping in mind that Rust was born 20 years after Python was. Python was being written before Mosaic, Netscape, and Yahoo! were around. I think it can be forgiven for failing to conceive of a perfect package management system in 1990s. There were bigger fish to fry back then, so to speak.
Over the decades (!) there have been many, well-documented attempts at coming up with a package management story. pip and virtualenv have been the obvious winners here for years.
So, in conclusion, again you're right. But 30 years of history produces a lot of "conflicting documentation". It's only the last 10 years or so, that people have fought over the superiority of one language's package management ecosystem or another.
This comment is rewriting Python history quite a bit.
First of all, Python was created around 1989 yet Python 1.0 was released in 1994. Secondly, Python was a pretty obscure language until Python 2.0 (and even long after that...), released in 2000. So realistically, Python had "only" about 15 years of historical baggage :-)
Also, cargo can be ignored because it's "new", but there was a lot of prior art in the area of good programming language specific package managers. CPAN (Perl) was launched in 1993. Maven (Java) was launched in 2004.
Python just botched its package management story, that's it. Sometimes stuff happens just because it happens, there's no good excuse for how things are. Sad, but true.
Python's first posting to Usenet (v 0.9) was in 1991. The 1.0 Misc/ACKS from 1994 includes 50 or so external contributions to that point, showing that "1.0" is a somewhat artificial point.
Rust's 1.0 was 2015, which is indeed "20 years after Python was" at 1.0, so how is gen220's comment a rewrite?
In 2000 I helped a company with the minor work to port their 1.5 code base to 2.x.
So I certainly didn't see it as obscure in the 1.x days.
But sure, I'm part of that environment so have a different view on things. If I use your definition, I'll argue that Rust is still "a pretty obscure language".
Around until 2005 at least, it was known as a friendly scripting language with a few web frameworks which were not that popular (Django was first released in 2005), as the language that was starting to be adopted by distributions for scripting tools (the first Ubuntu version was launched in 2004 and was one of the first distros to use it extensively). It wasn't really present for development work in most cases, DevOps was the domain of bash/Perl (for older stuff) or Ruby (for newer stuff).
People tend to forget how obscure Python was before 2000, compared to the mainstream language it is today. And I say that as someone who likes Python ;-)
Chiming back in to say that, while everything you're saying is correct (i.e. Python was not a ubiquitous language until "relatively" recently), it doesn't change the point: the best packaging solutions, done right, need to be done early in a language's history.
To illustrate the point with an example, you could invent cargo for python yesterday or in 2005, but it wouldn't have solved the problem, because you would still have decades-worth of third-party libraries that wouldn't comply to py-cargo's packaging requirements.
In contexts like these, it's the package manager with the fewest hard-asks (i.e. pip, or npm for node) that wins.
Go, for example, endured major controversies over migrating away from GOPATH-managed-with-third-party-dep-managers to go modules. Even though `go mod` would have been the best solution to start with from scratch, inertia and breaking changes are a real thing.
Rust is a pretty obscure language now in pretty much the same way that Python was an obscure language then.
Of course the world of programmers was smaller in the 1990s. But if your baseline is the entire world, then probably every programming language outside of Basic, C/C++, and Pascal was obscure in the 1990s. Just like Rust is now.
It feels very much like you have shifted baselines to determine what "obscure" means.
From my view, Python's popularity took off around 2000. That's when I no longer had to tell people what Python was, and when people in my field (cheminformatics) started shifting new code development from Perl to Python. It's also about when I co-founded the Biopython project for bioinformatics. And SWIG in the mid-1990s included Python support because Python was being used to steer supercomputing calculations at LANL.
So your statement that Python's popularity and use in science in general started only in 2010 sounds like revisionism which distorts the actual history with an artificial baseline.
You wrote "with a few web frameworks which were not that popular".
Ummm.... what? Zope was quite popular. The 2001 Python conference had its own Zope track, and the 2002 conferences felt like it was 50% Zope programmers.
Quoting its Wikipedia entry, "Zope has been called a Python killer app, an application that helped put Python in the spotlight". One of the citations is from 2000, at https://web.archive.org/web/20000302033606/http://www.byte.c... , with "there's no killer app that leads people to Perl in the same way that Zope leads people to Python."
As an individual that probably works just fine. In a team setup, it takes a lot of training and effort for everyone to consistently follow a manual pip/venv workflow, so it becomes valuable to minimize and standardize it.
Especially if you have to deploy to production and you want fast, reproducible builds, or you don't want to run a bunch of tests for things that haven't changed.
Let me start by saying: I love python, and I love developing in it. It's the "a pleasure to have in class" of languages: phenomenal library support, not too painful to develop in, nice and lightweight so it's easy to throw together test scripts in the shell (contrast that with Java!), easy to specify simple dependencies + install them. (contrast that with C!).
That said... if you work on software that is distributed to less-technical users and have any number of dependencies, python package management is a nightmare. Specifying dependencies is just a minefield of bad results.
- If you specify a version that's too unbounded, users will often find themselves unable to install previous versions of your software with a simple `pip install foo==version`, because some dependency has revved in some incompatible way, or even worse specified a different dependency version that conflicts with another dependency. pip does a breadth-first search on dependencies and will happily resolve totally incompatible dependencies when a valid satisfying dependency exists.[1]
- If you specify a version with strict version bounds to avoid that problem, users will whine about not getting the newest version/conflicting packages that they also want to install. Obviously you just ignore them or explain it, but it's much more of a time sink than anyone wants.
- In theory you can use virtualenvs to solve that problem, but explaining how those work to a frustrated Windows user who just spent hours struggling to get Python installed and into their `PATH` is no fun for anyone. Python's made great strides here with their Windows installers, but it's frankly still amateur hour over there.
- Binary packages are hell. Wheels were supposed to make Conda obsolete but as a packager, it's no fun at all to have to build binary wheels for every Python version/OS/bitness combination. `manylinux` and the decline of 32-bit OSes has helped here, but it's still super painful. Having a hard time tracking down a Windows machine in your CI env that supports Python 3.9? Too bad, no wheels for them. When a user installs with the wrong version, Python spits out a big ugly error message about compilers because it found the sdist instead of a wheel. It's super easy as a maintainer to just make a mistake and not get a wheel uploaded and cut out some part of your user base from getting a valid update, and screw over everyone downstream.
- Heaven help you if you have to link with any C libraries you don't have control over and have shitty stability policies (looking at you, OpenSSL[2]). Users will experience your package breaking because of simple OS updates. Catalina made this about a million times worse on macos.
- Python has two setup libraries (`distutils` and `setuptools`) and on a project of any real complexity you'll find yourself importing both of them in your setup.py file. I guess I should be grateful it's just the two of them.
- Optional dependencies are very poorly implemented. It still isn't possible to say "users can opt-in to just a specific dependency, but by default get all options". This is such an obvious feature, instead you're supposed to write a post-install hook or something into distutils.
- Sometimes it feels like nobody in the python packaging ecosystem has ever written a project using PEP420 namespaces. It's been, what, 8 years now? and we're just starting to get real support. Ridiculous.
I could go on about this for days. Nothing makes me feel more like finding a new job in a language with a functioning dependency manager than finding out that someone updated a dependency's dependency's dependency and therefore I have to spend half my day tracking down obscure OS-specific build issues to add version bounds instead of adding actual features or fixing real bugs. I have to put tons of dependencies' dependencies into my package's setup.py, not because I care about the version, but because otherwise pip will just fuck it up every time for some percentage of my users.
[1] I am told that this is "in progress", and if you look at pip's codebase the current code is indeed in a folder marked "legacy".
[2] I 100% understand the OpenSSL team's opinion on this and as an open source maintainer I even support it to some degree, but man oh man is it a frustrating situation to be in from a user perspective. Similarly, as someone who cares about security, I understand Apple's perspective on the versioned dylib matter, but that doesn't make it suck any less to develop against.
> struggling to get Python installed and into their `PATH` ... it's frankly still amateur hour over there
But that has been solved on Windows for quite a while hasn't it?
Python installs the "py" launcher on the path, which allows you to run whichever version you want of those you have installed. Just type "py" instead of "python". Or "py -3.5-32" to specifically run 32-bit Python 3.5, or "py -0" to list the available versions.
It's gotten a lot better, but we still hit tons of issues with users who don't know what Python version they installed their application in. Oh and of course our "binaries" in Scripts/bin don't seem to show up in the PATH by default. So I get to tell people "py -3.8-64 -m foo" on windows, "foo" everywhere else.
This gets much much worse when a new version of Python comes out and we don't support it yet (because of the build system issues I mentioned). I spent several weeks teaching people how to uninstall 3.8 and install 3.7 before we finally got a functioning package out for 3.8.
Sure, but telling people to run "py -3.7" seems a lot easier than walking them through uninstalling and reinstalling Python, as you would have had to in the bad old days. It's reliable and consistent and doesn't depend of what's installed where or how it's configured. If you run "py -3.7 -m venv my_env", it just works, always, with no special context required.
Although I don't handle user support for Python packages, if I did, that would be my go-to approach.
If only there was some graphical tool that allows the user to see conflicts, relax version dependencies, and of course rollback changes if things didn't work out.
Or an error message like:
There's a version conflict. In order to resolve, try one of the following:
pip relax-dep package1 >= 1.0
pip relax-dep package2 >= 2.0
pip remove package3
It depends! Sometimes I have to lock a dependency at minor releases because every.single.release from the author breaks something new, and I've already worked around the locked version's failings. Sometimes I have to lock a dependency at a major version and everything is fine after that. Usually when the latter happens, eventually the developer releases something that fits within the version bounds and breaks. Sometimes they fix it in the next release, but then I have to deal with a week of bug reports from users that "I couldn't pip install the latest release!". A big complaint I'll with flask/werkzeug app is that something or other broke because they installed something else with strict version requirements alongside it (because the authors of that program have experienced the same bullshit, I assume).
Maybe I'm spoiled from working with cargo and npm (I have almost no ruby experience so I can't comment there), but both of them have way fewer such version conflicts in my experience. Obviously there are tradeoffs and I don't want the node_modules experience for my users, but often it seems that would be a much better experience than pip for everyone. With either of those, I just "npm install" or "cargo install" and all my dependencies end up there working.
You can generate a requirements.txt file using "pip freeze" on a functioning system, but then you have to figure out a way to point users at it instead of using "pip install myapp". Also you might have to do it for each OS since windows vs mac vs linux can have different package dependencies specified, and even if you don't do that, a dependency doing it means you have to account for it.
You can copy+paste the "pip freeze" output into your setup.py and add quotes+commas, but then you're back to breaking side-by-side packages.
So what am I, a developer trying to distribute my command-line application to less-technical users, supposed to do? Distribute two entirely different packages, "myapp-locked" and "myapp"? Tell people to install from a copy+pasted "requirements.txt" file? I've started distributing docker containers that have the application installed via the requirements.txt method, which is fucking stupid but at least the users of that complain less about versioning issues... until the day someone yanks a package I guess.
I've recently reported bug on Xmonad github, they have
### Checklist
- [ ] I've read [CONTRIBUTING.md](https://github.com/xmonad/xmonad/blob/master/CONTRIBUTING.md)
- [ ] I tested my configuration with [xmonad-testing](https://github.com/xmonad/xmonad-testing)
I think it is brilliant idea, immediately checked latest git versions, I assume you may add
- [ ] I tested my application with [latest stable requirements.txt](...)
And something about triangulation and reporting to another repo too.
Sorry to hear about breaks on major version. Ruby gems (libraries) freeze dependencies on major, sometimes minor, example [0]. But applications shipped with Gemfile and Gemfile.lock [1], [2]. So `bundle install` is reproducible [3]:
> The presence of a `Gemfile.lock` in a gem's repository ensures that a fresh checkout of the repository uses the exact same set of dependencies every time. We believe this makes repositories more friendly towards new and existing contributors. Ideally, anyone should be able to clone the repo, run `bundle install`, and have passing tests. If you don't check in your `Gemfile.lock`, new contributors can get different versions of your dependencies, and run into failing tests that they don't know how to fix.
Yes, docker, msi, Flatpack, AppImage - whatever works for you and your users. It is sad we can't easily statically compile in one file on scripting languages.
When it comes to shipping Python server-side apps, Pipenv is a godsend. Before discovering it, I had 3 requirements.txt files (common, dev, prod) which I had to edit manually. This often meant forgetting to include something that I just installed and only finding out after a full round of QA.
It also meant a separate couple of steps for full-tree dependency freezing which never worked quite properly anyways. Pipenv just....works. Dependencies are saved as I install them, I only have to deal with the top-level ones, but the whole tree is locked.
To be blunt, maybe you just don't know what you're missing out on? Of course, Python's package management system works and is merely an annoyance to those of us who are used to more modern package managers. By the way, your comment reminded me a of this classic: https://news.ycombinator.com/item?id=9224 :)
I mean, possibly? What's considered the gold standard in package management these days?
I use yarn for managing javascript dependencies and do a lot of work with Cargo too. The community seems to love both these tools outside of slow compile and install times.
Cargo is my ideal, but really anything that doesn't make me manage virtualenvs or take 30 minutes to resolve dependencies. Note that "managing my own virtualenvs" is tricky because you have to make sure everyone has all of the same versions of the same dependencies in their virtualenv across your entire team (including production). I'm sure there are workflows that allow for this (probably with some tradeoffs), but we haven't figured it out. For a while we used Docker, but performance degraded exponentially as our test base grew (Docker for Mac filesystem problems, probably). Eventually we settled on pantsbuild.org which has a lot of problems, is super buggy, no one can figure out its plugin architecture, etc but as long as you stay on the happy path it generally works okay which puts it in one of the ballparks between any other Python dependency management scheme I've tried and Go/Rust/etc package management.
Great experience report: thank you! Wanted to point out that the Pants project has been focusing on widening that happy path recently (...by narrowing its focus to Python-only in the short term), and is ramping up to ship a 2.0.
This page covers some of the differences between v1 and v2 of the engine, and particularly its impact on Python: https://pants.readme.io/docs/pants-v1-vs-v2 ... We're using Rust and haven't bootstrapped yet, so we also appreciate Cargo and think that there is a lot to learn there.
That’s great to hear. Is there any page that documents the architecture of pants? I understand build systems of various kinds quite well, but I can’t tease out the design philosophy behind pants, especially how the different target types / plugins end and the “core” begins.
There isn't, but that's a good idea. At a very high level, the v2 engine is "Skyframe but using async Rust". All filesystem access, snapshotting, UI, and networking is in Rust, and we use the Bazel remote execution API to allow for remote execution. The v2 plugin API is side-effect free Python 3 async functions (called `@rules`) with a statically checked, monomorphized, dependency-injected graph connecting plugins (ie, we know before executing whether the combination of plugins forms a valid graph). Unlike plugins in Bazel, `@rules` are monadic, so they can inspect the outputs of other `@rules` in order to decide what to do.
yarn. fast and correct. every package.json is a virtualenv, but you still have a global cache and it just symlinks. (sort of like the global wheels cache for pip, but even more deduplication.)
cargo is great because it manages the build flow, it's extensible (clippy) - the global cache thing is a bit harder, because of rust package features (and other knobs like RUSTFLAGS), and it's not done by default, but it's as easy as setting RUST_TARGET_DIR as far as I know.
The only time I run into problems is when someone else is trying to use Conda. Then it can be hell trying to get their code running in standard pip/venv or vice versa.
I'm sure Anacona filled a niche at some point, but we have wheels now, can we all just agree to stop using Conda? What value does it actually bring now that makes it worth screwing up the standard distribution tools?
Isn't a conda environment just python installed into an isolated directory where someone can run pip? One can just run pip and pretend it isn't a conda environment.
It's way more than that. Firstly, most Anaconda installations come shipped with libraries like Matplot, numpy, etc. So a lot of people that use conda write software that assumes those libraries are always available e.g leaving them out of requirements.txt or setup.py.
Then there's the issue of Anaconda using it's own package repos, so even if you do manage to figure out what packages an Anaconda developed piece of software needs, you're getting a subtlety or maybe not so subtlety different version of it using standard pip, which creates the worst kind of hard to trace bugs.
Lastly, certain installations of Anaconda overwrite the system python version with it's own (so you can just use numpy or whatever anywhere) causing a huge headache with other system software and making using the standard distribution tools even harder.
I get that it's convenient for scientists that just want to write scripts and have them work, but if you're creating any kind of collaborative software, especially if you'll be working with SW engineers down the line, avoid Conda at all costs.
> So a lot of people that use conda write software that assumes those libraries are always available e.g leaving them out of requirements.txt or setup.py.
How is that any different than using python.org python? You'd still be unaware of what versions to use.
> you're getting a subtlety or maybe not so subtlety different version of it using standard pip, which creates the worst kind of hard to trace bugs.
That's way more of a problem with pip. You have no idea what versions a pip package is pulling in until install and then what binary actually gets installed depends on your compilers.
> certain installations of Anaconda overwrite the system python version with it's own (so you can just use numpy or whatever anywhere) causing a huge headache with other system software and making using the standard distribution tools even harder.
That's impossible unless one is actually copying binaries manually overtop of system binaries. You'd have to be root or use sudo to overwrite the system python manually. The whole point of isolation is to keep system python isolated and stable for system stability. That can happen if someone installs python from python.org and copies it into place.
> but if you're creating any kind of collaborative software, especially if you'll be working with SW engineers down the line, avoid Conda at all costs.
If you are working with SW engineers, you better know what versions you are pulling in, because you are going to be in serious pain using pip and trying to understand the provenance of your packages. Conda is way more powerful here for serious engineers to specify exact versions and reproducible and exact builds.
> How is that any different than using python.org python? You'd still be unaware of what versions to use.
Because python.org doesn't ship with numpy, matplotlib, or any of those other packages. Anaconda does, which makes it possible to import those libraries in projects without explicitly listing them as dependencies.
> That's way more of a problem with pip. You have no idea what versions a pip package is pulling in until install and then what binary actually gets installed depends on your compilers.
What? The problem here is that conda has it's own repos, which contains different packages than are contained in PyPi. What exactly do you mean by "no idea what versions a pip package is pulling". You realize you can set versions, right? numpy==1.13.2. The problem is numpy 1.13 on Anaconda can be different than numpy 1.13 on PyPi.
> That's impossible unless one is actually copying binaries manually overtop of system binaries. You'd have to be root or use sudo to overwrite the system python manually. The whole point of isolation is to keep system python isolated and stable for system stability. That can happen if someone installs python from python.org and copies it into place.
This is just wrong. Anaconda overwrites the system python by messing with the user's $PATH regardless if you are in a conda environment or not (probably easy to disable this "feature" but I've seen a lot of people with this setup). This causes major headaches.
> If you are working with SW engineers, you better know what versions you are pulling in, because you are going to be in serious pain using pip and trying to understand the provenance of your packages. Conda is way more powerful here for serious engineers to specify exact versions and reproducible and exact builds.
I'm not sure why you think you can't specify exact versions with pip. Projects like pipfile take it even further. The issue with conda is it's different package repos, not the ability to lock package versions.
> Anaconda does, which makes it possible to import those libraries in projects without explicitly listing them as dependencies.
I think your main problems are very naive users of conda. If you bring years of experience using pip, but use conda thoughtlessly, I can see your point.
If you don't want packages included, just use miniconda and install the ones you like. You could just create a new empty environment: `conda create -n py36 python=3.6`
Either way, it's completely reproducible.
When not using wheels, pip can be pulling in various versions of dependencies. Conda makes it easy to see all of them before they are dumped into your environment.
> Anaconda overwrites the system python by messing with the user's $PATH regardless
I understand what you are saying now. It's covering up system python in the PATH, but it isn't overwritten. Using `type python` (or which python will be correct 99% of the time).
> The issue with conda is it's different package repos, not the ability to lock package versions.
I thought this was your major argument. "Collaboration is difficult" when in fact it is much, much easier. You are getting the same binary everytime without slight differences in how it ends up compiled on the user's system.
I develop Python exclusively on Windows (that then is deployed on Linux) and my experience is identical to the original poster. It's not a perfect system, but it's good enough and I have dealt with dependency management systems.
I have used virtualenv/venv and pip to install dependencies for years and years, since I was a teen hacking around with Python. Packaging files with setup.py doesn't really seem that hard. I've published a few packages on pypi for my own personal use and it's not been too frustrating.
A lot of the issues people have with Python packaging seem like they can get replaced with a couple shell aliases. Dependency hell with too many dependencies becomes unruly in any package manager I've tried.
Is the "silent majority" just productive with the status quo and getting work done with Python behind the scenes? Why is my experience apparently so atypical?