Pydantic V2 rewritten in Rust is 5-50x faster than Pydantic V1

lucasyvas · on April 8, 2023

This is fantastic - first hand I have noticed this trend is making working with scripting languages less friendly though.

For a Node example, with esbuild, I can no longer ever install packages and assume they will work fine when running under the host (macOS) and Docker VM (Linux).

Unless Pydantic is downloading all OS binaries with the package and loading the right one at runtime, this would become a "problem" as well. For those coming strictly from scripting language land, this could screw up their workflow quite a bit. A lot more of these people have never worked with native binaries than you think.

I, like many here, understand this basic nuance and would rather my tools be fast. However, I think this is quickly eroding the value of using a dynamic scripting language. The platform portability advantage is completely removed as stuff like this spreads and it makes working with those scripting languages a pain in the ass to the point I wouldn't even want to use a scripting language anymore. It becomes totally opaque whether or not your code can actually run on another target.

In my case this is OK, but I think I'm in the minority.

ivoflipse · on April 8, 2023

Welcome to the problem the Scientific Python community has been struggling with for years. Providing binaries for a plurality of platforms brings challenges with it, but I reckon on the major platforms Pydantic will be fine as they won't have hard to compile parts.

But Rust isn't available everywhere, so surely someone will be losing out.

jeroenhd · on April 8, 2023

Rust is available for almost all common platforms. There are crates that don't run on some platforms (i.e. relying on CPU intrinsics that aren't available cross platform) but there's a Linux, macOS, Windows, FreeBSD, NetBSD, and more. It's supported on x64, ARM, MIPS, PowerPC, RISC-V, and s390x. There's even a rustup version for running the compiler on i686 Android.

If you're running software on a Pentium 2 or a DEC machine then maybe you're out of luck, but there aren't that many platforms out there that are still in use that will run into problems here.

Perhaps if you're stuck on a long unsupported version of Windows or CentOS and need the latest version of Python packages you're in trouble, but you already were when you got stuck on a legacy platform anyway; I doubt you'd be updating your dependencies to a version where this is a problem if your OS is that outdated. You'll have the same problem with your dependencies needing a C compiler more recent than GCC 4.

iudqnolq · on April 8, 2023

But with python the custom is to ship prebuilt binaries. And there won't be prebuilt binaries for plenty of normal-ish platforms, like Alpine Linux. So those users will now have to set up local complication for the first time.

This is no big deal for us, but it can be quite confusing and frustrating for users who haven't written compiler code before.

jammycrisp · on April 8, 2023

It looks like pydantic-core is distributing musllinux wheels, which should work fine on alpine. Fwiw tooling like cibuildwheel makes building and publishing wheels for all the common platforms fairly straightforward now.

jeroenhd · on April 10, 2023

My experience is that though the error messages Python produces are quite obscure and verbose, the installation of the necessary packages went quite smoothly. I've pkg added Rust in Termux on my phone, and it that works I don't think getting stuff built will be all that challenging.

What Python dependency managers need to improve, in my opinion, is for their toolsets to inform users about native dependencies when compilation is required.

peoplefromibiza · on April 8, 2023

once again Elixir community proves that tooling is as important as features of the language or number of libraries available

https://dashbit.co/blog/rustler-precompiled

kodablah · on April 8, 2023

> Unless Pydantic is downloading all OS binaries with the package and loading the right one at runtime, this would become a "problem" as well.

Nah, it's not that bad. I built a Rust-backed Python library used by many [0], and with setuptools-rust (maturin wasn't flexible enough at the time) and cibuildwheel and GH actions, the wheels are built/shipped with the shared libraries embedded and the end user never has to worry or even be aware of its presence.

Pydantic has already been shipping a binary mode with an option for pure Python, so maybe they'll keep the pure Python mode around.

0 - https://github.com/temporalio/sdk-python

rtpg · on April 8, 2023

I think that you've been lucky, since even for mostly-pure Python packages, you often can't install in one env and just use it in another, because there might be some deeper dependencies.

My wish would be for there to be a way to get "most of the installation" working in a portable way and for OS-specific stuff to somehow be isolated into a final step, but in the world of Docker images that feels like a rare user story.

deivid · on April 8, 2023

Try out shiv[1], it will package up your source and depndencies into a single file, though you still require the Python interpreter to run it on the target

[1] https://github.com/linkedin/shiv

sseagull · on April 8, 2023

Maybe I'm just a little skeptical today, but I'm not super enthusiastic about this.

1.) Adding another language significantly increases complexity in general. Now I see several packages (pydantic, pydantic-core, pydantic-settings). Issues can crop up in either language, or in the translation.

2.) Your typical users are python developers, many which don't know rust, and therefore would be unable/unwilling to help contribute or help debug issues. Debugging an issue with pure python is easier (ctrl-click in my IDE will take me to the code and I can look and see what is going on, add print statements, etc, even in external libraries).

3.) At least in my use cases (and we use pydantic quite a bit), performance increases there will be negligible compared to things like data transfer (network, db) and the rest of my code that's written in python.

Anyway, I do love pydantic, and hope this is just a drop-in replacement and I don't have too many issues.

jeroenhd · on April 8, 2023

I think your objections are fair, but I also think it's not that much of a difference. Most common Python projects seem to have a native component somewhere (often written in C) because pure Python has rather bad performance.

Separating the packages shouldn't be too much of a problem because dependency management should keep the versions in sync. Broken releases can occur, but that also happens in single package models.

I think the performance benefit is significant enough to excuse the barrier of entry for maintainers. Large projects will be unlikely to see much performance improvements because there are so many other slowdowns in other layers of code and dependencies, but every projects that fixes their performance reduces the impact.

I think Python shines as a language when it's used to glue together native code. Writing correct C and Rust is hard and most applications are usually just applying some custom logic to existing code. In my opinion, the more packages are available in native, compiled packages, the better; it doesn't matter if the native code was written in Rust or hand-crafted assembly, as long as the Python calls work without having to restructure the entire project.

It looks like this rewrite is a drop-in replacement with some minor renames and API changes. It's still a major version upgrade so there may be more differences that I can't see, but the migration guide (https://docs.pydantic.dev/blog/pydantic-v2-alpha/#migration-...) seems simple enough. There's nothing in there that I wouldn't expect if they hadn't rewritten the entire thing in a different language anyway.

crabbone · on April 8, 2023

For a while, I was meaning to write a package manager for Python, out of total frustration with existing ones.

If I ever find the time and will to actually commit to such a project, I will never use Python for it. There's an obvious reason, which in my case can work as an excuse: not wanting circular dependencies (both pip and conda today depend on Python version and a bunch of Python packages, which makes them flaky / needing multiple versions depending on what Python version is used). But that's just part of the problem.

Python community today houses the worst programmers I know to exist. The quality of community-produced code was and is still degrading fast. I think, Python overtook PHP on this metric a while ago, we just didn't notice it.

So, if I ever make another publicly distributed library for Python, I'm not writing it in Python. Both because I don't want any input from people writing in Python (exclusively) and because with each new version of the language it gets worse. Each new Python release is a step forward and ten steps back: they do fix some bugs, and every now and then clean things up, but they add ten times more garbage to the language than the work they put to make it genuinely better. I feel like if I'm going to chase Python language iterations, I'll be just plugging holes on a sinking ship, and eventually will just run out of stuff to plug the holes with.

Finally, Python doesn't have its own strategy for making Python code to go fast. What I'm talking about is stuff like const correctness in C, or no dynamic memory allocation in C, or eliminating dynamic dispatch in C++ or Java and so on. Many languages have their own "min-game" developers play to make programs faster. In Python, the name of this mini-game is "write in C". It doesn't have to be C specifically, but it's quite obvious that no tricks and backflips you can do in Python alone will ever be comparable performance-wise to simply rewriting even not-very-efficient code to C.

Python is a delivery mechanism today. It's a good delivery mechanism because many people use it. But it's an awful platform for programming. So, if you want comfort in programming and have many users: write in C, and add bindings in Python.

ilyt · on April 8, 2023

> Python community today houses the worst programmers I know to exist. The quality of community-produced code was and is still degrading fast. I think, Python overtook PHP on this metric a while ago, we just didn't notice it.

JS topped that metric long time ago;

But it is a curse of every language that gets popular as new developer's first language

devonkim · on April 8, 2023

A long time ago I remember people saying that JavaScript is the most popular language that programmers don’t learn. A language with a sufficiently low bar for mass adoption by non-programmers in theory should be useful without needing to know the ins and outs. Yet, I would argue still that the world’s most popular programming language isn’t even either JS or Python or PHP - it’s Excel. When we constrain a programming operating environment and language elegantly it’s quite impressive what is accomplished. We as programmers tend to forget oftentimes that computers and all that stuff is just a means to an end.

crabbone · on April 9, 2023

Can you write extensions / modules for VBA used in Excel? -- It's an honest question. I simply haven't used any Microsoft product in a very long time... If not, it's probably not a good delivery mechanism, even if very popular.

devonkim · on April 9, 2023

There’s various macro extensions and plugins available including .NET but these options tend to have a long history of horrible security problems because they are general purpose languages. While possible to crash spreadsheets with simple formulas they’re not going to steal your local credentials off disk probably.

MrJohz · on April 8, 2023

I'm not entirely sure that's the case. I agree, Javascript is very popular as a first language and an introduction to the world of programming, especially with all of the frontend bootcamp-type courses. But in most of these courses embed a lot of healthy "best practices". Sure, there's often a sense of cargo cult appeal rather than a full explanation, but, for example, most new developers seem to at least have a basic knowledge of git and version control (even if it's often treated as a set of mystical incantations that must be performed before programming). The tooling also helps a lot here, for example by installing dependencies locally and saving a lock file to prevent automagical updates.

Meanwhile, a lot of people who write Python are not necessarily developers, they are scientists or data analysis who just want to turn their data into conclusions. So things like version control are kept largely to one side as unnecessary complexity, and dependency management is often global, ad-hoc, and prone to breaking. Even things like Jupyter - which by itself can be a great tool - tend to push less experienced developers towards spaghetti code and difficult to reproduce results.

In fairness, I think it's difficult to compare these situations and make hard conclusions about which one is better. Ultimately, in both ecosystems, there are some fantastic developers producing excellent tools, frameworks, and libraries. The problem, to the extent that it exists, lies in the low barrier to entry, and the accessibility of these languages, and that's something that's hard to complain too hard about!

crabbone · on April 9, 2023

I remember JavaScript from the time it wasn't considered a language anyone would write by hand (RAD controls? Something similar generated by MSVS?). It was supposed to be generated because most people involved with it looked at it with disgust. This tradition, sort of, continues to this day with the most prominent member of the family being TypeScript. So, I'd say, JavaScript community is more aware of the awfulness of their language, and have created an ecosystem where they constantly make efforts to not to write in JavaScript (of course there's also the opposite camp of people who really want to write in JavaScript).

physPop · on April 8, 2023

no one should be writing C unless they have a very specific use case. There are so many better safer options like Rust or compile-to-c like Zig, Odin, and Nim.

ledauphin · on April 8, 2023

this seems like a reasonable objection.

for my part, i wish somebody would write a data modeling library for Python where the underlying objects are themselves Rust objects, such that they could be handed off to Rust code with no translation penalty, but where the data model definition happens completely on the Python side.

This would be helpful on larger teams where everyone needs to be able to contribute to the data model but not everyone is comfortable writing Rust.

crabbone · on April 8, 2023

All Python objects are C structs already... that's in part why binding between Rust and Python is possible.

Rust doesn't really have objects though. It has structs. But these are the definitions that exist ad definition time. They disappear into the ether at runtime, so, I'm not sure how such communication between Python and Rust could've happened. Or, in other words: did you mean communication at run time or at compile time? Does Python, in your version of events, generate Rust source code, or does it talk to a program written in Rust?

orf · on April 8, 2023

It's not possible to do with no translation penalty. Rust (and other languages like it) need the sizes of objects known at compile time in order to allocate space on the stack (and within struct fields).

One way you could do it would be to add dynamically dispatched ".get_attribute(str) -> Option<...>" methods which adds a lot of overhead.

Another way would be to have some bastardised `build.rs` script that constructs Rust structs from Python class definitions. This keeps the definitions in Python but obviously doesn't track dynamic changes to it at runtime.

manfre · on April 8, 2023

2) from my experience maintaining open source packages, many would report issues with little to no troubleshooting. A very small subset would look at the project code. The pedantic v1 code is not the easiest to digest. Changing v2 to rust will likely not have much of a change with regards to user reports.

bitwize · on April 9, 2023

> Adding another language significantly increases complexity in general.

True, but no language -- not even Lisp -- sparks joy the way Rust does. Developers are more than willing to endure the additional complexity for the benefits Rust provides: fearless performance or low-level capability. Unlike C or C++, safe Rust is within grasp of the average Python developer.

shalabhc · on April 8, 2023

I'd be curious is Cython was evaluated as an alternative. With less of an impedance mismatch with Python and capable of similar speedups it might be a good fit for this use case.

simon04 · on April 8, 2023

Related: Ruff, a fast Python linter, written in Rust, claiming to be 10-100x faster than existing linters.

https://news.ycombinator.com/item?id=34788020 and https://news.ycombinator.com/item?id=32666035

mau · on April 8, 2023

Congratulations to the team, Pydantic is an amazing library.

If you find JSON serialization/deserialization a bottleneck, another interesting library (with much less features) for Python is msgspec: https://github.com/jcrist/msgspec

jammycrisp · on April 8, 2023

Are there any necessary features that you've found missing in msgspec?

One of the design goals for msgspec (besides much higher performance) was simpler usage. Fewer concepts to wrap your head around, fewer config options to learn about. I personally find pydantic's kitchen sink approach means sometimes I have a hard time understanding what a model will do with a given json structure. IMO the serialization/validation part of your code shouldn't be the most complicated part.

BerislavLopac · on April 8, 2023

The biggest issue missing from most conversion and validation libraries is creating models from JSON Schema. JS is ideal for central, platform agnostic single source of truth for data structures.

timinou · on April 8, 2023

In my use case, I find the lack of features of msgspec more freeing in the long run. Pydantic is good for prototyping, but with msgspec I can build nimble domain specific interfaces with fast serial/deserialisation without having to fight the library. YMMV!

moving_at_43234 · on April 8, 2023

strange timing, I just pivoted from pydantic to rust's serde due to speed.

canadiantim · on April 9, 2023

Are you considering trying Pydantic again? Testing and comparing the two?

jerrygenser · on April 8, 2023

Link to pydantic v2 plan on the blog post is broken.

Great job team can't wait to use it!

reactordev · on April 8, 2023

Soooo, we go from a schema def (my model) to a pydantic-core schema (for use in rust) to a json schema (to send to the user).

I feel like this is a huge leap backwards… I was a huge fan of pydantic when merging your models with SQLAlchemy to build rest API’s but having to have rust around to build is a show stopper for me. Good luck.

jsmeaton · on April 8, 2023

Wheels will be provided for most platforms, no? Most folks won’t need rust tool chains.