Quite frankly, as someone with 20+ years of experience I have no idea what the b...

zbentley · 2025-11-11T15:35:57 1762875357

> what sort of "everything" do you have in mind?

I’ll answer in two parts.

First, in my original claim:

> not everything is necessarily backed by the same kind of heap-allocated memory object.

the emphasized part is important. Not all PyObjects are equivalent. Most things in Python are indeed PyObjects of some stripe, but they act very, very differently in some common and performance-sensitive situations. The optimizations discussed in the article are one example: allocations can be re-used, and some more complex data structures are interned/cached (like the short-tuple/short-list optimization, which is similar but not the same to how floats/numbers are reused via allocator pools: https://rushter.com/blog/python-lists-and-tuples/). As you point out, immortal objects and fully preallocated objects are also often special cases vis-a-vis the performance of creating/discarding objects.

Second, you’re wrong. PyObjects do not come into play in a wide variety of cases:

- In some (non-default even today, I believe) interpreter configurations, tagged pointers are used for some numbers, removing pointer chasing entirely: https://rushter.com/blog/python-lists-and-tuples/

- Bound method objects can be cached or bypassed, meaning that, while there’s a PyObject for the object being method-called on and the method, there’s sometimes one less PyObject for the binding: https://github.com/python/cpython/issues/70298, with additional/somewhat related info in PEP-580 and PEP-590. PyPy improves on this further: https://doc.pypy.org/en/latest/interpreter-optimizations.htm...

- None, False, and other “existential” types are, as you say, PyObjects. But they’re often not accessed as PyObjects and have much lower overhead when used in some cases. Even beyond the performance savings of “is” comparing by address (technically by id(), but that maps to address in CPython), special bytecode instructions like POP_JUMP_IF_NOT_NONE are used to not even introspect the "None" PyObject at all during comparison on some paths. Compare the “dis” output for an “x is None” check versus “x is object()” check to see the tip of the iceberg here.

- New/rolling out features like JIT compilation and tail-call optimization further invalidate the historical wisdom to consider everything an object: calling a function may not create PyObjects for argument stacks; accessing known-unchanged object fields may cache accessor results, and so on. But that’s all very new/not released yet; this isn’t meant as a “gotcha” to disprove your claim alone.

- While-True/While-1 loops don’t load or interact with the always-truthy constant value’s PyObject at all while they run, in many cases: https://github.com/python/cpython/blob/main/Lib/test/test_pe...

- Constant folding (which happens at a higher level than interning and allocation caching) deduplicates pyobjects, allowing identity/equality caches to be used more frequently, and invalidating some of the historical advice to consider nonscalar data structure comparison arbitrarily expensive.

- Things like LOAD_ATTR (field/method accessors) retain their own caches, invalidating the wisdom that “obj.thing” always pays a pointer-chasing cost or a binding-creating cost. In many (I’d go as far as guessing it’s “most”, though I don’t have data) looping cases, attribute access on builtins/slotted classes is always returning attributes from the cache without addressing any PyObject fields deeper than a single pointer. That's very different from the overhead of deep diving through the MRO to make a lookup. Is it still a PyObject and still on the heap? Sure! But are you paying the same performance cost as accessing uncached/seven-pointers-away heap data? That answer is much less cut and dried.

- Many many more examples I did not think of off the cuff.

The usual caveats apply: the above applies largely to CPython, and optimization behavior can and will change between interpreter versions.

> Can you give examples of the people involved, or explain why their use cases are not satisfied by native libraries?

I mean, the most significant fact in support of the claim is that the above optimizations exist. They were made in response to identified need. Even if you take a highly uncharitable view of CPython maintainers’ priorities, the volume of optimization work speaks to real value delivered by making pure python fast.

Beyond that, anecdotally, everywhere I’ve worked on Python--big SaaS platforms, million+ RPS distributed job/queueing systems, scientific research projects--over the last 15 years (hello fellow veteran!) has routinely had engineers who needed to optimize pure-python ops as part of ordinary, mundane, day-job tasks.

Even once you remove optimizations they performed that would have been necessary in any language (stop copying large data objects, aggregate I/O, and so on), I’ve worked with … hell, probably hundreds of engineers at this point that occasionally needed to do work like “okay, I’m creating tons of tuples in a loop and it’s slow, how can I make sure to recycle/resize things such that the interning cache handles more of my code?”. Silly tricks like "make some of the tuples longer than they need to be so that length-based interning has cache hits for individual items" are sometimes useful and necessary! Yes, that's seriously funky code and should not be the first resort for anyone (and merits a significant comment/tests re: validating behavior on future interpreters), but sometimes those things produce point fixes which yield multiple percentage points of improvement on memory/GC pressure without introducing new libraries or more pervasive refactors. That's not nothing!

Sure, many of those cases would have been faster if numpy or another native-code library were in play. But lots of that code didn’t have or need numpy/extensions for anything else, including it would (depending on how long ago we’re talking) have required arduous installation/compatibility acrobatics, sometimes third-party modules are difficult to get approved, and, well, optimization needs were routinely met by applying a little knowledge of CPython’s optimizations anyway, so I’d say optimizing pure Python is both a valid approach and a common need.

zbentley · 2025-11-14T13:33:54 1763127234

Correction on my first bullet, regarding tagged pointers. The link was wrong (mistaken duplication of the link used above), and the tagged pointer optimization is not present in CPython, but only in PyPy, so it largely does not pertain to this discussion: https://doc.pypy.org/en/latest/config/objspace.std.withsmall...