It’d be annoying for every type used in a cache to need to properly implement __...

nerdponx · on July 9, 2021

For anyone wondering what exactly the assignment semantics are, see: https://nedbatchelder.com/text/names1.html and https://www.youtube.com/watch?v=_AEJHKGk9ns

tialaramex · on July 9, 2021

> An important fact about assignment: assignment never copies data.

Is that really what's going on here? I'm in way too deep to be sure what's best for beginner programmers, but I feel like Python must surely optimise...

  sheep = 1
  goats = sheep
  sheep = sheep + 10

... by simply copying that 1 into goats, rather than tracking that goats is for now an alias to the same value as sheep and then updating that information when sheep changes on the next line.

Now, if we imagine those numbers are way bigger (say 200 digits), Python still just works, whereas the low level languages I spend most time with will object because 200 digit integers don't fit in a machine word. You could imagine that copying isn't cheaper in that case, but I don't think I buy it. The 200 digit integer is a relatively small data structure, still probably cheaper to copy it than mess about with small objects which need garbage collecting.

goodside · on July 9, 2021

The semantics of assignments in Python are not the same as assignment in C. When you assign a local like `x = some_expression` in Python, you can read it as, “Evaluate `some_expression` now, and call that result `x` in this local namespace.”

The behavior that results from your example follows from this rule. First, evaluate `1` and call it `sheep`. Then evaluate whatever `sheep` is, once, to get `1` (the same object in memory as every other literal `1` in Python) and call it `goats`.

The last line is where the rule matters: The statement `sheep = sheep + 10` can be read as, “Evaluate `sheep + 10` and call the result `sheep`.” The statement reassigns the name `sheep` in the local namespace to point to a different object, one created by evaluating `sheep + 10`. The actual memory location that `sheep` referred to previously (containing the `int` object `1`) is not changed at all — assignment to a local will never change the value of any other local.

This is easy to remember if you recall that a local namespace is effectively just a `dict`. Your example is equivalent to:

    d = {}
    d["sheep"] = 1
    d["goats"] = d["sheep"]
    d["sheep"] = d["sheep"] + 10

It should be clear even to beginners that `d["goats"]` has a final value of `1`, not `11`, because the right-hand side of `d["goats"] = d["sheep"]` is only evaluated once, and at that time it evaluates to `1`. Assignment using locals behaves in exactly the same way.

faho · on July 9, 2021

>Is that really what's going on here? I'm in way too deep to be sure what's best for beginner programmers, but I feel like Python must surely optimise...

For these particular numbers, CPython has one optimization I know of: Small integers (from -5 to 256) are pre-initialized and shared.

See https://docs.python.org/3/c-api/long.html#c.PyLong_FromLong

This is generally invisible to the python programmer, but leads to https://wsvincent.com/python-wat-integer-cache/

tialaramex · on July 9, 2021

Thanks! That's a wild ride.

On my system these "cached" integers seem to each be 32 byte objects, so, over 8kB of RAM is used by CPython to "cache" the integers -5 through 256 in this way.

There's similar craziness over in String town. If I mint the exact same string a dozen times from a constant, those all have the same id, presumably Python has a similar "cache" of such constant strings. But if I assemble the same result string with concatenation, each of the identical strings has a different id (this is in 3.9)

So, my model of what's going on in a Python program was completely wrong. But the simple pedagogic model in the article was also wrong, just not in a way that's going to trip up new Python programmers.

faho · on July 10, 2021

>presumably Python has a similar "cache" of such constant strings.

Not really. You're hitting constant folding: https://arpitbhayani.me/blogs/constant-folding-python

This isn't a pre-made list of certain strings that should be cached, this is the compiler noticing that you mentioned the same constant a bunch of times.

Also in general you would see a lot of things with the same id because python uses references all over the place. E.g. assignment never copies.

hashmush · on July 9, 2021

> by simply copying that 1 into goats

I don't think it does that. You can try it yourself:

    >>> x = 123456
    >>> y = x
    >>> id(x)
    139871830996560
    >>> id(y)
    139871830996560

Quekid5 · on July 9, 2021

Might be interesting as an option for the @lru_cache decorator to be able to specify a function to call before handing a cached value to the user. Then you could just do

    @lru_cache(post=deepcopy, ...)

to have a new (copied) instance in cases where that's required. Or do whatever else you needed. Maybe you happen to know some details about the returnee that would let you get away with copying less and could call my_copy instead.

Something something power of function composition.