It’d be annoying for every type used in a cache to need to properly implement __deepcopy__(), and it’d pose significant performance impact, especially if the cached objects are large (which there’s a good chance they are, given you’ve felt the need to cache them rather than build them from scratch).
Much better off using the same assignment semantics used throughout Python and let people choose to deepcopy() all the objects they read from the cache if they really really want to modify them later; it would even work as a simple decorator to stack onto the existing one for such cases.
Aside: If we’re missing “deep coping” anything by default in Python, it’s definitely default parameters :) deep copying caches makes much less sense than that.
> An important fact about assignment: assignment never copies data.
Is that really what's going on here? I'm in way too deep to be sure what's best for beginner programmers, but I feel like Python must surely optimise...
sheep = 1
goats = sheep
sheep = sheep + 10
... by simply copying that 1 into goats, rather than tracking that goats is for now an alias to the same value as sheep and then updating that information when sheep changes on the next line.
Now, if we imagine those numbers are way bigger (say 200 digits), Python still just works, whereas the low level languages I spend most time with will object because 200 digit integers don't fit in a machine word. You could imagine that copying isn't cheaper in that case, but I don't think I buy it. The 200 digit integer is a relatively small data structure, still probably cheaper to copy it than mess about with small objects which need garbage collecting.
The semantics of assignments in Python are not the same as assignment in C. When you assign a local like `x = some_expression` in Python, you can read it as, “Evaluate `some_expression` now, and call that result `x` in this local namespace.”
The behavior that results from your example follows from this rule. First, evaluate `1` and call it `sheep`. Then evaluate whatever `sheep` is, once, to get `1` (the same object in memory as every other literal `1` in Python) and call it `goats`.
The last line is where the rule matters: The statement `sheep = sheep + 10` can be read as, “Evaluate `sheep + 10` and call the result `sheep`.” The statement reassigns the name `sheep` in the local namespace to point to a different object, one created by evaluating `sheep + 10`. The actual memory location that `sheep` referred to previously (containing the `int` object `1`) is not changed at all — assignment to a local will never change the value of any other local.
This is easy to remember if you recall that a local namespace is effectively just a `dict`. Your example is equivalent to:
It should be clear even to beginners that `d["goats"]` has a final value of `1`, not `11`, because the right-hand side of `d["goats"] = d["sheep"]` is only evaluated once, and at that time it evaluates to `1`. Assignment using locals behaves in exactly the same way.
>Is that really what's going on here? I'm in way too deep to be sure what's best for beginner programmers, but I feel like Python must surely optimise...
For these particular numbers, CPython has one optimization I know of: Small integers (from -5 to 256) are pre-initialized and shared.
On my system these "cached" integers seem to each be 32 byte objects, so, over 8kB of RAM is used by CPython to "cache" the integers -5 through 256 in this way.
There's similar craziness over in String town. If I mint the exact same string a dozen times from a constant, those all have the same id, presumably Python has a similar "cache" of such constant strings. But if I assemble the same result string with concatenation, each of the identical strings has a different id (this is in 3.9)
So, my model of what's going on in a Python program was completely wrong. But the simple pedagogic model in the article was also wrong, just not in a way that's going to trip up new Python programmers.
This isn't a pre-made list of certain strings that should be cached, this is the compiler noticing that you mentioned the same constant a bunch of times.
Also in general you would see a lot of things with the same id because python uses references all over the place. E.g. assignment never copies.
Might be interesting as an option for the @lru_cache decorator to be able to specify a function to call before handing a cached value to the user. Then you could just do
@lru_cache(post=deepcopy, ...)
to have a new (copied) instance in cases where that's required. Or do whatever else you needed. Maybe you happen to know some details about the returnee that would let you get away with copying less and could call my_copy instead.
Something something power of function composition.
Much better off using the same assignment semantics used throughout Python and let people choose to deepcopy() all the objects they read from the cache if they really really want to modify them later; it would even work as a simple decorator to stack onto the existing one for such cases.
Aside: If we’re missing “deep coping” anything by default in Python, it’s definitely default parameters :) deep copying caches makes much less sense than that.