Nicely written article! - Slightly off topic: I love seeing and reading Python c...

radarsat1 · on Oct 10, 2021

> (things like `.append` changing the object, returning `None` instead of creating a copy and returning that)

This would be horrendously inefficient without immutable data structures like Clojure's. Very few languages have that, so it's a strange assumption to make, especially for a language as old as Python.

Although it is a very nice feature of Clojure.

Blikkentrekker · on Oct 10, 2021

It should also be worth nothing that Clojure sacrificed quite a bit to make this as efficient as possible.

“persistent vectors” are certainly an interesting data structure that strike a compromise between fast indexing and being able to relatively quickly create a copy where only one element changes, but it's a compromise and indexing is made slower to allow for the latter. — They also take up more memory on their own but are allowed to share memory with their copies.

I will say that my ideal language contains them in the standard library alongside standard vectors that index in constant time.

Further, it should be noted that much of the performance talk is on the assumption that accessing from memory is truly random access; — with the existence of c.p.u. caches that assumption is not entirely accurate and accessing from contiguous rather than scattered memory in practice is considerably cheaper so one also pays the price for their being scattered more in memory.

adrusi · on Oct 10, 2021

Random access into a clojure vector is going to need more memory lookups than conventional sequential buffer array (I don't recall the constants used in the implementation, I think it's either 4 or 8 lookups).

But when you're indexing into the vector sequentially, the memory layout plays rather well with memory caching behavior, and most lookups are going to be in L1 cache, just like they would be in a conventional array.

So lookups are a bit more expensive, but not as much more expensive as one might imagine.

Blikkentrekker · on Oct 10, 2021

How? I don't see how that's possible.

The actual data of a Pvector is not in contiguous memory but scattered however the JVM wills it, and on top of that in order to find which address to retrieve it, an algorithm that runs in logarithmic time with respect to the length of the vector must be used opposed to a constant time one.

How can most lookups end up in L1 cache if an element that is 32 indices removed is statistically likely to be arbitrarily far removed in memory?

Of course, all of that is not that material to begin with given that most elements will be pointers to begin with so the actual objects wither they point will already be arbitrarily scattered and it simply adds one more pointer indirection, but for unboxed types such as integers it does play a factor.

adrusi · on Oct 16, 2021

So I don't recall what size of chunks Clojure's implementation uses, but I'll assume it uses 64-bit indices with 16-word chunks, because I want to use numbers.

    (loop [i 0]
      (println (get my-vector i))
      (recur (inc i)))

Assuming no part of my-vector is cached at first, the first iteration needs to make a full 16 round trips to main memory — quite bad. But on the next iteration, all of that is cached, and we don't hit main memory at all, and the same until i=16, which requires one round trip. Then when i=16², we need to hit main memory twice, etc.

No doubt this is quite a bit worse than having everything nicely laid out sequentially in memory, but it's not as bad as you're describing.

Of course, all of that is not that material to begin with given that most elements will be pointers to begin with

I guess this is sort of true. If you're doing random lookups, then using a persistent vector instead of an array list mean 17 trips to main memory instead of just 1, so it's not totally inconsequential.

But I think (hope?) that modern JVMs can optimize collections of small immutable objects so that they're not represented as pointers to the heap. Surely ArrayList<Integer> x gets represented as int x[], and not int *x[], at least with the most optimizing JIT level.

fulafel · on Oct 11, 2021

I think you are talking about arbitrary stride sequential access, and the comment you were responding to meant stride-1 access. But conventional arrays also fare poorly with arbitrary stride access, though probably they have an advantage with small strides close to the Clojure persistent vector chunk size.

l33tc0der · on Oct 10, 2021

One of the many reasons why I love Clojure so much!

Rich implement his own brand of persistent data structures which makes Clojure's immutability a lot more efficient.

radarsat1 · on Oct 10, 2021

Thanks yeah, persistent is the term I was looking for, did not come to mind as I haven't used Clojure in a few years.

tusharsadhwani · on Oct 10, 2021

Thank you!

Things like `list.append` modifying in-place might feel like a flaw to some, but I think Python is really consistent when it comes to its behaviour. If you ask a person who comes from an object-oriented world, they'll say it only makes sense for a method on an object to modify that object's data directly.

There's always ways to do things the other way, for example you can use x = [*x, item] to append and create a new copy, while being quite a bit more explicit that a new list is being created.

int_19h · on Oct 10, 2021

One related area where Python is not consistent is operators like +=.

In pretty much all other languages that have them, the expected behavior of A+=B is exactly the same as A=A+B, except that A is only evaluated once. Now lets look at lists in Python:

   xs = [1, 2]
   ys = xs
   ys = ys + [3]
   print(xs, ys)

This prints [1, 2] [1, 2, 3], because the third line created a new list, and made ys reference that. On the other hand, this:

   xs = [1, 2]
   ys = xs
   ys += [3]
   print(xs, ys)

prints [1, 2, 3] [1, 2, 3], because += changes the list itself, and both xs and ys refer to that same list.

(Note that this is not the same as C++, because in the latter, the variables store values directly, while in Python, all variables are references to values.)

The worst part of it is that Python isn't even self-consistent here. If you only define __add__ in your custom class, you can use both + and += with its instances, with the latter behaving normally. But if you define __iadd__, as list does, then you can do whatever you want - and the idiomatic behavior is to modify the instance!

For comparison, C# lets you overload + but not +=, and automatically synthesizes the latter from the former to enforce the correct behavior.

milliams · on Oct 11, 2021

The rule in Python is that `=` creates a name binding, where as `+=` does not. That's pretty consistent as far as I can tell.

roelschroeven · on Oct 11, 2021

The inconsistency is that for immutable objects (or more generally, objects that have __add__ but not __iadd__), '+=' does create a name binding.

After 'a = 300; b = 50000',

    a += b

is exactly the same as

    a = a + b

In this case, a new object is created with the value of a + b which then gets bound to the name a.

int_19h · on Oct 11, 2021

Nope, += also creates a binding! Try this:

   xs = [1, 2]

   def foo():
      xs += [3]

   foo()

You'll get an exception saying that local variable xs was used before it was assigned - precisely because += created a new local binding for xs inside foo.

tyingq · on Oct 10, 2021

>Python is really consistent when it comes to its behaviour

True, though you end up with things like:

  ' '.join(thelist)

Instead of

  thelist.join(' ')

Because of the somewhat aggressive mantra to be consistent.

robluxus · on Oct 10, 2021

It's more about flexibility than consistency.

str.join() and bytes.join() can support all iterable type arguments.

Better than trying to implement a join (or two) on all iterables.

Marazan · on Oct 10, 2021

But that's good. Because a string just needs to know aboit interable to perform that operation whereas every iterable would need to implement it's own join if you had it the other way around.

Someone · on Oct 11, 2021

It may be necessary in python, but in general, a language could allow you to define a join method on the iterable superclasss/trait/interface that iterates over the elements, converting each to a string, and inserting the separator between each of them.

For example, scala has Iterable#mkstring (https://docs.scala-lang.org/overviews/collections-2.13/trait...)

goatlover · on Oct 10, 2021

Yet Ruby and JS manage to do it somehow. To me it seems natural that join should be a method on the iterable, and I always have to pause to remember Python is different.

aidos · on Oct 10, 2021

How does that work? Don’t you have to effectively convert your general iterable to an array and then join on that? Array.from(iterable).join(…)?

globular-toast · on Oct 10, 2021

I don't think it should be a method at all. It's just a function: join(iterable, separator). It can also be implemented with reduce naturally: `reduce(lambda x, y: x + separator + y, iterable)`.

Redoubts · on Oct 10, 2021

Reduce sounds like a really slow way to do string building

globular-toast · on Oct 10, 2021

Oh yeah, it's horrendous, my point was just that it's functionally equivalent and makes more sense as a function than a method on either object. You can actually call it like this if you want, though: `str.join(separator, iterable)`.

ptx · on Oct 10, 2021

The way it's managed in JS, digging the function out of the prototype to apply it, can be done in Python as well. But unlike JS you won't normally have to, thanks to the method not being defined only on one specific type of iterable.

JS:

  Array.prototype.join.call(["one", "two", "three"], "|")

Python:

  str.join("|", ["one", "two", "three"])

dragonwriter · on Oct 12, 2021

> Yet Ruby and JS manage to do it somehow.

Ruby does it by having a mixin (Enumerable) that anything meeting a basic contract (roughly equivalent to the Python iterable protocol) can include to get an enormous block of functionality; Python doesn’t have (or at least idiomatically use as freely; ISTR that there is a way to do it) mixins like Ruby does.

rndgermandude · on Oct 11, 2021

' '.join() makes more sense to me, and it's more universal too if done right and you accept anything resembling a "sequence" (which python does) and individual objects of the sequence have a sensible str(). And as language maintainer, you only have to maintain one such implementation, not one per collection type.

Javascript, on the other hand, kinda does it worst, at least of the languages I regularly use... .join() is a instance method on Arrays and TypesArrays. But they forgot to add any kind of join for Sets, for example.

    (["a", "b", "c"]).join("")
    "abc" # alright
    (new Set(["a", "b", "c"])).join("")
    Uncaught TypeError: (intermediate value).join is not a function
    ([...new Set(["a", "b", "c"])]).join("")
    "abc" # grmpf, have to materialize it into an array first.

That illustrates the drawback: if you make it a method on the concrete sequence types you got, you better not forget some and make sure the different APIs are consistent, too. If Javascript had a String.join(sep, <anything implementing the iterator protocol>) this wouldn't have been an issue.

python isn't alone either, by the way. C# has the static string.Join(...) that accepts "enumerables" (IEnumerable<T>), but no array.Join() or list.Join() or dictionary.Join(). Combined with Linq, especially .Select, that becomes quite handy. It has been plenty of times I did print-debugging by adding a one liner along the lines of

    Console.WriteLine(string.Join("\n", dictionary.Select((key, value) => $"{key} = {value.SomeProperty}")));

I find the C# way of having a string.Join(sep, ...) instead of python's "some string".join(...) nicer to read because it's more obvious.

pansa2 · on Oct 11, 2021

> I find the C# way of having a string.Join(sep, ...) instead of python's "some string".join(...) nicer to read because it's more obvious.

`str.join(sep, ...)` works in Python as well, because `a.f(...)` and `type(a).f(a, ...)` are (almost) equivalent.

pdonis · on Oct 10, 2021

In the cases you give, the original list is not being mutated; a new object (a string, not a list) is being created. So it does make sense not to have it be a method call on the list.

pokepim · on Oct 10, 2021

Huh I never even thought we would need to create copy of an object when adding new item to it (like a new item to list for example). Is there any drawback on doing that in standard pythonic way? I actually learned to program using Python and it was my first language. Since then I only used JS. In both I like using functions a lot and rarely dabble in OOP since it is more conveniet to me.

sgeisenh · on Oct 10, 2021

You often lose performance in traditional imperative languages when aiming for persistence.

When you have immutability guarantees (like in many functional programming languages like ML or Haskell) you can avoid making copies by sharing the parts of the data structure that don't change.

If this kind of thing interests you, you should check out Chris Okasaki's book "Purely Functional Data Structures".

eximius · on Oct 10, 2021

"avoid making copies" dors not always equal "performance". Depending on your access patterns, having the data colocated can be more important.

But immutability sure is nice when you can have it.

tusharsadhwani · on Oct 10, 2021

whether mutating data is better than creating a new copy for everything is a really long debate about immutability and functional programming, with good points on either sides, but that's really beyond the point here.

In my opinion, you should use whichever method makes your code easy to read and understand for your usecase.

tyingq · on Oct 10, 2021

You can control the behavior manually, like:

  first = [0,1,2]
  second = [*a,3] # first is unchanged, second = [0,1,2,3]

Or second=itertools.chain(first, [3]), which avoids the copy.

Though, to me, it's asking for trouble later.

Alex3917 · on Oct 10, 2021

> things like `.append` changing the object, returning `None` instead of creating a copy and returning that

The obvious question is why it can't return a reference to the list instead of returning None. I feel like if I've been using the language on an almost daily basis for ten years now and I still get burned by that all the time, then it's just a poorly designed feature.

canjobear · on Oct 10, 2021

The advantage of mutating operations always returning None is that you can easily tell whether a mutation is happening by looking at the code. If you see y = f(x) that means x is unchanged, whereas if you see just f(x) on a line that means something stateful is happening.

brundolf · on Oct 10, 2021

Agreed. JavaScript's Array.sort is an example of this. Most of JavaScript's other array methods return a new array and people get used to chaining them, but sort mutates the array and also returns a reference to it. You can actually get pretty far before being bitten by this so long as you're sorting already-copied arrays. But then one day you hit a bizarre bug caused by behavior that's been sneaking past your radar the whole time.

macintux · on Oct 10, 2021

There are counter-examples: functions that return values while also having side effects, cases where the developer simply made a mistake.

Coming from a language with baked-in immutability, Python’s behavior in this regard was very difficult to get used to.

xavdid · on Oct 10, 2021

I really like ruby's method naming convention for this:

* `sort(arr)` returns a sorted copy of the input * `sort!(arr)` returns the sorted original

methods that return booleans end in `?`, like `arr.sorted?`

It's just a convention, but it's a nice way to let the writer know what will happen.

dmurray · on Oct 10, 2021

I'll just point out that this is originally from Scheme (I think... Maybe Scheme got it from a previous Lisp) but borrowed by Ruby. Neither Scheme nor Ruby do a perfect job with sticking to the naming convention, at least if we include popular libraries, but it is very handy and intuitive.

https://stackoverflow.com/a/612588

pansa2 · on Oct 10, 2021

> `sort(arr)` returns a sorted copy of the input

> `sort!(arr)` returns the sorted original

Python has a naming convention as well: `sorted(arr)` returns a sorted copy and `arr.sort()` works in-place (and returns None). However, I've always thought it's a bit odd that one is a function and the other is a method.

canjobear · on Oct 10, 2021

The ! convention is ok but I don't think it's optimal because, in the presence of higher-order functions and related concepts, it's often not clear if a function should be marked as !.

For example if I have a map function that applies a function f to a sequence, should I call it map! because I might pass in a function f that mutates the input? If so then it seems like any function that takes a function as input, or any function that might call a method on an object, should get marked with ! just in case. But if I don't mark it that way then the ! marking is not as informative: I might end up with a line consisting only of non-! functions which still mutates the input.

dragonwriter · on Oct 12, 2021

Note that “!” in Ruby doesn’t conventionally mean “modifies the receiver”, it means “does something that is similar but less likely to be safe than a method with the same name without ‘!’”.

A very common case of this is mutating versions of non-mutating methods, but (1) mutating methods (in stdlib or other idiomatic code bases) that have no non-mutating equivalent are not named with a “!”, and (2) methods are sometimes named with “!” because they do dangerous things compared to a base method that are not mutating the receiver.

zarzavat · on Oct 10, 2021

map! would mean a function that performs a map in-place on the array by replacing the values. So it would depend on if the callback was encouraged to mutate the array or discouraged from doing so.

djur · on Oct 11, 2021

Array#map! actually exists in Ruby, too. It mutates the array item by item. Enumerable#each doesn't have a bang, because it doesn't change the enumerable itself, even though it can mutate the objects contained in the enumerable. This is overall consistent -- there's a distinction between mutating the receiving object and mutating objects contained in or referred to by the receiving object.

aftbit · on Oct 10, 2021

random.shuffle() has bitten me that way a few times too:

    array = random.shuffle(array)

because I expected it to return a copy or reference, instead making my array None.

It would also enable chaining operations:

    array = array.append(A).append(B).sort()

In-place vs immutable copy is a language design choice with tradeoffs on both sides, but there's no reason that I can see to not return a reference to the list.

Perhaps recognizing this is really the job of an external linter. Sometimes I wonder if the future of enforcing canonical formatting on save like "gofmt" or "black" will extend to auto-correcting certain goofy errors on each save.

mypy would yell at you about this, but afaik type-checked python still isn't the norm.

int_19h · on Oct 10, 2021

In Python, a function that makes and returns a copy would be idiomatically named shuffled(). Consider sorting:

   xs.sort()        # in-place
   ys = sorted(xs)  # copy

As for functions returning the object - I think it's a hack around the absence of direct support for such repetition in the language itself. E.g. in Object Pascal, you'd write:

   with array do
   begin
      append(A);
      append(B);
      sort;
   end;

Or better yet, in Smalltalk:

   array append(A); append(B); sort.

https://en.wikipedia.org/wiki/Method_cascading

igouy · on Oct 11, 2021

Also

https://dart.dev/guides/language/language-tour#cascade-notat...

brundolf · on Oct 10, 2021

> Used to see many flaws in the language (things like `.append` changing the object, returning `None` instead of creating a copy and returning that)

I think it's pretty off-base to call this a "flaw". Immutable structures have their place and can be very helpful where appropriate, but making its core primitives work this way is far outside the scope or the philosophy of Python. If you want otherwise, you're really wanting an entirely different language. And there's nothing wrong with that! But I think it would be a "flaw" for Python to make these operations immutable, even though I love immutability personally.

listenallyall · on Oct 11, 2021

And also like a Swiss army knife, it's not particularly great at anything, can be awkward even when functional, and there's always a better tool for any specific job.