Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Thank you!

Things like `list.append` modifying in-place might feel like a flaw to some, but I think Python is really consistent when it comes to its behaviour. If you ask a person who comes from an object-oriented world, they'll say it only makes sense for a method on an object to modify that object's data directly.

There's always ways to do things the other way, for example you can use x = [*x, item] to append and create a new copy, while being quite a bit more explicit that a new list is being created.



One related area where Python is not consistent is operators like +=.

In pretty much all other languages that have them, the expected behavior of A+=B is exactly the same as A=A+B, except that A is only evaluated once. Now lets look at lists in Python:

   xs = [1, 2]
   ys = xs
   ys = ys + [3]
   print(xs, ys)
This prints [1, 2] [1, 2, 3], because the third line created a new list, and made ys reference that. On the other hand, this:

   xs = [1, 2]
   ys = xs
   ys += [3]
   print(xs, ys)
prints [1, 2, 3] [1, 2, 3], because += changes the list itself, and both xs and ys refer to that same list.

(Note that this is not the same as C++, because in the latter, the variables store values directly, while in Python, all variables are references to values.)

The worst part of it is that Python isn't even self-consistent here. If you only define __add__ in your custom class, you can use both + and += with its instances, with the latter behaving normally. But if you define __iadd__, as list does, then you can do whatever you want - and the idiomatic behavior is to modify the instance!

For comparison, C# lets you overload + but not +=, and automatically synthesizes the latter from the former to enforce the correct behavior.


The rule in Python is that `=` creates a name binding, where as `+=` does not. That's pretty consistent as far as I can tell.


The inconsistency is that for immutable objects (or more generally, objects that have __add__ but not __iadd__), '+=' does create a name binding.

After 'a = 300; b = 50000',

    a += b
is exactly the same as

    a = a + b
In this case, a new object is created with the value of a + b which then gets bound to the name a.


Nope, += also creates a binding! Try this:

   xs = [1, 2]

   def foo():
      xs += [3]

   foo()
You'll get an exception saying that local variable xs was used before it was assigned - precisely because += created a new local binding for xs inside foo.


>Python is really consistent when it comes to its behaviour

True, though you end up with things like:

  ' '.join(thelist)
Instead of

  thelist.join(' ')
Because of the somewhat aggressive mantra to be consistent.


It's more about flexibility than consistency.

str.join() and bytes.join() can support all iterable type arguments.

Better than trying to implement a join (or two) on all iterables.


But that's good. Because a string just needs to know aboit interable to perform that operation whereas every iterable would need to implement it's own join if you had it the other way around.


It may be necessary in python, but in general, a language could allow you to define a join method on the iterable superclasss/trait/interface that iterates over the elements, converting each to a string, and inserting the separator between each of them.

For example, scala has Iterable#mkstring (https://docs.scala-lang.org/overviews/collections-2.13/trait...)


Yet Ruby and JS manage to do it somehow. To me it seems natural that join should be a method on the iterable, and I always have to pause to remember Python is different.


How does that work? Don’t you have to effectively convert your general iterable to an array and then join on that? Array.from(iterable).join(…)?


I don't think it should be a method at all. It's just a function: join(iterable, separator). It can also be implemented with reduce naturally: `reduce(lambda x, y: x + separator + y, iterable)`.


Reduce sounds like a really slow way to do string building


Oh yeah, it's horrendous, my point was just that it's functionally equivalent and makes more sense as a function than a method on either object. You can actually call it like this if you want, though: `str.join(separator, iterable)`.


The way it's managed in JS, digging the function out of the prototype to apply it, can be done in Python as well. But unlike JS you won't normally have to, thanks to the method not being defined only on one specific type of iterable.

JS:

  Array.prototype.join.call(["one", "two", "three"], "|")
Python:

  str.join("|", ["one", "two", "three"])


> Yet Ruby and JS manage to do it somehow.

Ruby does it by having a mixin (Enumerable) that anything meeting a basic contract (roughly equivalent to the Python iterable protocol) can include to get an enormous block of functionality; Python doesn’t have (or at least idiomatically use as freely; ISTR that there is a way to do it) mixins like Ruby does.


' '.join() makes more sense to me, and it's more universal too if done right and you accept anything resembling a "sequence" (which python does) and individual objects of the sequence have a sensible str(). And as language maintainer, you only have to maintain one such implementation, not one per collection type.

Javascript, on the other hand, kinda does it worst, at least of the languages I regularly use... .join() is a instance method on Arrays and TypesArrays. But they forgot to add any kind of join for Sets, for example.

    (["a", "b", "c"]).join("")
    "abc" # alright
    (new Set(["a", "b", "c"])).join("")
    Uncaught TypeError: (intermediate value).join is not a function
    ([...new Set(["a", "b", "c"])]).join("")
    "abc" # grmpf, have to materialize it into an array first.
That illustrates the drawback: if you make it a method on the concrete sequence types you got, you better not forget some and make sure the different APIs are consistent, too. If Javascript had a String.join(sep, <anything implementing the iterator protocol>) this wouldn't have been an issue.

python isn't alone either, by the way. C# has the static string.Join(...) that accepts "enumerables" (IEnumerable<T>), but no array.Join() or list.Join() or dictionary.Join(). Combined with Linq, especially .Select, that becomes quite handy. It has been plenty of times I did print-debugging by adding a one liner along the lines of

    Console.WriteLine(string.Join("\n", dictionary.Select((key, value) => $"{key} = {value.SomeProperty}")));
I find the C# way of having a string.Join(sep, ...) instead of python's "some string".join(...) nicer to read because it's more obvious.


> I find the C# way of having a string.Join(sep, ...) instead of python's "some string".join(...) nicer to read because it's more obvious.

`str.join(sep, ...)` works in Python as well, because `a.f(...)` and `type(a).f(a, ...)` are (almost) equivalent.


In the cases you give, the original list is not being mutated; a new object (a string, not a list) is being created. So it does make sense not to have it be a method call on the list.


Huh I never even thought we would need to create copy of an object when adding new item to it (like a new item to list for example). Is there any drawback on doing that in standard pythonic way? I actually learned to program using Python and it was my first language. Since then I only used JS. In both I like using functions a lot and rarely dabble in OOP since it is more conveniet to me.


You often lose performance in traditional imperative languages when aiming for persistence.

When you have immutability guarantees (like in many functional programming languages like ML or Haskell) you can avoid making copies by sharing the parts of the data structure that don't change.

If this kind of thing interests you, you should check out Chris Okasaki's book "Purely Functional Data Structures".


"avoid making copies" dors not always equal "performance". Depending on your access patterns, having the data colocated can be more important.

But immutability sure is nice when you can have it.


whether mutating data is better than creating a new copy for everything is a really long debate about immutability and functional programming, with good points on either sides, but that's really beyond the point here.

In my opinion, you should use whichever method makes your code easy to read and understand for your usecase.


You can control the behavior manually, like:

  first = [0,1,2]
  second = [*a,3] # first is unchanged, second = [0,1,2,3]
Or second=itertools.chain(first, [3]), which avoids the copy.

Though, to me, it's asking for trouble later.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: