What a beautiful, understated piece of writing. This particular problem seems to come up a lot when thinking about how to use new Python classes (defaultdict, Counter) and affordances (list comprehensions, +=). It's nice to see it captured in timeline format.
The author seems to misunderstand one part of The Zen of Python:
| Simple is better than complex.
by saying:
> Our code is more complex (O(n2) instead of O(n)), less beautiful, and less readable
The Zen is not about computation complexity! It's about complexity of the source code.
The code in question is:
color_counts = dict((c, colors.count(c)) for c in set(colors))
And while I agree that this is inefficient and shouldn't be used in a library, I find this to be very readable and would always prefer that code if I know that I'm dealing only with small lists. It translates neatly to a natural-language description of the problem:
Give me a dictionary that maps each distinct color
to the number of times it occurs in the list.
So I don't think this violates the guide "Simple is better than complex". Rather, it is a good example where it makes sense to introduce additional complexity to improve the performance of an often-used helper function.
I don't think you should get into the habit of writing O(n^2) algorithms if the O(n) solution is not much more complex. Unless the constants are very dissimilar, you probably hurt your performance already for a few hundred elements. Writing reusable code includes using algorithms that are ok for a wide range of input sizes.
I have no idea how the author found this code more complex. Probably he's still more used to imperative programming, but functional programming styles and one liners will mostly greatly simply the understanding of the code.
The try block method would be considered bad in most languages, and I hope it is considered bad in Python as well. Using exception handling as part of a normal flow of control is bad style, bad taste, and bad for performance.
Essentially, in CPython, you're already paying for an exception check on either a lot of the opcodes or all of them, so you might as well use it. Statements are very rich in Python, so generally to get the best CPython performance the name of the game is to get as much out of each one as you can, and minimize the total number of them. (I have to specify CPython because PyPy can be much more complicated.)
Certainly nowadays that would be in bad style, but, back in 1.5 it wouldn't have been so bad. Also, when 1.5 was out, 200MHz was still a pretty decent machine! The little things could matter quite a lot.
It's not considered quite that bad. In many languages, using exceptions for non-exceptional flow control is bad style. Python is a bit more ambivalent about this, for instance take a look at the iterator protocol which uses an exception to signal the iterator is done.
Yep, the fundamental termination mechanism seems to be the same. Years and years ago I got into some silly irc nerdgument with a python expert (I think one of the twisted people) about the ugliness of this design and for a moment I thought I got to triumphantly yell 'Told you so!' a decade later. Alas, not the case.
I take it you prefer the explicit test for the end of iteration?
As an historic note, the StopIteration form grew out of the earlier iterator form, which called __getitem__ with successive integers until there was an IndexError. That may explain a bias towards an exception-based approach.
I probably do, although in practice, given how well (and composable) python comprehensions/iterators/generators have turned out, getting all worked up about some implementation detail now seems a bit churlish and pointless.
There was a paper at the 1997 Python conference on this topic titled "Standard Class Exceptions in Python". A copy is at https://web.archive.org/web/20030610173145/http://barry.wars... . It evaluated the performance of try/except vs. has_key and concluded:
> This indicates that the has_key() idiom is usually the best one to choose, both because it is usually faster than the exception idiom, and because its costs are less variable.
The take-home lesson is that actually raising an exception in Python 1.5 was about 10x more expensive than a function call, but the try/except block when there is no exception was not expensive.
Interesting. My information was not only out of date; it was also wrong. The dangers of cargo-culting, although in my case, more theoretical than real, as I never had performance-sensitive python code in production.
Instead of checking the type of some_string, or seeing if it has the method split. Reason is: it's more straight forward, and it instantly tells the reader this function is meant to handle strings, and it will split them. If it gets a not-string for some reason, oh well, it'll still handle it.
You would check for values in a circumstance like this:
def dict_breaker(some_dict):
if 'items' in some_dict:
return parse_some_items(some_dict['items'])
These versions may be more efficient (only have to do one hash and interaction with the hashmap, but this probably won't be visible with integers/short strings), and don't suffer from race conditions in concurrent code with the hashmap being modified between the check and the use (yes, this can occur even with GIL).
Is there a reason you chose to `pass` instead of the more explicit `return None`? The former seems like it would be less idiomatic since its return value is not explicitly stated.
> Per the Zen of Python, “there should be one– and preferably only one– obvious way to do it”. This is an aspirational message.
"aspirational" is an optimistic way to describe it. As the article illustrates, every ~2 years the "pythonic" approach is different. Innovation is good, but it hurts re-use of code and of skills.
Yes! This was one of the biggest things I missed when I moved to Ruby. I try and tell people how great autovivification is but unless they've coded with it the feature just sounds strange. But it lets you build some really great data structures on the fly!
I'm well aware of exists, and how it functions, and that it in no way solves the problem presented. Exists tests for the existence of a hash key, so you can tell if it exists but is possibly undefined, but it does not stop autovivification in any way.
Your example doesn't cause autovivification even without exists. Autovivification is the automatic creation of the underlying hashes and arrays in a multiple level data structure when they are used while accessing a nested data-structure.
For example, given an empty hash %hash, $hash{foo} does not cause autovivification, but $hash{foo}{bar} will automatically create an empty hash and assign a reference to it to $hash{foo}.
Fwiw Perl 6 only autovivifies anything, including intermediary data structure levels, when writing to a data structure. So nothing happens in this case:
my %h;
dd %h
if %h<foo><bar><baz> { say "Never happens" }
dd %h
Nice, I wasn't aware of that feature, but the features I'm still not aware of in Perl 6 could fill a small book, since I still haven't gotten around to using it for much. :/
I'm a Python ignoramus, but I think they're the same, except for Perl's being more concise. How would you do a 3-level hash though in Python ?
What you call "value to use depends on the operation" is called casting. Depending on the context, a variable can be different values. For example a variable that has been defined but nothing else is treated as 0 if you try to add a number to it. It's an empty string if you use it in a string operation.
Wikipedia explains it well: autovivification is the automatic creation of new arrays and hashes as required every time an undefined value is dereferenced. Perl autovivification allows a programmer to refer to a structured variable, and arbitrary sub-elements of that structured variable, without expressly declaring the existence of the variable and its complete structure beforehand.
> I'm a Python ignoramus, but I think they're the same, except for Perl's being more concise.
Isn't everything always more concise in Perl? No matter what language it's compared to :)
> How would you do a 3-level hash though in Python ?
You might/should be able to wrangle that with some customisation of the default_factory or a defaultdict subclass, but yeah the arbitrary depth wouldn't be as 'plug n play' as with Perl.
> What you call "value to use depends on the operation" is called casting. Depending on the context, a variable can be different values. For example a variable that has been defined but nothing else is treated as 0 if you try to add a number to it. It's an empty string if you use it in a string operation.
Yeah Python's strong typing (usually) balks at automatic casting. You'd need to choose what the default is.
Perl uses separate operators for mathematical addition and string concatenation.
1 + 1 == 2;
1 . 1 eq 11;
It uses separate operators for most mathematical and string operations, for that matter. There isn't much of a meaningful difference between strings and numbers in Perl -- it converts between then depending on the operation.
$ txr
This is the TXR Lisp interactive listener of TXR 123.
Use the :quit command or type Ctrl-D on empty line to exit.
1> [hash-update [group-by identity
'(brown red green yellow yellow
brown brown black)]
length]
#H(() (green 1) (red 1) (brown 3) (black 1) (yellow 2))
Form a hash by grouping like items into lists. The identity function is the key in the hash and the basis for equality, so the keys are colors, and the values are lists of colors.
Then update the hash values by filtering through the length function.
It's better (by default) to optimize for programmer time and succinct expression unless something else takes priority, like running time, memory use, image size, etc.
The TXR version of the iterative approach:
1> (let ((h (hash)))
(each ((c '(brown red green yellow yellow brown brown black)))
(inc [h c 0])) ;; (inc (gethash c h 0)) can be used
h)
#H(() (black 1) (yellow 2) (green 1) (red 1) (brown 3))
Or:
2> (let ((h (hash)))
(mapdo (do inc [h @1 0])
'(brown red green yellow yellow brown brown black))
h)
#H(() (black 1) (yellow 2) (green 1) (red 1) (brown 3))
I toyed with porting some loop macro implementation to TXR Lisp. I looked at some historic sources and just went "gack". Followed by, "what am I thinking".
I think in the end I would go for something like iterate, but even more Lispy. (Iterate still has "for x in y" type syntax in the clauses, just with parentheses around it; I would drop the infix "in" cruft and just have a symbol followed by nothing but semantic arguments)
I've come up with a function group-reduce which generalizes the construction of hashes from sequences, while streamlining the syntax. The idea is that the hash table entries are accumulators, and we can combine the grouping operation which distributes items into hash keys with reduction over the accumulators. For instance, take the integers 1 through 10, group them into even and odd, and sum the groups:
Now with this, we can obtain a histogram easily, because a left (or right) reduce/fold can count items:
2> [reduce-left (do inc @1)
'(brown red green yellow yellow brown brown black) 0]
8
The (do inc @1) gives us (lambda (blah . rest) (inc blah)) which can take two arguments (suitable for reduce) and just returns (+ 1 blah). We can use this reducer function with group-reduce:
3> [group-reduce (hash) identity (do inc @1)
'(brown red green yellow yellow brown brown black) 0]
#H(() (black 1) (yellow 2) (green 1) (red 1) (brown 3))
group-reduce will be in the next release of TXR (124).
I deliberately made it accept an existing hash so it can be called repeatedly on the same hash to accumulate multiple jobs. That also answers the question of where to specify the hash table attributes.
One last detail is that there is an optional argument we are omitting in the group-by call: a filter function which is optionally applied to each element in the table after the accumulation is done. For instance, we can specify this as nreverse, and then we can express group-by using group-reduce.
First naive attempt: oops cons arguments wrong way:
> <brown red green yellow yellow brown brown black>.Mix
mix(red, yellow(2), brown(3), black, green)
See http://doc.perl6.org/type-composite.html for a little more info about Mix and other composite types shipped in stock Perl 6 distributions. (Warning: this end user doc is still very incomplete and immature.)
Would be interesting doing the same exercise for Java as well. We've been through Hashtable, HashMap, basic for loops, Enumeration, Iterator, foreach loops, generics, and now we have lambdas.