Time Complexity of Python Data Structures

djacobs · on Jan 22, 2011

This article is a great example of what's so great about the Python ecosystem. I don't think I've ever seen something like this for the Ruby data structures (except for in the Hamster data structure project [1], which has solid performance data).

[1] https://github.com/harukizaemon/hamster

ComputerGuru · on Jan 22, 2011

You should program in C++ then. Every single data structure in the standard library, boost, and 3rd party libraries dissected, analyzed, benchmarked, extended, rewritten, and optimized :)

But there's a reason you don't. I chose C, others chose Ruby, and you chose Python :)

djacobs · on Jan 22, 2011

I chose Ruby then Python [1]. I use them both for the same kinds of things (especially functional programming [2]), but I'm surprised almost daily at how these similar languages drive such different ecosystems.

[1] http://news.ycombinator.com/item?id=1947723

[2] http://news.ycombinator.com/item?id=2130725

kingkilr · on Jan 22, 2011

Aren't the computation complexity of STL data structures defined by the spec?

beoba · on Jan 22, 2011

STL data structures are explicitly defined as to their internals, which in turn implies their performance. For example, when you're using a STL map, you know it's internally a tree and has the upsides, downsides, and performance characteristics of that structure. (for a hashmap you'd be using unordered_map)

Or at least that's how I picture it.

ComputerGuru · on Jan 22, 2011

Actually, it's the exact opposite. The STL specification specifies the complexity but not the underlying data structure.

For instance, if you'll refer to section 23.1.2 of the C++ 2003 standard ("Associative Containers"), you'll find that nowhere does it mention "tree," though it does explicitly lay out the complexity requirements for associative containers such as map and multimap, that at the end of the day, can really only be had with one type of tree or the other. Table 69 goes through all the functions and operations and the complexities allotted for each. Some are even explicitly specified as "compile time" operations. No C++ implementation is allowed to deviate from this, even to provide better performance.

beoba · on Jan 22, 2011

I think of it in these terms: If you're implementing a key for a std::map, you have to implement operator<, which in turn implies that std::map is tree-based. Similar for unordered_map, which would want operator== and a hash function.

Either of these interfaces are effectively describing what the internals are going to look like, even if the official standard decides to be coy about using the terms "tree" or "hash map" when describing them.

In the case of Python, the fact that {}.keys() has no guaranteed order (for example) already implies that dictionaries are hash based and would have similar characteristics to any other hash map.

ot · on Jan 23, 2011

> I think of it in these terms: If you're implementing a key for a std::map, you have to implement operator<, which in turn implies that std::map is tree-based.

Not necessarily. For example a skip list can satisfy the same requirements but it is not a tree

beoba · on Jan 23, 2011

True, but you still end up with similar performance to a tree.

mahmud · on Jan 22, 2011

The great majority of them should be intuitive to any competent programmer, imo, at least within a ballpark. Some of it is implementation detail (list append isn't required to be constant, and could be O(N) in a naive implementation, for example) but generally, you should have an intuitive feel for this stuff across languages.

Sometimes you can almost guess how a language is implemented just from profiling data.

ohmygodel · on Jan 22, 2011

I went specifically looking for this information a week ago when I was learning Python. It wasn't clear to me how a list was implemented, and in particular if random access and deletes were constant or linear time

jemfinch · on Jan 23, 2011

Which is exactly why it should never have been called "list". If it were "vector" or "array" you'd never have even wondered.

rflrob · on Jan 22, 2011

Having not thought deeply about big-O runtimes since Freshman year, are set operations usually harder to evaluate? Half of them don't have "wort-case" times, and Symmetric Difference just has a "?" for the average case.

praptak · on Jan 22, 2011

It is interesting that get-slice on list does not use copy-on-write, so it is in fact Theta(k) even if you don't modify the slice.

ot · on Jan 22, 2011

Copy on write can be very dangerous if not used properly. Besides concurrency issues (not specifically for python, because of the GIL), in the case of slice taking a small slice from a big list would keep a reference to the list even if it is not needed anymore. In pathological cases this can be as bad as a memory leak.

kmfrk · on Jan 22, 2011

Thanks for this. It's a bloody goldmine.