Hacker News new | past | comments | ask | show | jobs | submit login
Time Complexity of Python Data Structures (python.org)
82 points by iamelgringo on Jan 22, 2011 | hide | past | favorite | 16 comments



This article is a great example of what's so great about the Python ecosystem. I don't think I've ever seen something like this for the Ruby data structures (except for in the Hamster data structure project [1], which has solid performance data).

[1] https://github.com/harukizaemon/hamster


You should program in C++ then. Every single data structure in the standard library, boost, and 3rd party libraries dissected, analyzed, benchmarked, extended, rewritten, and optimized :)

But there's a reason you don't. I chose C, others chose Ruby, and you chose Python :)


I chose Ruby then Python [1]. I use them both for the same kinds of things (especially functional programming [2]), but I'm surprised almost daily at how these similar languages drive such different ecosystems.

[1] http://news.ycombinator.com/item?id=1947723

[2] http://news.ycombinator.com/item?id=2130725


Aren't the computation complexity of STL data structures defined by the spec?


STL data structures are explicitly defined as to their internals, which in turn implies their performance. For example, when you're using a STL map, you know it's internally a tree and has the upsides, downsides, and performance characteristics of that structure. (for a hashmap you'd be using unordered_map)

Or at least that's how I picture it.


Actually, it's the exact opposite. The STL specification specifies the complexity but not the underlying data structure.

For instance, if you'll refer to section 23.1.2 of the C++ 2003 standard ("Associative Containers"), you'll find that nowhere does it mention "tree," though it does explicitly lay out the complexity requirements for associative containers such as map and multimap, that at the end of the day, can really only be had with one type of tree or the other. Table 69 goes through all the functions and operations and the complexities allotted for each. Some are even explicitly specified as "compile time" operations. No C++ implementation is allowed to deviate from this, even to provide better performance.


I think of it in these terms: If you're implementing a key for a std::map, you have to implement operator<, which in turn implies that std::map is tree-based. Similar for unordered_map, which would want operator== and a hash function.

Either of these interfaces are effectively describing what the internals are going to look like, even if the official standard decides to be coy about using the terms "tree" or "hash map" when describing them.

In the case of Python, the fact that {}.keys() has no guaranteed order (for example) already implies that dictionaries are hash based and would have similar characteristics to any other hash map.


> I think of it in these terms: If you're implementing a key for a std::map, you have to implement operator<, which in turn implies that std::map is tree-based.

Not necessarily. For example a skip list can satisfy the same requirements but it is not a tree


True, but you still end up with similar performance to a tree.


The great majority of them should be intuitive to any competent programmer, imo, at least within a ballpark. Some of it is implementation detail (list append isn't required to be constant, and could be O(N) in a naive implementation, for example) but generally, you should have an intuitive feel for this stuff across languages.

Sometimes you can almost guess how a language is implemented just from profiling data.


I went specifically looking for this information a week ago when I was learning Python. It wasn't clear to me how a list was implemented, and in particular if random access and deletes were constant or linear time


Which is exactly why it should never have been called "list". If it were "vector" or "array" you'd never have even wondered.


Having not thought deeply about big-O runtimes since Freshman year, are set operations usually harder to evaluate? Half of them don't have "wort-case" times, and Symmetric Difference just has a "?" for the average case.


It is interesting that get-slice on list does not use copy-on-write, so it is in fact Theta(k) even if you don't modify the slice.


Copy on write can be very dangerous if not used properly. Besides concurrency issues (not specifically for python, because of the GIL), in the case of slice taking a small slice from a big list would keep a reference to the list even if it is not needed anymore. In pathological cases this can be as bad as a memory leak.


Thanks for this. It's a bloody goldmine.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: