Users of JSON, probably the most common data interchange format on the planet, frequently have implicit requirements about key ordering. It is highly convenient to be able to parse a JSON string into a native Python data structure, add a field, emit it back, and preserve the ordering.
From what I recall of the JSON standard itself, there's no guarantee about key ordering being significant. If you're diffing serialized output to compare two JSON objects you need to be serializing it in a consistent format, otherwise even whitespace is going to throw you off.
If a human is inspecting serialized JSON using pen and paper, the human is presumably clever enough to match up key for key regardless of ordering.
If the human is using a computer to compare two JSON payloads (as the use of a diffing algorithm suggests), the human and computer should be clever enough as a team to realize that they could just deserialize and reserialize each JSON payload such that the keys were lexicographically sorted and the data was pretty-printed in the exact same way before running it through the diffing algorithm. `jq -Sc` would do the trick.
Most diff programs don't have what you describe. And in a lot of cases you don't have the easy ability to "do stuff" before running the input through a diffing algorithm.
> Most diff programs don't have what you describe.
That’s why pipes were invented.
> And in a lot of cases you don't have the easy ability to "do stuff" before running the input through a diffing algorithm.
Well, in that case you’re not going to be able to meaningfully compare two JSON payloads because neither order nor white space have any semantic meaning in JSON. I’m really curious what you’re talking about though, since if you’re working on the shell you can easily use jq to do as I describe.
I tracked down a bug once where someone was hashing state that happened to include Python dicts serialized to JSON, which caused different hash values for the same entity. I know, I know, insert tears of blood emoji, and any engineer worth his salt wouldn't be so sloppy. But you don't always get to choose to work only with top talent that understands why you should design things elegantly as a baseline[1]. In these cases, removing implicit gotchas like "the same code has the same behavior" (ie iteration over the same dict) is valuable.
[1] Well you _can_ choose to do so, and I recently have, but it's not without its costs.
I can't speak for all "other" languages but Ruby made the change years ago from 1.8 to 1.9. Ruby calls them hashes but they're standard maps. Obviously Ruby and Python are very similar but I think the point stands.
Convert a deck of cards ([]Card?) to a map (map[Card]bool?) and back just to shuffle? That's unlikely to be faster or more idiomatic than a straightforward implementation of the Fisher-Yates shuffle[1]. Try writing the code to do it both ways and compare.
I think "idiomatic" would be using rand.Perm, which implements the Fisher%E2%80%93Yates shuffle. But aside from whether it's idiomatic, converting to a map isn't random enough.
With Golang 1.11, I get orderings like this. S is spades, H is hearts, D is diamonds, and C is clubs, because HN eats the Unicode characters I was actually using.
S9 SJ SK HQ D9 S4 S5 S10 HJ S8 H6 DA D2
D4 DQ C6 C8 CJ H3 H9 DK C3 CQ SA S2 S3
S6 SQ H4 H10 D8 D10 C4 CK S7 H7 D5 D6 D7
C7 H2 HK D3 DJ C2 C5 C9 HA H5 H8 CA C10
On average you would expect about one card in a shuffled deck to be
(cyclically) followed in the deck by its succeeding card, and on
about one out of 50 shuffles, that card would be followed by its
successor. Here we have S4 S5, DA D2, SA S2 S3, and D5 D6 D7.
In particular it seems like there ought to be a less verbose way to express the equivalent of the Python list({card: True for card in deck}) in Golang.
IIRC it used to just start iteration at a pseudorandom index and then iterate normally. I looked at it couple years ago, don't know if they changed it.
It seems to do that, but I think it also tweaks the hash function for each newly created map, because in the code linked from https://news.ycombinator.com/item?id=22278753 I'm not getting rotations of the same iteration order when I generate two maps. There doesn't seem to be any randomness in mapiternext() itself.
You could imagine that the hash function itself might do an adequate job of randomizing the order of the cards, though, especially if salted with a per-map salt. SipHash, for example, would probably not have any detectable biases in the distribution of the permutations thus produced. But whatever hash function Golang is using for my structs has an easily visible bias, as described in that comment.
What does it say? That Python is willing to spend more computation time for simplicity, without waiting for the programmer to ask for it? That Python will make semantic changes in minor version bumps of the language?
The ordering property is a side effect of the new and more cpu/memory efficient data structure. It would be very surprising to say no to a 2x performance jump to avoid the (sometimes useful) ordering.
Java added LinkedHashMap a while ago. I tend to default to it except for small temporary use cases where order absolutely won't matter or have an outside impact...
LinkedHashMap is not Java's "standard maps" though. If you tend to use it by default it's really a personal idiosyncrasy, especially as LLHM tend to be larger and slower than regular hashmaps due to having to maintain a doubly linked list.
Python has had one such in the standard library for a decade or so.
LHM is not slower for iteration (it's faster actually).
LHM indeed pays 2 references per node but they are well worth as it has deterministic ordering/iteration and I have witnessed numerous cases with HashMap that show up in production only due to iteration order (esp after rehashing). The code is broken but the cases did not reproduce during testing...
Now if the two extra references are an issue, consider that HashMap is quite space inefficient with having a dedicated node per each entry - that's the main price. The (simple) node memory footprint is 36bytes on heaps less than 32GB, i.e. compact pointers. The extra references add another 8 bytes for having an insert or access order.
If the goal is getting a compact low memory footprint, HashMap is not fit for purpose. Overall it's the jack of all trades and even got reworked (java8) to support tri-based collision resolution for keys that implement Comparable.
Couple years back I wrote CompactHashMap (under CC0) that has an average cost of 10bytes per entry and it's extremely compact for small sizes with having only 2 fields on its right own, so even small/empty maps are tiny. In microbenchmarks (same used in openjdk) it handily (2x) beats java.util.HashMap on "Traverse key or value", get/put have similar performance, and "Search Absent" is worse.
The point is: LHM should be the go-to hashmap for java as being node based hashmap is justified (unlike java.util.HashMap)
Yep. Also, LinkedHashMap maintains ordering based on insertion order. So, iff you insert your data ordered how you want it, it's behaving as an ordered map. If you want true, self-reordering map, what you want is a TreeMap. Beauty with that one is that you have full control over defining the custom ordering function, because we're not always indexing a map by primitives or autoboxed type. I try to use it sparingly though as you're paying log(n) on pretty much all map operations.
While we're here, another tidbit that's often overlooked in the Java collections:
If you reallly care about iteration performance, your data is without nulls, your data is already ordered how you like it or you don't care about ordering, your qty items >= 10, and you don't need random access, then ArrayDeque is gonna be your horse because of how much better it co-locates its contents in memory and how much less overhead is required to maintain it during each operation compared to all the other List implementations, including ArrayList and LinkedList.
Interestingly enough here it was kinda but kinda not the other way around: historically PHP used a closed-addressing hash map and threaded a doubly linked list through it to maintain the insertion order.
But the dict TFA talks about doesn't use a doubly linked list, or closed addressing, its ordering is a side-effect of its implementation but not originally a core goal (memory saving and iteration speed were). It'd probably been proposed by others before but it came to wider attention after Raymond Hettinger (a core dev) proposed it a few years back[0]. PHP actually released it first[1], closely followed by pypy[2]. CPython only got around to it some time later[3]
Well, you can convert any sorted dictionary to an insertion order dictionary by copying it over, but that does the make the original example pretty clunky.
I guess a better use case is if you needed some sort of priority/FIFO logic along with indexing. Hard to think of something that doesn’t feel contrived, but I guess imagine having a queue where determining membership or updating fields of an element based on ID can be done in O(1) for any element?
As mentioned in The Friendly Orange Glow, it was started by people who worked on the PLATO system at the University of Illinois which added its Notes discussion system in 1973. PLATO ran until 2015.
Notice the "if". My point was that his working for Google shouldn't matter if those conditions are met. Whether or not they were met is another matter.
No, actually.
I've been quite clear on my position on this many times on HN (it's just another fun form of bias people try to use to not have to engage in real discussion on the merits, and convince themselves their position is right), and i'm being totally consistent with that.
Do you have anything substantive to add to the conversation?
Do you want to disagree with anything I said?
In practice, my comment comes more from being in DC for 7 years, not for working for any company.
What i said applies across just about every media outlet these days.
Every unnamed white house advisor (sometimes even "senior advisor", because why not!) who isn't an "anonymous source" is usually unnamed because otherwise you'd be able to point out their job title matches what they were doing and is unrelated to the white house
(or their attachment to the white house is "tenuous at best").
It's true the white house sometimes has people with the job title "advisor" in it, but you will notice when articles mention real names, if you were to go looking, you'd see that was usually their real title (or they failed at fact checking).
That's not just the white house either. Random low level people at companies become "senior executives", etc.
This even spreads, unfortunately, using those things as sources, and later often becomes "truth".
Here, i'll take a totally non-political thing as an example for you:
https://en.wikipedia.org/wiki/Dan_Fredinburg (RIP) was a great person and all around good guy (IME), but he was 100% not a google executive. This is an objective fact, by any definition of executive (including Google's). You can see the sources cited for these claims are news articles.
All of which, in order to play up their narrative on a tragedy, decided to give him that title (and various others!)[1]
Thinking that anything else here is happening, in this, or really any other situation, is just naive (or gross application of variants of gell-mann amnesia).
In the situation in the original article, do you seriously believe that if the current administration had the chance to nail a real former obama whitehouse advisor doing something illegal or untoward, they wouldn't jump at the chance do it?
It's not like they don't have white house visitor logs and plenty of people around the DOJ/FTC/etc who can tell them what happened!
[1] As an aside, i find it pretty sad that it appears people find "google executive" to be a more impressive/etc way to memorialize someone than "good person". If i pass away in a news-worthy event, can y'all make sure it says "good father" or something.
Very fascinating, thanks. Do you have any reading resources I could peruse about these kinds of topics? I read The Power Broker and thought it was an amazing insight into the nuances of politics and media.