Parsing 63k items in a 10 MB json string is pretty much a breeze on any modern system, including raspberry pi. I wouldn't even consider json as an anti-pattern with storing that much data if it's going over the wire (compressed with gzip).
Down a little in the article and you'll see one of the real issues:
> But before it’s stored? It checks the entire array, one by one, comparing the hash of the item to see if it’s in the list or not. With ~63k entries that’s (n^2+n)/2 = (63000^2+63000)/2 = 1984531500 checks if my math is right. Most of them useless.
The JSON patch took out more of the elapsed time. Granted, it was a terrible parser. But I still think JSON is a poor choice here. 63k x X checks for colons, balanced quotes/braces and so on just isn't needed.
Time with only duplication check patch: 4m 30s
Time with only JSON parser patch: 2m 50s
It’s an irrelevant one. The json parser from the python stdlib parses a 10Mb json patterned after the sample in a few dozen ms. And it’s hardly a fast parser.
I did a very very ugly quick hack in python. Took the example JSON, made the one list entry a string (lazy hack), repeated it 56,000 times. That resulted in a JSON doc that weighed in at 10M. My initial guess at 60,000 times was a pure fluke!
Dumped it in to a very simple sqlite db:
$ du -hs gta.db
5.2M gta.db
Even 10MB is peanuts for most of their target audience. Stick it in an sqlite db punted across and they'd cut out all of the parsing time too.
Ahh. Modern software rocks.