Hacker News new | past | comments | ask | show | jobs | submit login

That's a good pattern for straight data retrieval.

Unfortunately if you need to do aggregated queries on all of the SQLLite tables, things may be challenging.

But if you could somehow connect Spark to a folder (on a distributed FS) of these SQLite files...

Edit: Also SQLite has a limitation that only one process can write to it at a given a time. For this particular use case though, it shouldn't be a problem unless you have rewrites coming from various sources (which can happen when correcting data)




I mean, depends on the aggregation you need imo. Shouldn't be too hard (tm) to rig up some distributed query pipeline. (as long as you are ok with coding per query, instead of the convenience of sql)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: