1 - I think if we were sticking with the JVM, I do wonder if Lucene would be the right choice in that case
2 - It's a great tool with a lot of tuneability and support!
3 - We've been using it for K8s logs and OTEL (with Jaeger). Seems good so far, though I do wonder how the future of this will play out with the $DDOG acquisition.
These are great projects, we use DuckDB to inspect our data lake and for quick munging.
We will have some more blog posts in the future describing different parts of the system in more detail. We were worried too much density in a single post would make it hard to read.
It's a bit difficult at the moment, given we have a lot of proprietary data at the moment and a lot of the logic follows it. I'm hoping we can get it to a state where it can be indexed and serving OSM data but that is going to take some time.
That being said, we are currently working on getting our Google S2 Rust bindings open-sourced. This is a geo-hashing library that makes it very easy to write a reverse geocoder, even from a point-in-polygon or polygon-intersection perspective.
Author here! We were really motivated to turn a "distributed system" problem into a "monolithic system" from an operations perspective and felt this was achievable with current hardware, which is why we went with in-process, embedded storage systems like RocksDB and Tantivy.
Memory-mapping lets us get pretty far, even with global coverage. We are always able to add more RAM, especially since we're running in the cloud.
Backfills and data updates are also trivial and can be performed in an "immutable" way without having to reason about what's currently in ES/Mongo, we just re-index everything with the same binary in a separate node and ship the final assets to S3.
Why not just use a open source solution like paradedb ... .
Paradedb = postgres pg_search plugin (the base is tantivy). Need anything else like vectors or whatever, get the plugins for postgres.
The only thing your missing is a LSM solution like RocksDB. See Orioledb what is supposed to become a plugin storage engine for postgres but not yet out of beta.
Companies like Netflix with bigger market caps are still on AWS.
I can imagine the productivity of spinning up elastic cloud resources vs fixed data center resourcing being more important, especially considering how frequently a company like Figma ships new features.
A bit of an aside, I'm a big fan of how Mike Perham managed to start a business around Sidekiq. I think it really gives us Hacker News folks that there's hope starting a viable bootstrapped business around open-source infrastructure.
The idea is going into the right direction but i would also love to see a skyrocketing price for a second car per household.
So if i need a car and my wife as well, i would encourage to make the second car around four times as expensive. Also I don't like the weight and size that much cause a functional vehicle like a minivan or pickups or vans used for work reasons shouldn't be punished.
A combination out of HP, weight and a functionality factor (roomy family friendly cars and pure or mainly used working vehicles vs big luxury SUVs or small overpowered sports cars) would be a adequate calculation.
Also motorbikes would need an adjustment in that level. Nobody can tell me they need a 120hs motorbike for a proper commute. They are just noise pollution.
One area which an external monitor (or high resolution) helps a lot with is UI or frontend programming. It's a huge pain to play with the chrome or Firefox debugger and have to jump between windows to get feedback.
When it comes to backend or non-UI work, a laptop is just fine.
2 - It's a great tool with a lot of tuneability and support!
3 - We've been using it for K8s logs and OTEL (with Jaeger). Seems good so far, though I do wonder how the future of this will play out with the $DDOG acquisition.