Seems like they know about ParadeDB, but for some reason don't publish banchmarks:
"""Another solution is ParadeDB, which pushes full-text search queries down to Tantivy for results. It supports BM25 scoring and complex query patterns like negative terms, aiming to be a complete replacement for ElasticSearch. However, it uses its own unique syntax for filtering and querying and delegates filtering operations to Tantivy instead of relying on Postgres directly. Its implementation requires several hooks into Postgres' query planning and storage, potentially leading to compatibility issues."""
So it is more apples to red than equal comparison.
Hi folks, ParadeDB author here. We had benchmarks, but they were super outdated. We just made new ones, and will soon make a biiiig announcement with big new benchmarks. You can see some existing benchmarks vs Lucene here: https://www.paradedb.com/blog/case_study_alibaba
This comparison isn't super fair -- ParadeDB does not have compatibility issues with Postgres and rather is directly integrated into Postgres block storage, query planner, and query executor
But think of the status symbolism! To know that someone has had to spend so much $$$ just to send you a message! Next they should do it for phone calls too!
If I remember correctly, Dropbox was one of the first big tech companies to publish its career framework openly, a lot of organizations (especially smaller ones) based their careers on the Dropbox’s one
It would be reductive to look at it as a multiplier, but really? You don't see synchronizing terabytes of data for millions of people to not be complex? When if you make mistakes you are losing people's photos of their children or their company's financial charts? With hooks in the filesystem to make this all work?
The SQLite development team doesn’t run and operate a scaled set of sqlite instances serving 100,000 enterprise customers and 200 million consumer accounts. They don’t have to run a billing and account management system at scale and provide support for a sales and customer service organization. They aren’t responsible for managing an exabyte or so of other people’s data.
The core storage/synching engine of DropBox is pretty simple, sure. Running DropBox is a lot more than just building that piece of software.
When your product is basically a wrapper around s3 (figuratively - I believe they migrated), you bet there’s a lot of politics in their engineering org
They’ve done many acquisitions to expand functionality, tons of integrations, APIs, search and they do a very good job when they integrate those acquisitions too.
It depends on workload. This tool generates recommended config for that specific machine workload. App Nodes can have completely different recommendations vs Database Nodes. It will be completely different for Workstation.
Sure, but the kernel could just do the same. Of course the kernel is already too big. Is BPF the right level to make it more modular? Just thinking, I don't think I have the answer.
Our initial approach is to do full table re-syncs periodically. Our next step is to enable incremental data syncing by supporting insert/update/delete according to the Iceberg spec. In short, it'd produce "diff" Parquet files and "stitch" them using metadata (enabling time travel queries, schema evolution, etc.)
If there was only one alert criteria, that'd be simple. Our alerts can be configured for any data filters (eg. only matching logs with column `level='error'`); we would have to create a unique MV for each alerts' filter condition.
You could have an alert ID be part of the MV primary key?
A MV is really more like a trigger, which translates an insert into table A into an insert in table B, evaluating, filtering, and grouping each batch of A inserts to determine what rows to insert into B. Those inserts can be grouped by an alert ID in order to segregate the state columns by alert. To me this sounds like exactly what you're doing using manual inserts?
That said, I while MVs are super powerful and convenient, they're a convenience more than a core function. If you have an ingest flow expressed as Go code (as opposed to, say, Vector or Kafka Connect), then you're basically just "lifting" the convenience of MVs into Go code. You don't get the benefit of MV's ability to efficiently evaluate against a batch of inserts (which gives you access to joins and dictionaries and so on), but it's functionally very similar.
I try to use most of the tools (linux) as standard as possible without customisation including shortcut keys. The problem is, once you are in remote sever/dev ops boxes, you can't have fancy tools or fancy shortcuts. It's better to train your mind to standard tools as much as possible.
Just because we don't have access to great tools when working in remote server doesn't mean we shouldn't use them locally.
I use Vim with lots of plugins on my personal projects, I use IntelliJ at work. But if I need to ssh and vi, it's ok, I know how to it efficiently.
With Fleet or VSCode you can easily use your dev environment with your tools, plugins, shortcode to work on remote codebase via SSH.
I agree, and fzf is a good example - on my local box it speeds up my reverse search, whereas when I'm on a remote server I use the same Ctrl+R I used for decades, and the final result is similar so no additional cognitive load.
That only really applies at a small scale. At some point you either stop logging into them, or do it just to run some automation. I can't remember the last time I did something non trivial in a remote terminal now. (Apart from my home server which has everything I want)
This completely depends on the system architecture of your company and your job role, scale has nothing to do with it. There are so many giant Unix shops out there with people herding them day in, day out.
Agreed. With how easy it is to copy over a standalone binary for things like rg and fd, I find it hard to justify taking the time to learn the much more clunky standard tools.
I don't need to access servers often though. I'm sure for others the situation is different.