Hacker News new | past | comments | ask | show | jobs | submit | nitinreddy88's comments login

Any comparison results in terms of performance vs accuracy with: https://github.com/paradedb/paradedb/tree/dev/pg_search


Seems like they know about ParadeDB, but for some reason don't publish banchmarks:

"""Another solution is ParadeDB, which pushes full-text search queries down to Tantivy for results. It supports BM25 scoring and complex query patterns like negative terms, aiming to be a complete replacement for ElasticSearch. However, it uses its own unique syntax for filtering and querying and delegates filtering operations to Tantivy instead of relying on Postgres directly. Its implementation requires several hooks into Postgres' query planning and storage, potentially leading to compatibility issues."""

So it is more apples to red than equal comparison.


Hi folks, ParadeDB author here. We had benchmarks, but they were super outdated. We just made new ones, and will soon make a biiiig announcement with big new benchmarks. You can see some existing benchmarks vs Lucene here: https://www.paradedb.com/blog/case_study_alibaba

This comparison isn't super fair -- ParadeDB does not have compatibility issues with Postgres and rather is directly integrated into Postgres block storage, query planner, and query executor


This is first thing I noticed and Uninstalled


But think of the status symbolism! To know that someone has had to spend so much $$$ just to send you a message! Next they should do it for phone calls too!


Do you really think status symbolism is what they were going for when the cheapest iCloud+ plan is $0.99 USD per month?


Pretty much same. I didn't find anything its doing substantial from kitty and copied almost all good things from kitty


Is there any connection between the Windows kitty and the mac kitty?


Nope, they just have the same (similar?) name. Kitty is for Linux too btw.


I don't know what to read or interpret from this. This is very very generic and almost same in any Big Tech companies


If I remember correctly, Dropbox was one of the first big tech companies to publish its career framework openly, a lot of organizations (especially smaller ones) based their careers on the Dropbox’s one


Dropbox has 2,693 employees. 37signals has ~80.

I can't see Dropbox having a product/sales that is ~30 times as complex. However, I can easily see an organization that has become overly political.

Dropbox should be run as a small company and not run it as "big tech".

What are all these people doing besides political games?


It would be reductive to look at it as a multiplier, but really? You don't see synchronizing terabytes of data for millions of people to not be complex? When if you make mistakes you are losing people's photos of their children or their company's financial charts? With hooks in the filesystem to make this all work?


Synchronizing is software that scales well. You need a handful of very good developers, not an evergrowing number.

Look at how few developers that SQLite has. That isn't simpler than the dropbox software.


The SQLite development team doesn’t run and operate a scaled set of sqlite instances serving 100,000 enterprise customers and 200 million consumer accounts. They don’t have to run a billing and account management system at scale and provide support for a sales and customer service organization. They aren’t responsible for managing an exabyte or so of other people’s data.

The core storage/synching engine of DropBox is pretty simple, sure. Running DropBox is a lot more than just building that piece of software.


When your product is basically a wrapper around s3 (figuratively - I believe they migrated), you bet there’s a lot of politics in their engineering org


This sums up the original HN post where several people made fun of Dropbox.

https://news.ycombinator.com/item?id=9224

They’ve done many acquisitions to expand functionality, tons of integrations, APIs, search and they do a very good job when they integrate those acquisitions too.


The first post says "this shouldn't be a product at all".

I'm not saying that. I'm saying that it should be a small focused company like FastMail or 37signals.


That’s how they started but it turns out that good syncing storage has a lot of potential integration points to create a more complete product.


you are comparing the most unique company with Dropbox. bad start


There is likely some reason why they all look the same and also happen to be some of the best performing companies there are.


There was also likely a reason why everyone used to believe Earth was center of Universe. What exactly is your point?


Why it shouldn't be? Is there any rule book which defines.


It depends on workload. This tool generates recommended config for that specific machine workload. App Nodes can have completely different recommendations vs Database Nodes. It will be completely different for Workstation.


Sure, but the kernel could just do the same. Of course the kernel is already too big. Is BPF the right level to make it more modular? Just thinking, I don't think I have the answer.


How does update or continuous inserts get written/updated to parquet files? Architecture doesn't show nor anything in docs.

1. All the benchmarks/most of the companies, show one time data exists and try querying/compressing in different formats which is far from reality

2. Do you rewrite parquet data every time new data comes? Or partitioned by something? No examples

3. How does update/delete works. Update might be niche case. But deletion/data retention/truncation is must and I don't see how you support that


Our initial approach is to do full table re-syncs periodically. Our next step is to enable incremental data syncing by supporting insert/update/delete according to the Iceberg spec. In short, it'd produce "diff" Parquet files and "stitch" them using metadata (enabling time travel queries, schema evolution, etc.)


Precisely, whats stopping them from using simple Incremental Materialised view?


If there was only one alert criteria, that'd be simple. Our alerts can be configured for any data filters (eg. only matching logs with column `level='error'`); we would have to create a unique MV for each alerts' filter condition.


You could have an alert ID be part of the MV primary key?

A MV is really more like a trigger, which translates an insert into table A into an insert in table B, evaluating, filtering, and grouping each batch of A inserts to determine what rows to insert into B. Those inserts can be grouped by an alert ID in order to segregate the state columns by alert. To me this sounds like exactly what you're doing using manual inserts?

That said, I while MVs are super powerful and convenient, they're a convenience more than a core function. If you have an ingest flow expressed as Go code (as opposed to, say, Vector or Kafka Connect), then you're basically just "lifting" the convenience of MVs into Go code. You don't get the benefit of MV's ability to efficiently evaluate against a batch of inserts (which gives you access to joins and dictionaries and so on), but it's functionally very similar.


I try to use most of the tools (linux) as standard as possible without customisation including shortcut keys. The problem is, once you are in remote sever/dev ops boxes, you can't have fancy tools or fancy shortcuts. It's better to train your mind to standard tools as much as possible.


Just because we don't have access to great tools when working in remote server doesn't mean we shouldn't use them locally. I use Vim with lots of plugins on my personal projects, I use IntelliJ at work. But if I need to ssh and vi, it's ok, I know how to it efficiently. With Fleet or VSCode you can easily use your dev environment with your tools, plugins, shortcode to work on remote codebase via SSH.


I agree, and fzf is a good example - on my local box it speeds up my reverse search, whereas when I'm on a remote server I use the same Ctrl+R I used for decades, and the final result is similar so no additional cognitive load.


That only really applies at a small scale. At some point you either stop logging into them, or do it just to run some automation. I can't remember the last time I did something non trivial in a remote terminal now. (Apart from my home server which has everything I want)


This completely depends on the system architecture of your company and your job role, scale has nothing to do with it. There are so many giant Unix shops out there with people herding them day in, day out.


or bring your tools with you. or use them remotely - for example tramp mode in emacs.

there's no need to walk barefoot your entire life just in case some day your shoes break and you have to hobble to the store.


Agreed. With how easy it is to copy over a standalone binary for things like rg and fd, I find it hard to justify taking the time to learn the much more clunky standard tools.

I don't need to access servers often though. I'm sure for others the situation is different.


Ack. AMD gpus broke left and right with this version. Lot of issues reported : https://gitlab.freedesktop.org/drm/amd/-/issues

Even the latest Linux kernel 6.11 still have same issues (atleast for me)


Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: