More

hazaskull · on Dec 8, 2024

I was under the impression that Yugabyte requires signing a CLA to contribute which leads me to avoid it for fear of them relicensing the thing when the VC's start squeezing. Also: very unique and single vendor driven. Seems like too much of a risk longer term but that is just my take.

EDIT: in response to your question I did run a PoC of it but it had issues where I wasn't able to create very large indexes without the statement timing out on me. Basic simple hand-benchmarking of complex joins on very large tables were very slow if they finished at all. I suppose systems like this and cockroach really need short, simple statements and high client-concurrency rather than large, complex queries.

sgarland · on Dec 8, 2024

> DDL timeouts

That’s normal for building indices on large tables, regardless of the RDBMS. Increase the timeout, and build them with the CONCURRENTLY option.

> Query speed

Without knowing your schema and query I can’t say with any certainty, but it shouldn’t be dramatically slower than single-node Postgres, assuming your table statistics are accurate (have you run ANALYZE <table>?), necessary indices are in place, and there aren’t some horrendously wrong parameters set.

fweimer · on Dec 8, 2024

Not sure about the CLA process, but the database is already under a restrictive, proprietary license:

    ## Free Trial
    
    Use to evaluate whether the software suits a particular
    application for less than 32 consecutive calendar days, on
    behalf of you or your company, is use for a permitted purpose.

https://github.com/yugabyte/yugabyte-db/blob/master/licenses...

It's not really clear what this means (what is a permitted purpose?), but it seems the intent is that after 32 days, you are expected to pay up. Or at least prepare for a future when the infrastructure to charge customers is in place (if it isn't there yet).

hazaskull · on Dec 8, 2024

Thanks. I think that only covers the commercial bits they run themselves though:

  "The entire database with all its features (including the enterprise ones) is licensed under the Apache License 2.0


  The binaries that contain -managed in the artifact and help run a managed service are licensed under the Polyform Free Trial License 1.0.0."

EDIT: formatting

senorsmile · on Dec 19, 2024

It also mentions: > By default, the build options generate only the Apache License 2.0 binaries.

So, it seems like the proprietary builds are for the managed services that they host themselves, which makes sense.

franckpachot · on Dec 9, 2024

Index creation should not be controlled by statement timeout, but backfill_index_client_rpc_timeout_ms which defaults to 24 hours. May have been lower in old versions

hazaskull · on June 2, 2024

While it is fun to see how to creatively solve such issues, it does raise the question of managability. When sharding data into loosely (fdw) coupled silo's it would become tricky to make consistent backups, ensure locking mechanisms work when sharded data might sometimes be directly related, handle zone/region failures gracefully, prevent hot spots, perform multi-region schema-changes reliably, etc. I suppose this pattern principally only works when the data is in fact not strongly related and the silo's are quite independent. I wouldn't call that a distributed system at all, really. This may be a matter of opinion of course.

It does give a "When all you have is a hammer..." vibe to me and begs the question: why not use a system that's designed for use-cases like this and do it reliably and securely ? i.e.: https://www.cockroachlabs.com/docs/stable/multiregion-overvi... (yes, I know full data domiciling requires something even more strict but I currently don't know of any system that can transparently span the globe and stay performant while not sharing any metadata or caching between regions)

tudorg · on June 2, 2024

> It does give a "When all you have is a hammer..." vibe to me and begs the question: why not use a system that's designed for use-cases like this and do it reliably and securely ?

(disclaimer: blog post author)

A reason would be that you want to stick to pure Postgres, for example because you want to use Postgres extensions, or prefer the liberal Postgres license.

It can also be a matter of performance, distributed transactions are necessarily slower so if almost all the time you can avoid them by connecting to a single node, which has all the data that the transaction needs, that's going to get you better performance.

hazaskull · on June 2, 2024

Hi there! Thank you for the post and your work on pgzx! Though it depends on the system (cockroachdb can place leaders on specific nodes to speed up local queries, it has global tables and otherwise there's always follower-reads) those are of course valid reasons. Admittedly if you want to keep data "pinned", you're into manual placement, rather than horizontal scaling but that's nitpicking and there's pros and cons. I do enjoy the freedom of Postgres and am hopeful that its virtually prehistoric storage-design becomes a non-issue thanks to tech such as Neon and Orioledb. The option to decouple storage would provide wonderful flexibility for solutions like yours too I think.

tudorg · on June 2, 2024

Thanks for noticing pgzx! We are working on a project building on top of it and going into the direction hinted by this blog post.

I agree that the separation of storage and compute complements this nicely. In fact, we take advantage of it in the Xata platform which uses Aurora.

hazaskull · on Aug 25, 2023

Not Postgres-based (but wire- and mostly syntax-compatible): cockroachDB using column families is much like a columnar MPP. Yugabyte is PG-based and MPP but not columnar.

refset · on Aug 25, 2023

The presence and use of column families is only half of the puzzle - it doesn't strictly imply that the execution engine is capable of working in a vectorized columnar style (which is necessary for competitive OLAP).

hazaskull · on Aug 25, 2023

Was unable to edit my previous. It does use vectorization: https://www.cockroachlabs.com/docs/stable/vectorized-executi...

refset · on Sept 2, 2023

Thanks for sharing that - TIL! These blog posts elaborate with more detail:

https://www.cockroachlabs.com/blog/vectorized-hash-joiner/

https://www.cockroachlabs.com/blog/vectorizing-the-merge-joi...

...it seems the distinction here is that the vectorization is only present in the execution layer and not the storage layer also. I would guess that from a storage perspective, even with column families in play, everything is being streamed out of sorted a LSM engine regardless. So there isn't additionally some highly-tuned buffer pool serving up batches of compressed column files etc.

hazaskull · on Aug 25, 2023

Indeed. As I commented alsewhere this is just about the general design. It is not targeting OLAP in this case (even though I do believe cockroach employs vectorization for reads)

riku_iki · on Aug 25, 2023

> cockroachDB using column families is much like a columnar MPP.

I am wondering why they are saying it is not for OLAP workload..

hazaskull · on Aug 25, 2023

They don't optimize for it and I suppose the data distribution is primarily aimed at parallel OLTP rather than OLAP. Just wanted to mention that design-wise it is similar but that's indeed not all there is to it. I'd be hesitant to store large volumes of data on a single PG instance; don't see how a single-writer, filesystem-based database is suitable at all for data that is large enough to warrant columnar storage

riku_iki · on Aug 25, 2023

so, what would be your db choice for OLAP?

esafak · on Aug 25, 2023

You can also go HTAP with TiDB which has TiKV for OLTP and TiFlash (Raft-based columnar replicas) for OLAP.

riku_iki · on Aug 25, 2023

I am more interested in actual OLAP than HTAP, and don't see strong OSS OLAP offering on the market right now, my rants in previous discussion: https://news.ycombinator.com/item?id=36992039

But I should look at TiDB, they looks like interesting and relatively mature project.

esafak · on Aug 25, 2023

https://www.starrocks.io/ is on my shortlist for OLAP

riku_iki · on Aug 25, 2023

its also on my list, but it looks like not easy to try, setup procedure is complicated.

hazaskull · on Aug 25, 2023

No expert but I'd say you'd rather be looking at bigquery, Redshift, Clickhouse, Snowflake, etc.

ddorian43 · on Aug 25, 2023

Note column families has nothing to do with columnar.

Another example is cassandra is not column oriented.

hazaskull · on Aug 25, 2023

Thank you for the correction. Indeed it is not entirely the same thing. Though I'd expect that at least the benefit of not having to read columns that aren't in the family would still help (haven't tried in earnest). I suppose compression is not an option though.

hazaskull · on March 5, 2023

...and then there is me using

  echo "75000000" | sudo tee /sys/class/powercap/intel-rapl/intel-rapl\:0/constraint_1_power_limit_uw

to cap my i7-10700 to prevent it from overpowering the system fan by peaking to 200+ watts.

hazaskull · on Feb 26, 2023

Not to say that I disagree with you in principle but I'd think there is a large difference between disagreeing with someone's opinions and disagreeing with someones values. You generally can't have a meaningful conversation in the latter case.

hazaskull · on Jan 7, 2023

Personally I find the idea of a very small cartel of companies holding all the cards quite dystopian. There are certain people that I very much would not like to be in control of all that. Guess I'm not ready to accept the supposed inevitability of this end game.

hazaskull · on Jan 7, 2023

The way I read it is simply a broadly applicable notion that a certain type of people will demonstrate their power and (perceived) higher status over others by openly breaking the others' rules without consequence. EDIT for better wording

hazaskull · on Nov 6, 2022

Ah well, we're all humans with our own blind focus on specific wants and needs. Most managers are not maniacs. Our capitalistic society just has a lot of bad incentives that are hard to change without damaging its good parts. Spreading power helps to balance things out. I know unions have a bad rep in the US but they do have good merit where government is laissez faire and no, we do not need to go heavy socialist (not meant to imply you were advocating for such). In general I'd wish we'd focus a bit more on societal stability than economic output but we need both.

hazaskull · on Nov 6, 2022

Honestly I think your comment shows part of the problem here, though I do understand your point. But if "mere" employees feel that they are personally in competition with basically the rest of the world it easily becomes a war of attrition and employees become kind of soldier-ants. I'd rather be my own person thank you very much. People can in fact be loyal and productive in 8 hours per day without being a slacker and maybe have some energy left to partake in society outside of work. Nobody likes people with a bad work-ethic but if a good work-ethic is considered to require regular overtime than maybe it's time to reconsider that concept.

subradios · on Nov 10, 2022

The unfortunate part is that your company is in competition with the rest of the world.

I made the comment about slackers because having moved from retail to military to tech, the amount of privilege shown unironically by tech employees on "crunch" is borderline unbelievable.

hazaskull · on Nov 6, 2022

52. Not been pulling all-nighters (okay maybe once or twice in exceptional circumstances that were also appreciated as such by the management after) as I've always believed your point to be right. If you're not (in part) owning the company then it should not be your life unless you don't have one and don't care. To each his own but an environment where overtime is the norm (even if driven by employees themselves) is not a healthy place to be when you have a family. Besides, when a company is so successful to need more than the normal hours available then the onus is on the leadership to hire more people; it's in their best interest to not have to rely on such devotion. ...and if crunch-time is instead caused by the company _not_ being successful and not having the money to hire people then toxicity is pretty much guaranteed in short order.