the manifesto [1] is the most interesting thing. I agree that DuckDB has the largest potential to disrupt the current order with Iceberg.
However, this mostly reads to me as thought experiment:
> what if the backend service of an Iceberg catalog was just a SQL database?
The manifesto says that maintaining a data lake catalog is easier, which I agree with in theory. s3-files-as-information-schema presents real challenges!
But, what I most want to know is what's the end-user benefit?
What does someone get with this if they're already using Apache Polaris or Lakekeeper as their Iceberg REST catalog?
it adds for users the following features to a data lake:
- multi-statement &
multi-table transactions
- SQL views
- delta queries
- encryption
- low latency: no S3 metadata &
inlining: store small inserts in-catalog
and more!
One thing to add to this:
Snapshots can be retained (though rewritten) even through compaction
As a consequence of compaction, when deleting the build up of many small add/delete files, in a format like Iceberg, you would lose the ability to time travel to those earlier states.
With DuckLake's ability to refer to parts of parquet files, we can preserve the ability to time travel, even after deleting the old parquet files
Does this trick preclude the ability to sort your data within a partition? You wouldn’t be able to rely on the row IDs being sequential anymore to be able to just refer to a prefix of them within a newly created file.
I hope he gets the help and relief he needs. In someways a brain with a penchant for software causes more trouble during mental health crises than one who doesn’t. Especially if they’re into crypto.
Software engineers are just normal people. Our brains are not "special." This (objectively incorrect) mental model of software devs is very harmful if you want to actually understand things going on around you.
People are attracted to things based on their personality traits. The average software engineer is not the same as the average person, because the average person is not attracted to software engineering.
In my observations, the average software engineer is more likely to be persnickety, caught up in small details, and obsessive than the average person. We're more likely to split hairs than most people would care to, and proper labeling and categorization is much more important to us than average person.
It's not that software engineers are somehow magical or special, it's that there is a selection bias to become a software engineer. It's extra not special, because the same thing happens for virtually any specialized field. There are famous stereotypes about the personalities of psychologists, for instance.
> In my observations, the average software engineer is more likely to be persnickety, caught up in small details, and obsessive than the average person. We're more likely to split hairs than most people would care to, and proper labeling and categorization is much more important to us than average person.
It sounds like a fancy way of saying that people on the spectrum are more likely to become software engineers than some other arbitrary person.
It had occurred to me, but I didn't want to say it that way because I'm not a psychological professional and it seems out of my area of expertise to make that assumption based on my passive observations.
I have software friends and non-software friends. There is no particular correlation. This is some weird flavor of biological essentialism going on here.
My 30-year-old daughter is still driving the Toyota version, the Matrix, also 2008, that we bought in about 2013. She loves the thing. If she didn't have it, I'm sure I would still be driving it.
I find it hilarious that it's a limited-edition M Theory model. It has a badge glued to the dash that says "1926 of 5000." For a Toyota econobox.
> how so many organizations end up investing humongous amounts of effort rolling their own databases from scratch because none of the off-the-shelf solutions meet all their requirements. But in most of these cases, it's because some of the "requirements" were actually "nice-to-haves" and they could have gotten by fine with an off-the-shelf database, but they talked themselves into building one from scratch.
I love the term "arbitrary uniqueness" for this too. Like how different are your needs, really?
conda user for 10 years and uv skeptic for 18 months.
I get it! I loved my long-lived curated conda envs.
I finally tried uv to manage an environment and it’s got me hooked. That a projects dependencies can be so declarative and separated from the venv really sings for me! No more meticulous tracking of a env.yml or requirements.txt just ‘uv add` and `uv sync` and that’s it! I just don’t think about it anymore
I'm also a long time conda user and have recently switched to pixi (https://pixi.sh/), which gives a very similar experience for conda packages (and uses uv under the hood if you want to mix dependencies from pypi). It's been great and also has a `pixi global` similar to `pipx`, etc the makes it easy to grab general tools like ripgrep, ruff etc and make them widely available, but still managed.