More

mcdonje · 2026-05-15T18:32:24 1778869944

A souffle has not been made

tomaytotomato · 2026-05-15T18:42:08 1778870528

Indeed, more of a frambled egg. Lets see what happens in two years time.

mcdonje · 2026-05-15T18:06:39 1778868399

Prescient

mcdonje · 2026-05-08T12:32:33 1778243553

Yeah, they often call this to mind: "Better ingredients. Better pizza. Papa John's."

mcdonje · 2026-05-08T11:19:16 1778239156

EPUB is the ebook standard, outside of Amazon-land, so it has staying power in its space. I think it would be good for the ecosystem if it broke containment and got tooling in enough places to challenge PDF.

mcdonje · 2026-05-07T14:54:41 1778165681

Man, Access could've been so good if they just made an app around SQLite. Or since it's Microsoft and they need to do everything their own way, it would've been so good if they made a flat file DB à la SQLite, but with T-SQL (or a subset thereof) instead of JET-SQL.

Increase interoperability. Funnel data people from Excel into real DB technologies.

And if they did more to blur the lines between spreadsheets and databases, and make it seamless to work out of both Excel and Access, add more spreadsheet features to the data views, etc.

mcdonje · 2026-05-07T14:33:46 1778164426

PII sniffers are pretty good at dealing with excel files. Excel is seen more as an analyst tool than a dev tool. Any place that bans Excel needs to either let analysts use some other turing complete data tools, like python or R or something, or they'll have trouble attracting analyst talent. They'll have devs and data entry users and that's it.

The only way that works is if the dev team is large enough to be responsive to business needs, which almost never happens because devs are expensive. The juniors who are tweaking business logic every day are functionally doing a role analysts can do if you just give them a sane API and data tools.

mcdonje · 2026-05-04T16:42:27 1777912947

Why are you making my screen look dirty? lol

TechSquidTV · 2026-05-04T20:18:18 1777925898

lol I kind of thought it made it look like a hot sauce label. but maybe not.

mcdonje · 2026-04-23T09:13:12 1776935592

We're not going to have a rehash of the McDonald's coffee settlement argument here, are we?

https://en.wikipedia.org/wiki/Punitive_damages

Schiendelman · 2026-04-23T11:46:00 1776944760

She deserved way more than that for the way they tried to smear her afterward!

ziml77 · 2026-04-23T13:17:37 1776950257

Seriously, the reporting on that was so terribly biased that many people still think it was a frivolous lawsuit.

dvlsg · 2026-04-23T15:46:24 1776959184

It's honestly kind of chilling just how effective smear campaigns can be.

I don't think there's any reasonable person who could read the full medical description of the injuries sustained and think "yeah 2.7 mill was too much".

watwut · 2026-04-23T13:37:42 1776951462

The result was wrong. And yes, I read the contra arguments put on and they were not convincing.

Schiendelman · 2026-04-23T19:47:48 1776973668

Your entire opinion is based on an expensive propaganda campaign. Is that who you want to be?

some_random · 2026-04-23T15:17:34 1776957454

Stella Liebeck was awarded 2.7 million in punitive damages, that seems like a much more reasonable number than 1.4 billion.

wredcoll · 2026-04-23T15:41:40 1776958900

It was considerably less on appeal and the mcdonalds lawyers didn't anatagonize the court every chance they got and it was literally 30 years ago and there was only one victim.

Just with inflation (6.4m) and number of victims (22?) you get a much larger number real quick.

some_random · 2026-04-23T15:51:28 1776959488

6.4m * 22 = 140.8m, an entire order of magnitude less.

wredcoll · 2026-04-24T16:48:31 1777049311

Sure, and the mcdonalds case was a famously low penalty amount.

mcdonje · 2026-04-20T12:37:19 1776688639

You're blaming a lot of normal ETL problems on DSVs.

Like, specifying date as a type for a field in JSON isn't going to ensure that people format it correctly and uniformly. You still have parsing issues, except now you're duplicating the ignored schema for every data point. The benefit you get for all of that overhead is more useful for network issues than ensuring a file is well formed before sending it. The people who send garbage will be more likely to send garbage when the format isn't tabular.

There are types and there is a spec WHEN YOU DEFINE IT.

You define a spec. You deal with garbage that doesn't match the spec. You adjust your tools if the garbage-sending account is big. You warn or fire them if they're small. You shit-talk the garbage senders after hours to blow off steam. That's what ETL is.

DSVs aren't the problem. Or maybe they are for you because you're unable to address problems in your process, so you need a heavy unreadable format that enforces things that could be handled elsewhere.

jcattle · 2026-04-20T13:16:40 1776691000

I would kind of disagree.

We are talking here in the context of scientific datasets. Of course ETL plays a part here. However here it is really more the interplay of Excel with CSV which is often outputted by scientific instruments or scientific assistants.

You get your raw sensor data as a csv, just want to take a look in excel, it understandably mangles the data in attempt to infer column types, because of course it does, its's CSV! Then you mistakenly hit save and boom, all your data on disk is now an unrecoverable mangled mess.

Of course this is also the fault of not having good clean data practices, but with CSV and Excel it is just so, so easy to hold it wrong, simply because there is no right.

> so you need a heavy unreadable format

I prefer human unreadable if it means I get machine readable without any guesswork.

mcdonje · 2026-04-20T18:59:32 1776711572

That's Excel's type inference causing problems. Not an issue with CSV or any other type of DSV.

It is possible to import a CSV into Excel without type conversion. I just tested it two different ways.

While possible, it's not Excel's default way of doing things. Not always obvious or easy. Not enough people who use Excel really know how to use it.

Regardless, Excel mangling files via type inference is an Excel problem. It's not the fault of the file formats Excel reads in.

thunderfork · 2026-04-20T21:43:26 1776721406

The file format being ambiguous and underspecified enough to mangle is, though.

mcdonje · 2026-04-21T14:43:58 1776782638

No, it's Excel trying to be too clever. It does the same thing with manual imput if you don't proactively change the field type.

You can import a DSV into Excel without mangling datatypes in a few different ways. Probably the best way is using Power Query.

A DSV generally does have a schema. It's just not in the file format itself. Just because it isn't self-describing doesn't mean it isn't described. It just means the schema is communicated outside of the data interchange.

jcattle · 2026-04-21T06:26:58 1776752818

If you get an .xls which doesn't have very esoteric functions, I expect it to open about the same way in any Excel program and any other office suite.

With CSV I do not have that expectation. I know that for some random user-submitted CSVs, I will have to fiddle. Even if that means finding the one row in thousand rows which has some null value placeholder, messing up the whole automatic inference.

mcdonje · 2026-04-21T14:46:56 1776782816

You're just saying when there's no filetype transfer, you don't have to deal with issues related to filetype transfer.

jcattle · 2026-04-27T13:55:26 1777298126

No. That's not at all what I'm saying. I am saying that a fixed CSV file will open differently depending on the program you open it with.

Don't even need to transfer it. Opening a csv in pandas can be different than opening with polars, can be different to DuckDB, can be different to Excel.

You've got not guarantees. There's no spec, and how edge cases (if you want to call how to serialize and deserialize a float an edge case) are handled is open to the implementation.

IanCal · 2026-04-21T21:22:11 1776806531

It's both of their faults. CSV is not blameless here - Excel is doing something broadly that users expect, have dates as dates and numbers as numbers. Not everything as strings. If CSV had types then Excel would not have to guess what they are.

mcdonje · 2026-04-22T01:47:55 1776822475

It does have types if you define them in the schema. Not every format needs to be self-describing. It's often more efficient to share the schema once outside of the data feed than have the overhead of restating it for every data point.

It's completely Excel's fault for pushing their type-inference and making it difficult for users to define or supply their own.

Power Query does a better job handling it, but you should be able to just supply a schema on import, like you can with Polars or DuckDb.

It's another example of MS babying their userbase too much. Like how VBA is single threaded only because threads are hard. They're making their product less usable and making it harder for their users to learn how stuff works.

IanCal · 2026-04-22T17:37:08 1776879428

Csv doesn’t have a schema, it has a barely adhered to post-hoc “not a specification” and everything is strings.

That you can solve some of these problems by using something as well as the csv file is not anywhere near as helpful, and it’s a clear problem of csv files. There is no universally followed schema, for a start, so now we’re at unique solutions all over the place.

> It's often more efficient to share the schema once outside of the data feed than have the overhead of restating it for every data point.

You cannot be suggesting that csv files are efficient surely, they’re atrociously inefficient. Having the same format and a tied in schema would solve a lot and add barely anything as overhead. If you want efficiency, do not use csv.

Asking users to manually load in the right schema every time they open a file is asking for trouble. Why wouldn’t you combine them?

> It's completely Excel's fault for pushing their type-inference and making it difficult for users to define or supply their own.

It’s not entirely excels fault that csv doesn’t have types. They didn’t invent and promote a new standard, but then why would you? There’s better formats out there. I’m sure they would argue that the excel files are a better format for a start.

And people did make better formats. That’s why I think csv should be consigned to the bin of history.

mcdonje · 2026-04-18T00:43:16 1776472996

While some may lament the departure of the phallus bird, nobody can be sad about the arrival of the giant fierce mythical bird.