Hacker Newsnew | past | comments | ask | show | jobs | submit | mcdonje's commentslogin

A souffle has not been made

Indeed, more of a frambled egg. Lets see what happens in two years time.

Prescient

Yeah, they often call this to mind: "Better ingredients. Better pizza. Papa John's."

EPUB is the ebook standard, outside of Amazon-land, so it has staying power in its space. I think it would be good for the ecosystem if it broke containment and got tooling in enough places to challenge PDF.

Man, Access could've been so good if they just made an app around SQLite. Or since it's Microsoft and they need to do everything their own way, it would've been so good if they made a flat file DB à la SQLite, but with T-SQL (or a subset thereof) instead of JET-SQL.

Increase interoperability. Funnel data people from Excel into real DB technologies.

And if they did more to blur the lines between spreadsheets and databases, and make it seamless to work out of both Excel and Access, add more spreadsheet features to the data views, etc.


PII sniffers are pretty good at dealing with excel files. Excel is seen more as an analyst tool than a dev tool. Any place that bans Excel needs to either let analysts use some other turing complete data tools, like python or R or something, or they'll have trouble attracting analyst talent. They'll have devs and data entry users and that's it.

The only way that works is if the dev team is large enough to be responsive to business needs, which almost never happens because devs are expensive. The juniors who are tweaking business logic every day are functionally doing a role analysts can do if you just give them a sane API and data tools.


Why are you making my screen look dirty? lol

lol I kind of thought it made it look like a hot sauce label. but maybe not.

We're not going to have a rehash of the McDonald's coffee settlement argument here, are we?

https://en.wikipedia.org/wiki/Punitive_damages


She deserved way more than that for the way they tried to smear her afterward!


Seriously, the reporting on that was so terribly biased that many people still think it was a frivolous lawsuit.


It's honestly kind of chilling just how effective smear campaigns can be.

I don't think there's any reasonable person who could read the full medical description of the injuries sustained and think "yeah 2.7 mill was too much".


The result was wrong. And yes, I read the contra arguments put on and they were not convincing.


Your entire opinion is based on an expensive propaganda campaign. Is that who you want to be?


Stella Liebeck was awarded 2.7 million in punitive damages, that seems like a much more reasonable number than 1.4 billion.


It was considerably less on appeal and the mcdonalds lawyers didn't anatagonize the court every chance they got and it was literally 30 years ago and there was only one victim.

Just with inflation (6.4m) and number of victims (22?) you get a much larger number real quick.


6.4m * 22 = 140.8m, an entire order of magnitude less.


Sure, and the mcdonalds case was a famously low penalty amount.


You're blaming a lot of normal ETL problems on DSVs.

Like, specifying date as a type for a field in JSON isn't going to ensure that people format it correctly and uniformly. You still have parsing issues, except now you're duplicating the ignored schema for every data point. The benefit you get for all of that overhead is more useful for network issues than ensuring a file is well formed before sending it. The people who send garbage will be more likely to send garbage when the format isn't tabular.

There are types and there is a spec WHEN YOU DEFINE IT.

You define a spec. You deal with garbage that doesn't match the spec. You adjust your tools if the garbage-sending account is big. You warn or fire them if they're small. You shit-talk the garbage senders after hours to blow off steam. That's what ETL is.

DSVs aren't the problem. Or maybe they are for you because you're unable to address problems in your process, so you need a heavy unreadable format that enforces things that could be handled elsewhere.


I would kind of disagree.

We are talking here in the context of scientific datasets. Of course ETL plays a part here. However here it is really more the interplay of Excel with CSV which is often outputted by scientific instruments or scientific assistants.

You get your raw sensor data as a csv, just want to take a look in excel, it understandably mangles the data in attempt to infer column types, because of course it does, its's CSV! Then you mistakenly hit save and boom, all your data on disk is now an unrecoverable mangled mess.

Of course this is also the fault of not having good clean data practices, but with CSV and Excel it is just so, so easy to hold it wrong, simply because there is no right.

> so you need a heavy unreadable format

I prefer human unreadable if it means I get machine readable without any guesswork.


That's Excel's type inference causing problems. Not an issue with CSV or any other type of DSV.

It is possible to import a CSV into Excel without type conversion. I just tested it two different ways.

While possible, it's not Excel's default way of doing things. Not always obvious or easy. Not enough people who use Excel really know how to use it.

Regardless, Excel mangling files via type inference is an Excel problem. It's not the fault of the file formats Excel reads in.


The file format being ambiguous and underspecified enough to mangle is, though.


No, it's Excel trying to be too clever. It does the same thing with manual imput if you don't proactively change the field type.

You can import a DSV into Excel without mangling datatypes in a few different ways. Probably the best way is using Power Query.

A DSV generally does have a schema. It's just not in the file format itself. Just because it isn't self-describing doesn't mean it isn't described. It just means the schema is communicated outside of the data interchange.


If you get an .xls which doesn't have very esoteric functions, I expect it to open about the same way in any Excel program and any other office suite.

With CSV I do not have that expectation. I know that for some random user-submitted CSVs, I will have to fiddle. Even if that means finding the one row in thousand rows which has some null value placeholder, messing up the whole automatic inference.


You're just saying when there's no filetype transfer, you don't have to deal with issues related to filetype transfer.


No. That's not at all what I'm saying. I am saying that a fixed CSV file will open differently depending on the program you open it with.

Don't even need to transfer it. Opening a csv in pandas can be different than opening with polars, can be different to DuckDB, can be different to Excel.

You've got not guarantees. There's no spec, and how edge cases (if you want to call how to serialize and deserialize a float an edge case) are handled is open to the implementation.


It's both of their faults. CSV is not blameless here - Excel is doing something broadly that users expect, have dates as dates and numbers as numbers. Not everything as strings. If CSV had types then Excel would not have to guess what they are.


It does have types if you define them in the schema. Not every format needs to be self-describing. It's often more efficient to share the schema once outside of the data feed than have the overhead of restating it for every data point.

It's completely Excel's fault for pushing their type-inference and making it difficult for users to define or supply their own.

Power Query does a better job handling it, but you should be able to just supply a schema on import, like you can with Polars or DuckDb.

It's another example of MS babying their userbase too much. Like how VBA is single threaded only because threads are hard. They're making their product less usable and making it harder for their users to learn how stuff works.


Csv doesn’t have a schema, it has a barely adhered to post-hoc “not a specification” and everything is strings.

That you can solve some of these problems by using something as well as the csv file is not anywhere near as helpful, and it’s a clear problem of csv files. There is no universally followed schema, for a start, so now we’re at unique solutions all over the place.

> It's often more efficient to share the schema once outside of the data feed than have the overhead of restating it for every data point.

You cannot be suggesting that csv files are efficient surely, they’re atrociously inefficient. Having the same format and a tied in schema would solve a lot and add barely anything as overhead. If you want efficiency, do not use csv.

Asking users to manually load in the right schema every time they open a file is asking for trouble. Why wouldn’t you combine them?

> It's completely Excel's fault for pushing their type-inference and making it difficult for users to define or supply their own.

It’s not entirely excels fault that csv doesn’t have types. They didn’t invent and promote a new standard, but then why would you? There’s better formats out there. I’m sure they would argue that the excel files are a better format for a start.

And people did make better formats. That’s why I think csv should be consigned to the bin of history.


While some may lament the departure of the phallus bird, nobody can be sad about the arrival of the giant fierce mythical bird.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: