Hacker Newsnew | past | comments | ask | show | jobs | submit | more sheetjs's commentslogin

Article doesn't explain the issue, but OFFSET lets you access cells that aren't referenced in the formula.

For example, OFFSET(B2,1,1,1,1) is a live reference to cell C3, which means you can use functions like COLUMN to investigate the range. C3 shows up nowhere in the formula, so there's no non-volatile way to implement it.

The first argument to INDEX is a "sqref" (cell, range, or set of ranges) and INDEX will error if you try to reference a cell outside of the sqref, so use of INDEX doesn't break the obvious dependency structure


The correct way to generate a CSV cell with a leading 0 is

    ="01"
You can verify this with

    01,"01",="01"


Perhaps by "correct way" you meant "dodgy hack to make Excel happy and risk breaking more sensible implementations"?

Excel may predate the RFC but AFAIK MS didn't invent or coin the term CSV, so you can't just say whatever Excel does is correct. The RFC is loose because of nonsense like this, it doesn't mean it was ever a good idea.


How in the world is Excel supposed to know which fields you want to be numbers and which to be strings? CSV doesn't have that info built in unless you surround the number with quotes and you select the right process. Excel isn't just a CSV importer, it reads all sorts of files, and it needs some help if you expect it to work. What ever happened to process and a sense of responsibility and craft in your work?


Choosing based on whether it's in quotes seems like a better solution than using an equals sign. Not everything follows that convention admittedly, but not everything understands the '=' either. Or it could just treat everything as text until a user tells it otherwise.

But none of that is really the point. Because CSV files aren't just for importing into Excel. One of their main benefits is their portability. In other situations column types might be specified out of band, but even if not, putting equals signs before values is unconventional, so more likely to hurt than help. And in the cases it might help, i.e. when you only care about loading into Excel, then you have options other than CSV, rather than contorting CSV files for Excel's sake.

> What ever happened to process and a sense of responsibility and craft in your work?

I actually have no idea what you are on about. I'm talking about the "responsibility and craft" of not producing screwed up CSV files. Why do some people find that so offensive? Yes, it is not inconceivable that there could be some situation working with legacy systems where putting `="..."` in CSVs is, unfortunately, your best option. Sometimes you do have to put in a hack to get something done. But don't go around telling people (or yourself) that it is "the correct way".


[flagged]


I sure don't want to have to deal with people putting `="01"` in CSV files.


We have multiple legacy systems where I work, communicating via batch csv files, some of which are authored & curated by staff in Excel. I can confirm that doing this would be a very bad thing indeed and make our systems grind to a halt.


For CSV that isn’t solely targeting Excel, hacks around the way Excel works are useless.


The reason you use CSV is that you want to use it with software that ISN'T Excel. Otherwise you'd just use .xlsx. No other software uses this convention, and it's not correct CSV.


I guess the HN audience has voted. The result is: hacking around Excel stupidities is itself a stupid idea.

If you want Excel to understand your output, perhaps use a library which can write xlsx files.


“Correct” is a strong word. It isn’t defined in https://tools.ietf.org/html/rfc4180 so Excel should not try to be smart and add extensions only they support.


Excel predates RFC4180 by nearly 20 years (RFC4180 is October 2005, Excel 1.0 was September 1985) and this behavior was already cemented when the RFC was written.

As for the actual RFC, it's worth taking a read. Any sort of value interpretation is left up to the implementation, to the extent that Excel's behavior in interpreting formulae is 100% in compliance with the spec.


CSV predates Excel, and other CSV implementations don't have this behavior


What spec?

Anyway the RFC doesn't mandate any value interpretation IIRC.


If CSV were being used just to exchange data with Excel, we probably wouldn't be using CSV. Many systems neither need nor know that ="01" should be treated as the string "01".

If Excel were the only intended consumer, .xlsx would be a preferable file format. At least it's mostly unambiguous.


I ran into this last week with a UK bank. I was offered a CSV file. What I got was a CSV file with excel formulae in it.

I actually wanted a CSV file – preferably without having to resort to sed to strip out excel formulae.


And now the csv parser (or downstream process) has to guess whether to interpret that as the raw string or as the eval'd value.


If you want to write a bespoke CSV generator for an application where you know for sure that the file is only ever going to go straight to an Excel instance, sure.

For all the other uses in the world, that's a breaking change.


I have a list of companies I'd like you to consult for. Coincidentally, they're companies I'd like to work for, and I've quite enjoyed building proper database solutions which replace incredibly hacky/terrible Excel (or Excel adjacent) solutions.


Not even Excel uses that syntax when exporting to CSV (at least by default).


It was literally cited by Britain's finance minister George Osborne as justification for UK government austerity:

> As Ken Rogoff himself puts it, "there's no question that the most significant vulnerability as we emerge from recession is the soaring government debt. It's very likely that will trigger the next crisis as governments have been stretched so wide."

https://web.archive.org/web/20100414205630/http://www.conser...


People often cite papers which support their positions, but correlation does not necessarily imply causation, and the paper is rarely decisive.


Not the paper on its own, maybe, but it bolsters other arguments. If someone already wants austerity, an authoritative paper saying that's better will make them want austerity more, and gives them a strong argument in discussions with opponents, and may sway people who are unsure.


> If you spend some time and effort removing whatever obstacles you have in place that are keeping you from being able to do that

This is literally impossible for many JS libraries. Chromium / NodeJS / other JS environments are themselves constantly changing. Irrespective of the evolving timezone info, the core MomentJS can only be "done" for a particular set of browser versions. Each bug pertaining to dates, like https://bugs.chromium.org/p/v8/issues/detail?id=7863 , is a potential browser/engine version for which Moment needs a fix.

> this world does in fact exist

It only exists for certain proprietary software and SaaS developers because of the hard work of open source developers to keep up with the changing landscape. If everyone adopted your attitude, you would be forced to contend with the true nature of the ecosystem directly.


This is true, but also it would be good if the JS environments stopped doing that.


If the Web, browsers, the JS lang, sandboxing, Internet protocols, TLS, etc. were all perfect and done, then JS envs would be too.


CSV is a very bad example. Yes, it is easy to throw together a simple regex to parse simple RFC4180 CSV strings, but Excel is its own black box with a huge number of hacks.

For example, en-US excel will automagically parse TRUE and "TRUE" to be the logical value TRUE. The way to get Excel to see a literal string TRUE is to make a formula ="TRUE". Many CSV writers implement this hack specifically assuming files will be read back in Excel. So now your parser, if you're trying to process data like Excel, has to do the same.

So then you discover that this is actually localized! If you set your UI language to French (France), Excel will treat VRAI and FAUX as booleans while TRUE and FALSE are treated as literal strings.

What you thought was a simple CSV parser now has to handle localization as well. So that CSV parser library can roll its own dodgy localization support, use a tried and true solution, or just choose not to support the feature. Each choice has its own drawbacks


2020 feels like a new age because of the browser ecosystem. The actual endpoint of the "second age" is probably the combined forces of:

1) Windows 7 Support officially ended

2) Edge now switched over to Chromium

With the exception of highly niche applications and markets, sites meant to be consumed on computers can aim for Chromium support and developers can rest assured they are covering the overwhelming majority of the market. Many slower-moving enterprises are standardizing on Chromium-powered browsers. Even NetFront (browsers used by video game consoles) is switching over to Chromium

The "Third Age of JavaScript" isn't marked by tooling or frameworks or code structure, but rather by developers able to focus on one browser instead of multiple. This new age will be marked by innovative use of APIs that people disregarded in the past because they only worked in Chrome.


> developers able to focus on one browser instead of multiple. This new age will be marked by innovative use of APIs that people disregarded in the past because they only worked in Chrome.

Ah, just like the good old days of `new ActiveXObject("Microsoft.XMLHTTP");` ;)


I see the opposite. I've talked to a lot of developers who have move to Firefox out of concern for things Google has been doing with Chrome recently. Though, developers are a minority for sure. Most people don't follow tech news closely enough to understand why they should be concerned.


The only place I can really see this making a difference is webassembly (which is not JavaScript). People have been transpiling their code to erase js discrepancies for years.


I doubt it, iOS is still a thing and will drag down support for new browser features.


It's even worse than the author is suggesting. For most people, "RFC4180" is meaningless, all that matters is what Excel does. And that means you need to handle a bunch of cases if you are reading AND if you are writing files. A few cases not discussed in the blog post:

- if your file starts with \x49\x44 ("ID"), Excel will interpret the file as their symbolic link .SLK format. So if you're writing files, the ID should be wrapped in double quotes even if it isn't necessary according to RFC4180

- Excel will proactively try to "evaluate" fields that start with \x3d ("="). You can see this in action with the sample file

    1,2,3
    =SUM(A1:C1)
- Excel will aggressively interpret values as dates when possible. For example, SEPT1 issues https://genomebiology.biomedcentral.com/articles/10.1186/s13...

CSV parsing / writing certainly isn't going to be a value driver for most companies (if you're supporting user imports, you really care about XLSX/XLSB/XLS files and Google Sheets import), but it's not a trivial problem.


Similar story here and with many of our customers. It's been radio silence with larger banks (Chase) and community banks.

It's rumored that, just like how BofA prioritized companies with active business loans, other banks are giving money to ensure their loan customers can continue to pay existing loans.


Performance is affected by a myriad of factors, many of which aren't visible by looking at the worksheet.

For example, XLS/XLSX support a special "shared formula" representation which tells Excel that a formula is structurally similar to other formulae. This is not always written, depending on how the file was generated and other factors. Without that, Excel will take an "inefficient" calculation approach. You can't figure it out from the UI or by inspecting the formula text -- you actually have to dig into the file to see it.


That's really cool! How did you figure this out and what does this representation look like?


https://docs.microsoft.com/en-us/dotnet/api/documentformat.o...

The OpenXML basis format for XSLX is an ISO standard. Most things are documented albeit in a somewhat obtuse way.


Shameless plug: we (https://sheetjs.com/) offer that as one of our paid builds. In conjunction with our JS parser https://github.com/SheetJS/js-xlsx , you can modify cell values / formulae and recalculate the workbook (or individual expressions) in the web browser or nodejs


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: