> Convert a continuous string of letters and numbers to a date.
This has been a massive bugbear of mine. Particularly when it inexplicably chooses USA date formats even when faced with a column containing values like 15-07-75. It would frequently convert half the values into US date format where possible and leave others like above unconverted.
Yes, I had big issues with Excel converting a number range like "1-3" to "3rd January" when importing property data. Eventually switched to Libre Office which works much more intuitively.
And leading zeros, which is awesome when postal codes have them and shipping software uses csv imports. Somebody only has to open the file once, and it isn't immediately obvious unless people know to expect that.
I used to work with some data that would come across as CSV files but from non-technical people and required leading zeros on some fields. Trying to explain the difference between csv and excel ... or why you should not open those files in excel ... was difficult to say the least.
CSVs need an option to open as text.
Or at least to tell people "hey, there butchered your data for ya" with an undo option.
I got a ticket from IT dept (lol) that a CSV is the wrong format. It wasn't.
This article says the user will be notified if any data is automatically converted when opening a CSV. This should have been done a decade ago but I'm still glad to see it.
US data formats, REGARDLESS of any regional setting you might have for dates. So even if they were dates being auto-formatted, they were still doing it wrong.
Anyone done this? Open a .csv in excel to fix/edit a item, then save it without realizing that it autoformatted a bunch of columns. Now it doesn't work in the parent program anymore.
Excel isn't the only MS product with US date insanity. Outlook set up for AU region only supports date filtering in searches using US date formats, which is the icing on the turd sandwich of date searching as Outlook requires you to hand write in text search queries to search for before/after x date.
Damn near 50 years late. Did the memo that literally everyone hates this feature, and it’s spoiled many scientific studies and brought down multiple businesses just get delivered to their developers?
Its annoying, but to be fair we dont hear about the people auto-date-detection has helped or the time it has saved because those people mostly remain blissfully unaware of it
On a level, I would love if people don't use excel at all for science. The potential damage to your own reputation, for using excel features is enormous. Also, it wreck the work of scientist sharing data https://genomebiology.biomedcentral.com/articles/10.1186/s13...
Ehhh, everyone in biology has seen the effects of Excel date conversion and has probably experienced it in some of their own data workups. While I have certainly been annoyed that it happens, I do not look down upon other researchers who have a few adulterated septins slip into their work.
Until the day where everyone is computationally savvy and can do their processing in Python/R/Julia, it is the state of the world. As a matter of fact, HUGO agreed to rename some of the worst offender genes to Excel-friendly formats[0].
“The problem of Excel software (Microsoft Corp., Redmond, WA, USA) inadvertently converting gene symbols to dates and floating-point numbers was originally described in 2004.”
“RIKEN identifiers were described to be automatically converted to floating point numbers (i.e. from accession ‘2310009E13’ to ‘2.31E+13’).”
We keep a log of PCs that come in for repair at work, it's not mission critical and excel does the job well enough
One extremely aggrevating thing it does repeatedly is strip leading 0s off phone numbers (local phone numbers here follow thw format 0XX-XXX-XXXX), we constanly need to work around it by remembering to add a dash in the middle.
As it is, Excel needs to keep around a) the value of a cell, b) its format (ignoring formulae, comments, etc.).
In your proposal, Excel would need to keep around a) the value of a cell, b) its format, and c) whatever you originally typed in, with c) being potentially more than a)+b), without very much benefit. And remember, Excel was released in 1987 (years before Windows 3.1) on machines that had less storage than they have today. I'd say the design decision made back then was the right one.
It's not 1987 any more and Excel doesn't use the file format that it used back then. The price of main memory and long-term storage has declined by many orders of magnitude since 1987.
Well, yes, obviously, but do you realise what a headache it is (would be) if the default behaviour of Excel changes? How many people would complain if things that used to work suddenly stop working with a new version?
The problem was that this default behavior was unchangable. Like, let me get into settings and turn this shit off (like they are now doing), they could have done this decades ago.
The value of the cell is what I typed in. How that is displayed is determined at render time or whatever, based on the data type.
This is already kind of the case. If you change the date rendering format, the "real" date text in the cell stays the same. The only issue is that Excel decides what an acceptable "real" value is and modifies my input to match that.
This is of course correct, but there are also people who do not understand formatting. They get confused when they want to edit the displayed value and find the typed value to be different from it.
Indeed. Every time I opened a .csv and had Excel courteously convert all UPCs into scientific notation, I wondered "Who would ever want that to happen?" Finally there is a way to avoid this.
Import CSVs into excel; don't double click to open. Been telling this to colleagues since excel 2007 days; still have never managed to get anyone to follow this advice.
Add a ' prefix to them so they are treated as text. We've had problems with many-digit integers being converted to floating point approximations instead.
How did people keep up with that? Is there really no competitor who gets this right?
These unexpected conversions must have cost billions in problems they created or fixing time they needed.
And now after so many years they just remove some types of auto conversion and leave the others?
Sounds like some kind of Stockholm syndrome...
I don't think that's a fair comparison.
Here we talk about a tool that fucks up your data once you load it.
With javascript there is no such thing. There are some auto type conversions when passing a wrong type into a function that can yield unexpected results. But I would argue this is different as it's an environment for power users only, and there is types at play that should make you as a programmer aware.
And if you want a string in Excel you must apostrophe-prefix it. If you think “serial number” == number, telephone number == number, then try to use the number handling, you get problems because “numbers” don’t have leading zeros or spaces or parens or hashes or plus symbols.
I would like to store, then retrieve exactly 1234567890123456789 apples instead of 1.070816993713379 × 2⁶⁰ = 1234567939550609408. Unfortunately due to IEEE-754 conversions I can only do this up to 7-8 digits or so. This is a number, just not one that can be represented as a Number without loss of precision towards the end.
Dynamic typing is orthogonal to implicit type conversion, though admittedly dynamic environments do it more often. Watch out for (void*) in C or accidentally inferred union types in TypeScript.
// Is there really no competitor who gets this right?
Clearly excel has offered value far in excess of this annoyance. The competitors such that they existed aside from google sheets clearly fail to deliver sufficient value even in hypothetical absence of this bug.
You say "clearly" twice, but it's not clear at all to me that this is the case. Can you elaborate what makes it clear that no other competitor exists who can provide similar value?
And isn't LibreOffice Calc closer to Excel than Google Sheets in terms of features?
I use libre at home and excel at work. Having worked in various departments of a somewhat regulated industry, I can say that the value of Excel comes from familiarity and 'not having to retrain everyone'. It seems silly to people like me, but I personally experienced a person unable to complete their daily routine, because version change moved from one menu to another.
I guess my point it is: it is not features. It is humans. MS cracked the code on that one. Get people familiar with their stuff and the rest will follow.
> person unable to complete their daily routine, because version change moved from one menu to another.
> I guess my point it is: it is not features.
Discoverability of commands, customization, as well as the stability of UI are all features. As would be "full UI compatibility with Excel sans stupid bugs like data conversion"
The main reason people continue using it is due to the network effect (xlsx files are considered shareable) and the fact that people are trained on its features and interface. That does make it valuable, but that doesn't mean it is more usable than competitors.
I don't think it has anything of the kind. "Office" is no longer a must for any home user, and most businesses starting today have a clear choice between (at the very least) Office and gSuite.
I haven't had Excel installed on my work laptop for the current or previous job. Though if I didn't have access to Sheets and had to use Libre by default, I'd petition for Excel in a second...
With CSV files you get much better experience on Windows when you use "Data -> From Text/SV" instead of just opening the CSV file. You get a proper preview, can tweak the field types, set decimal separators etc.
This also creates a connection to the CSV file and you can also easily refresh data. With pivot tables this makes it possible to build simple reporting solutions. Just place updated csv files to known location, refresh data and tables.
That's Power Query, which is basically an ETL, and with PowerPivot (xVelocity engine ), you have basically a powerfull self service BI couple with an actual spreadsheet.
Uh, my experience is that the opened CSV file will get all converted - dates formats changed, trailing zeroes trimmed, umlauts broken, you name it. So "better" means at maximum "less shitty".
There is a difference between double clicking the CSV file vs importing from within. Importing via Get Data > CSV will bring up PowerQuery which only made a copy of the source file and allows me to modify the data without affecting the original file. If I made a mistake after transforming, I can go back to PowerQuery (without importing again) and undo the step then adjust the modifications. That is the beauty of PowerQuery. Even in the wizard, it does allow me to set the data type of the columns before opening. Users generally skip that step ahead and get down to the data.
Problem with importing is that laypeople are not familiar with PowerQuery and it can be overwhelming for lot of users.
It didn't make it less shitty. The problem is on the between the keyboard and the chair. If users take the time to be familiar with the wizard and PowerQuery, a lot of miscorrection would be avoid in the first place.
Oh I see. I definitely don't want to go through all these hoops. If it doesn't open it properly on click, it's only a failure which needs to be corrected by hand - be it with PowerQuery or whatever. And getting familiar with a shitty workaround doesn't make it less shitty or less workaround.
This isn't opening the csv file directly, it's importing it into an existing (though possibly not yet saved) workbook. I can attest that it works correctly though it is kind of annoying if you don't want to do more than just some simple manipulation.
Also I believe excel for mac either doesn't have this feature or it's not as feature rich as the windows counterpart.
Or at the very least turned off by default for CSV.
CSV is a data interchange format. When you open and hit "save" on one, unlike Excel's XLS/XLSX format, they cannot even store information about cell formats. So all this "feature" does is cause irreversible data loss to CSV.
There is absolutely no excuse, and never was an excuse, to ever auto-convert a CSV. I always felt like they kept doing this to under-cut CSV in order to force users to use an XLS/XLSX instead. But even with this sabotage Microsoft lost this war and yet continues to destroy data.
It is great I can turn this off, but until the people I'm sending the file to have also done so, it isn't enough. Still a high risk someone in the chain will cause data-loss.
Talking about under-cutting the value of CSV as interchange format... Instead of "comma-separated values", Excel treats CSV as "values separated by locale-specific characters". In continental European locales Excel CSV files are actually semicolon separated, and entirely incompatible with UK or American CSV files, or CSV files in non-Excel software.
I don’t even care that Excel by default wil transform “00001234” into “1234”. It seems like a sensible default. But…if only I could figure out how to turn that back to “00001234” for the few times I do not want it. And I don’t mean one instance, as double clicking the cell usually does it. I mean for 10,000 numbers at once.
I've said for years that whenever Microsoft guesses what I want, it's wrong. My context is as a software developer, so I get that I'm not their target demographic with an office suite, but I stand by my point.
By the Nine Divines! I'm still just a spreadsheet initiate, but that auto formatting was steadily becoming my nemesis. I can hardly conceive of the headaches this has caused in people who make serious use of them. It's hard to believe it took this long to implement a feature that seems like it should have been there from the start.
I wouldn't get too excited, its yet another way Excel can mess up your data without realizing. Clippy never died, he just silently lives in Excel, messing with all the Clippy haters.
Great! I am saying in the last 10 years: all I need is a button in Excel called "I am a power user, don't do anything unless told"
We are getting there :)
It looks like they are rolling it out in stages. I have Excel for Mac on two devices. One has it already, the other one not yet. It's also marked as "BETA". On macOS you can find it in Preferences and then in the "Edit" category.
I'm glad they finally introduced this option. Unfortunately the previous behavior is still the default. If someone sends you data and does not know about this, your data will still be subject to the old conversion rules.
Thanks, I get it. I guess I lost track of the naming after Office 365 arrived, vs Office 2010/2016/2019. Still not sure I can explain the order. Or the Mac versions.
Still remember that dna data was saved and edited inside of excel and somehow those huge strings of data were randomly false in part, because of automatic data conversion to arbitrary data formats.
Control+Backspace is a common shortcut for deleting the previous 'word' (the previous block of text to some form of white space).
It's very common in IDE's, even FireFox supports it if you type words in the address bar, press Control Backspace and the behavior happens there.
Asking ChatGPT about the origins of this, it points to Control+W originating from Unix Terminals, and how it's been adopted by most IDE's as Control+Backspace.
Microsoft has very poor support for it in their tools, and I use MS products for the bulk of my work.
Moving to non-Microsoft products like Google Sheets, and viola, Control+Backspace works.
Even muscle memory stuff like Control+Shift and left arrow to select previous words; also not supported by Excel, but it is by Google Sheets and IDE's.
> How is what you want different from pressing delete?
Compact and laptop keyboards; sometimes excluded, often poorly/inconsistently sized/positioned.
There's a much better chance of Ctrl+Backspace being a consistent movement independent of keyboard layout, so I can appreciate where the parent is coming from.
Those compact layouts that exclude a delete key typically replace it with Fn + Backspace.
I'm not saying there aren't scenarios where ctrl + backspace would be useful, however the majority of the time I'd argue that delete is available and should be used.
Is the desire for a ctrl + backspace chord coming from some other system where this is the standard keying?
I just want paste to work as a human might expect. Normal paste in excel ... brings color and font information in, ctrl+shift+v which should just be plain ... does something else? Idk wtf it does, but it paste strings from further back in the history. At least for me on mac
ALT+E, S, V is paste as values.
ALT+E, S, T is paste formats.
ALT+E, S, F is paste formulas.
Those are the ones I use most frequently and these keystrokes are now deeply embedded in my muscle memory.
BTW, I would take slight issue with your expectations of what a human might expect... I actually think the standard CTRL+V paste does what 90% of Excel users expect.
[Edited to add: this is on Windows, didn't read your comment properly about your experience on a Mac. My apologies.]
This works until you try have to use Excel localized in languages other than English, since Microsoft thought having localized shortcuts was a good idea.
at least you can rebind these types of keybinds in an external keyboard remapping tool, but in general it's a disgrace that you still can't have convenient remapping
So not that libreoffice calc is as feature-ful as excel but AFAIK it gives you options on how to autoconvert data every time you import or copy and paste table like data into it including not converting.
This will help us, not because we worked with genetics, we have far more mundane requirements: occasionally inspect csv import files, that contain externally relevant ids, of the format "00012345678912345". Turning those into "1.2345E13" is no help at all.
Yes, VSCode with an appropriate plugin is IMHO better than excel at this, but some people (e.g. business analysts) will automatically reach for excel and have to be walked through setting up VSCode and the plugin.
Ids are not really numbers, even if they look like them.
One would think. I have yet to find a spreadsheet program that doesn't mimic Excels behavior for automatic conversion up to the point of making it non-optional.
It would be nice if Excel had a builtin "double bookkeeping" feature: given a formula, ensure results are also calculated or accounted for in another place.
It would be cumbersome to always have it enabled, so it could certainly be disabled by default. But Microsoft could market the feature with suggestive tooltips.
One example is if I have the fornula `=A1+B1` in cell C1, I can go to a separate worksheet and generate a constraint like `=MUST(W1!C:C<1337)`; then Excel would flag any rows where the calculation is false (≥1,337).
Of course, this kind of goes into treating derived cells as constrained types, but it seems sanity is achieved with the easier checks.
Constraints or properties are nice in that they are not unit tests; they could be added at the "moment of instantiation" like an object constructor--but in this case, the violation occurs as a post hoc check. It has to happen first.
You might say, "I always triple-check my models and ensure worksheets are equal in multiple ways." Maybe it's possible to do it already. Sometimes, quality is about introducing frictive utility with minimal overhead.
The problems solved are usually not handled with only with an integer primitive, but hand in hand with a domain component that makes us pause and go, "Okay, I guess a person's age won't be MAX_VALUE or negative."
I am no excel wizard (have been using Libreoffice for a decade now) but I guess you can do that already (in it's own row)
=IF(EXACT(ValueA;ValueB); ValueA; "Mismatch!")
So you basically just compare two other rows (which you can hide if you like) and if they are exactly the same you display the value, otherwise you display an error.
The next step may be Don't Repeat Yourself (DRY), so even N nearly duplicate formulae for N rows is N-1 more times than needed.
It's not so bad with one column, but after rearranging a dozen columns and copy-pasting some corrections, the question becomes, "Is everything still working okay, or do I need to skip lunch?"
There's ways around, like a `CURRENT_ROW()` function. That makes it generic.
Understandably, it can be a hassle to type extra functions all the time. Boilerplate for one-liners isn't fun; the whole point is rapid iteration and prototyping.
Just saying, if a model is important enough to keep around and maintain, put in some pragmatic checks--just like your example.
If I'm doing a lot of moving around of things, and wanting to keep more complicated formulas working, I'll usually just use named ranges for my own sanity (and as a convenience for whoever may need to take over a workbook after me) - eg, ```=SUMIF(TransactionProductSold,"Product1",TransactionAmount)```
I wish someone built a tool to extract any model from Excel and help annotation and conversion to code with clean separation of input, logic, output. The amount of creepy legacy Excel is killing my organization.
Yes I recognize this problem in my organization as well. I think this is feasible, and tools in this direction exist already, like https://formulas.readthedocs.io/en/stable/doc.html. I think one challenge is that the variable names in Excel (B3, B2-10 for a list) are not easily converted to descriptive names.
>I think one challenge is that the variable names in Excel (B3, B2-10 for a list) are not easily converted to descriptive names.
It takes some getting used to, but you can pretty easily create a named range for an individual cell by modifying the value immediately left of the formula bar. You can also setup a table to hold data (insert -> table).
Tables can be renamed and allow formulas like =sum(tbl_salaries[salary]).
With named ranges, your formula can look like =purchase_price*sales_tax
Thank you! I was hoping for a reply like this. The naming is actually (in big corp) a non-problem. I even think many finance workers dislike these models as well, so this isn’t even a bad chore. The models we run have tops a few hundred inputs, many ranged. That’s an hour or two of puzzling.
I know it's a fad, but isn't that the exact kind of thing that AI-enthusiasts claim ChatGPT/LlaMa/CoPilot will be capable of doing within a year or two?
You think Excel is bad, just think of the legacy those will leave behind in 20 years... Anything cloud or SaaS-related, good luck trying to dust that off down the line!
I mean, some are claiming that AI will be capable of anything. But I don't think extracting Excel formula's is currently a focus of LLM applications? Do you know of startups or other attempts at exploring this?
Power Apps are pushing into AI direction. And it does use AI to parse excel file. Moreover Power Apps on itself has PowerFx engine that uses Excel formulas for app + more.
Yes, it could just be "Hi there, you're using a formula depending on another formula. Do you want to add a check on these rows?"
If the user elects to add a check, then expand on constraints and such.
If the user selects no, remind the next user that "inadvertent modifications could result in indeterminate results."
Eventually, someone receiving the attachment enables it, and a discussion starts for the group as a whole.
One camp may deride the change: it will never be useful, and data is always in bounds. The other may point out some assumptions that were unclear, and now a check exists. Adding it cost nearly nothing, but coverage reduces chances of regression.
I work technical support at a startup where we have a self-service Uploader for some data imports. Can't count how many hours and emails I spent going back and forth to tell people how to turn auto conversion to scientific notation off. Long overdue feature/setting for them to release.
I hate when excel converts UPCs to decimal and no amount of format will fix it other than to insert a colin before the UPC which is apparently the official answer on what to do
I think the "oh, finally" here is somewhat overblown. It would be much more annoying to lots of people if Excel did not recognize a number as a number, or a date as a date. This convenient conversion feature could always be overridden, if I'm not mistaken, by either pre-formatting cells as text (when you're entering manually), or designating a column as text (when importing, and converting from, a CSV file).
One could criticise that instead of 1. trying to determine the type of a column in CVS, then 2. treating all values of the column as instances of that type, Excel would go through row by row and decide ad-hoc which values to auto-convert. That might have been done because people don't just store relations in CSV, but use it as a lingua franca format to move things between applications.
The designers of Excel were not idiots, and tried to build a tool usable by the average user.
It’s really sad that we all basically have to design code and data fields around the behavior tools like Excel. R will do the same thing for numeric fields (strip leading 0s), so it’s not just Excel here. But in the R case, you rarely read/write to the same file, so it’s less of an issue.
Then again, these issues have been known for decades, so a lot of things like your best practice are around for a reason…
More like we can’t name things (eg. gene names) the way scientists want because of needing to work with the data in Excel. So instead of OCT4 we get a different name because Excel mangled the name into a date for decades.
Excel is still the easiest way to look at tabular data, even if it isn’t part of the production workflow. And sadly, even if you save the file as txt, Excel would always mangle certain fields.
So yes… users have been working around Excel-isms for years.
This has been a massive bugbear of mine. Particularly when it inexplicably chooses USA date formats even when faced with a column containing values like 15-07-75. It would frequently convert half the values into US date format where possible and leave others like above unconverted.