I've learned products designed to replace spreadsheets have a huge hurdle because the people who use spreadsheets treat operating the sheet as their job. Replacing them removes their autonomy and control over an information process, and subsumes the value they bring to their employers - so they will resist products that threaten that. Excel is a complete management subculture.
The other advice I give is if you are generating analytics, have a PowerBI connector of some kind because the people who make decisions (managers, etc) make them based on PowerBI, and not from an interface their staff is a peer at using, and likely has control over. In enterprise, they want data in metrics their staff can't see, hence a separate tool.
Spreadsheets will always be with us I think. The opportunity may be in creating one that is has sufficient work-alike features with legacy ones, with new power features (python, etc) where there is a connector between the high power open development environment, and the familiar Excel ones managers use. Key thing being not asking managers or sr. employees to change.
It's not about autonomy in an abstract sense. It's the fact that regular people can customize, hack, modify, automate, and otherwise program the spreadsheet. Data analysts and managers are more than trained monkeys; they actually need to do those kinds of things (at least sometimes) in order to do their jobs.
I agree that the ideal world is one where you can connect the spreadsheets to other data sources, so you get the best of both.
This is a double edge sword, because often time there is no one validing the results of the spread sheets...
"The number looks ok" is not a good validation, and there has been some very public data errors as a result of bad spreadsheets.
I often wondered if in the average business is even 10% of the spreadsheets where actually audited what would happen.... I suspect the results would be rather shocking
I'm not sure if I should be proud of this or not, but once I had some pretty critical field values in a Excel spreadsheet that I had to make sure were compatible with my code. I ended up adding some derived formulas in a new locked sheet within the file, committed the XLSX to Git and then created a JUnit test to make sure everything was in sync. In a nutshell, the XLSX became the source of truth
Yup, and this is frequently a total disaster. Like, hundreds of thousands of dollars get lost either in spreadsheet errors or by paying the salaries of people who manage spreadsheets when their jobs could probably be replaced by a much better solution.
But, whatever - more work for everyone! Certainly suits me. It's like wanting the average person to keep programming C because you're in infosec.
Indeed, and I think this is the real problem with non-programmer/data people working with spreadsheets. The problem isn't the spreadsheet interface itself. The problem is that people are doing what amounts to engineering without any training in engineering and no sense of best practices.
> I've learned products designed to replace spreadsheets have a huge hurdle because the people who use spreadsheets treat operating the sheet as their job. Replacing them removes their autonomy and control over an information process, and subsumes the value they bring to their employers - so they will resist products that threaten that. Excel is a complete management subculture.
Spreadsheets are better because as you say the owner is their job to maintain it. If you replace with an IT process the new "owner" is likely a below-average developer that probably is uninterested in the business. A few years down the road the usefulness of the replacement will suffer.
> Spreadsheets are better because as you say the owner is their job to maintain it. If you replace with an IT process the new "owner" is likely a below-average developer that probably is uninterested in the business.
I disagree. Spreadsheets are incomparably worse because what you charitably described as "the owner is their job to maintain it" in real life it's reflected as having a single employee who abused a first-move advantage to monopolize and excerpt unduly control over, and even hijack, key operation areas.
We all heard horror stories of how employees screwed over their former bosses because only they had control over things like key spreadsheets. Advocating for spreadsheets is advocating for these vulnerabilities.
I have heard these horror stories so many times but never witnessed them in reality. From my point of view, excel is a great way to ensure knowledge is not hidden, as you have a file format that embeds calculations, data and outputs. It can be ugly, but nothing that a seasoned excel warrior cannot parse and there’s plenty of them around. Now if someone as an employer does not even have copies of the files, they have bigger problems than Excel itself.
The other horror stories of errors in spreadsheets, them yes I have witnessed them regularly.
> Advocating for spreadsheets is advocating for these vulnerabilities.
Spreadsheets will exist regardless of what developers think of them. Ironically, that's a good thing.
Many projects that put food on dev's tables started out as out-of-control Excel monstrosities that were created and operated for long spans of time by well-meaning and productive folks. They start as simple manual spreadsheets with some formulas and then evolve into much more involved beasts. Work gets done and it's all nicely contained in somebody's cube and they look good and can be rightfully proud of their accomplishment.
Things just get done. For a while. Sometimes a LONG while. Until the bitter realities that software developers have learned to deal with over the decades start to seep into these projects and drown the unwitting folks who created them, slowly but surely, like an ever-increasing number of small holes in the bottom of boat. That's when things break or become unmanageable and that's when developers start getting engaged-- assuming these excel masterpieces have actually become mission-critical.
There's a guy at work that operates one of these excel monstrosities. It's been going for ~7 years now. It's a monster excel spreadsheet that, among an ever-growing list of things, does dubious probabilistic forecasting of future PO's based on shit ripped from salesforce (not even using the api). He has a dedicated laptop behind him pulling in data from multiple sources, like clockwork, and has recently started making attractive Power-BI dashboards using his excel worksheets as the data sources. And you know what? He looks GOOD to the people that matter. Does the forecasting actually work? Not really, but being so immersed in all that data has made him knowledgeable about many details of the operation. He's able to keep track of costs and stay on top of things. It doesn't matter (to him) that the whole thing will vaporize when he leaves, or that he could tire of it and just foist it upon some hapless supply-chain person who's just learned to use formulas in excel.
Most forecasting doesn't really work, as in, most falls between "not useful for predicting the future" and "outright wrong." You could get your forecasts for free or you could pay millions of dollars, pretty much same result. Not an Excel thing in any way.
Indeed, the spreadsheet I am thinking is particularly hot garbage, but slap some dorky corporate bar-charts on there and it's like beer-googles for suits.
But why are you associating that with Excel? Every analytics tool features charts front and center regardless of how it is delivered, web, BI, dashboard, Jupyter, etc, and no matter what language it is written in (R: ggplot, Py: matplotlib, JS: highcharts etc.).
The guy's spreadsheet seems to work. He's delivering what his bosses want to see. You might have an issue with the final output but they apparently don't. What exactly is the problem you think you can fix?
I am on the side of excel being used like this, even if it's hot-garbage. The worst that can happen is that it collapses upon itself and then others need to come in and do it right, or migrate the thing to something else entirely.
A counter point to this is that spreadsheets bottle-neck the sharing of data and introduce data cleaning issues.
Most spreadsheets are built with the mindset that it is the end of the dataflow. However, at some point, this data needs to be shared forward. This might not be the original intention, but the more important the report is, the more important downstream use-cases become.
This is when spreadsheets become problematic. One can say that it's the owner's job to keep it compatible, but thinking of keeping it compatible isn't what normal spreadsheet users do.
Some issues I've seen:
* One can add a column easily/ rename it. This breaks any data sharing because now, downstream reports break. (In many cases, the data could have been added as a row instead of a column (new status code, etc.)
* Data-types are not enforced. Nothing prevents entering text into what should be a number or even create a completely new status code. Again, automation downstream breaks.
* Important info is usually not included. The spreadsheet is the latest representation of the data, so in many cases, attributes like the time-period (because it's implied) and unique identifiers (skus most frequently) aren't included.
* Maintaining compatible dimensions across different domains is not a priority for a spreadsheet owner. Finance may group countries differently than Supply-chain, which means they'll always see different numbers and argue that their number is correct.
Source: work at a Fortune 500 company, that has way too many excel reports (with critical performance metrics) and combining them to get an accurate view of the company performance is very labor-intensive and error-prone.
Excel can be replaced, but it won't be replaced by the current crop of VC-funded SaaS.
> Airtable really ought to be killing Excel, but the SaaS model combined with a stupidly low artificial row count limit (over 50000 rows is listed as "contact us for pricing") means that it will never achieve penetration into weird and wonderful use cases like Excel has.
Many Excel processes are 20+ years old. No SaaS could replace the stability and pricing.
The ability to email a spreadsheet and have it just work on the other side (and become the recipients copy they can fork a version and "own") is huge. In a SaaS situation, you have to solve IAM and security vs. leveraging what users already have as a sunk cost in email and windows.
When we think of document based workflows as a problem (vs. say just the data/info), we tend to think of them as inefficient and prone to duplication, forking, editing, versioning problems - but I'd argue these are valuable features because they create levers for managing. Maybe I've spent too much time staring into the enterprise abyss and this is the inner deadness of a consultant speaking, but what documents facilitate (e.g. MS Office) is flexibility of ownership, provenance, authenticity, sources of truth, authority, and other qualities.
When you solve a problem, it becomes inert, there is nothing about it to manage anymore, which means someone can't extract value from it, and that's value destruction to them. SaaS problematizes these document features and then "solves," them, which in fact just constrains managers by concretizing data and workflows instead of being a tool that provides some data that ultimately supports a narrative conversation without being a forcing function on a dynamic of ongoing "problems" that is producing value for the business.
I'd suggest this is the quiet part your SaaS prospect customers can't say out loud, because managing isn't solving problems, it's extracting value from them, and using tech to collapse dynamics that are producing value is anti-value from that perspective.
Another way to see the same problem is that the IT system replacing the spreadsheet will take a long time to build and replicate the process in the narrowest possible way. Then something changes, updating the system will be an uphill battle that will take years fighting for budget and prioritisation, and you have to revert to spreadsheets.
Plus the gap between a spreadsheet and an application can be huge in a large organization. No one needs a permit-to-build process or ci/cd pipeline to start a new spreadsheet and make changes to it.
> I've learned products designed to replace spreadsheets have a huge hurdle because the people who use spreadsheets treat operating the sheet as their job.
In general? Not really. Just as frequently, the fancy tool promoters don't care to understand the subtleties of the job and when it requires flexibility or judgment that the spreadsheet accommodates better. They have their hammer -- software formally engineered by software experts for disempowered "users" -- and everything looks like a nail.
"It's faster and more reliable (when everything goes as planned)" isn't really the slam dunk these folks think it is.
Give these users a more flexible tool like Alteryx, that actually lets them do their job, and I've seen that they'll happily migrate off of Excel.
I agree. One of the great things about Excel is that it’s massively flexible. It’s almost the antithesis of what most development environments want to be. Programming is about considering all the code paths. Spreadsheets are about “what if”.
The flip side of this is that understanding the behaviour of a spreadsheet is generally a specialist job, which is why we have people whose job it is to “run” the spreadsheet. The spreadsheet has rules and boundaries and it will stop working if you just start plugging random values into formula cells.
I would add that any tool looking to replace Excel will be build on some very powerful but very primitive foundations, I’m order to compete with that flexibility. It’ll never be about adding a special “view” like AirTable or just tacking in Python.
Excel is like Emacs; most users will write some Elisp at some point, it’s designed to be meddled with from the ground up. AirTable is the VSCode; most users will never write a line of plugin code and when you do you’ll find you can’t extend much.
> the people who use spreadsheets treat operating the sheet as their job
Yes, observed the same. And "The new tool is faster and more reliable!" does not help either. They got their workflow and cope with it - for years.
Only time people adopted new tools - banking - when they could deliver their assignments way faster to their superiors. Personal advantages must be spotted.
Spreadsheets are malleable by design. Being able to modify how it functions on the fly is an enormous power for these users. And the better/faster/smarter SaaS replacement is also extremely rigid and inflexible. So while it works to replace the current version of their god spreadsheet, it can’t adjust on the fly as the user’s needs change like the spreadsheet can.
I’m convinced that there does exist a “better spreadsheet” that treats power users as exactly they and incorporates things from the software engineering works like version control, modularity, reusability, sharability, etc. that hasn’t been built yet.
I’m trying my best to create such a product. What you’re describing is something that (1) amateur spreadsheeters can use, (2) the power users, and (3) programmers, will be able to use and not feel out of their depth or patronized. I personally think that given that Excel is a pure-ish declarative language, then building on a declarative language that programmers use is a solid attack on this, which is why I’ve chosen pure functional programming as a basis for my work. It’s been an enjoyable design process to include or throw out ideas that would alienate either end of the spectrum.
Elsewhere: Making all cells CAS gives you a strong foundation for version control, modularity, resizability and share-ability.
Happy to share more thoughts on this. I’m two years into the process.
I realized I left that open ended but I’ll touch on a few points that I think are relevant here;
* There’s a language which is like Haskell/Elm/PureScript in terms of being purely functional and statically typed. But with syntax that looks more like Excel.
* Purity gets us fearless recalculation.
* Static types let us build UI elements automatically based on the inferred types of code.
* It’s content-addressable like Unison. That means every expression and “cell” has a unique SHA512 hash of it which refers to only that expression.
* Content addressability makes cache invalidation of results trivial.
* It also makes it easy to say “I want exactly this version of that person’s cell and for all time.” Makes it impossible to break someone else’s code once it’s working.
* It also lets you fearlessly federate, if ever needed.
* Content addressed also means you can write tests against code and have them run on every change. Only the tests whose dependencies changed will be rerun. That’s not normal in Python or Haskell, but in a spreadsheet it is.
There are other design choices related to your comment but I don’t want to ramble on.
Truly Excel could be improved in a vast number of ways. But would anyone use it? Excel is a kind of product that is perfectly understandable to the average business person.
The more complicated you make it, the more it becomes like real software development. And that is a skill that most people don’t possess.
Excel is ridiculously complicated. I think you're vastly underestimating the ability of these "average business people" because they don't know how "real software development" works.
I think there is scope to replace Excel, but it’s hard. You’re seeing a gradual spread of “data science” tools in finance, as the more technical analysts start to use Python over Excel. But as you say, the management layer still uses Excel to poke and prod a model.
I think that there will be a generational shift here - you are not going to train an SVP to use Python, but the next generation of SVPs might have more exposure and be willing to use Numpy in a Jupyter notebook.
And in the other direction - there is definitely scope to come up with an “Excel isomorphic” Python framework for data science. It’s fairly easy to generate an Excel sheet from Python computations, but maintaining bi-directionality is Hard, and would require restrictions on the Excel side. I think with the right UI, you could do this though.
I have a colleague who is a PhD in Applied Math. Pretty bright guy, huge Python/Jupyter lover with many years of experience. Loves to use git, loves to write dozens of unit tests. A true Man of the Future, according to Excel haters.
He wrote some Python to solve some mildly complex business problem. I told him to translate it into Excel for stakeholders. He did, and the answer came out completely different.
It turned out he had made multiple catastrophic errors in the Python. This is not the first, second, or third time this has happened.
Python, and tech-beyond-Excel in general, just isn't the silver bullet software types often seem to think it is. Even experts sometimes seem to do a worse job in it than in Excel.
I love both spreadsheets and Python/Jupyter. But neither is Excel the silver bullet? The reason the error is caught (and the real lesson) is more because he tried to reproduce his work as opposed to a particular technology stack.
I think Excel has something to do with it. In Excel, you're usually forced to do computations in small steps and look at the intermediate results. In traditional coding, you don't see anything but the final result (unless you ask to see it). It's much easier to assume everything is working as intended when it's not.
Excel has its problems too. Different tools for different jobs. Tech boosters need to understand this and not just cynically assume that spreadsheet lovers are old fogeys who are afraid of their jobs being automated away.
Certainly if you do traditional TDD or something like REPL-driven development, you see the intermediate results and validate the correctness of your code as you go.
Those are great! But they are coding disciplines one must choose to follow, and continue following every step of the way. You aren't inherently following them because that's how the tool works. Sometimes the tool-enforced discipline has advantages.
How would one write tests for a complex, mission-critical Excel spreadsheet? Or use version control?
Spreadsheets often mix the data and code/formulas and the formulas are hidden behind the sheet view and sprinkled across many cells. At least Python scripts separate the code from the data so you can write tests using known good or fuzzing data. And you can use version control to track and review code changes.
I agree that, in the hands of a hypothetical ideal person, Python should be better than Excel for almost everything. But as my story above shows, even people you'd expect to be experts, seemingly following best practice, don't end up being very close to this ideal person.
Honestly, your story just indicates that someone senior needs to have a discussion with the applied mathematician about their coding practices (and they may even need a bit of training in that regard).
His code style is fine. His problem is that he is short on the discipline and focus to validate his results in meaningful ways. An exhortation to follow some vague "coding practices" won't fix that - he'll respond e.g. by writing lots of tests, none of which catch the problem his code actually has.
The two ways I know to get correct work out of him are (1) review it and kick it back to him when problems are found, (2) have him implement what he's trying to do in Excel.
So I was mainly a Haskell programmer in grad school before getting into scientific computing, and the main good habit I picked up was working at the REPL and breaking my code up into small, testable functions/components, and thinking very hard about state. The main bad habit I had was expecting the Julia compiler to be as helpful as GHC when refactoring code.
While I was writing my thesis and job hunting, I attended a few workshops aimed at “grad students breaking into industry”. The main thing I noticed from applied mathematics students in particular was they would write out long functions (really hard to debug) or they worked exclusively in Jupyter notebooks (these have super complicated state, so it takes a lot of discipline to be able to translate these into usable code).
Sometimes it's hard to think of the right tests, especially when you're solving a mathematical model that hasn't been solved before.
Even for things that are intuitive and have been implemented thousands of times before, like web logins and shopping carts, where the tests one should do are not hard to think of... even so, software engineers rarely develop tests that catch all possible bugs on the first try.
Replacing excel is like replacing paper. People always talk about formulas, but most spreadsheets don’t have any. They’re todo lists, or more commonly, static database exports.
Or they have formulas, but the author calculates a few numbers and throws it away. In this case they’re like a scratch pad, or a calculator with a visible memory.
I think it’s certainly possible to make better business modeling tools, and have played with some designs in that space. But they’ll never be Excel. And that’s ok
The other advice I give is if you are generating analytics, have a PowerBI connector of some kind because the people who make decisions (managers, etc) make them based on PowerBI, and not from an interface their staff is a peer at using, and likely has control over. In enterprise, they want data in metrics their staff can't see, hence a separate tool.
Spreadsheets will always be with us I think. The opportunity may be in creating one that is has sufficient work-alike features with legacy ones, with new power features (python, etc) where there is a connector between the high power open development environment, and the familiar Excel ones managers use. Key thing being not asking managers or sr. employees to change.