I’m intrigued as well.. My experience is notebooks struggle as a format for production code. We encourage people who work heavily in notebooks to use them for exploratory work, but choose other tools when it comes time to ship.
When you are exploring something, experimenting, showing.. it’s great; train-of-thought structure, APIs like Pandas optimised for writing and terseness etc.
But when you have a piece of code that will lose a million dollars a minute if someone ships a bug, and which will be maintained by many engineers over many years, then you really want a format that’s optimised for long-term maintenance, incremental change, testability, and APIs optimised for readers.
I write production code, I also work lots in Jupyter notebooks.
Personally, I think the fact that notebooks are usually easier/funner for me to work with is a big problem. I'm by no means a Clojure expert, but I did do a semi-large project in Clojure a few years ago, and some of the ideas of true REPL-driven development that exist there are things I wish that Python supported.
It's hard to explain without actually learning it for real (and most Python devs mistakenly think Python has REPL-driven development; I sure did before learning Clojure!). But once you get used to being able to interact with your actual source code, and at any point just being able to write new code and immediately print out its value, then with one shortcut make it part of the regular codebase... that just blurs the distinction that exists between Jupyter Notebooks and production code in a way that makes everything much better.
I'd love to hear more about this REPL-driven development. I've heard people bring it up from time to time, but it's clearly very different from the typical "stateless horizontal micro-service" that has become common practice.
What tools are used for "write new code and immediately print out its value, then with one shortcut make it part of the regular codebase" and how does that square with working on a team and getting code reviewed?
My book, Effective Pandas 2, has many of them. There's a few conference talks of mine floating around on YouTube that also mention some.
Writing clean data code is one aspect. Filling in knowledge gaps is another. Covertly teaching software engineering best practices to folks who "aren't programmers" yet sit down and write code in Jupyter all day is another.
I should probably write a blog post or record a short video. (I just taught a week long course on this for a client last week.)
Do you have this written down anywhere or on video anywhere? I'd love to learn more about what you mean.