For anyone unfamiliar with the title's origin, it is a play on words of a 1959 Richard Feynman lecture, "There's Plenty of Room at the Bottom: An Invitation to Enter a New Field of Physics".
And Feynman's title was a play on Daniel Webster's comment "There's plenty of room at the top" when told that he shouldn't become a lawyer because the field was too crowded.
Anecdotal - One of favorite YouTube live-streaming personalities, Jon Gjengset[1] is a Ph.D student at MIT CSAIL. His talks are very detailed[4] and captivating[3]
I always interpret the end of Moore's Law to mean that was once a curve of exponential growth, has shown where the inflection point is, and now we're on the other side of the S curve. Yes there's still gains to be had, but they're harder and harder in ways that aren't ameliorated by just increases in funding. There needs to be changes in how we compute, and a rejection of the idea that specialization is beat out by the economies of scale of generic solutions.
If you want something faster and more energy-efficient for any particular task, you do want an ASIC. You can try to tell physics that you'd prefer something else but physics is not a good listener.
> Unfortunately, Feynman’s “room at the bottom” is no longer plentiful.
There's plenty of room at the bottom. Another million times the transistors is hardly implausible; a thousand times is practically a given.
> Although other manufacturers
continued to miniaturize—for example, with
the Samsung Exynos 9825 (8) and the Apple
A13 Bionic (9)—they also failed to meet the
Moore cadence.
This just factually isn't fair; transistor density has been on an unwavering straight line since 1970.
Imagine being a CPU architect and being told to find a good use for yet another billion transistors. It is hard to get much benefit from the N+1th multiplier when the ones you have are already starved for input, so it's not hard to see why now we mainly get more cores and cache. Cache is good because most of it is idle at every moment, so doesn't produce much heat. Cores are easy to count, and people will pay for them even if whenever they all get busy, you have to slow the clock after a moment. Often a moment is enough.
Productive uses for a billion transistors that can be mostly turned off most of the time are hard to find.
I feel like more and more transistors will be dedicated to machine learning tasks. You'll eventually have phones and computers constantly running neural networks to accomplish advanced but relatively mundane tasks (e.g. text-to-speech, speech-to-text, facial recognition, human-seeming AI in videogames, neural rendering (see NeRF from Berkeley), and tons of other random stuff.
All of it is parallel and benefits from a sheer increase in transistors and density. Maybe in a few generations we'll see chips that are 99% ML compute and 1% traditional CPU.
In those chips, I think practically all the transistors have to be active all the time, so the density must be quite limited unless you have really serious cooling capacity.
I feel like there's a ton of unexplored area in the hardware space for extremely low power circuits used for ML. As we know, you can do ML with very low precision, which is why chips that can do fast INT8 and F16 are getting more popular.
However, these lower power chips are still designed like classical computers and are deterministic. What if you had something like a few thousand analog multipliers in parallel computing matrix products? An analog multiply can use far less power than a digital one at the expense of accuracy and determinism, which might not matter for some ML tasks.
To be fair, it's not necessarily clear what to do with those transistors. When people think of Moore's Law, they usually think of all of CMOS scaling, Dennard scaling, faster clock speeds, etc
Power may soon become the bottleneck as it's not clear that energy per op is going down as fast as transistors are going up
Power is the bottle neck right now. We have 16 core processors were each core is capable of 4.7GHz individually. But we can't run all of the cores simultaneously at those speeds because we have trouble with power delivery and especially with dealing with the waste heat that comes with using that much power.
The only way we can really use all that transistor density today is to artificially limit clock speeds to conserve power.
I doubt we should read too much into the difference between 1.8x and 2x. If there's a long-term trend, it'll take more than one data point to see it.
TSMC 3nm does sound less ambitious, but it's also supposedly coming early, and sounds too way early to call for sure, at least for anyone not under NDA.
I was always jealous of the stories about programmers in the 80s who wrote games in insanely small amounts of RAM. I kind of want that same kind of challenge needed to be overcome in my lifetime.
Write apps for Garmin watches, especially supporting ones a generation or two old and on the lower end. Welcome to the world of removing code to shave a couple KB off the space the code itself takes up in memory so you can use it for data.
Oh and for some reason their language and APIs are OO despite the fairly tight memory limitations, and despite not having any real need to do that (nor enough memory to really take advantage of any OO features even if you wanted to). So you've really got to watch (haha) yourself.
There are a lot of crappy third party apps on the Garmin Connect IQ store which burn battery life or cause random crashes or interfere with activity recording. On the Garmin forum another user found that a third party data field causes erratic GPS tracks for open water swimming.
Some of that might just be sloppy development but I have a feeling that part of it's the SDK's fault. It's a weird mix of being too abstract to let you take tight control of performance but not abstract enough that I ever felt confident it was doing a reasonable job of taking care of that for me. Though maybe I'm wrong and that part's fine, I don't think I ever benchmarked power use (another thing: the tooling is about as mediocre as you'd expect for a proprietary platform like that)
I tell you what, though: having a declarative UI where "declarative" is "now draw text at coords [some coords]. Then draw a square of dimensions [dimensions] at coords [other coords]" is a fucking refreshing break from webdev. Even more so than regular mobile dev. There's simply no room on the devices for much screwing about. You've got your lifecycle methods and some calls that draw things. You can listen for button presses. That's about it as far as the UI goes.
The operating system is working as designed. Garmin wearable device customers demand maximum battery life with minimum size and weight. Within those constraints it simply isn't feasible for the OS to provide a high level of protection.
I don't understand what power constraints have to do with the operating system providing proper process control, but I'm not an OS expert. Could you maybe add a few sentences of explanation?
The Garmin Watches don't have DRAM for power reasons. The processor is a memory constrained microcontroller without a hardware memory management unit. Without the MMU you don't get as powerful memory virtualization or process isolation.
If the cpu is doing work it’ll consume battery life. You get longer battery life from a system mostly in a low power state. An app will get the cycles it requires - it’s a trade off which usually means doing more work means less low power states. Less efficient apps will not enable low power as often. There is cooperation here and it’s why Apple didn’t originally allow background apps on the iPad.
Expectations for what software should be able to do were quite a bit lower in the 80s too :)
I started learning to code as a kid in the 80s on 8-bit machines. Seemed like there was rapid progress in the availability of CPU power, RAM, etc., up until the early 2000s and then things seemed to... slow down.
Which is about the time everybody started to focus hard on horizontal scalability.
My first paid job was in the late 80's. I had to do something about a system that was basically a lot of menus written in Oracle Forms running in terminals on SCO Unix or maybe Xenix.
I was unbearable slow. Users had to wait 5+ seconds every time they pressed a key.
And it was a nightmare to try to optimize it. But then I scrapped it and rewrote the whole thing using plain C and curses and system() calls to do the actual work. Hard-coding the menu hierarchy structure which nobody was changing anyway.
That made the system very fast. The company was very surprised that navigating menus could be so fast.
Were they? My recollection is that software back then had more features. Software today is neutered in comparison. (Compared Google Docs to Word Perfect from the early 1990s).
Think about Windows and how easily it was to change settings, you went to the control panel. Now in Windows 10 you have dig around to find things. It's like they are trying to hide stuff.
That's a smart alec answer, but there's a bit of truth to it - back in the 80's when WordPerfect had its heyday, there wasn't a better answer. The Internet was still barely out of research labs, computers were slow and memory was very limited. Video calling was still the realm of science fiction.
Google Docs isn't a great example. In WordPerfect's case, WordPerfect was the product and there was an incentive to make it great.
Google Docs isn't the product. It's free. You're the product. And it only needs to be good enough to keep competitors from creating something so great that it removes their first-mover advantage. It's clear that Google has given up developing new features for it.
One of the nice things about GSuite (for me) is that it implements a nice spectrum of features that many people need without a lot of other stuff. I certainly get that some people need that other stuff whether sophisticated revision tracking, pivot tables, etc. But for my regular but generally unsophisticated use of GSuite, it's better than anything I've used in years.
[ADDED: And as other have mentioned while the collaborative editing of Google Docs doesn't scale all that well, it's a great remote collaboration tool. Never want to go back to sending docs around.]
I actually never much cared for WordPerfect with all its formatting codes and so forth. I was actually a bigger fan of early DOS-based Microsoft Word at the time.
I can only remember a single time the bad grammar marker on google docs wasn't hilariously wrong, contorting a perfectly normal sentence into something weird. And that one time it was catching a doubled word.
By the mid 2000s, they just started adding more cores. Machines kept getting more memory, albeit slower. SSDs have been a game changer, and they really only got big in the last 10 years.
We never went past 64 bits because it's enough bits to address memory for the foreseeable future, and applications needing more are rare. It's mostly cryptography.
Those challenges are everywhere in data science/engineering. It's enormously expensive to do things shoddily in those domains and the resource constraints are very much bottlenecking what's practically achievable.
Well, microcontroller programming is like that to this day - many MCUs have performance and memory pretty simillar to early classic microcomputers - and in some casrs they actually use the same chip design. :)
It's all relative. There's nothing stopping you from trying to make a complex 3D game that runs well on a low-end laptop. Think of how much waste there is in games made with popular tools like Unity. They trade performance for ease of development. See what happens when you go the other way.
You can just go into some embedded domain. I work in automotive and in the current project, we have a budget of ~2MB RAM.
Also, you still worry about small amounts at Google scale:
> “It’s pretty slow,” Jeff said. He leaned forward, still relaxed. “So that one was a hundred twenty kilobytes,” Sanjay said, “and it was, like, eight seconds.” “A hundred twenty thousand stack calls,” Jeff said, “not kilobytes.” [...] In a sense, they had been occupied by minutiae. Their code, however, is executed at Google’s scale. The kilobits and microseconds they worry over are multiplied as much as a billionfold in data centers around the world. – https://www.newyorker.com/magazine/2018/12/10/the-friendship...
If you are in a startup where you rather worry about product-market fit, of course such minutiae is not relevant (yet).
Wrote a game for PDP-11 ... operating system, all the drivers, compiler, editor and your code had to share the same 65636 bytes of memory. Try that, Elon Musk!
Only at the scale of the big tech companies do the machine costs start to seriously stack up against the labor costs. (Well, also smaller machine learning shops, but they are in trouble anyways.) And I don't think they fsce enough competitive pressure to be forced to take the massive foundational and educational investments they would need to pull this off.
It won't happen with this generation of the industry.
Most of the software that I use I would consider to be unacceptably slow on my top of the line workstation: emacs, webbrowsers, intellij; the list goes on and on. Something needs to be done, and fast. Pretty much everyone I know regularly complains about how slow and crappy the software they use is, especially my non-technical friends.
We need to start thinking about performance as a necessary and basic feature, not some nice to have that can be worked on later. A new program needs to designed from the ground up to be fast, from the language choice to the data structures.
And I think this shift is being made, except in web development circles. Most new and upcoming languages list performance as a basic feature: Rust, Julia, Nim, Zig, etc. Also, native compilation is often listed as a feature, as there's been a huge backlash against the vm cargo cult on the late 90s and 2000s.
What's your problem with your emacs? Mine runs pretty snappy. Sometimes you get a runaway process in it which may be a problem but if you have problems every time, something else is wrong.
As for browsers shrug, turn off JS and add a blocklist and it'll run an order of magnitude faster.
I didn't even feel that emacs was slow when I was running it on a 25MHz 386DX. The only real change between then and now is that I'm using a TrueType font with it.
First I've heard of company-mode, thanks for that. A DDG search gives https://company-mode.github.io/ as its first result, on their it refers you to its issue tracker - perhaps let them know if you haven't already?
Large CSVs, Perhaps use LFV (large file viewer) if you want a read-only look.
Otherwise I just loaded a 26MB binary file in less than a second. I created a 2.46 Gigabyte (not MB) CSV, opened it in a new emacs, took less that 15 seconds (fundamental mode). I think your problem lies elsewhere - it may be an unnecessary mode being invoked.
How did the browser run when you turned off JS, a whole lot faster I imagine?
The tooling infrastructure of code development itself needs to be optimized (not necessarily in speed terms) to provide coders with the information & tools necessary to write faster code. It seems like IDE's are currently optimized to facilitate rapid code development, but not development of rapid code.
The issue is, most code only runs so few times that it doesn't really matter. Only when the code requires so much cloud resources that it's worth to spend an engineer's salary for a few hours/days to optimize it you should do it.
Or if your software aims to be successful on end-user devices. Only because you don't pay for the electricity or the faster machines, that doesn't mean that no carbon is emitted for producing them.
I use PHPStorm and a plugin called EA Inspections that will gladly point out optimizations. I deal with a lot of legacy code written 10 years ago that I've been slowly optimizing when possible. Of course we've been working to burn down the legacy code base.
With respect to legacy code, it's interesting: I work a bit with ERP systems, and 10-year old legacy code from them is often significantly slower than the 25/30-year old legacy code in a prior ERP version from the same vendor. I think it shows that when developers are forced to work within tighter operating limits, they figure it out.
Yeah, it a shame people say screw doing slightly more work to make a system as efficient as possible. Even some stuff like using $i++ vs ++$i. While, it only shaves one op code per pass, but it doesn't cost any time to type it out. Also people underestimate SQL indexes.
Indexes are actually one of the reasons the old code ran faster. Or the equivalent for that database type: I forget what it was called, but it wasn't relational. It was some type of file-based system where a file could be like a normal table, but it could also have "segments", where each segment might have a one-many relationship with the primary segment. Or you might "join" one file to sub-segment of another file, never going through the primary segment.
So that was a factor, but so was the fact that the entire subsystem was pretty much written in COBOL, though I would occasionally write small C programs to work around limitations. Even newer versions of the ERP had a lot of the old COBOL still in it, just with a shinier new interface, but they were gradually replacing that code with (I think) Java. Probably because of the ability to keep finding & retaining COBOL talent, along with the need to make the product more readily interfaced via the web rather than a custom presentation layer from the vendor. (which was a green-screen terminal based interface, and significantly faster than any interface you'd see today. And with modern terminal emulators you weren't limited to green, you could choose any color! )
There is even more room at the top. The physical limits are as follows. A computer "with a mass of one kilogram confined to a volume of a liter" can perform at most 2:71 * 10^50 operations per second on 2.13*10^31 bits. See:
Lloyd, S. Ultimate physical limits to computation. Nature volume 406, 1047–1054 (2000)
Bremermann, H. J. Minimum energy requirements of information transfer and computing. Int. J. Theor. Phys. 21, 203–217 (1982)
Bekenstein, J. D. Energy cost of information transfer. Phys. Rev. Lett. 46, 623–626 (1981)
It would become very hot indeed. But that does not necessarily imply a runaway explosion. Just like the center of the sun won't explode anytime soon. Nevertheless, I'll take the much cooler model with only 10^44 operations per second.
I quote from the Conclusions section of the actual paper linked by dang:
"As miniaturization wanes, the silicon-fabrication improvements at the Bottom will no longer provide the predictable, broad-based gains in computer performance that society has enjoyed for more than 50 years. Performance-engineering of software, development of algorithms, and
hardware streamlining at the Top can continue to make computer applications faster in the post-Moore era, rivaling the gains accrued over many years by Moore’s law. Unlike the
historical gains at the Bottom, however, the gains at the Top will be opportunistic, uneven, sporadic, and subject to diminishing returns as problems become better explored. But even where opportunities exist, it may be hard to exploit them if the necessary modifications to a component require compatibility with other components. Big components can allow their owners to capture the economic advantages from performance gains at the Top while minimizing external disruptions."
(My emphasis on "economic advantages from performance gains.")
Databases are still getting faster. You don't get a faster DB by buying a new server and then running a 10 year old version of SQL Server or PostgreSQL on it.
Language runtimes are still getting faster. You don't get better Java or JavaScript performance by running a 10 year old release of the HotSpot JVM or V8.
3D renderers, video encoders, maximum flow solvers (one example examined in depth in the article): they're all getting faster over time at producing the same outputs from the same inputs.
The key is incentives. There are probably people in your organization who care a lot if database operations slow down 10x. But practically nobody in your organization cares if Slack responds to key presses 10x slower or uses 100x as much memory as your favorite lightweight IRC client. (I'm trying to make a neutral observation here. Looking at it from 10,000 feet, I get annoyed when I see how much memory Slack takes on my own machine, but that annoyance wouldn't crack the top 20 priorities for things that would improve the productivity of the business I'm in.)
The incentives problem is also why the Web seems slow. Browsers too are still getting faster. I used to run multiple browsers for testing and old Firefox and IE releases were actually much slower at rendering identical pages than current stable releases. But pages are getting heavier over time. Mostly it's not even a problem of people trying to make "too fancy" sites that are applications-in-a-browser. It's mostly analytics and advertising that makes everything painfully slow and battery-draining. I run a web site that has had the same ad-free, analytics-free, JS-light design since the early 2000s. It renders faster than ever on modern browsers. It's not modern browsers that are the problem -- it's the economic incentive to stuff scripts in a page until it is just this side of unbearable.
For certain kinds of human-computer interaction, people will pay a lot to reduce latency. Competitive gamers will pay, for example. Sometimes people will invest a lot of effort to reduce memory footprint -- either because they're shaving a penny off of a million embedded devices or because they're bumping up against the memory you can fit in a 10 million dollar cluster. But the annoyances that dominate Wirth's Law discussions on HN -- why do we have to use Electron apps?! -- are unlikely to get fixed because few people are willing to pay for better.
I remember in the Windows 98 days, there was a program called GuitarFX that blew my mind. IIRC, it was only a couple MB in size, didn't use a lot of RAM, and was fairly low latency. I think I was running it on a PIII 500Mhz with 128MB of RAM.
Now we have apps that are like 40+ MB, eat RAM and CPU cycles to do fairly simple things.
What’s mind boggling is the change in relative demand. Back in the 1990s, an IRC or AIM client was a little background app that took a tiny fraction even of a 64MB of RAM computer. But Slack takes a larger share of an 8GB if RAM computer.
Slack is an electron app which is why it needs GBs to run. Native apps can run in a fraction of that. But the faster time to market is likely most bloated implementation.
From my experiments with the JVM that is not really true. Keeping memory usage below 1GB is pretty easy. It's only getting tricky when you want to go below 200MB. You'll have to deal with the overhead the JVM introduces. As a result I often just use lua for really simple programs and I am rewarded by creating programs that use less than 2MB of RAM.
Or good old Minecraft with 300 mods. It takes 20 minutes to start and even if you run the server embedded within the Minecraft client it will take 6GB at a minimum. Well, RAM is cheap so I just bought 32GB for $110 out of necessity but I still keep paying attention to how much RAM my programs need and avoid wasting it needlessly (using 4.6GB out of 32GB right now).
There were functional spreadsheets that ran on a Apple ][ with 48k of RAM. Not to mention schematic capture and PCB layout programs that would run 'fine' on a 486 with 4 MB of RAM.
So yeah it's disheartening to see programs with the same functionality of an old 16 bit 0x86 program using hundreds of MB's of RAM.
I wonder how much of this is simply due to increasing expectations on interface and portability. Icons, borders, and fonts were tiny in the 90s and looked tiny. Programs ran for particular platforms and were often coded in the native platform's APIs and linked to specific binaries of a specific OS version. Now with expectations of high res iconogrophay, detailed and smooth fonts, as well as the expectation that an OS update shouldn't effect any of my apps - we inherently have bigger programs. The alternative approach could be taken, but it wouldn't be viewed as a mark of quality.
Games like Destiny weighs in at >100GB with it's latest updates due to map and image needs, and the slack client runs smoothly across 5 Operating systems.
"High-res" icons are still quite tiny. Expectations have definitely grown wrt. the complexity of text rendering: things like Unicode multi-language support, emojis and high-quality typography are taken for granted nowadays, and even something as simple as that would have required fairly high-end hardware back in the early-to-mid 1990s.
A bunch of you seem interested in taking up the challenge: having to write smarter software, wistful for the old days when it was forced on you.
Turns out, you can! There's a big domain that needs your help, today: deep learning.
A lot of the software is currently crude. For example, to train a StyleGAN model, the official way to do it is to encode each photo as uncompressed(!) RGB, resulting in a 10-20x size explosion.
There's plenty of room at the top, and never moreso in AI software. Consider it! Every one of you can pivot to deep learning, if you want to. There's really nothing special or magical in terms of knowledge that you need to study. A lot of it is just "Get these bits from point A to point B efficiently."
There's also room for beautiful tools. It reminds me a lot of Javascript in 2008. I'm sure that will sound repugnant for a majority of devs. But for a certain type of dev, you'll hear that ringing noise of opportunity knocking.
In response to your edit: I can see how GUI changes would be a problem for RPAs, but that seems solvable. For minor GUI changes, an RPA can theoretically identify where salient features have remained and just re-train on where they were subjected to change. If that approach fails, you can "show" the computer the new GUI and manually identify what changed. If that fails, you can restart training from scratch with the new GUI. That should take less time than training on the old GUI if the new GUI is any good. Maybe that's the best/cleanest approach in most cases and just becomes annoying with complex processes that need a lot of training data.
Keep in mind, though, that RPAs often link together different pieces of software, and if the GUI changes for one of them you'd only have to retrain on that piece. I wouldn't be surprised if enterprise software vendors start optimizing their programs for RPAs so that they don't have to rely on hacky GUI monitoring as much.
The end game is that large, frequently run RPAs will serve as a flag for the organization to develop, find, or outsource an end-to-end program that executes the same process without an RPA. RPAs, then, will always be the scout at the frontier of automatable office work, finding who can be freed from rote drudgery next and serving as a bridge to the best programmatic solution.
RPA is something like "macros on steroids". It tries to be something like shell scripting for script-hostile GUI applications, but at least the one product I've gotten to play with is still way more clunky than proper shell scripting.
Only in the sense that it automates stuff. RPA is differentiated by its use of machine learning to "observe" your workflow and automatically create automations, even where it's hard to create a standard macro.
Very heartening to hear. You think some of us tired enterprise/product devs that come from non-traditional (read: didn't study CS) background can make it? Any recommended starting resources? I've heard supposedly good things about fast.ai as a "guided tour".
Definitely try fast.ai, I also have no formal CS background, am bad at math, etc... and with a little hard work I was able to keep up and do things I never thought possible. You will truly be blown away after even just a month of the class as to what you understand and can build. It is not a domain solely for geniuses, roll up your sleeves and you can be great.
Admittedly, I took the first iteration like 4 years ago, and I've heard it's much improved since then, but the fun part is they get you building real things right away! You build an image classifier in the first lesson.
With other ML resources I find they get bogged down trying to explain the concepts/details, and I would usually lose interest before getting to the implementation of things.
They've also added a "theory" portion after the practical, which I think is a great way to solidify the concepts scaffolded in building the thing initially. Thank you for your input.
What you need is determination. There's no substitute for this.
If you have that, there are all kinds of resources. Here are a few...
Resource 1: a community. We've set up a discord server for AI dev. It has 360 users. At any given time, around ~50 people are online, of which ~10 are skilled devs. Come join! https://discordapp.com/invite/x52Xz3y
Resource 2: Find something fun for you, and pursue that. I like generative AI, so for me that's been GPT-2 and StyleGAN. Gwern has some lovely tutorial-type articles on both.
The reason to follow them is, whenever you see something that seems interesting or fun, tweet at them and say so! Ask questions. Ask how to get started. Everyone is shockingly nice and helpful. My theory is, the software is so crude and often hard to use, that we all like to celebrate together whenever one of us gets it working, and we're happy to share that knowledge however we can. (Twitter is a bit chaotic right now due to world events, but I imagine it might return to normal within a couple weeks.)
And yes, you're right about fast.ai and other courses. You can go that route if you like it. I found it more exciting to dive into the deep end, though, and try to tinker with stuff.
But, have you achieved something singificant in the context of deep learning, as per your earlier comment? There's no information about that in your profile and a cursory glance at ddg and google results for "shawn presser" doesn't turn up anything very relevant.
So, I have to ask: without having studied CS, what contributions have you made in deep learning that are widely recognised?
I hope you agree this is a reasonable question to ask, and that you are not offended by it. Otherwise, I apologise because it's not my intention to offend you.
To be honest, and again without having an intention to be harsh, but those are not what I'd call "widely recognised contributions to deep learning". They're mainly articles in the lay press and a honourable mention in a DeepMind blog post. They certainly sound like contributions to Shawn Presser's reputation, but "contributions to deep learning"?
To clarify, what I was hoping to see is, at best, an article published at a reputable venue for AI research, a conference or a journal, or at a minimum an arxiv article that at least looks like it was meant to be submitted to a conference or journal. And at worst, a software tool that can be used in deep learning research. But it seems to me that your achievements are mainly having fun with and in one case finding an interesting use for tools that are already available.
Again, I'm not trying to be harsh, neither do I want to say that all this is not worth the trouble. But it should not be held up as an example of what people can achieve without studying CS. Because, I think you'll agree, they are kind of underwhelming when compared to what people routinely achieve who have studied CS.
I _super_ appreciate this, thank you for sharing. I've had to deal with a lot of gate-keeping in my career, so its great to interact with people who are so open to invite people in. We could use more with such a positive attitude.
May I ask, why are you interested in deep learning? What are you trying to achieve? If you don't have a traditional background in CS, might you not be better served by working to acquire such a background first, before thinking of making any contributions to more advanced sub-fields of CS?
For instance, I'm sure that self-studying some fundamental subjects in CS, like formal languages and automata and complexity theory, will give you important tools to tackle any CS-related problem, be it programming a more efficient deep learning implementation or really, anything else you like.
Spending the same effort to get into deep learning instead, will most likely leave you with only superficial knowledge that will not transfer to other subjects.
I think its safe to say I'm looking to see if this field is something I'd be interested in pursuing deeper, and get a feel for the types of problems that might or might not put bread on the table should I go further.
I have no idea if I would or would not be better served by a traditional CS background in this case, so part of this is to find out if that is or is not the case.
If I understand correctly, your main interest is in finding ways to up your game when it comes to placing yourself professionally?
In that case it's very difficult to go wrong with acquiring a traditional CS background. There is really nothing you can do with computers that will not benefit from a CS background, including anything that may have to do with deep learning etc. On the other hand, learing about deep learning will only help you when it comes to working with deep learning.
So, regardless of whether getting into deep learning may help "put bread on the table", getting a well-rounded CS education, certainly will.
The highest-performance deep learning framework is most likely Theano which still has to be ported to current Python 3.x. There's quite a bit of low-hanging work to be found via fixing things like that.
I am old. Computer hardware advanced 1000 times since 30 years ago. But computer software degraded at the same rate. We started with Pascal and C, and ended with PHP and JavaScript ... ughh.
> "For tech giants like Google and Amazon, the huge scale of their data centers means that even small improvements in software performance can result in large financial returns,"
And really this is only for software that runs inside their servers. The costs for software running on client machines is just distributed to the user.
If google makes a change that makes Chrome run less efficiently that means I'm paying for it with my time and electricity. If I want it to run better I'm shelling out money from my own pocket for new hardware.
FYI, I optimised the "python3" code by using numpy. Total runtime on a laptop CPU was 1 s.
This is why all classes for scientists I've seen teach numpy.
(Incidentally: total time to "optimise" and TEST is less than 5 minutes.
Total time to rewrite in Rust: ongoing after 2 hours, with issues filed on the best available crate for the task, whose documentation examples fail to compile. Not even going to try in bare Rust due to the issues with > 32 length arrays.)
The paper's URL is https://science.sciencemag.org/content/368/6495/eaam9744 but the text seems to be paywalled. We've put its title above.