Hacker News new | past | comments | ask | show | jobs | submit login
Japan to Unveil Pascal GPU-Based AI Supercomputer (nextplatform.com)
137 points by jonbaer on March 12, 2017 | hide | past | favorite | 65 comments



As someone who works at a deep learning chip startup, this is great news! Looks like there's a market for our chips ;)

If anyone wants to learn more about AI chips, I'd be happy to answer questions.


NVIDIA is making a big bet on GPUs being the right abstraction layer for research/production, and is nearly pivoting the entire company around this. The software/developer tools layer is really important too. Custom AI chip makers (e.g. Nervana) hope to compete with NVIDIA despite having fewer resources. What are your thoughts on how a small AI chip company can best NVIDIA/other competitors?


We compete with them on efficiency. On a self driving car, a 1000W machine for 3d lidar data crunching isn't really feasible. If we can provide that at 10W,which we are aiming for, then we have a selling point.

As for our training chips, without divulging too much on a public forum (I'd be happy to talk more in private or over email), I think we provide the right level of abstraction and precision which would allow a researcher to one click port a tensorflow Model (we plan to support a few others like Caffe, torch, mxnet, etc out of the gate as well) to our chips.


I hate to bring this to your attention but Nvidia is not pushing to use their full desktop GPUs in the car. In fact they have a system specially designed for use in cars which they "claim" (I have not scrutinized the numbers) is extremely efficient on a power per watt basis (AKA Nvidia DRIVE PX2). I'd keep an eye on this platform myself.


Copy pasting my comment from below:

I'm aware (excruciatingly aware...) of that. Our numbers are ~10X better despite a process disadvantage. And as always, Nvidia is twiddling the numbers, they exclude the cost of memory access when calculating 1 top/w, whereas we include it.


Like they don't know.


It's ok, as long as their investors don't know ;)


I expected that 1kW would be a drop in the bucket for a car, since moving that huge hunk of metal from A to B seems like it should totally dominate. But I worked it from first principles, and sanity checked against Tesla anecdotes [0]. At usual speeds and 2.5-3 mi/kWh [0], that's a power consumption in the tens of kW [1], so call 1kW a 5% bump. According to CNN[2], that could be around ~$20/mo from super rough back-of-the-envelope math.

If that estimate is on the money or high, I'd totally pay <= $20/mo for my car to drive itself. If it's low, I probably wouldn't pay $100+/mo for it.

[0] https://forums.tesla.com/forum/forums/miles-kwh

[1] X mph / (Y mi / kWh) = X/Y kW

[2] http://money.cnn.com/2011/05/05/news/economy/gas_prices_inco...


You might be willing to pay $20/mo, but how much do you think Tesla would be willing to pay to increase their range by 5%?


I wrote out the numbers I worked out for a tesla, but I had also done it from basic physics (plus referenced efficiencies) for a gas car, and it's not horribly different. I didn't include it because I have no idea what so ever how big of a deal adding in a 1kW power system to a gas car that still drew power from the gas engine, or what the losses look like. The order of magnitude remains about the same. I drive a prius, and if my estimate is close enough, the range isn't a factor for cars like mine. I was mostly curious in order of magnitude kW stuff, and only tangentially $ stuff.


Also consider whatever you value your hourly rate. You can work/do whatever you want for an extra hour in a driverless car on your way to work but you cant do that in a traditional car. There are other cost factors here besides electricity/gas.


So you put extra 1kW heater somewhere in the car. Now how much energy you spend on cooling?


Internal combustion engines are 20-45% efficient. That means a 150 hp car dissipates 55 KW at peak. Assuming half load that's over 25 KW to dissipate continuously. Cars usually dissipate heat by way of a ~100 kg radiator, ram air and a couple of 20W fans. Brakes dissipate far larger amounts of heat in short bursts, and are passively cooled by ram air. For EVs, I'm not sure, but I think 100KW li-ion pack will continuously emit something between 1-5KW in heat when charging or discharging.

1KW is a small fraction of a typical car's heat budget.


I agree with your general point, but to sanity check your numbers: car radiators don't even weigh a tenth of that. The entire cooling system, including fluid, is in the sub-20kg range. The entire drivetrain of a common 4-banger is in the 150-160 kilo range.


Except it only takes about 20hp to keep a car at highway speed. The rest is for acceleration, hills, headwinds and looking cool.


Driving autonomously is pretty cool don't you think...


Well, in a world where all the cars were automated and could talk to each other (coordinated merges, for instance), you'd probably be fine with 60 hp motors. Where all the vehicles are hybrid, 30 might be enough for highways.

Hilly neighborhoods might be something else entirely.


So the first self-driving car should be a 20hp Volkswagen with a computer under the hood.

... I'd buy it!


The brain can drive a car and uses a fraction of that amount of energy.


For self-driving cars in motion you don't need a monster GPU - the hard part of training is where you need a supercomputer; the "in driving" requirements are mostly about making a prediction in a short time on a pre-trained network (if you use deep learning). As for LiDAR, you reduce scanning radius to distance you can handle in realtime.


I'm unaware of what else can run a 3D Wide-ResNet on lidar data at 480FPS. If you do find something like that, let me know!


Clearly you've never run any of the object detection nets. State of the art is still 70 fps on Pascal for a 300x300 image.


Just to be clear I understand, in the car, you only need to run the forward pass, right? One frame of data at a time through the network, get the result, pass it on. The 1000W machine would be more likely used for training on large datasets, no? Aren't these different platforms for different purposes?


How does 1000W compare to the rest of the car? Also, I can't find any concrete numbers from NVIDIA, but the great discussion on https://news.ycombinator.com/item?id=12938016 pointed at an Anandtech article suggesting the full Drive PX2 is 250W (and suggested that a Model S chews up about 3600W).


For reference, a reasonably big alternator on a mid-2000 petrol car is ~2200 W. Alternators are perhaps getting bigger as we stuff more electronics into cars?

That 3600 W for the Model S sounds high if I understand it's just for the brainbox; that alone would consume all your 85 kWh in less than 24h.


I'm aware (excruciatingly aware...) of that. Our numbers are ~10X better despite a process disadvantage.

And as always, Nvidia is twiddling the numbers, they exclude the cost of memory access when calculating 1 top/w, whereas we include it.


"On a self driving car, a 1000W machine for 3d lidar data crunching isn't really feasible."

How-so? That's 1.34 horsepower. Even assuming catastrophic losses in converting fuel to electrical power, say 90%, that's only 13.4 horsepower to generate 1000W.


Electrical car. You lose a lot of power like that. That translates into less range, and that's before you start redirecting air to cool it which will eat up even more range.


Internal combustion engines vent tens of kilowatts of waste heat to atmosphere.

Dealing with a few hundred watts more isn't a deal breaker.


Cooling 1000W of electronics in a vehicle is a tough problem. I don't know why you'd think it's easy.


Because most vehicles have a pressurised liquid cooling heat transfer system bolted in at the factory and a refrigerant cooling system.

HVAC in automotive and industrial machines is a solved problem. I don't know why you would think otherwise.


Because it's not a solved problem.


Could have fooled me. I work with industrial machines and kilowatts of electronics, it's all fairly low maintenance.

1000 watt isn't really that much heat to deal with.


would love to learn more whats your email and company


If you don't mind me using my personal email, sixsamuraisoldier[at]gmail[dot]com will work


Nervana was acquired by the 800 lbs gorrila of Silicon: intel

AMD has announced a major AI initiative, MIopen.

My point is that all the big chip players are entering this battle. Going to be tough for any small player to make a dent.


Nervana is a strong competitor, true. No comments on that.

MIopen however is basically just a rebranded Fiji/Polaris/Vega


Well..

There seems to be a veritable flood of various kinds of deep learning chip papers and startups at the moment. Seems the common theme is some kind of processor-in-memory (PIM) style architecture in order to reduce the energy cost of memory access, coupled with massive amounts of low-precision arithmetic units, since neural nets seem to be able to get by without the traditional FP64.

I am completely wrong in the above assessment? And what are you doing differently than all the other players? In any case, good luck!


That's the common way to approach the design of deep learning chips. Our approach differs slightly, and is more efficient, but there is the worry that people will just say "good enough" to other chips.


> As someone who works at a deep learning chip startup, this is great news! Looks like there's a market for our chips ;)

While there may be a market for your chips, I'm curious why you think the K computer is in that market? National supercomputers, like Japan’s RIKEN "K" supercomputer, are used for many different applications (for example, physics and engineering simulations) - not just "AI." The multi-purpose use of such machines is what justifies their multi-billion dollar budgets in the first place. I can't imagine a government spending billions of dollars on a machine that only has one function (e.g. neural net training).

The history of HPC hardware is littered with special-purpose HPC microarchitectures that were eventually abandoned in favor of general-purpose processors. The one lasting exception to this has been GPUs, which have proven to be a boon to HPC applications and sparked the Deep Learning renaissance in machine learning. The difference with GPUs is that they were not strictly aimed at HPC applications. Obviously, they are used for graphics rendering in gaming, professional graphics and CAD. There are hundreds of millions of GPUs deployed for gaming and other graphics applications. The application of GPUs to HPC came later, and the specific application to deep neural networks came later still. GPUs are successful because they are a form of commodity hardware and have a wide range of applications. In a sense, hard-core gamers have become the R&D funding source for state-of-the-art HPC processors. This healthy and diversified ecosystem is what allows for the long-term sustainability of the microarchitecture.

You can always build a more efficient machine by specializing it to a narrow application. In the extreme case, you can just build a custom ASIC that has some fixed function. That would be the ultimate solution in efficiency, but things become less sustainable when you need to continuously compete with alternative solutions - the cost of competing in this space is astronomical, and there needs to be sustainable source of funding for that activity. This is why the HPC industry is completely dominated by Intel/AMD/NVIDIA processors, instead of custom ASICs that (for example) could perform some fixed matrix operations.

Having said that, there is a vague opportunity on the horizon if and when Moore's Law scaling completely fizzles out. Conceivably, after processor node scaling completely ends and the established microarchitectures have been completely optimized to death, the industry will reach a state where competition on performance/efficiency stalls because there is no major next-gen CPU or GPU because nothing more can be done to improve the product while maintaining its general-purpose applicability. At that stage, a significant opportunity could open up for special-purpose processors and it could be sustainable since the field would be far less competitive.


The last paragraph is the gist of the opportunity. "The party isn't over yet, but the cops have been called and the music has been turned down". Dennard scaling has already slowed, and Intel recently (famously?) stumbled on 10nm.

For the record, many of the improvements we were able to squeeze out for deep learning has also lead us to create two new designs, one for a GPU and one for a CPU. Although we're focusing on the deep learning processor for now, the ultimate goal is to develop all three and put them on a SoC. This is too ambitious in our current stage, and so we're focusing on the deep learning processor.


Being that a performant and efficient GPU is appealing to a lot of existing systems, if y'all can get that far and be competitive with it then that would be an interesting development.

Phones are pretty homogeneous right now with only a couple real competitive hardware solutions implemented, but if Windows 10 proper can really make the transition to phones then we may see an explosion of increased processing demands of those devices over the next 7 years.

Best of luck!


Thanks!

Our primary problem is that GPUs tend to be the epitome of a bad initial market for startups to pursue, since they require large (very large) volumes and very large fixed costs. If you happen to have a better business model to get into the GPU market, drop me a line :)


I'm interested in knowing more about your chips. If smart sensors are one of the next steps for DL, as it seems from the high demand I get for embedded DL into cameras, etc... the existing offer by Nvidia (TK and TX) is not enough still. We have had conversations with FPGA DL startup, some others based on Nvidia chips, so I'd be interested in yours if you can share here. Otherwise PM is at deepdetect. Exciting times :)


deepdetect the API? We used that API and loved it :)

We'd love to collaborate!

You can contact me at my personal email at sixsamuraisoldier[at]gmail[dot]com


Curious how you interface your chip (ASIC or FPGA?) with a processor? Or is it a self-contained solution?


We're I designing it I'd look at a bus like pci-e since that will provide more Ethan enough bandwidth and many other that arm (or x86) chips that are likely to be doing the decision making based on the data from the chips will already support it. That flexibility along with already being a k own quantity in spinning boards would be a HUGE plus.


Does your product address both the learning and "feedforward" stages of deep-learning?

Does your product allow to run generic frameworks like Theano or TensorFlow?

By the way, I'd love to read more hardware-startup-related news on HN.


Yes, we have architectures to address both needs. And yes, anything that can be converted into a computational graph can be intercepted and run.


In that case, couldn't your product address a much wider scope than just AI/deep-learning? I'd guess that computational graphs have a much wider applicability.


You're absolutely correct, and we've come up with early versions of a (much) better GPU and CPU. However, these are really hard markets to crack as a startup, since you need early volume. As a result, we are focusing on the deep learning processor for now.


Would love to learn more, how can I get in touch with you?


If you don't mind me using my personal email, sixsamuraisoldier[at]gmail[dot]com will work


192 GPUs eh. Interesting to compare that with the numbers being dropped in some of the Google Brain and Deepmind papers like 800 GPUs...


AFAICT, this is a single supercomputer, not a cluster like the Google systems.


It's 24 Nvidia DGX-1 servers, which contain 8 GPUs each. It's worth noting that Nvidia already have their own 124-node DGX-1 installation, which would have 992 GPUs.


What's the difference?


Parallelism of sgd across multiple machines is a,highly non-trivial task.


I don't think parallelizing SGD is that important. There are steeply diminishing and even negative returns (see the sharp minima paper) to increasing minibatch size, so you wouldn't want to use hundreds of GPUs to run huge minibatches up to the size of the dataset; and you wouldn't want to split a model across multiple GPUs because no one has enough data+compute to train monstrous models which are dozens or hundreds of GB in size (if anything, one trend has been towards appreciating how powerful small well-trained NNs are and how grossly overparameterized many past NNs have been). What you would use that many GPUs for is hyperparameter optimization training many models in parallel using deep RL or evolutionary computation, asynchronous RL exploring many environments simultaneously, or supporting many researchers working on their own individual projects. None of which needs super networking parallelism stuff.


Sorry, I still don't understand: what's the difference between implementing SGD on a "cluster" and implementing it on a "supercomputer"?

Is a supercomputer not a cluster of multiple machines?


Yep, he said while eyeballing a dashboard, there are plenty of larger private machines..


I know this is not that site but I'd love a picture.


To address everyone's questions about whether or not 1000W is too much for a car, I should have clarified, the power itself is not too big of a concern. But, having a large machine (I'm aware of the PX2 and it's successor, but that's simply way too weak for what we need) on a car requires a lot of space, energy to move and energy to cool.


Slightly Off Topic:

How far behind is AMD with GPGPU, or AI. It seems OpenCL is a dead end. CUDA won. AMD announced some CUDA to (x) code conversion which never really caught on.


I guess I could be wrong but I don't see OpenCL as dead. You usually hear about CUDA more because it's being marketed and agreements signed between Nvidia and the company using CUDA. I know quite a few organizations that utilize OpenCL.




Consider applying for YC's Summer 2025 batch! Applications are open till May 13

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: