Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Google has been using its own TPU silicon for machine learning since 2015.

I think they do all deep learning for Gemini on ther own silicon.

But they also invented AI as we know it when they introduced transformer architecture and they’ve been more invested in machine learning than most companies for a very long time.



Not that it matters, but Microsoft has been doing AI accelerators for a bit too - project Brainwave has been around since 2018 - https://blogs.microsoft.com/ai/build-2018-project-brainwave/


Yeah I worked in the hardware org around this time. We got moved from under Xbox org to azure and our main work became AI related accelerators


> We got moved from under Xbox org to azure and our main work became AI related accelerators

Since I prefer games to AI, this makes me rather sad.


You know, I prefer games too but I’ve suspected that the AI boom improved hardware for games and AI rather than just AI in isolation.


Very cool! Catapult/Brainwave is what got me into hardware & ML stuff :)


The first revisions were stuff made by qualcomm right? I don't think we have much data on how much customizations they make and where they their IP from, but given how much of the Tensor cores comes from Samsung I think it's safe to say to assume that there is a decent amount coming from some of the big vendors.


For TPUs I believe it is Broadcom: https://www.theregister.com/2023/09/22/google_broadcom_tpus/

Not sure about the mobile SoCs


Broadcom fills the same role for Google TPU that Marvell fills for Trainium @ amazon.


To be fair that’s a pretty good approach if you look at Apple’s progression from assembled IPs in the first iPhone CPU to the A and M series.


Apple generally tries to erase info about acquisitions from their official company story, they want it to look like internal Apple innovation.

When it comes to CPUs they bought P.A. Semi back in 2008 and got a lot of smart people with decades of relevant experience that were doing cutting-edge stuff at the time.

This was immensely important to be able to deliver current Apple CPUs.


PA Semi was kind of an interesting acquisition. The team was definitely very skilled, but there are always gotchas. Before the acquisition happened we were on their receiving end of their dual core PPC and it was not great at all. We had a lot of issues with board bringup, power, and heat. More errata than I've ever seen. We eventually had to went with x86 for the project instead, which was more performant and certainly a lot easier overall at the time.

I had previously encountered some of that team with the SiByte MIPS in an embedded context, I know they were highly skilled, they had tons of pedigree, but PA Semi itself was a strange beast.


Yeah but at least when it comes to mobile CPUs Apple seemed vastly more competent in how they approached it.


It's "made by" TSMC as usual. Their customization comes from identifying which compute operations that want optimized in hardware and do it themselves. And then they buy non-compute IP like HBM from Broadcom. And Broadcom also does things like physical design.


I thought they use GPU for learning and TPU for inference, I’m open to been corrected.


The first tpu they made was inference only. Everything since has been used for training. I think that means they weren't using it for training in 2015 but rather 2017 based on Wikipedia.


The first TPU they *announced" was for inference



no. for internal training most work is done on TPUs, which have been explicitly designed for high performance training.


I've heard its a mixture because they can't source enough in-house compute


I'm 99.999% sure that the claim of "all deep learning for Gemini on their own silicon" is not true.

Maybe if you restrict it similarly to the Deepseek paper to "Gemini uses TPU for the final successful training run and for scaled inference" you might be correct, but there's no way that GPUs aren't involved for at minimum comparability and more rapid iteration reasons during the extremely buggy and error prone point of getting to the final training run. Certainly the theoretical and algorithmic innovations that are often being done at Google and do make their way into Gemini also sometimes using Nvidia GPUs.

GCP has a lot of, likely on the order of at least 1 million GPUs in their fleet today (I'm likely underestimating). Some of that is used internally and is made available to their engineering staff. What constitutes "deep learning for gemini" is very up to interpretation.


That's a strange position to take with such high certainty. Google has been talking about training on TPUs for a long time. Many ex and current employees have been on the record talking about how much nicer the Google internal training infra using TPUs is. GPU is an afterthought for Google's inference and non-existent in training.


Internally, TPU is much cheaper for the same amount of compute compared to GPU, so I don't see much reasons why they need to use GPU. Probably >99% of compute budgets are spent on TPU. It might be true if you say these <1% still counts, but I guess it is pretty safe to say all of its meaningful production workload are running on TPU. It is simply too expensive to run a meaningful amount of compute on non-TPU.

Just to clarify, TPU has been in development for a decade and it is quite mature these days. Years ago internal consumers had to accept the CPU/GPU and TPU duality but I think this case is getting rarer. I guess this is even more true for DeepMind since itself owns a ML infra team. They likely be able to fix most of the issues with a high priority.


You seem to think GPUs are better than TPUs for rapid iteration. Why is that? There's no inherent reason why one is more suited to rapid iteration than another; it's entirely a matter of developer tooling and infrastructure. And Google famously has excellent tooling. And furthermore, the tooling Google exposes to the outside world is usually poorer than the tooling used internally by Googlers.


It was a few years ago since I last played with Googles hw, but iirc TPUs were inflexible and very fast. Worked well for linear and convolutional layers but could not accelerate certain LSTM configurations. For such networks GPUs were faster. It wouldn't surprise me the least if TPU hardware support lagged behind what the latest and greatest LLMs require for training.


Google is the creator of JAX and XLA. Maybe the developer laptops have Nvidia GPUs and they do some testing there, but for Google there is literally no point in bothering with CUDA, pytorch or any other ecosystem strongly focused on Nvidia GPUs.

In my experience JAX is way more flexible than pytorch the moment you want to do things that aren't training ML models. E.g. you want to build an optimizer that uses the derivative of your model with respect to the input.


Honestly Pytorch is weird imo, I'm surprised people love it so much.

Loss.backward? Tensor.grad? Optimizer.zero grad()? With torch.no_grad()?

What is with all these objects holding pointers to stuff? An ndarray is a pointer to memory and a shape my dudes. A gradient is the change in a scalar function w.r.t to some inputs.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: