Our chips don't cost $3M. I'm not sure where you got that number but its wildly ...

aurareturn · 2025-07-23T15:57:58 1753286278

So how much does it cost? Google search return $3m. Here's your chance to tell us your real price if you disagree.

1W6MIC49CYX9GAP · 2025-07-23T16:47:30 1753289250

He also didn't argue about the rest of the math so it's likely correct that the whole model needs to be in SRAM :)

benderbreaker · 2025-07-27T23:32:09 1753659129

Agree. The OP is picking from dated and not exactly applicable data. I estimate you could be down to 20% of that by now if you were optimizing for costs. An issue that is real for you guys is software stack tractability; i.e. the ability of your team to bring on board models in a timely manner. Maybe because all models are optimized for GPUs, but it's something that I would get on top of if its fixable. Obviously, you must be taking into account these issues and competitive performance in future iterations of your chips also.

UltraSane · 2025-07-23T17:58:17 1753293497

Is it actually $4M?

cgdl · 2025-07-23T18:38:58 1753295938

Do you distinguish betwen "chips" and the wafer-scale system? Is the wafer-scale system significantly less than 3MM?

EDIT: online it seems TSMC prices are about 25K-30K per wafer. So even 10Xing that a wafer-scale system should be about 300K.

npsomaratna · 2025-07-23T15:43:06 1753285386

Are you the CEO of Cerebras? (Guessing from the handle)

rajnathani · 2025-07-24T10:08:15 1753351695

I wonder why he (Andrew Feldman) didn't retort to the SRAM vs HBM memory incorrect assumption made by the OP comment; maybe he was so busy that he couldn't even cite the sibling comment? That's a bigger wrong assumption than being off by maybe 30-50% at most on Cerebras's single server price (it definitely doesn't cost less than $1.5-2M).

benderbreaker · 2025-07-27T23:04:36 1753657476

I have followed Cerebras for sometime. Several comments: 1. Yes I think that is Feldman. I have seen him intervene at hacerknews before thou don't remember the handle specifically 2. Yes, the OP technical assumption is generally correct. Cerebras load the model onto the wafer to get the speed. It's the whole point of their architecture to minimize the distance between memory and compute. They can do otherwise in a "low cost" model, they announced something like that in a partnership with Qualcomm that AFAIK has never been implemented. But it would not be a high-speed mode. 3. The OP is also incorrect on the costs. They pick these costs up from dated customer quotation seen online (in which the Cerebras has incentive to jack it up), but this is not how anything works commercially, and at that time Cerebras was at much smaller scale. But you wouldn't expect Feldman to tell you what their actual costs are. That would be nuts. My thinking is the number could be off by up to 80% by now assuming Cerebras was making progress in their cost curve and the original number had very high margins (which it must have).

aurareturn · 2025-07-24T15:02:09 1753369329

Probably because they are loading the entire model into SRAM. Thats how they can achieve 1.5k tokens/s.

agentastic · 2025-07-23T22:39:02 1753310342

Congrats on Qwen3 launch, also ty for the exploration tier. Makes our life a lot easier.

Any plan/ETA on launching it's big-brother (Qwen3-code)?

qualeed · 2025-07-23T15:59:56 1753286396

In that case, mind providing a more appropriate ballpark?