Agree. The OP is picking from dated and not exactly applicable data. I estimate you could be down to 20% of that by now if you were optimizing for costs. An issue that is real for you guys is software stack tractability; i.e. the ability of your team to bring on board models in a timely manner. Maybe because all models are optimized for GPUs, but it's something that I would get on top of if its fixable. Obviously, you must be taking into account these issues and competitive performance in future iterations of your chips also.
I wonder why he (Andrew Feldman) didn't retort to the SRAM vs HBM memory incorrect assumption made by the OP comment; maybe he was so busy that he couldn't even cite the sibling comment? That's a bigger wrong assumption than being off by maybe 30-50% at most on Cerebras's single server price (it definitely doesn't cost less than $1.5-2M).
I have followed Cerebras for sometime. Several comments:
1. Yes I think that is Feldman. I have seen him intervene at hacerknews before thou don't remember the handle specifically
2. Yes, the OP technical assumption is generally correct. Cerebras load the model onto the wafer to get the speed. It's the whole point of their architecture to minimize the distance between memory and compute. They can do otherwise in a "low cost" model, they announced something like that in a partnership with Qualcomm that AFAIK has never been implemented. But it would not be a high-speed mode.
3. The OP is also incorrect on the costs. They pick these costs up from dated customer quotation seen online (in which the Cerebras has incentive to jack it up), but this is not how anything works commercially, and at that time Cerebras was at much smaller scale. But you wouldn't expect Feldman to tell you what their actual costs are. That would be nuts. My thinking is the number could be off by up to 80% by now assuming Cerebras was making progress in their cost curve and the original number had very high margins (which it must have).