> Per an analysis by TD Cowen cited by Bloomberg, hiked prices for server racks, cooling systems, chips, and other components could contribute to overall build cost rises of 5-15% on average.
I can agree that infrastructure generally has much lower margins than software businesses, but I don't know of any mega-projects that wouldn't turn out costlier during construction, and 15% is hardly something that stopped them.
If I recall correctly from the recent YC interview, the Windsurf founder noted their team leans more toward GTM than engineering. That makes this less likely to be a classic acquihire (as with Rockset) and more plausibly a data play rather than a product integration.
My current read is that this is a frontier lab acquiring large-scale training data—cheaply—from a community of “vibe coders”, instead of paying professional annotators. In that light, it feels more like a “you are the product” scenario, which likely won’t sit well with Windsurf’s paying customers.
Agreed. It seems like a data play and a hedge to beef up vibe code competition against upcoming Google and MS models so OpenAI doesn't lose API revenue. I would assume vibe coding consumes more tokens than most other text based API usage.
Of all the billion-scale investment and acquisition news of the last 24 hours this is the only one that makes sense. Especially after the record-breaking $15B round, that Databricks closed last year.
The article covers extremely important CUDA warp-level synchronization/exchange primitives, but it's not what is generally called SIMD in the CUDA land .
Oh wow, TIL, thanks. I usually call stuff like that SWAR, and every now-and-then I try to think of a way to (fruitfully) use it.
The "SIMD" in this case was just an allusion to warp-wide functions looking like how one might use SIMD in CPU code, as opposed to typical SIMT CUDA.
Also, StringZilla looks amazing -- I just became your 1000th Github follower :)
Traditional SWAR on GPUs is a fascinating topic. I've begun assembling a set of synthetic benchmarks to compare DP4A vs. DPX (<https://github.com/ashvardanian/less_slow.cpp/pull/35>), but it feels incomplete without SWAR. My working hypothesis is that 64-bit SWAR on properly aligned data could be very useful in GPGPU, though FMA/MIN/MAX operations in that PR might not be the clearest showcase of its strengths. Do you have a better example or use case in mind?
I don't -- unfortunately not too well-versed in this field! But I was a bit fascinated with SWAR after I randomly thought of how to prefix-sum with int multiplication, later finding out that it is indeed an old trick as I suspected (I'm definitely not on this thread btw): https://mastodon.social/@dougall/109913251096277108
As for 64-bit... well, I mostly avoid using high-end GPUs, but I was of the impression that i64 is just simulated.
In fact, I was thinking of using the full warp as a "pipeline" to implement u32 division (mostly as a joke), almost like anti-SWAR. There was some old-ish paper detailing arithmetic latencies in GPUs and division was approximately more than 32x multiplication (...or I could be misremembering).
Has anyone done/shared a recent benchmark comparing JNI call latency across Java runtimes? I’m exploring the idea of bringing my strings library to the JVM ecosystem, but in the past, JNI overhead has made this impractical.
Java has replaced JNI with the Project Panama FFM, which depending on your use case might perform quite a bit better than JNI used to. The Vector API is stuck in incubator and still a bit rough around the edges though, so SIMD might be a bit trickier.
At this point, it doesn’t provide much novel functionality, but it should be faster than the standard libraries of most (or maybe all) programming languages.
I can't speak for the whole industry, but we used it in older UForm <https://github.com/unum-cloud/uform> and saw good adoption, especially among those deploying on the Edge, where every little trick counts. It's hard to pin down exact numbers since most deployments didn't go through Hugging Face, but at the time, these models were likely among the more widely deployed by device count.
I can agree that infrastructure generally has much lower margins than software businesses, but I don't know of any mega-projects that wouldn't turn out costlier during construction, and 15% is hardly something that stopped them.
reply