> just because the code hasn’t been ported, Seems stupid to use millions of doll...

bee_rider · on Oct 6, 2023

>> just because the code hasn’t been ported, sometimes because it’s just not something that a GPU can do well.

> Seems stupid to use millions of dollars of supercomputer time just because you can't be bothered to get a few phd students to spend a few months rewriting in CUDA...

Rewriting code in CUDA won’t magically make workloads well suited to GPGPU.

wang_li · on Oct 6, 2023

It's highly likely that a workload that is suitable to run on hundreds of disparate computers with thousands of CPU cores is going to be equally well suited for running on tens of thousands of GPU compute threads.

atq2119 · on Oct 6, 2023

Not necessarily. GPUs simply aren't optimized around branch-heavy or pointer-chasey code. If that describes the inner loop of your workload, it just doesn't matter how well you can parallelize it at a higher level, CPU cores are going to be better than GPU cores at it.

monocasa · on Oct 6, 2023

They're not that disparate; the workloads are normally very dependent on the low latency interconnect of most supercomputers.

mlyle · on Oct 6, 2023

A supercomputer might cost $200M and use $6M of electricity per year.

Amortizing the supercomputer over 5 years, a 12 hour job on that supercomputer may cost $63k.

If you want it cheaper, your choices are:

A) run on the supercomputer as-is, and get your answer in 12 hours (+ scheduling time based on priority)

B) run on a cheaper computer for longer-- an already-amortized supercomputer, or non-supercomputing resources (pay calendar time to save cost)

C) try to optimize the code (pay human time and calendar time to save cost) -- how much you benefit depends upon labor cost, performance uplift, and how much calendar time matters.

Not all kinds of problems get much uplift from CUDA, anyways.

jonwachob91 · on Oct 6, 2023

>> A supercomputer might cost $200M and use $6M of electricity per year.

I'm curious, what university has a $200MM super computer?

I know governments have numerous Supercomputers that blow past $200MM in build price, but what universities do?

mlyle · on Oct 6, 2023

> I know governments have numerous Supercomputers that blow past $200MM in build price, but what universities do?

Even when individual universities don't-- governments have supercomputing centers that universities are a primary user of and often charge back value of computing time to the university or it is a separate item that is competitively granted.

Here we're talking about Jupiter, which is a ~$300M supercomputer where research universities will be a primary user.

sophacles · on Oct 6, 2023

University of Illinois had Blue Waters ($200+MM, built in ~2012, decomissioned in the last couple years).

https://www.ncsa.illinois.edu/research/project-highlights/bl...

https://en.wikipedia.org/wiki/Blue_Waters

They have always had a lot of big compute around.

otabdeveloper4 · on Oct 6, 2023

CUDA is buggy proprietary shit that doesn't work half the time or segfaults with compiler errors.

Basically, unless you have a very specific workload that NVidia has specifically tested, I wouldn't bother with it.

cmdrk · on Oct 6, 2023

sometimes the code is deeply complex stuff that has accumulated for over 30 years. to _just_ rewrite it in CUDA can be a massive undertaking that could easily produce subtly incorrect results that end up in papers could propagate far into the future by way of citations etc

londons_explore · on Oct 6, 2023

All the more reason to rewrite it... You don't want some mistake in 30 year old COBOL code to be making your 2023 experiment to have wrong results.

gmueckl · on Oct 6, 2023

That's the complete opposite of what is actually the case: some of that really old code in these programs is battle-tested and verified. Any rewrite of such parts would just destroy that work for no good reason.

_a_a_a_ · on Oct 6, 2023

Why don't YOU take some old code and rewrite it. I tried it for some 30+ year old HPC code and it was a grim experience and I failed hard. So why not keep your lazy, fatuous suggestions to yourself.

mlyle · on Oct 6, 2023

The whole point is in these older numerical codes is that they're proven and there's a long history of results to compare against.

dpe82 · on Oct 6, 2023

*FORTRAN.

throwaway10965 · on Oct 6, 2023

Sounds like a great job for LLMs. Are there any public repositories of this code? I want to try.

mlyle · on Oct 6, 2023

Sounds like a -terrible- job for LLMs, because this is all about attention to detail. Order of operations and specific constructs of how floating point work in the codes in question are usually critical.

Have fun: https://www.qsl.net/m5aiq/nec-code/nec2-1.2.1.2.f

throwaway10965 · on Oct 7, 2023

Attention to detail can come later when there's something that humans can get started with. I did not mean that LLM could do it all alone.

mlyle · on Oct 7, 2023

A human has to have the knowledge of what the code is trying to do and what the requisites are for accuracy and numerical stability. There's no substitute for that. Having a translation aid doesn't help at all unless it's perfect: it's more work to verify the output from a flawed tool than to do it right in this case.

brnt · on Oct 6, 2023

The JSC employs a good number of people doing exactly this.

hulitu · on Oct 7, 2023

CUDA ? I thought rust was the future. /s