Hacker Newsnew | past | comments | ask | show | jobs | submit | psalz's commentslogin

Have you heard of SYCL[1]? :)

[1]: https://www.khronos.org/sycl/


Hi, Celerity dev here!

> Weird. A library that wraps SYCL within MPI, yet requires all processes to hold a copy of all the memory ?

> One of the main reasons to use MPI is to solve problems that do not fit within the memory available in a single cluster node.

You are absolutely right; this is a problem that Celerity currently has. However, I'm happy to say that we are actively working on a solution that should cover a good portion of use cases and which will be available quite soon hopefully!

Of course, as was already pointed out in this thread (and on the website), this is a research project, and we only have limited resources. If you look a little closer, I'm sure you will find all sorts of issues ;-). However, we are committed to continuously improve Celerity, and ultimately strive to build a robust and modern HPC programming framework that is usable by non-experts (i.e., domain scientists) while also delivering solid performance.

> Doing a distributed MatMul using MPI-CUDA is trivial. I wonder how the cyclomatic complexity and performance compares for that case.

I would assume the programmability metrics to be somewhere in between OpenCL and SYCL. CUDA is a bit less verbose than OpenCL, but you still have to manually deal with MPI (which is also the main factor for MPI+SYCL having much higher cyclomatic complexity than Celerity). Performance-wise I would also not expect much difference, given that we are talking about naive matmul here.

In general, we're not trying to beat [insert your favorite BLAS library]. At this level of abstraction, that would be pretty much impossible. We are showing off results for matmul because everybody knows it and has at least somewhat of an understanding of the MPI operations required to do it in a distributed setting. You can see it as a stand-in for whatever domain-specific algorithm you want to run in a distributed setting.

> Insted, they only compare doing one MatMul per process using MPI-OpenCL... that's... a two liner with CUDA (just call MpiInit followed by a cuBLAS call).

That's not quite true. What we showed here are several successive matrix multiplications, each one being computed in distributed fashion across all participating GPUs. The splitting of work as well as all intermediate data transfers happen completely transparently to the user (save for having to provide a "range mapper"), which I think is one of the main selling points of the project!


Great points, especially 1 - 3. For me it is also often not so much about optimizations, but rather additional use/edge cases that might need consideration, but I'm not sure yet because I haven't e.g. fully fleshed out an interface yet. Often times I will ultimately just delete the note (going back to your third point), but other times it might result in an additional test case, for example. Whenever I'm starting out on a new project (which might just be a new module in an existing code base) my mind tends to bombard me with these hypotheticals, so it is tremendously relaxing to just whip up a quick note and know that it will be dealt with by future me :).

To actually enforce this, for the past couple of years I've been using NOCOMMIT comments, together with a git hook that actually prevents me from committing those lines. I rely on this workflow so much now that I can't even imagine going back. I also get really nervous in pair-programming scenarios when someone will say "Yeah, I'll change/fix/adjust that later" - all I'm thinking is "but will you remember?!".


Instead of NOCOMMIT, I keep the "is it final" bit in short-term memory (maybe helped by reviewing staged changes) and start the commit message with "WIP:" if necessary. It's a convention from the Qt project that its CI understands. The advantage is that it doesn't block other work until resolved. Sometimes what's missing is even a good/better commit message, which is awkward to remark in the code.


Yes, automatic enforcement is invaluable. Instead of a commit hook you can also use a pre-receive hook on the server. This allows rebasing, wip commits etc. locally and doesn't require any client config.


In case anyone was wondering: It's 106.




Now this is just starting to get ridiculous...


No, we can go further, like: "Or v the letter".


Or V the sign.


or V the television show


or V for Vendetta



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: