Agreed. It'd be really interesting to see such a language.
One language that I think is woefully underappreciated for how ubiquitous it is is GLSL. With OpenGL 4 Compute Shader you get surprisingly close to general-purpose use for the type of tasks that benefit from massive parallelism. And GLSL is really quite a nice language; driver bugs are the main things holding it back.
Sure, it's not very good for task parallelism (though some of the extensions that AMD is introducing for APUs are very interesting!) But if you've got an embarrassingly data-parallel problem, you can't beat its performance.
That performance is largely dependent on drivers + HW though, right?
Then again I'm used to mobile GPUs where any conditional statement used to cause the shader to be evaluated 2^n for each and the gathered at the end(aka forget about any branching).
For my 2c I'm a fan of Elixir + Rust, Rust has a nice C ABI that should make it easy to embed.
One language that I think is woefully underappreciated for how ubiquitous it is is GLSL. With OpenGL 4 Compute Shader you get surprisingly close to general-purpose use for the type of tasks that benefit from massive parallelism. And GLSL is really quite a nice language; driver bugs are the main things holding it back.