I'm really glad to see a post like this come out. I've seen so many discussions online about customizing models -- this post really does cut through the noise.
Really like the evaluation methodology, and seems well-written as well.
code distribution I mean like cloudpickle. Which Ray has, but I wonder about an equivalent to Spark files (torrented and cached). A Kubernetes runner is also important ...
I wish the Ray peeps would consider just trying to merge some with the Spark RDD API. Reynold (at Databricks) is kinda hard to deal with, but so far to me it looks like the aim of having Ray team build things from the ground up has simply re-validated a lot of the systems work (but not all) that’s already in Spark.
Can you say more?