Yeah, this is really, really far from an apples-to-apples comparison. First of, ...

felipe_aramburu · on Feb 18, 2019

1. You don't have to fit your whole workload on the GPU you can process it in batches like you would for a workload that doesn't fit into memory on a non gpu solution. You don't need PB of GPU memory to run PB workloads.

2. The dataset is trivially small because this is a new engine built for the rapids eco system and it is limited for the time being to a single node. We are releasing our distributed version for GTC (mid March) and will be able to give you more reasonable comparisons. This is a similar path of development to our pre Rapids engine which went from single node to distributed in about a month because we have built this engine to be distributed. Right now we are finishing up UCX integration which is the layer we will be using to communicate between all the nodes.

3. You can always try it out. Its own dockerhub (see links in this post) and if you want to run distributed workloads right now you can manage that process using dask by handling the splitting up of the job yourself. In a few weeks you will be able to have the job split up for you automatically without need for the user to be aware of the size of the cluster or how to distribute data across it.

lmeyerov · on Feb 18, 2019

We're pretty excited near-term for getting to sub-second / sub-100ms interactive time on real GB workloads. That's pretty normal in GPU land. More so, where this is pretty clearly going, is using multiGPU boxes like DGX2s that already have 2 TB/s memory bandwidth. Unlike multinode cpu systems, I'd expect better scaling b/c no need to leave the node.

With GPUs, the software progression is single gpu -> multi-gpu -> multinode multigpu. By far, the hardest step is single gpu. They're showing that.

einpoklum · on Feb 26, 2019

1. If you process it in batches then you have to count the time it takes to send the data of each batch to and from the GPU. 2. It's fair to start out with small data sets, but then you don't compare against distributed frameworks like Spark, but rather against single-node solutions.

Also - Spark is very slow compared to analytic distributed DBMSes.

gdulli · on Feb 19, 2019

I used to think that experience meant the difference between believing benchmarks and being skeptical of them. Now I know it's the difference between being skeptical and ignoring them.

Jordanpomeroy · on Feb 19, 2019

So it’s not a revolutionary game changer? Aw shucks, back to work I guess.