> what's amazing is just how close a CPU-powered solution gets by turning it into a memory-streaming problem (which the GPU needs to do anyway).
Yes, it turns out the algorithmic approach you use to solve the problem tends to dwarf other factors.
> There are a lot of differences between a JVM-powered business solution and a KDB-powered business solution, however one striking difference is the cache-effect.
Wait, you looked at those benchmarks and came to the conclusion that the language runtimes were the key to the differences?
> However the question remains: What exactly do we get by having a big runtime? That we get to write loops?
There is absolutely no intrinsic value in a big runtime.
Now, one can trivially make a <1KB read-eval-print runtime. So I'll answer your question with a question: why do people not use <1KB runtimes?
> Wait, you looked at those benchmarks and came to the conclusion that the language runtimes were the key to the differences?
At the risk of repeating myself: I don't have any conclusions.
> There is absolutely no intrinsic value in a big runtime.
And yet there is cost. It is unclear if that cost is a factor.
> Now, one can trivially make a <1KB read-eval-print runtime. So I'll answer your question with a question: why do people not use <1KB runtimes?
Because they are not useful.
We are looking at a business problem, think about the ways people can solve that problem, and cross-comparing the tooling used by those different solutions.
Is there really nothing to be gained here?
The memory-central approach clearly wins out so heavily (and the fact we can map-reduce across cores or machines as our problem gets bigger) is a huge advantage in the KDB-powered solution. It's also the obvious implementation for a KDB-powered solution.
Is this Spark-based solution not the typical way Spark is implemented?
Could a 10mb solution do the same if it can't get into L1? Is it worth trying to figure out how to make Spark work correctly if the JVM has a size limit? Is that a size limit?
There are a lot of questions here that require more experiments to answer, but one thing stands out to me: Why bother?
If I've got a faster tool, that encourages the correct approach, why should I bother trying to figure these things out? Or put perhaps more clearly: What do I gain with that 10mb?
That CUDA solution is exciting... There is stuff to think about there.
> At the risk of repeating myself: I don't have any conclusions.
For someone who doesn't have any conclusions, you're making a lot of assertions that don't jive with reality.
> And yet there is cost. It is unclear if that cost is a factor.
It's a factor... just not the factor you think it is.
> Because they are not useful.
I think you grokked it.
> The memory-central approach clearly wins out so heavily (and the fact we can map-reduce across cores or machines as our problem gets bigger) is a huge advantage in the KDB-powered solution. It's also the obvious implementation for a KDB-powered solution.
KDB is a great tool, but you are sadly mistaken if you think the trick to its success is the runtime. That its runtime is so small is impressive, and a reflection of its craftsmanship, but it isn't why it is efficient. For most data problems, the runtime is dwarfed by the data, so the efficiency that the runtime organizes and manipulates the data dominates other factors, like the size of the runtime. This should be obvious, as this is a central purpose of a database.
> There are a lot of questions here that require more experiments to answer, but one thing stands out to me: Why bother?
Yes, you almost certainly shouldn't bother.
Spark/Hadoop/etc. are intended for massively distributed compute jobs, where the runtime overhead on an individual machine is comparatively trivial to inefficiencies you might encounter from failing to orchestrate the work efficiently. They're designed to tolerate cheap heterogenous hardware that fails regularly, so they make a lot of trade-offs that hamper getting to anything resembling peak hardware efficiency. You're talking about a runtime fitting in L1, but these are distributed systems that orchestrate work over a network... Your compute might run in L1, but the orchestration sure as heck doesn't. Consequently, they're not terribly efficient for smaller jobs. There is a tendency for people to use them for tasks that are better addressed in other ways. It is unfortunate and frustrating.
Until you are dealing with such a problem, they're actually quite inefficient for the job... but that inefficiency is not a function of JVM.
Measuring the JVM's efficiency with Spark is like measuring C++'s efficiency with Firefox.
> If I've got a faster tool, that encourages the correct approach, why should I bother trying to figure these things out? Or put perhaps more clearly: What do I gain with that 10mb?
If you read the documentation, the gains should be clear. If you are asking the question, likely the gains are irrelevant to your problem. I would, however, caution you to worry less about the runtime size and more about the runtime efficiency. The two are often at best tenuously related.
Yes, it turns out the algorithmic approach you use to solve the problem tends to dwarf other factors.
> There are a lot of differences between a JVM-powered business solution and a KDB-powered business solution, however one striking difference is the cache-effect.
Wait, you looked at those benchmarks and came to the conclusion that the language runtimes were the key to the differences?
> However the question remains: What exactly do we get by having a big runtime? That we get to write loops?
There is absolutely no intrinsic value in a big runtime.
Now, one can trivially make a <1KB read-eval-print runtime. So I'll answer your question with a question: why do people not use <1KB runtimes?