Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Make Lisp 15x faster than Python or 4x faster than Java (postabon.com)
113 points by smanek on Jan 28, 2010 | hide | past | favorite | 94 comments


I ran the Python script through Shedskin, a partial Python implementation that compiles Python to machine code via an intermediate layer of C++. I did not have to declare any types for any variables, or add declarations for speed/safety; Shedskin automatically determined the types.

The finished program went down in runtime from 533.7s on my machine, to 24.42 seconds. If we adjust for machine speed (since the author's Python ran the python code in 474s), the comparative optimized result would be 21.7 seconds compared to author's hand-optimized lisp code at 30.2.

Note that this is probably an ideal case for Shedskin (which is a relatively new product: a 1-man job that's just a few years old compared to, I guess, hundreds if not thousands of man-years of research that have gone in LISP implementations).

Still, an interesting result.



Very impressive! I haven't used Shedskin before, but I hope it continues to grow and improve.


There may be hundreds of man years of research gone into LISP, but a lot of that research found its way into shedskin too. And the Self research is in there too. And his own research (Shedskin was originilly his Masters).


the main techniques that I use for type inference were invented by ole agesen and john plevyak, who worked on self and concurrent aggregates, respectively (iirc). it may be they were influenced by research on lisp, but I doubt it.


It's better to use the actual JDK (i.e. the one everyone uses) rather than the implementation you get from debian apt-get due to debian's licensing hangups.

SBCL:

  * (test)

  Evaluation took:
    45.268 seconds of real time
Sun JDK

  Runtime was: 51086 Milliseconds

 
That's less than 15%, a far, far cry from '4 times'


The result of the distance() function is unused and can be optimized away by the compiler. Runs in 0.04s with LuaJIT 2.0. No type declarations needed. :-)

    local radius = 6371

    local function distance(latA, lngA, latB, lngB)
      local latAr = math.rad(latA)
      local lngAr = math.rad(lngA)
      local latBr = math.rad(latB)
      local lngBr = math.rad(lngB)

      local deltaLat = latBr - latAr
      local deltaLng = lngBr - lngAr

      return radius * 2 * math.asin(math.sqrt(
              math.sin(deltaLat/2)^2 + math.cos(latAr)
              * math.cos(latBr) * (math.sin(deltaLng/2)^2)))
    end

    local start = os.clock()

    for latA=-90,90,2.5 do
      for lngA=-90,90,2.5 do
        for latB=-90,90,2.5 do
          for lngB=-90,90,2.5 do
            distance(latA, lngA, latB, lngB)
          end
        end
      end
    end

    local stop = os.clock()
    print(stop-start)


Ha.. of course, the real problem is that Java and Python are actually doing all of the work to compute their trigonometric functions, while C libraries that Lisp uses probably have them mapped into a table.

Java does actually keep a table for trig functions on small values -- they just haven't widened the table definition, which is filed as a bug at http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=5005861

"I just performed some simple benchmarks of my own, and noticed that summing e.g. sqrt, sin and tan for small values are just as fast as their C equivalents. This was a great relief to me.

However, when computing sin in the range of pi/2 ... pi and 10 ... 10+pi/6 Java became 3-5 times slower, while the C program was just as fast. This is explained above by lack of precision, but for example sin(10.0) yields bit-to-bit exactly the same result in Java and C! This comparison was performed on a rather old 1GHz Mobile Intel Pentium, so I would assume that the precision has been fixed in virtually all modern processors, and therefore these operations could be intrinsified for a much larger value range (possibly the entire double range)."


  Java and Python are actually doing all of the work to compute their 
  trigonometric functions
Not true in the case of CPython. It uses the underlying C implementation. See line 270 onwards.

http://www.google.com/codesearch/p?hl=en#2T6lfGELm_A/trunk/M...


Interesting.. well, I'd think the trig functions would be by far the most expensive part of that program. Is Python really that much slower for control structures? That's literally the only other computation there.


Well, every instruction in a Python program requires running through the interpreter loop. Then you have to decode, get the opcode arguments, then actually do the operation. For calling C functions from Python, you also have to extract the C types, call the C function, and wrap the result back up. Its not hugely slow, but yeah, it adds up, and its fairly easy to see why it compares poorly to compiled versions.


>> sqrt, sin and tan for small values are just as fast as their C equivalents <<

So always use small values?

http://shootout.alioth.debian.org/gp4/program.php?test=parti...


I'm diving into Lua and loving every second of it. Seeing LuaJIT work its magic is wonderful. Thanks for it.


The most common way to compute your distance from a deal is to assume the earth is a perfect sphere

I thought you would only really be finding deals at distances small enough to just assume the earth is flat. Or at least, aren't those the only ones people would actually be interested in? If it's beyond my physical reach, I don't care how far it is. I'll get it shipped.

Or am I missing something?


You're absolutely right - in my experience the pythagorean theorem has less than ~50 meters difference from the Haversine formula for distances less than 20 miles or so. In the actual code I use a simple heuristic to decide when to use which calculation.

Like I mentioned in the post, the distance function presented is a bit of a simplification I invented for pedagogical purposes (my goal was to examine languages, not geometry).


Fair enough. But then that raises a more important issue: you are showing Lisp is faster than more popular languages in a way that probably doesn't matter much. When I read the title, I was hoping this isn't some benchmarking or academic exercise, which does not have relevance for real applications. So I was delighted to read that this is something you use in your webapp, only to be disappointed again now.

I don't mean to discourage you or be cynical. I love the fact that you are writing interesting articles on the practicalities of using Lisp with actual data behind it. So, keep up the good work. I look forward to more posts.


The distance the swallow flies doesn't seem very interesting for many cases, where the terrain forces you to take a much longer route due to bridges, train track crossings, etc. Wouldn't you be better of using the Google maps API to return the length of an actual route to the target?


Can you release the source data used for the benchmark?

I'd like to compare it to C.


There is no source data.

I just iterate through each latitude from -90 to 90 and each longitude from -180 to 180 in increments of 2.5 (for both the start and end). Check out the source code for any of the 5 examples (included in the post).

If you'd like, post the code up somewhere and I'll be happy to run it on the same machine and post the results along with the rest.


I also wonder why your java benchmark is slow. Java seems to be around 2-3x slower than C on average, this would make your optimized CL to be around 1.5x faster than C.


Define "slower". Languages don't have a fixed "speed" relation between them. Performance varies between machine, language, problem, and implementation technique.


Is any of that so remarkable that we should presume catch23 doesn't know?

A charitable response would help catch23 understand what's different about this one problem from what we can sensibly presume as common experience - not Sun Java, JVM trig functions.


Even if you assume a local mercator projection (centered on your business or whatever) you still need to know that the earth is spherical. Otherwise, you're likely to assume that an 0.1 degree change in latitude is the same distance as an 0.1 degree change in longitude. To correct this erroneous assumption (for small distances) divide your longitude differences by the sine of your latitude.


Are people still calling Lisp slow in 2010? That was somewhat true when most programming was done in C. Plenty of languages slower than Lisp have risen to popularity since then.


Indeed, I thought it was always much faster:

http://shootout.alioth.debian.org/u32/benchmark.php?test=all...

One of the guys behind R is also looking at moving to a LISP base after considering python:

http://www.springerlink.com/content/v38u176xp7j562m3/

That said, I will stick with python as I am not a programmer by profession and can't deal with a plethora of languages when I have a good swiss army knife as it is.


There are worse ones to pick.


Yeah, I find it an odd criticism too - but you'd be surprised how many people still bring it up.

I think most people played with Scheme/Lisp in one class in college in the 90's and have been soured on it ever since.


What does it mean to call a language "slow", though? It's all relative, and it depends on the problem set.

I'm wary of analogies, because they can be tortured into "proving" anything you like. But what's faster, a car or a bike? In a big city during rush hour, a bike could very well be faster than driving. Clearly on the freeway, the car wins out.


One interesting source of data is www.spoj.pl, since they keep records of the runtime of submissions to their site.

For example, take a look at the solutions to PRIME 1; the task is to generate a list of prime numbers: http://www.spoj.pl/ranks/PRIME1/

The top 20 C solutions all take less than 0.05 seconds. The top 20 Java solutions all take less than 0.5 second. Python has several solutions that took 0.55 seconds. But for Lisp, only one user was able to get it to less than 1 second, and then only barely.

There's not enough data to draw any real conclusions, but it does make me skeptical of claims that Lisp is near in speed to Java or C (any Lisp gurus who want to prove me wrong, feel free to code up a fast solution and submit it)


Actually, there is a solution that takes 0.52 seconds: http://www.spoj.pl/ranks/PRIME1/lang=LISP%20sbcl


There are two Lisp implementations on www.spoj.pl. One is Clisp which is bytecode compiled and rather slow and the other is SBCL which compiles to machine code and is the implementation used by the OP.

It looks as if you only looked at the former as the later has an entry at 0.52 (as mentioned by another commenter.)

A note about Common Lisp implementations is that there are many of them, all with different performance specs ideal use cases. Some are considered very fast, such as SBCL and CCL, others, not so much. Most of the data I've seen benchmarking Lisp against other languages uses one of the open source implementations. I'd like to see how a commercial Lisp, such as Allegro or Lispworks, performed, especially since those are the ones that have actually been worked on since the 80s/


I like the Lisp approach to optimize the program by adding hints to the runtime about types and the ability to tune the optimizer's aggressiveness from the code. It really makes possible to write the correct algorithms first, then later add appropriate optimizations simply by changing the optimized and unoptimized functions.

That said I think these kinds of benchmarks (and article titles) are misleading, because one who has deep knowledge in one language does not necessarily write good code in other languages. Even the article notes that "they are all just literal line-by-line translations of each other".


Thanks, that's interesting. My own main optimization problem is always reducing the memory usage of large in memory data sets. I found that languages like Java and Python, which don't support structured value types and use references/pointers everywhere, are basically unusable for my my purposes.

Now, I wonder whether I could use Lisp for what I do. One basic necessity would be to have a way to define a structure like this

  struct s {
    int32_t x;
    int32_t y;  
  };
and then create an array of that type as struct s a[size], which uses exactly size * sizeof(s) bytes in memory. A further requirement would be to have a library of data structures (balanced trees, lists, hash tables) that support storing such structs directly without using pointers everywhere.

Do you have any idea whether that could work with SBCL?


I think you can do that - although I'm not entirely certain how much overhead a struct (defstruct) has. Array allocation has no overhead, to the best of my knowledge (if you specify the element-type, etc).

I usually don't code at that low level, but you might want to check out www.lrde.epita.fr/~didier/research/verna.06.imecs.pdf


In all seriousness, how does one start with setting up and learning how to program a CL web app?



Great stuff thanks!

$Weekend--;


What platform are you on?

Let me know if you're on Win32 and I will package for you my setup, with a double-clickable installer, and your choice of Emacs or Win32 friendly IDE :-)


Even though I do 90% of my dev work in *nix, I'd be very interested in the emacs version of this (if only to satiate my geek hunger).


Thanks for the offer, but I'm afraid I have to decline so that I learn how start from scratch.

I'm on an EC2 Ubuntu instance.


By tomorrow you'll switch to (decf weekend)


And after randomly mutating variables breaks a few of your programs, you'll switch to "(plan-weekend-time (-1 weekend))".


Practical Common Lisp is also a very good start - http://gigamonkeys.com/book/


Looks great! Thanks.


The python benchmark is contaminated. Lots of <= checks, increasing variables, etc. Those things need to be left out of the timing. How about using itertools.product instead?

  for latA, lngA, latB, lngB in itertools.product(*[xrange(-90,91), xrange(-180,181)]*2):
      distance(latA, lngA, latB, lngB)
Not sure it's faster, but certainly more idiomatic.

Also, the 2nd python example doesn't work; you need to drop all the 'math.' and use radians, sqrt, etc directly.


On my laptop - Ubuntu 9.04/Python 2.6.2

Python without Psyco : 522

Python with Psyco : 190


How about benchmarking the python code with NumPy?


My understanding is that you have to rewrite your code to take advantage of NumPy. Unfortunately, I've never used NumPy before, so I don't know how to do so.

However, if you'd like to modify my code to do so I'd be happy to rerun the test and publish the results.


[deleted]


Yes ... but I don't know how to.

If you care to do so, I'll be happy to include the results.


Isn't it a much better idea to pick a distance comparison algorithm that's not so staggeringly slow to begin with? Why not convert (i.e. pre-compute) the lat/lons to 3-vectors (origin center of the earth) and then compare distances by pythagoras, no fuss, muss or trig and you can even skip the square root. If you need to display an actual accurate distance you can do the (significantly smaller amount of) trig to get something presentable.


It's valid if you want to understand what'll happen when you start getting random (lat,long) pairs thrown at you that you can't precompute.


His problem is 'i have a bunch of existing points, find and compare distances to new point'. Doing this by lat,lon and trig is known as one of the slowest ways to do it.


Um... no. From the article:

"I thought I'd demonstrate this functionality, with a bit of real world code. Postabon recommends deals to users based on a variety of factors such as the age of the deal, other people's votes, your preferences, and (relevant for this example) your distance from the deal. Most of the other factors can be computed asyncronously and just cached, but your distance from deals is computed a lot, and can't really be pre-computed since I have no way of knowing your location a priori (well, that's not strictly true, and we do do some memoization, but it has a relatively low hit rate)."


Yes. He has a bunch of existing points (the deals) and and a new point (user location). Deals are the known points, which can be represented any way you like. lat lon, vectors, cylindrical coords, you name it. The thing not known in advance is the user location. And don't umm... me. It's rude and stupid, especially when you don't seem to know what you're talking about.


Yup, you're right. I had my concepts in a twist.


Any guesses as to why the Java implementation is relatively slow? JNI boundary crossings for floating point arithmetic, or lack of use of the math coprocessor?


I don't know, but it's much slower than I expected. It's the latest Java installable by apt on Debian (all these tests are run in Debian running under VMWare).

  $ java -version
  java version "1.6.0_0"
  OpenJDK  Runtime Environment (build 1.6.0_0-b11)
  OpenJDK 64-Bit Server VM (build 1.6.0_0-b11, mixed mode)


Dohhhhh don't use OpenJDK :)

Get the sun version, it's probably faster.

Although it may not be much faster, found this unfixed bug report: http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=5005861


In more depth : the sun JDK probably does a lot better job of JIT compilation, it's the difference between running interpreted bytecode and compiled native code.

On the Hadoop project, checksumming was a major CPU constraint, because you have to assume every file everywhere is possibly on faulty hardware, and checksum everything. They found that it was actually faster to implement CRC32 in pure Java and let the JIT compilation do it's magic than it was to make native calls to zlibc.

But that may not apply to floating point if the Java implementation is bad enough.


Java sin and cos might not be using the machine hardware.

Plain vanilla use of Math.sin and Math.cos = 9.15s Constrain to the range +45 to -45 degrees = 4.88s

As demonstrated 20 months ago http://shootout.alioth.debian.org/gp4/program.php?test=parti...


It's an issue with the trig functions in Java. See this Factor vs Java comparison: http://factor-language.blogspot.com/2009/08/performance-comp...


And of course, Haskell can be faster than all of these.


http://gist.github.com/289645

I coded it in haskell, and it is faster, if it's a correct translation. Without parallelizing, it clocks at 1.8 seconds, compiled with $ ghc --make. If anyone plans to compile this you'll need to $ cabal install geodetic, before $ ghc --make filename.hs

1.8 Seconds seems a little too fast though...


It is. You never request the actual calculation to be done on each point, you only request that the last value be calculated. It takes 1.8 seconds to get to the end of the 4-variable cartesian product.

This was my theory, and I tested it by importing Debug.Trace, and then changing the line in question to:

    let result = [traceShow [j,k,l,m] (distance j k l m) | j <- rangeLat, k <- rangeLng, l <- rangeLat, m <- rangeLng]
This will print "show [j,k,l,m]" when the actual distance is required. When I run the program, it prints what you'd expect:

    [[-90.0,-180.0,-90.0,-180.0]
    0.0,[-90.0,-180.0,-90.0,-177.5]
    2.2737639212328448e-32,[-90.0,-180.0,-90.0,-175.0]
    9.095050533892612e-32,[-90.0,-180.0,-90.0,-172.5]
    2.046381347867479e-31]
    [90.0,180.0,90.0,180.0]
    0.0
The stuff in brackets is the traceShow output, and it's mixed in with the printing of each result. Lazy evaluation is; it calculates only what is necessary. In this case, it's every (j,k,l,m) quad (so it knows which one is last), and then the 5 values you request.

FWIW, your program runs in 14.28 seconds on my (old) machine, and the strict version runs in 48.94. (This includes a foldl' over the list that ensures every value is evaluated. It may add a bit of overhead.)


great, thanks for the follow up.



I don't think that "Not always" is a valid answer to "Haskell can be faster"; or rather, in the sense that it is a valid answer, I don't think your link proves it.


Of course. There are some benchmarks where Perl is faster than Java 6.


Why haven't you linked to them?


I want to discuss the nature of benchmarks, not which line of Java to change to make one program run one second faster.


Another, probably more detailed article on optimizing Common Lisp code in SBCL:

http://t-b-o-g.blogspot.com/2009/12/brians-brain-on-common-l...

It highlights the importance of profiling code first.


1. Looping with a floating point increment is silly here because you can't be sure how many iterations it takes.

2. C translation takes 11 seconds (compare to 25.6 for SBCL)

http://gist.github.com/288936#file_silly distance loop


Changing the loop indexes to integers is sillier because it makes the program not only different from the original one but also slower.


Incorrect code is silly. It is not unlikely that the different languages are performing a different number of iterations in these tests, correcting that by being precise about the range to be iterated over is not silly. The integer indices here produce the behavior that seems to be intended by the `<=` comparison in the originals.

The speed difference is negligible since all but the inner multiplication is hoisted out, and each inner loop takes over 250 cycles.


The speed difference is hardly negligible especially when talking about benchmarks. It's actually about 15% on my OS X laptop, less but still measurable on a Linux server. Your change actually has a greater impact on execution time than off-by-ones in the iteration counts which don't really take place.


15% is a bit higher than I benchmark, but here is a version that is guaranteed to do the correct number of iterations, increments in floating point instead of performing the slow cast, and is faster than performing the comparison in floating point math (because it occupies a different execution unit). Note that it is also less accurate than the cast for most definitions of what this function might be attempting to measure.

http://gist.github.com/288980


Is java known to be (about) 4x faster than python?


4x is low:

http://shootout.alioth.debian.org/u32q/benchmark.php?test=al...

Roughly speaking, anything time-critical needs to be done in c.


Thanks! You guys rock! Also, would you therefore guess most of Google's search engine is written in C?


I really don't know, but I would guess Google's search engine is almost all C/C++. When you are operating at that scale, it makes sense to invest huge programmer time for very small speed increases.

As to "normal" people, I do a lot of numerical computing using python+C. I end up with probably 90% of the code in python, and 10% inner loops in C. That's typically enough that the C code is taking, say, 50% of the time, at which it isn't worth it (for me) to move more python into C, since I could never buy myself more than a factor of 2 speedup.


Or you could, you know, use C++: http://stevehanov.ca/blog/index.php?id=95


I wish that blog post was a joke, as well as some of the comments there.


A couple of notes:

1.) A simple square macro will be faster than the expt function, because expt does special calculations for dealing with decimal exponents. All you really need to do is multiply it by itself. I think something like this:

(defmacro square (expression) (let ((symb (gensym))) `(let ((,symb ,expression)) (* ,symb ,symb))))

would work fine.

2.) you can turn degree to radian into a macro like so:

(defmacro degree-to-radian (deg) `(the single-float (* ,deg ,(/ (coerce pi 'single-float) 180))))

And avoid a couple of function calls and some multiplication. This way you also don't need to duplicate 'the single-float'...

On my machine, running clozure cl, this made it about 1/8th faster than the original optimized version.

3.) I was kind of wondering, could you do this sort of computation ahead of time for varying longitudes and latitudes, then store them in a table and do a simple look up and interpolation, instead of doing a lot of trigonometry (is this technically trigonometry?) at run time?

I'm not clear on the math exactly (whether it would work in this situation), but that sort of thing is done using pre-computed tables with fairly complex polynomial equations all of the time. In the case of a web app, it might be better to do this sort of thing (ram is cheap anyway).


> All the benchmarks were run 5x each on the same Debian machine

I refer the honourable gentlman to Zed Shaw's "Programmers Need To Learn Statistics Or I Will Kill Them All"

http://www.zedshaw.com/essays/programmer_stats.html


I'm actually a mathematician by training ;-)

I ran all tests 5 times each, and averaged the results. Each of the results for each of the tests were within 2% of each other, and the machine wasn't used for anything else while the tests were running.

I know it's not perfect, but I don't see any obvious sources of error ... I did provide all the source, so you're welcome to verify my results.


It shows, you are comparing optomized LISP with unoptomized Java and C. Java has a long startup process so it is relativly faster on 20 second benchmarks than sub second ones. As to C, it does what you tell it do do, so the compiler needs a little more handholding. I suspect LISP's compiler is not acctually doing the computations because they are never used, granted I have not really looked into it.


I don't count the startup time for any of the languages (Java included). And the runtime for Java was over 2 minutes.


Wow, that is slow. Well java's pow function uses floats on the exponent so it would be a lot faster if you used a temporary variable, and then multiplied it by its self. Other than that I don't know, most of these math functions should translate to a single ASM function so I don't know why it's that slow.

PS: I would probably inline the C function, but that's not a huge deal. My point was each language has its own optimizations; the only way to compare them is to write reasonably optimized programs in each of them and compare that.

Edit: I think Java's sin functions are slower than the HW implementation but more accurate, which is irrelevant because you don't care but it's something to look into. I dislike Java for other reasons, but it's easy to make code slow which code that looks very similar runs a lot faster.


... the only way to compare them is to write reasonably optimized programs in each of them and compare that.

That's one valid method. Another is to compare idiomatic programs from each language. Both are valid and both measure different things.


1. There is no C function.

2. Inlining in an actual C implementation doesn't matter because there is no register pressure and each trig function generates a real function call. Function calls to known addresses are really fast (~2 cycles), even indirect calls are fast compared to transcendental functions (~ 6 cycles). For comparison, the C implementation takes ~260 cycles each distance computation. This can't be improved much without vectorizing the trig functions.


1. There is no C function.

What do you call public static float distance(float latA, float lngA, float latB, float lngB) {

As I said but that's not a huge deal. Yes it's not a major slowdown, but a tight inner loop with a function call is begging for an inline.


> import java.util.Calendar;

might be your first clue...

> but a tight inner loop with a function call is begging for an inline.

The loop body makes several other function calls, you would be very hard pressed to measure the difference. And if it was a C function in a larger project, you probably wouldn't want to inline it because that would slow your compile times and cause you to lose modularity (i.e. you'd have to rebuild all client code rather than just update the DSO when you optimize the distance function). This function does enough work that it doesn't deserve to be inlined.


Yep, sorry, I was not really paying attention and saw what I was expecting to see.


I refer the honourable gentleman to the answer I gave a few moments ago :)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: