Ruby's Date/DateTime classes rewritten in C.. 20-200x perf improvement

hopeless · on Aug 23, 2010

Too often these types of optimisation ignore the actual real world usage but credit to the author for showing that it can seriously improve ActiveRecord performance (albeit 3-4x speedup rather than 200x). I have some Date-intensive models so I might consider using this.

j-g-faustus · on Aug 23, 2010

I got something like a 10x performance improvement (for custom code parsing stock quotes from CSV) merely by rewriting Ruby's Date class in Ruby. The default Ruby Date implementation is awfully slow.

_pius · on Aug 23, 2010

Have you submitted that as a patch?

j-g-faustus · on Aug 23, 2010

No, I implemented only the subset of the Date API I needed for the application. A fully-fledged Ruby Date would be a larger project.

+1 for duck typing, a custom non-Date object can look like a date for operations that only use a subset of the methods.

jrockway · on Aug 23, 2010

How many other people have half-started implementations of the same thing? That's why it's important to publish your code somewhere, even if it's "not good enough". Someone else can make it good enough.

weaksauce · on Aug 23, 2010

Any idea on what edge cases you might be missing out on that would make your version X times faster? Or, why their version is so slow?

psadauskas · on Aug 23, 2010

Date.parse is really incredibly slow, because it tries all the format parsers until it get one that works. If the format you're trying to parse is low in that list, its going to be very slow.

If you know your format (iso8601, etc), you're better off just using the right parser instead of Date.parse.

Also the internal representation uses Rational, which can be pretty slow itself.

Date.parse: http://github.com/ruby/ruby/blob/trunk/lib/date/format.rb#L1...

edit: correction to how parse works.

j-g-faustus · on Aug 23, 2010

I implemented just the subset of the Date API I needed for the application, so it didn't do conversions to other calendar formats (Julian/ordinal/commercial etc.).

I don't know why their version is so slow, I can only imagine that the conversion to Astronomical Julian Day (the Ruby Date internal representation) is expensive, inefficient or both.

I used year/month/day ints as internal representation, which was fast for my use cases (parse date string, format date string, compare two dates).

binomial · on Aug 23, 2010

I've been using Ryan Tomayko's date/performance library, which is pretty fast, especially if you use it along with his date/memoize module.

http://github.com/rtomayko/date-performance

heresy · on Aug 23, 2010

Despite the problems with the Date classes, I still think this is ultimately the wrong approach.

Dropping to C at the slightest hint of a performance issue is a band-aid, papering over deficiencies in the VM.

gregwebs · on Aug 23, 2010

I hope this becomes a default at least for 1.8. The performance improvement is less dramatic in Ruby 1.9

jpr · on Aug 23, 2010

I don't get it. Isn't one of the major purposes of using higher level languages to avoid writing in C?

parenthesis · on Aug 23, 2010

"one of the major purposes of using higher level languages [is] to avoid writing in C"

And one of the major purposes of using C is to write fast libraries that can then be called from high level languages.

lsb · on Aug 23, 2010

Alternately, if you use a high-level-language like Haskell that can compile into machine code, you can rewrite something like regexp matching (like regex-tdfa[1]) and get better performance than what the stdlib gives you on some inputs.

[1] http://www.haskell.org/haskellwiki/Regular_expressions#regex...

jrockway · on Aug 23, 2010

Yeah, exactly. I rarely have to call out to C in my Haskell applications. When I do, it's only to get or send data to a C library that I don't want to rewrite.

Internally, things like Data.Vector work just like the C equivalent. The internal representation of the data is the same (i.e., Vector.Unboxed Double == a big block of memory with CDoubles in it), and the machine instructions that run on the data are the same.

High-level languages need not be slow. It's just that Perl, Python, Ruby, and PHP are.

fauigerzigerk · on Aug 23, 2010

Right, so if you cannot write solid C code, don't use a C based Ruby runtime, because not being able to write a fast library if need be cripples you as a programmer.

weaksauce · on Aug 23, 2010

I don't know if it cripples you but it certainly makes the problem domains that you are going to be able to tackle effectively smaller. But a lot of the time you are going to get the most speed improvement by just looking for an appropriate algorithm.

fauigerzigerk · on Aug 23, 2010

Sure, and when I have chosen a good algorithm, I want to implement it in a language/runtime that is suitable for algorithmic code, not in Ruby or Python or PHP. To be able to do that is something I expect of myself and everyone I work with.

msbarnett · on Aug 23, 2010

Ruby and Python aren't suitable for "algorithmic code"? Is there any other kind?

fauigerzigerk · on Aug 23, 2010

It's obviously not a formal distinction. Algorithmic means "spends a lot of time running loops doing something interesting" as opposed to "copies data back and forth between the UI and the DB". Or CPU intensive versus IO intensive.

[Edit] And just to be completely clear and also reap all the downvotes that I can get from disgruntled web devs: Python and Ruby are indeed totally unsuitable for writing CPU intensive, complex algorithms. If you write Ruby or Python code, avoid loops! Try to be declarative so all the loops stay in the C code.

jlouis · on Aug 23, 2010

In addition to the other comments, I will add that this is only viable in languages that are much slower than C. If you take Haskell, Ocaml, Go, Java, C#, Common Lisp, Factor and so forth, the speed increase is much lower since these languages are fast by themselves. Thus, the barrier of having to rewrite and maintain a C-implementation at the side is as high, and there is no fruit-tree on the other side of the barrier you can reap.

Languages in which CPU-intensive work is fairly expensive are the ones which benefits the most from a C-rewrite. Ruby is notoriously slow in that regard, which makes it a prime candidate for such a rewrite. And before people scold me: Synthetic benchmarks, like the alioth shootout:

http://shootout.alioth.debian.org/u64/which-programming-lang...

is a good indicator of exactly the CPU-intensive power of a given language. Note however, that CPU-usage is not all constraints of a modern program. Which is, as an aside, why Ruby and PHP are viable options for writing code in.

EvilTrout · on Aug 23, 2010

Ruby is written in C. However, many of the standard libraries are written in Ruby itself. The author of this gem has gone and converted one of the Ruby ones to C for performance reasons.

Most Ruby developers will never use C unless they really care about extracting every last drip of performance from something. In most production deployments I've seen, the only gem that is compiled from C is the mysql one, for example.

bad_user · on Aug 23, 2010

> Most Ruby developers will never use C unless they really care about extracting every last drip of performance from something

Except having lots of legacy Ruby libraries written in C ignores the real problem and prevents the Ruby VM from evolving.

For this reason alone Ruby will never have the capabilities of the JVM ... which has it all ... kernel threads (no GIL), async IO and userspace threads (through libraries like Kilim) ... making all kinds of parallelism / concurrency paradigms feasible (in some tests Kilim scales better than Erlang).

For most web apps this doesn't matter because the heavy processing is done by the DB, but have you ever seen a usable DB written in Ruby?

Here's a freakishly scalable one written in Java with no C libs dependencies ... http://neo4j.org/

I wish people would start supporting projects like Rubinius more ... who's main bottleneck is the limited support it provides for C extensions, because for speed improvements it pays better long-term to invest in the VM rather then improving bottlenecks by dropping to C (which IMHO actively hurts the platform).

Dropping to C should only be done for code reuse.

xal · on Aug 23, 2010

You compare a many thousand man years VM (HotSpot JVM) to one that has been largely written by one person in isolation ( Ruby 1.8 series ).

Ruby 1.9 has 90% of the features you mention above and RBX will soon remove the GIL on top of it. Async IO is arguably further along in Ruby then in Java.

Ruby will never be as fast as Java, but micro optimizations like the OP will not hold back the inevitable progress of ruby vms.

bad_user · on Aug 23, 2010

     You compare a many thousand man years VM (HotSpot JVM) 
     to one that has been largely written by one person in 
     isolation ( Ruby 1.8 series ).

LuaJIT was also written by "one person in isolation" and yet it beats Java in some benchmarks.

There are techniques first researched in the Smalltalk/Self implementations that have been known for at least 20 years (roughly the same time Ruby was born) that could've been used in Ruby 1.9.

     Ruby 1.9 has 90% of the features you mention above and 
     RBX will soon remove the GIL on top of it.

You're talking as if there's a list of checkboxes that just needs to be checked.

It doesn't work that way.

One reason the GIL is here to stay is because of the many libraries written in C that depend on it.

Another reason would be performance degradation on single threaded programs ... removing the GIL on the whole needs major architectural changes ... just putting fine-grained locks on all mutable data structures won't cut it.

And yet another would be the garbage collector which also needs to be optimized for true multi-threading, otherwise it becomes a bottleneck. So add this on your list ... Ruby also needs a top-notch generational garbage-collector.

     Async IO is arguably further along in Ruby then in Java

You are probably kidding.

xal · on Aug 23, 2010

It's odd to point to C extensions when talking about the need for a GIL.

The solution seems obvious. Wrap them in a mutex by default and introducing an optional API call to remove the lock.

This way the important libraries will be fixed up gradually by the community and after a bit of time you won't have any C ext's left that will make use of the mutex and therefore the GIL will be gone.

That's the approach that rubinius is set to take.

lg · on Aug 23, 2010

rbx will use the immix collector

rue · on Aug 23, 2010

Uses.

andreaja · on Aug 23, 2010

Yes, but it's not turtles all the way down.

Useful libraries written in other languages (C in this case) can benefit the platform (Ruby). The most common case for this is for speed benefits, but there are other possible goals as well.

To paraphrase the JRuby team "We write Java so you don't have to".

lamnk · on Aug 23, 2010

What's the point of self hosted interpreters like PyPY or Rubinius then ? Shouldn't Ruby core team write MRI all in C to make Ruby faster ?

steveklabnik · on Aug 23, 2010

I can't speak for PyPy, but Rubinius isn't a project of ruby-core... and MRI is written in all C. And MRI isn't the latest version of Ruby, YARV is. (MRI -> 1.8, YARV -> 1.9)

Finally, Rubinius is architected totally differently than YARV. Rubinius uses a JIT, YARV is just an interpreter.

gaius · on Aug 23, 2010

Because you can prototype more easily, then implement what you're happy with in C.

It's funny, for some language communities (i.e. Java) being "self-hosted" is a big deal. I can't speak for Ruby but in the Python world, we simply don't consider it a major issue.

sigzero · on Aug 23, 2010

Sort of. I have no problem with core stuff being "C" because I am not using "C" to use that library, I am using the HLL to do what I would normally do. Python does this as well (probably Perl and Tcl too).

2mt_stephan · on Aug 23, 2010

Which is excellent if your primary use case is dealing with millions of date operations.