Too often these types of optimisation ignore the actual real world usage but credit to the author for showing that it can seriously improve ActiveRecord performance (albeit 3-4x speedup rather than 200x). I have some Date-intensive models so I might consider using this.
I got something like a 10x performance improvement (for custom code parsing stock quotes from CSV) merely by rewriting Ruby's Date class in Ruby. The default Ruby Date implementation is awfully slow.
How many other people have half-started implementations of the same thing? That's why it's important to publish your code somewhere, even if it's "not good enough". Someone else can make it good enough.
Date.parse is really incredibly slow, because it tries all the format parsers until it get one that works. If the format you're trying to parse is low in that list, its going to be very slow.
If you know your format (iso8601, etc), you're better off just using the right parser instead of Date.parse.
Also the internal representation uses Rational, which can be pretty slow itself.
I implemented just the subset of the Date API I needed for the application, so it didn't do conversions to other calendar formats (Julian/ordinal/commercial etc.).
I don't know why their version is so slow, I can only imagine that the conversion to Astronomical Julian Day (the Ruby Date internal representation) is expensive, inefficient or both.
I used year/month/day ints as internal representation, which was fast for my use cases (parse date string, format date string, compare two dates).
Alternately, if you use a high-level-language like Haskell that can compile into machine code, you can rewrite something like regexp matching (like regex-tdfa[1]) and get better performance than what the stdlib gives you on some inputs.
Yeah, exactly. I rarely have to call out to C in my Haskell applications. When I do, it's only to get or send data to a C library that I don't want to rewrite.
Internally, things like Data.Vector work just like the C equivalent. The internal representation of the data is the same (i.e., Vector.Unboxed Double == a big block of memory with CDoubles in it), and the machine instructions that run on the data are the same.
High-level languages need not be slow. It's just that Perl, Python, Ruby, and PHP are.
Right, so if you cannot write solid C code, don't use a C based Ruby runtime, because not being able to write a fast library if need be cripples you as a programmer.
I don't know if it cripples you but it certainly makes the problem domains that you are going to be able to tackle effectively smaller. But a lot of the time you are going to get the most speed improvement by just looking for an appropriate algorithm.
Sure, and when I have chosen a good algorithm, I want to implement it in a language/runtime that is suitable for algorithmic code, not in Ruby or Python or PHP. To be able to do that is something I expect of myself and everyone I work with.
It's obviously not a formal distinction. Algorithmic means "spends a lot of time running loops doing something interesting" as opposed to "copies data back and forth between the UI and the DB". Or CPU intensive versus IO intensive.
[Edit] And just to be completely clear and also reap all the downvotes that I can get from disgruntled web devs: Python and Ruby are indeed totally unsuitable for writing CPU intensive, complex algorithms. If you write Ruby or Python code, avoid loops! Try to be declarative so all the loops stay in the C code.
In addition to the other comments, I will add that this is only viable in languages that are much slower than C. If you take Haskell, Ocaml, Go, Java, C#, Common Lisp, Factor and so forth, the speed increase is much lower since these languages are fast by themselves. Thus, the barrier of having to rewrite and maintain a C-implementation at the side is as high, and there is no fruit-tree on the other side of the barrier you can reap.
Languages in which CPU-intensive work is fairly expensive are the ones which benefits the most from a C-rewrite. Ruby is notoriously slow in that regard, which makes it a prime candidate for such a rewrite. And before people scold me: Synthetic benchmarks, like the alioth shootout:
is a good indicator of exactly the CPU-intensive power of a given language. Note however, that CPU-usage is not all constraints of a modern program. Which is, as an aside, why Ruby and PHP are viable options for writing code in.
Ruby is written in C. However, many of the standard libraries are written in Ruby itself. The author of this gem has gone and converted one of the Ruby ones to C for performance reasons.
Most Ruby developers will never use C unless they really care about extracting every last drip of performance from something. In most production deployments I've seen, the only gem that is compiled from C is the mysql one, for example.
> Most Ruby developers will never use C unless they really care about extracting every last drip of performance from something
Except having lots of legacy Ruby libraries written in C ignores the real problem and prevents the Ruby VM from evolving.
For this reason alone Ruby will never have the capabilities of the JVM ... which has it all ... kernel threads (no GIL), async IO and userspace threads (through libraries like Kilim) ... making all kinds of parallelism / concurrency paradigms feasible (in some tests Kilim scales better than Erlang).
For most web apps this doesn't matter because the heavy processing is done by the DB, but have you ever seen a usable DB written in Ruby?
Here's a freakishly scalable one written in Java with no C libs dependencies ... http://neo4j.org/
I wish people would start supporting projects like Rubinius more ... who's main bottleneck is the limited support it provides for C extensions, because for speed improvements it pays better long-term to invest in the VM rather then improving bottlenecks by dropping to C (which IMHO actively hurts the platform).
You compare a many thousand man years VM (HotSpot JVM) to one that has been largely written by one person in isolation ( Ruby 1.8 series ).
Ruby 1.9 has 90% of the features you mention above and RBX will soon remove the GIL on top of it. Async IO is arguably further along in Ruby then in Java.
Ruby will never be as fast as Java, but micro optimizations like the OP will not hold back the inevitable progress of ruby vms.
You compare a many thousand man years VM (HotSpot JVM)
to one that has been largely written by one person in
isolation ( Ruby 1.8 series ).
LuaJIT was also written by "one person in isolation" and yet it beats Java in some benchmarks.
There are techniques first researched in the Smalltalk/Self implementations that have been known for at least 20 years (roughly the same time Ruby was born) that could've been used in Ruby 1.9.
Ruby 1.9 has 90% of the features you mention above and
RBX will soon remove the GIL on top of it.
You're talking as if there's a list of checkboxes that just needs to be checked.
It doesn't work that way.
One reason the GIL is here to stay is because of the many libraries written in C that depend on it.
Another reason would be performance degradation on single threaded programs ... removing the GIL on the whole needs major architectural changes ... just putting fine-grained locks on all mutable data structures won't cut it.
And yet another would be the garbage collector which also needs to be optimized for true multi-threading, otherwise it becomes a bottleneck. So add this on your list ... Ruby also needs a top-notch generational garbage-collector.
Async IO is arguably further along in Ruby then in Java
It's odd to point to C extensions when talking about the need for a GIL.
The solution seems obvious. Wrap them in a mutex by default and introducing an optional API call to remove the lock.
This way the important libraries will be fixed up gradually by the community and after a bit of time you won't have any C ext's left that will make use of the mutex and therefore the GIL will be gone.
Useful libraries written in other languages (C in this case) can benefit the platform (Ruby). The most common case for this is for speed benefits, but there are other possible goals as well.
To paraphrase the JRuby team "We write Java so you don't have to".
I can't speak for PyPy, but Rubinius isn't a project of ruby-core... and MRI is written in all C. And MRI isn't the latest version of Ruby, YARV is. (MRI -> 1.8, YARV -> 1.9)
Finally, Rubinius is architected totally differently than YARV. Rubinius uses a JIT, YARV is just an interpreter.
Because you can prototype more easily, then implement what you're happy with in C.
It's funny, for some language communities (i.e. Java) being "self-hosted" is a big deal. I can't speak for Ruby but in the Python world, we simply don't consider it a major issue.
Sort of. I have no problem with core stuff being "C" because I am not using "C" to use that library, I am using the HLL to do what I would normally do. Python does this as well (probably Perl and Tcl too).