Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Switching from Common Lisp to Julia (tpapp.github.io)
140 points by fanf2 on Oct 15, 2017 | hide | past | favorite | 72 comments


1. The standard doesn’t guarantee double float arrays will be efficient.

Most implementations do it (CCL SBCL CMUCL ECL LW ACL SCL ...), except for a few unpopular implementations. Seven do what you want, but a couple don’t. What gives? Julia has neither a standard nor more than one implementation.

2. No efficient parametric/ad hoc polymorphism.

You can get far with inlining. You can also use libraries specifically built with this in mind. But you’re right, it’s not a built-in feature. But scientific codes are generally not super polymorphic anyway. Look at all the FORTRAN.

But besides that, Lisp let’s you build the language you want to use, and have it interoperate for the rest of the ecosystem. If you are surprised that building a language is work, then Lisp may not be the choice for you. If you’re willing to design the language you want for numerical computing, Lisp has the essential data types and primitives to allow you to do that, even portably.

If you want off-the-shelf features of Julia, then using Julia is more optimal than Lisp. If you want a language that will readily adapt without changing under your feet, without risking your programs not working 2, 10, or 25 years from now, then Lisp might be a good choice.


> Lisp let’s you build the language you want to use, and have it interoperate for the rest of the ecosystem.

I suspect this strength of Lisp could have undermined improvement of Lisp compilers in long term. That is, the fact that programmers can customize the language to run efficiently for their applications put less pressure to make the existing compiler better, compared to the languages that don't give programmers such flexibility.

I've worked on performance-sensitive commercial Common Lisp applications. We employed heavy macrology so that optimal instructions were generated in the performance critical regions. Effectively, it was reimplementing part of the compiler to deal with out domain-specific meta information such as parameterized types. It worked, but came with the cost of maintenance--as if we were maintaining another layer of the compiler. It was a burden. I'm sure there have been such effort spent in many other places and eventually abandoned.

Meanwhile, languages that give less power to the programmers, put lots of efforts to improving the compiler and/or the language that work more effectively with the improved compiler, and over a few decades, they have quite sophisticated compilers.

I still mostly use Lisp-family languages at work, but sometimes wonder if the power could have adverse effect in long term.


This is an excellent reply and I think it goes to the crux of the problem you were facing.

Perhaps, then, Julia makes sense in those cases.

Or, it's time for an enhanced Common Lisp implementation targeted specifically for scientific computing. I agree that one thing is "extending the language" and other is "implementing things that the compiler should have given me. "


SBCL is already suitable for scientific computing. All that's needed are good engineers who can write good libraries.


'Scientific computing' is a wide area. One recent shot at is CLASP, especially to reuse scientific computing tools from the C++ world:

https://github.com/drmeister/clasp


For DSL (DOmain Specific Languages) I use R for my data work. For general computing I use to be a Python and Lua person but now I use Racket, because I can create my own DSL for what I am doing. https://beautifulracket.com/

I totally get the reason why he would go to Julia for doing data jobs. Data structures and already done libraries are very valuable.


Scientific programming is heavily polymorphic. You just need to look at the interface to most well used libraries like BLAS.

Fortran and C conventionally deal with this by adding the relevant type signature to the function name.

Scientific programming is also, regularly, polymorphic in 2, (often parametric) types. This heavily motivated the design choices in Julia esp the multiple dispatch infrastructure that is descended from Dylan.


Most scientific codes are in the end working on large arrays of floats. There are exceptions, but many scientific codes don’t at all resemble, say, general purpose template libraries.

You can make them generic, but it’s not often you gain anything at all.


That's not true. At all. Almost all scientific libraries resemble template libraries but with bespoke, exposed name mangling and significant portions of the internal data structures specified in comments.

They do this, not because it's inherently a good thing, but because the advantages of using Fortran or C have outweighed the disadvantages. But where that's not true, for example in end user computing, Fortran has not been popular for decades.

What the Julia devs are attempting to do is to design a language that has the ergonomics of a Matlab or R to make scientific programmers productive whilst having the performance of a Fortran or C. They have a sophisticated type and dispatch system to help bring these goals together.

Now, their current focus appears to be more on the R/Matlab replacement than on becoming the language of choice for HPC, say. So the steady state performance is unlikely to be competitive at scale as yet.

Maybe they won't get there. Maybe the decisions they've made mean that it's effectively impossible to eek out those last few percent so we'll still have a 2 language model. I don't know but I applaud them for trying.

More, I applaud the way they have gone about their development: focus on usage, actively learning from other languages, very little linguistic dogma. I hope they succeed.


You should check out the Celeste project [1]. Julia is a more than viable option for HPC.

https://juliacomputing.com/case-studies/celeste.html


I was aware, but thank you for the link.

The fact that it can be used in an important use case is a necessary but not sufficient condition that it can replace the incumbents. Same goes for end user computing.

Now, it gives me great confidence that, without the apparent short term focus of the core devs in this field, there has been such progress as Celeste. This is a good sign that the fundamentals are right.

But I'm probably more encouraged that the community are focused on user experience and delivering an environment that scientists want to use. This is essential, IMO, and appears to be working well.


Short term focused work on Celeste, by at least one of the core devs, came at a very significant opportunity cost of not having an up-to-date, usable Julia debugger.


The Julia v0.5 debugger works. Why upgrade to v0.6 when the debugger is important? I'm sure the debugger will target v1.0. And maybe it was more fun (read interesting)to work on Celeste than to support short-lived intermediary versions?


While I agree that things like arbitrary precision are niche, there's lots to gain in standard uses. Let me point out a few areas where that's already very clear:

1. Allowing arbitrary array types means your algorithm can already work on the GPU via GPUArrays.jl (https://github.com/JuliaGPU/GPUArrays.jl). This means that all differential equation, optimization, numerical linear algebra, etc. routines which internally utilize the functions which GPUArrays implements will automatically compile a special version when encountering a GPUArray that will do all computations on the GPU without any data transfer back and forth.

2. Matrix types allow the user to specify algorithms for linear solving. Numerical linear algebra is an entire field based around coming up with good algorithms for solving linear systems (\), and the whole point of the field is to specialize on the type of matrix you have. With Julia, you can set it up so that way the user passes what type of matrix they have (Tridiagonal, sparse, PETSc for multi-node parallelism, or a special matrix type with \ overloaded to do multigrid) and the optimization, differential equation, etc. routine will use the fast linear solver routine specific to the problem. This makes it easy to expose very large amounts of performance gains and specialization opportunities to the user in ways that are usually automatic (i.e. sometimes that user doesn't even have to know!).

3. Complex numbers. Large fields of scientific computing (physicists and engineers) use complex numbers. Many libraries have pitiful support for complex numbers. Generic typing helps a lot here.

Those are 3 off the top of my head, and there are many more diffeq-specific examples that have come up as users have asked to have resizing control models with discrete variables mixed in with continuous variables, etc., and this can all be handled efficiently via the type system without having to restructure the core algorithms to handle these extra features.

The key thing here is that users can add new features specific to their problem into your algorithm as they need to, without having to modify your code, just by smart uses of the type system.


A lot of Julia code is not only generic in the element type, but also with respect to the array type: a linear solver written in Julia works on classical matrices, but also on banded real matrices or sparse arrays of BigFloats.


After reading shiro's comment below and a the comments made by Tamas (the article author), i think i understand in more depth the essence of the problem, and there's a pointed criticism made by the author: Due to performance considerations, they have a necessity for having the compiler deeply understand parametric types and apply the code optimizations accordingly.

Thus in this case Julia is a sane choice, unless some valiant, progressive Lispers want to fork one of the implementations and create a new Common Lisp implementation focused on high performance scientific computing...


>You can get far with inlining. You can also use libraries specifically built with this in mind.

This implementation of parametric types for Common Lisp looks nice:

https://github.com/cosmos72/cl-parametric-types

It allows you to do templating which additionally declares the (parametrized) types, so the compiler can optimize accordingly.

Worth a look.


The author mentions trouble with portability and reliance on standard guarantees, and then goes on to talk about Julia, a single-implementation language that is "rapidly evolving" with breaking changes and no standard to speak of?


You don't need to support multiple implementations when writing a library when there's only one

Whereas to ignore CL & stick to SBCL will be working to fracture the community-- perhaps necessary, but CL is already a niche community


>will be working to fracture the community--

The CL community knows one uses the implementation one needs. Lispers are using ECL whenever they need to embed with C code, ABCL to interact with JVM, CCL when they're with macs, this is not fragmentation at all, they are all compatible with the CL stanfard.


If Julia gets more popular, it is very likely that other Julia implementations will get developed.


Not necessarily. R doesn't have other implementations, for example, and Julia still has some way to go before it catches up with R popularity-wise. Or consider Python - it does have several implementations, but the vast majority of the ecosystem is centered around one, and many libraries don't work on the others in practice (especially anything written in other languages using the extensibility API).


There are other R implementations, see https://radfordneal.wordpress.com/2013/07/24/deferred-evalua... for some pointers.


How many of these are used in production?


Well, technically it can be argued that R is itself an implementation (of the language S). It's just that the alternative implementation has become defacto and replaced the original...


That's because the "single-implementation language that is "rapidly evolving" with breaking changes and no standard to speak of" still offers better guarantees that the respective CL ecosystem.


I still don’t know what these guarantees are. The Lisp ecosystem is extremely well understood and very stable.


Single implementation > standard clusterfucks. Look at Java, Perl, Python, Ruby, Php, etc., vs. CL, Scheme, C, C++...


I really don't understand how you're trying to compare those languages.

The first language you name in your "single implementation"-list is Java, which is standardized and has multiple implementations.

Maybe it's "quality" or "lack of bloat"? That can't be, because one of the largest clusterfucks in those departments, PHP, is on the left.

Maybe you're talking about popularity? The list of "standard clusterfucks" includes C, C++ and (though you don't mention it) JavaScript, which are some of the most popular programming languages in the world.

What are you trying to say?


Which standard clustefuck? Common Lisp is an ANSI Standard and code usually compiles and run straight away from one implementations from another without any change.


And to add to that, most non-standardized features like threads have de facto standard implementations that libraries rely on.


Any language that actually matters in the market reaches a point where it has multiple implementations.


I don't remember Java going down this path. Also the Julia license is very permissive which undermines the commercial case for, and removes the community imperative for alternate implementations.


Java always had multiple implementations since the early days.

That was one of the ways how Sun made money with it, by certifying implementations for the trademark symbol.


Java actually has a lot of implementations. [0]

Some examples you might know:

HotSpot, the base for Java SE and OpenJDK

OpenJ9, now run by Eclipse.

DoppioJVM, a JVM for JavaScript.

[0] https://en.wikipedia.org/wiki/List_of_Java_virtual_machines#...


What path? Java has many compatible implementations and I've never heard of anyone having problems because of this.


CPython, PyPy, Jython, IronPython, MicroPython, Nuitka, ..


Yet most of those don’t actually play well with on another, with various major and minor incompatibilities.


Are there any languages where the different implementations "play well with each other"? Perhaps C? I remember all the hassles of working with different Fortran compilers (and C++ compilers) simply because of different function name mangling in the .o files.

Which major incompatibilities are you thinking of between CPython and PyPy, which are the two I use most often? Are they more severe than the incompatibilities between the different Common Lisp, which the author describes?

(I regard C extensions as implementation-specific features outside of the Python language. The different Common Lisp implementations also have implementation-specific features which are not portable.)

MicroPython is deliberately not Python compatible (differences at http://docs.micropython.org/en/latest/pyboard/genrst/index.h... ) but the others have a goal of not having major incompatibilities.


>Are there any languages where the different implementations "play well with each other"?

Yes: Common Lisp, Java, C and C++, to give four examples. Javascript as well. All these languages have formal standards, which helps a lot.

>Which major incompatibilities are you thinking of between CPython and PyPy, which are the two I use most often? Are they more severe than the incompatibilities between the different Common Lisp, which the author describes?

PyPy attempts to implement the full Python 2.7 language but there are things missing. You argue that Python's CFFI features are "implementation-specific", let's just agree with this; but the Python 2.7 language also defines -as part of the spec- many modules, and PyPy does not implement all of those modules, so it's not implementing the full set of features defined by that spec.

Code written in Common Lisp should run (and often runs) correctly with no change in all mature Common Lisp implementations (and there are many of them), unless the code uses implementation-dependent features. Those implementations implement the full set of features defined by the ANSI Common Lisp standard.


Ahh, I see you have a different definition of "play well with each other" than I do. I was thinking of things like having two different C++ compilers generate .o files which can then be linked together. Since the ABI isn't part of the C++ specification, there have been many problems (see https://stackoverflow.com/questions/7492180/c-abi-issues-lis... ) in getting two different C++ compilers to work together. The platform-specific ABIs help, a lot, but don't solve all the problems.

Regarding missing modules, which ones? And don't C compilers have similar issues? https://en.wikipedia.org/wiki/C99#Implementations helpfully points out Clang "Supports all features except C99 floating-point pragmas", and Microsoft's compiler doesn't implement tgmath.h.

I thought also that most Python code runs correctly with no change in CPython and PyPy, where "correctly" includes not using implementation-dependent features like third-party C extensions or garbage collection behavior.


Java has many implementations: and it has a standard. Why is it on the left hand side of this equation, rather than being a strong counterexample to your argument?

And you really want to use PHP as an example of a good situation?


> The standard does not guarantee that this gives you an array of double-float: it may (if the implementation provides them), otherwise you get an array of element type T. This turned out to be a major difficulty for implementing portable scientific code in Common Lisp.

Wouldn't sticking to say SBCL (that does give you specialized arrays) be equivalent to using a one-implementation language that does the same?


For yourself, maybe. If you intend to share the code you write (or use code written by others), then implementation differences start to matter.


In case with lack of specialized float arrays, the worst that is going to happen is performance degradation. It's not like they are some non-standard extension.


Common Lisp has standard ways to define that code must run on particular implementations that support a certain feature. There is nothing wrong with developing libraries supported by just a few compilers.


Slightly off-topic, but what's your preferred stack for developing data-intensive applications that involve a lot of preprocessing, heavy statistics and machine learning plus a web front-end?

Julia sounds quite appealing as a replacement for MATLAB or R with some Dylan-like semantics plus types and really efficient code generation on LLVM. I wish the Racket - Chez merger lead to something that targeted LLVM to be able to do front-end and back-end stuff using Scheme.

A JVM-centric stack is one of my preferred alternatives. Clojure is great for data preprocessing and manipulation, plus ClojureScript for coding all front-end. Datomic, core.spec, core.logic, anglican, plumatic.plumbing just to name a few are a joy to use. Then there's Scala, which is also a great asset, and tons of fantastic Java / Scala libraries like Stanford NLP, Markov Logic Networks (Tuffy, RockIT...), Factorie, Deeplearning4j, etc. Sadly, Scala-Clojure interop is not very good.

Python is the other obvious option, with tons of good libraries, including a fantastic data analysis ecosystem built around NumPy, SciPy, Matplotlib and Pandas. Plus most deep learning libraries targeting Python first. I just feel the language doesn't scale that well, although things like Numba or Cython help.


Have you considered Lua? You can develop web apps with Lapis [1] and do your data analysis with Torch [2].

The tradeoffs compared to Python: LuaJIT is much faster, which should ease your scaling worries, but the ecosystem is not as developed.

[1] http://leafo.net/lapis [2] http://torch.ch/


It’s really not fair to compare CPython with LuaJIT.

If you compare LuaJIT with PyPy or Numba, the “much faster” argument will simply not be true. What’s different is that both of the above only cover a part of the language, but for specialised (e.g. numeric) application that’s often not a problem.


LUAJIT is often on par with C. How would that make it slower than Pypy?


> If you compare LuaJIT with PyPy or Numba, the “much faster” argument will simply not be true

Source? AFAIK it does hold true. Last time I benchmarked those, LuaJIT was significantly faster, with PyPy almost being on pair with vanilla Lua, due to Python being a much more complex language and harder to optimise.


I think LuaJIT is generally faster, not only because Python is harder to optimize.


Just want to note a ton of the torch people have largely moved to pytorch. Lua with LuaJIT probably works for some people, but I don't see it really growing in marketshare over python..


This is true, but the reason is just people already knowing python / python being more popular. Lua is a much simpler language than python or julia though, I'd absolutely still recommend it for data scientists who are new to programming.


Right but with a good enough FFI and wanting to actually deploy models in some way, having the infrastructure to deploy general purpose applications in the same language (even if it's slower than say: go or the JVM) is really appealing. Lua itself while simpler owns a very different part of the application space than python does. There's a reason more peopple know it.

I would actually recommend against lua because of the lack of libraries for doing every day data science tasks.

I mean look at what facebook had to do to justify using lua, it had to invent the notebook for it.


Hi,

My team builds deeplearning4j. We're aware of the massive demand for python and built a bridge to our tensor library: https://github.com/deeplearning4j/jumpy

This library does direct pointer mapping between our JNI based tensor library and cython (no network!)

So you could off load some of your work to the JVM using pyjnius (which this library uses underneath)

It's not a full solution yet but it's definitely a start to something promising!

We also import python models. We only support keras right now but our new autodiff library (samediff which will also be usable from python!) will handle onnx and tensorflow.

For visualization we tend to use zeppelin which has worked well enough in practice. If you have any specific suggestions or use cases I'm more than glad to take input though. We would love to build a python friendly JVM backend.

Other notable work in this space is what wes is doing with arrow. We are looking at using their tensor interop (it's still kinda green field yet..but it holds promise!) to do zero copy ETL between python and java. That should help as well.


R plus C++ or Fortran. R comes with a whole bunch of stuff built in for preprocessing, stats, it has some web front ends, etc..., and it's trivial to hook up Fortran or C++ code.


Note: I generally applaud Julia, it was designed taking many good choices.

However, I really don't understand how the author complains that CL doesn't "have" certain features that in fact are available by choosing a suitable CL implementation. But no, he prefers switching to a language with no standard (yet) and only one implementation...


He mentioned that they have them it is that they are just not updated or standardized since there are one person developing the library.


>"The standard does not guarantee that this gives you an array of double-float: it may (if the implementation provides them), otherwise you get an array of element type T."

Well, many Common Lisp implementations, which includes most of the famous ones like SBCL, CCL, and ABCL, will give you exactly...

... an array of double-float!!

So where is the problem?

>"However, this gets worse: while you can tell a function that operates on arrays that these arrays have element type double-float, you cannot dispatch on this, as Common Lisp does not have parametric types."

Well, there are many options. First, let me reiterate that an array of a certain element type, stays of that element type. Example:

    CL-USER> (defparameter *a* (make-array 0 :element-type 'double-float ))
        *A*
    CL-USER> (type-of *a*)
        (SIMPLE-ARRAY DOUBLE-FLOAT (0))
I'm going to give the simplest, quick&dirty options:

If you're using arrays of different element-types, option A is write your function and just use the ETYPECASE function to do the dispatch according to the element type, so you can ensure that you select the code correct to the element type. Use the DECLARE declaration specifier so the Lisp compiler knows which type is your array /array elements and thus the machine language code produced is optimal.

Option B is simple, just define classes or structs, one for each array type you intend to use; and then take advantage of CLOS dispatch so invoking the generic function dispatches to the correct code for each array.

These are two options which don't even need any macro solutions; there are many more options as well, i'm just proposing two.

Option C, more elegant, could be perhaps using Fare's "LIL" (Lisp Interface Library)? https://common-lisp.net/~frideau/lil-ilc2012/lil-ilc2012.htm...

I really think Julia is a nicely designed language, and is a language I often tell people to take a look at. However, if the author already has a program in Common Lisp, and has good experience of Common Lisp (which, by the way, means the author might be a quite skilled programmer), why not dig deeper into the facilities that Common Lisp brings to solve those problems?

NOTE: This is an edited version of the comment i left on the page.


Keeping in mind that Julia appears to be very very close to common lisp, some minor differences that you appear to lose going over:

- 1) lisp 1 vs lisp 2

- 2) Matlab syntax (y tho?), infix, and sygils everywhere

- 3) Clos and metaobject protocol and ability to adjust these as easily at run-time

You do appear to gain some things though in all honesty. I'm trying to program up an dataframe/analytics type program/library in Common Lisp as we speak, and I'll admit, getting that stuff right and efficient is hard work. Gives you immediate respect for anyone who has done the same in other languages.

Additionally:

- 1) Parametric types and dispatch on them

- 2) Potentially inlining and compiler optimisations and integration around their generic functions. My gut says SBCL would need some compiler magic and metaobject protocol type stuff to do the same, and it would no longer be standard Common Lisp. That's not necessarily a bad thing, there are some warts around in-builts/objects/generic functions in Common Lisp, but it would be a fracturing of the community to update/change them.


You’d only be able to inline if you know that your call-site is monomorphic. Also, there’s a Lisp library [0] for inlining generic functions that give you huge efficiency gains for certain use-cases (especially monomorphic ones). I couldn’t imagine this being a _library_ in any other language.

[0] https://github.com/guicho271828/inlined-generic-function


Julia does this automatically, it you force it with @inline


1) lisp 1 vs lisp 2

Eh. Emacs vs Vim.

2) Matlab syntax (y tho?), infix, and sigils everywhere

The target audience is scientific and mathematical users, not necessarily "programmers", and certainly not "software engineers". Matlab syntax, infix, and sygils read like math notation. The target use-case is mathematical and scientific programming. This is a benefit in my opinion.

Also, with respect to sigils, I can think of at least one popular sigil in Common Lisp: the #' reader macro for accessing functions! So much for Lisp-1 vs Lisp-2.

3) Clos and metaobject protocol and ability to adjust these as easily at run-time

Fair, but do you really need CLOS with a delightful type system and multiple dispatch?


And to be totally fair to point 3, I doubt that 99% of common lisp code either uses our needs these qualities.

Structures, types, generics, etc, do cover a lot of what people actually want in practice...


> Structures, types, generics, etc, do cover a lot of what people actually want in practice...

For scientific computing, no. However it is very useful in other areas such as web, agents, gui, where you have many methods to combine in many ways and OOP really suits in.


MATLAB syntax is pretty convenient for linear algebra. It makes things very readable.

Maybe Julia should be seen as an infix typed Lisp DSL for math. It'd make a good stack (see my other comment) with Clasp or any other Common Lisp that offers good bindings to LLVM.


CL-INFIX [0] gives you infix syntax in Lisp a la carte.

[0] https://github.com/rigetticomputing/cmu-infix/blob/master/RE...


There is already a lisp-based infix DSL for math: it is called maxima CAS.


Hi there,

I am also working on a data frame like library, using Tamas's experimental cl-dataframes as a starting point.

I'd be interested in learning a bit more about your effort - whats the motivation, why something like clml was not appropriate and so on. There might be an opportunity to collaborate?

Cheers


The problem I have with Lisp for scientific computing is that libraries matter. You really want to standardize not only what you are calling but also your notation because most of the people that you want to use the libraries are scientific domain experts and not "programmers": they may know like 20 functions and that's it (and that's fine!).

Lisps tend to love the "you can build anything in it! You can make all of your own syntax in it!" and I think that goes too far. Julia lets you do this with macros, but then there's a good convention to not overuse this. There has to be a balance between being too structured (which hurts innovation) and too dynamic (which makes it hard to learn someone's new library because everything is too different), and I think Julia strikes the right balance.


It is my biggest complaint with Common Lisp, that while the language supports efficient optimization based on type information, it is often a bit cumbersome to provide them. However, if you target is to write high performance numeric code, you should consider SBCL first of all. It is excellent in using type-information and has a very good type-inferencer, so the task is a bit easier, and it produces excellent code.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: