Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Is Julia seriously targeting the R user base? Or, if it is honest with itself, is it going after the matlab people first. My sense is that engineers (Matlab users) and to a certain extent scientists (Python rules) will be drawn in, but the stats crowd requires subtly different priorities, that seem to be alluded to here. Graphics are one such priority. The excellent ggplot2 gets all the glory, but base graphics are mega-fast, extremely robust, and deeply customizable, and there is the unsung hero of grid graphics which provides an extraordinary set of abstractions for building things like ggplot2 and/or Lattice. My point is that so much graphical quality speaks directly to one of the key requirements of statisticians, where at the end of the analysis, communication is usually required. This is much less the case for engineers or financiers (big matlab users) for example, where correct and fast answers are the endpoint. Where is Julia on graphics? Last time I checked it was still trying to interface to R and/or Matplotlib.

The other thing that intrigues me is Julia's scalar computations being "at least" as important as vectors. This has the whiff of For loops (an immediate eye-roller for R people) accustomed to vectorization everywhere and essentially, exclusively. I am not suggesting that Julia doesn't do vectors well, just that, like any set of priorities, it is not catering first for statisticians, whose requirements are often quite different from those of scientists and engineers who use Matlab and Python.




In regards to graphics, there is really nice work being done by Daniel Jones. His Gadfly.jl (https://github.com/dcjones/Gadfly.jl) package implements the grammar of graphics and it can already produce high quality plots similar to ggplot2.

I will also disagree with your statement about Julia not catering to statisticians. Especially as datasets grow, performance keeps becoming more and more important. And this essentially means that statisticians implementing new methods in R packages have to drop down to C++ via Rcpp anyway. And this is exactly the problem Julia tackles. Of course, you could argue that practitioners do not care about the trouble the method developer went through in order to optimize the code via Rcpp. Yet, the practitioner will just use the language which offers the statistical tools he needs, thus Julia will draw him in, if it can first attract method developers.

There's also the people working on systems biology. People there usually use Matlab to do all of their simulations, modelling, fitting the data, etc. Yet, for some of the data, such as gene expression data, which is fed into the models, it is a lot easier to analyze/normalize it with packages available in R/BioConductor. Here Julia could also provide a unifying interface, so you won't need to be constantly switching languages.


For my own use case, and I use 5-10-giga-plus financial tick datasets, I find Numpy to outperform Julia 3:1. R users who need perf have already found other ways (like me). I think this Julia performance argument is hugely overrated. When I don't need Numpy, R gives me more than everything I will eve need and I don't see where these developers are going to come from to reinvent the wheel for a language which does not target their user base. I just don't see what Julia brings to the table. Now if it had gone the whole hog and give us full infrastructure for ingesting massive streaming data, then it would have skated to where the puck is going to be (see Clojure/Storm). As it stands, the only main benefit I see for Julia, is as an open source (and better) replacement for Matlab.


Actually, I think your point rather showcases that Julia could be useful for the statistics community! Due to performance reasons you occasionally have to abandon R to use Numpy, but for everything else you use R. Thus, in a sense you still have the problem that Julia tries to solve: You constantly have to jump between two different languages depending on the problem size.

Also, just like you chose Numpy, someone else might choose Julia. So this turns into a Numpy versus Julia comparison. And here I feel like (and this is not really an argument, just my gut feeling) Julia might be better at attracting people with a non-CS background, who want to implement some statistical methods or analyze their biological datasets.


Yes and yes. Julia is better because, having seen the mistakes of 3 x 20 year old languages, it is creating an (incrementally) better PythonRMatlab base language (not libraries). Had Julia dropped into my lap 5-10 years ago I would have been a fool to use anything else. The fact is though, it is very late to a party where everyone has already danced the night away, and just as the music is about to change (massively parallel multi node functional).


I think (this is gut - not an authority!) that Julia is very well placed for the massively parallel multi-node functional future of which you speak. lazy.jl and the cluster implementations such as spark.jl are evolving.


With true coroutine support it is only a matter of time before someone writes a pytoolz like package for Julia.


"It's only a matter of time" in about any language. This isn't a valid argument.


Julia does vectors well, but scalar computations are very important. The problem with vector computations:

    z = x*(v+w)
This involves a memory allocation for v+w, and a second one for the product x times result. One way to avoid this in numpy is careful use of out:

    z = zeros(shape=x.shape, dtype=float)
    add(v,w,out=z)
    multiply(x,z,out=z)
An even faster way, which only requires traversing the arrays once, and is more readable in my opinion, is a simple for loop:

    for i=1:len(x)
      z[i] = x[i]*(v[i]+w[i])
    end
This also lets you more easily put if statements inside loops, etc. You can also do accumulative calculations without creating intermediate arrays at all. I.e. one way to create a random walk is:

    walk = cumsum(bernoulli(p).rvs(1024))
    end_result = walk[-1]
Another way is:

    end_result = 0
    for i=1:1024
      if rand() < p
        end_result += 1
      end
    end


This is not a problem with vector computations, but the particular implementation. The vector expression in your example is by far the most readable. It is literally the math you had in mind.

De-vectorization in code is like embedding ASM code: you had to write it out because the compiler sucks. Good language design should favor lucid and concise syntax, and good compiler implementation should make it not necessary to circumvent it for performance. In this case, the compiler should be implementing those vector expressions without allocating unnecessary memory.


I've sort of been assuming that Julia is capable of inlining the vectorized notation into a single traversal over the array, at least for simple types. Is that not true?

EDIT: Not yet, I guess. http://pastebin.com/Tw5PuCcJ


This problem is addressed by numexpr, which also handle multithreading.


gadfly interfaces with d3/svg, which while currently at the primitive stage, will ultimately be extremely powerful.

While julia 'prefers' for loops according to the docs, I tend to write things in vectorized form.... I have a toy neural net library I play around with (https://github.com/ityonemo/julia-ann/blob/master/NN.jl) - you'll note that the "nnlog" function is one line, vectorized, and the meat of the evaluated is a dot product. I did once write a benchmark comparing the unvectorized, for loop form of these matrix multiplications, and it was slower than the vectorized form (I didn't bother trying to figure out why).




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: