Is Julia seriously targeting the R user base? Or, if it is honest with itself, i...

complicated · on Nov 30, 2014

In regards to graphics, there is really nice work being done by Daniel Jones. His Gadfly.jl (https://github.com/dcjones/Gadfly.jl) package implements the grammar of graphics and it can already produce high quality plots similar to ggplot2.

I will also disagree with your statement about Julia not catering to statisticians. Especially as datasets grow, performance keeps becoming more and more important. And this essentially means that statisticians implementing new methods in R packages have to drop down to C++ via Rcpp anyway. And this is exactly the problem Julia tackles. Of course, you could argue that practitioners do not care about the trouble the method developer went through in order to optimize the code via Rcpp. Yet, the practitioner will just use the language which offers the statistical tools he needs, thus Julia will draw him in, if it can first attract method developers.

There's also the people working on systems biology. People there usually use Matlab to do all of their simulations, modelling, fitting the data, etc. Yet, for some of the data, such as gene expression data, which is fed into the models, it is a lot easier to analyze/normalize it with packages available in R/BioConductor. Here Julia could also provide a unifying interface, so you won't need to be constantly switching languages.

vegabook · on Nov 30, 2014

For my own use case, and I use 5-10-giga-plus financial tick datasets, I find Numpy to outperform Julia 3:1. R users who need perf have already found other ways (like me). I think this Julia performance argument is hugely overrated. When I don't need Numpy, R gives me more than everything I will eve need and I don't see where these developers are going to come from to reinvent the wheel for a language which does not target their user base. I just don't see what Julia brings to the table. Now if it had gone the whole hog and give us full infrastructure for ingesting massive streaming data, then it would have skated to where the puck is going to be (see Clojure/Storm). As it stands, the only main benefit I see for Julia, is as an open source (and better) replacement for Matlab.

complicated · on Nov 30, 2014

Actually, I think your point rather showcases that Julia could be useful for the statistics community! Due to performance reasons you occasionally have to abandon R to use Numpy, but for everything else you use R. Thus, in a sense you still have the problem that Julia tries to solve: You constantly have to jump between two different languages depending on the problem size.

Also, just like you chose Numpy, someone else might choose Julia. So this turns into a Numpy versus Julia comparison. And here I feel like (and this is not really an argument, just my gut feeling) Julia might be better at attracting people with a non-CS background, who want to implement some statistical methods or analyze their biological datasets.

vegabook · on Nov 30, 2014

Yes and yes. Julia is better because, having seen the mistakes of 3 x 20 year old languages, it is creating an (incrementally) better PythonRMatlab base language (not libraries). Had Julia dropped into my lap 5-10 years ago I would have been a fool to use anything else. The fact is though, it is very late to a party where everyone has already danced the night away, and just as the music is about to change (massively parallel multi node functional).

sgt101 · on Nov 30, 2014

I think (this is gut - not an authority!) that Julia is very well placed for the massively parallel multi-node functional future of which you speak. lazy.jl and the cluster implementations such as spark.jl are evolving.

infinite8s · on Nov 30, 2014

With true coroutine support it is only a matter of time before someone writes a pytoolz like package for Julia.

stewbrew · on Nov 30, 2014

"It's only a matter of time" in about any language. This isn't a valid argument.

yummyfajitas · on Nov 30, 2014

Julia does vectors well, but scalar computations are very important. The problem with vector computations:

    z = x*(v+w)

This involves a memory allocation for v+w, and a second one for the product x times result. One way to avoid this in numpy is careful use of out:

    z = zeros(shape=x.shape, dtype=float)
    add(v,w,out=z)
    multiply(x,z,out=z)

An even faster way, which only requires traversing the arrays once, and is more readable in my opinion, is a simple for loop:

    for i=1:len(x)
      z[i] = x[i]*(v[i]+w[i])
    end

This also lets you more easily put if statements inside loops, etc. You can also do accumulative calculations without creating intermediate arrays at all. I.e. one way to create a random walk is:

    walk = cumsum(bernoulli(p).rvs(1024))
    end_result = walk[-1]

Another way is:

    end_result = 0
    for i=1:1024
      if rand() < p
        end_result += 1
      end
    end

makeset · on Dec 1, 2014

This is not a problem with vector computations, but the particular implementation. The vector expression in your example is by far the most readable. It is literally the math you had in mind.

De-vectorization in code is like embedding ASM code: you had to write it out because the compiler sucks. Good language design should favor lucid and concise syntax, and good compiler implementation should make it not necessary to circumvent it for performance. In this case, the compiler should be implementing those vector expressions without allocating unnecessary memory.

WallWextra · on Nov 30, 2014

I've sort of been assuming that Julia is capable of inlining the vectorized notation into a single traversal over the array, at least for simple types. Is that not true?

EDIT: Not yet, I guess. http://pastebin.com/Tw5PuCcJ

data_scientist · on Nov 30, 2014

This problem is addressed by numexpr, which also handle multithreading.

dnautics · on Nov 30, 2014

gadfly interfaces with d3/svg, which while currently at the primitive stage, will ultimately be extremely powerful.

While julia 'prefers' for loops according to the docs, I tend to write things in vectorized form.... I have a toy neural net library I play around with (https://github.com/ityonemo/julia-ann/blob/master/NN.jl) - you'll note that the "nnlog" function is one line, vectorized, and the meat of the evaluated is a dot product. I did once write a benchmark comparing the unvectorized, for loop form of these matrix multiplications, and it was slower than the vectorized form (I didn't bother trying to figure out why).