Hacker News new | past | comments | ask | show | jobs | submit login
GoLearn – Machine Learning in Go (sjwhitworth.com)
109 points by alixaxel on April 27, 2014 | hide | past | favorite | 23 comments



I have found that most difficult thing about doing machine learning in Go is the lack of a really good matrix library. I always just end up using Python because I end up wanting to do something like "find eigenvectors." Easy in Python but no one has yet wrapped up a nice interface into the BLAS/LAPACK libraries to do this from Go. (That I know of! If you know a good library for this let me know!!!!)

Numpy and Scipy are so mature in this respect it is difficult to compete with them. I looked into implementing eigenvalue algorithms recently with the idea that I would just write a native Go library for doing this kinda stuff. However, reading the source of JAMA[1] was sufficiently humbling for me to realize this was not a good idea. (If you really want to be humbled try reading the fortran implementations in LAPACK.[2] I believe SRC/dgeesx.f is a good starting point)

[1] http://math.nist.gov/javanumerics/jama/ [2] http://www.netlib.org/lapack/#_software


When Go was released, back in 2009, I decided I wanted to use it for my machine learning experimentation software (I was year 4 of a PhD program). There was no matrix library, so I created one: https://github.com/skelterjohn/go.matrix



Ooo. I had not seen mat64 before. That looks very interesting. I believe I need LAPACK on top of BLAS with respect to the BLAS lib.

Thanks for the pointers!

EDIT: The lack of README and documentation beyond the API docs concerns me for mat64. still a pretty interesting project. Might be useful for non-critical stuff.


I am one of the developers, and at the moment I wouldn't use it for critical stuff. We would like it to be good (not just functional) and that takes time. We are not in "1.0" stage yet, and we make backward incompatible changes from time to time. Specifically, if the proposal for tables is accepted, the package will change a lot (and it will be awesome). The CBLAS package should work well, and I believe all of the goblas functions that are there have good tests. In my opinion, Mat64 needs a lot of work before it is a "premiere" package (a bunch is missing and a bunch is slow). That said, I use it in my work and many parts of it are good. We are interested in making it better, but it's entirely volunteer and it all takes time. It would be great to have people providing code/documentation/bug reports, so I encourage you to use it in non-critical stuff.


> Numpy and Scipy are so mature in this respect it is difficult to compete with them.

Tell me about it. What I'd give to have something equivalent for Objective-C (or certain other languages too, e.g. Julia). I'm looking at PyObjC as a stop-gap solution for now, but it sure adds complexity to a project.

Edit: Since SciPy is BSD-licensed and presumably mostly C behind-the-scenes, perhaps there's potential for a group to try and package it up for other languages? I have no idea how large an undertaking like that would be...


At least with Julia's PyCall, you don't need to sacrifice losing access to the Python stack. You can work with NumPy arrays without needing to copy data around.


Thanks! I didn't realise this. This looks really useful.


The author of numpy is making the next generation numpy, blaze (http://blaze.pydata.org/docs/index.html). There're also many python project, such like numexpr, blz, numpy aim to boost scientific computing in python. Having a strong community, I think python might dominate the data analysis in the nearly future.


iOS and OS X ship with Accelerate.framework, which include implementations of BLAS and LAPACK: https://developer.apple.com/library/mac/documentation/Accele...


Thanks – sadly, for myself, the algorithms I need aren't part of Accelerate (I've most recently been using SciPy for its spatial algorithms). The benefit of SciPy is that there's just so much breadth, along with an easy way of moving data between different parts of SciPy, and good documentation too. There's just nothing else like it that I know of.


GNU Scientific Library? I've not benchmarked it against Numpy/Scipy, but there is quite a bit of overlap in functionality.


It looks great. Sadly doesn't have the specific tools I need, and regardless the license is prohibitive if I ever wanted to publish something on the App Store (regardless of if I open sourced it myself), so it's not really an option. Not that that's GSL's fault of course.


There's a BLAS/LAPACK interface in biogo. https://code.google.com/p/biogo/ The maintainers had plans to turn that part into a standalone library. I have no idea on progress.


Considering this is four months old and the only method implemented is still just knn, I think it's disingenuous to (a) call this a library and (b) write in such general terms (admittedly, hindsight is 20/20). I don't mean to detract from the premise behind starting the project, but language like "I couldn't find any comprehensive ML library for Go, so I decided to write one" has a bit more hubris than is warranted.


Indeed. Also, the KNN implementation is rather lightweight. It only has euclidean distance, and basic matrix datastructure.

Even just KNN needs several distance metrics built in (manhattan, hamming, mahalonobis, to name a few) and a good ball tree implementation for use on large datasets so that the search time goes from N to log N.


Cool project.

There are already a couple of Machine Learning libraries[1][2] written in Go and some of them are actually more mature than GoLearn.

Also just curious, I always thought Go is not really a good language for DM/ML stuff due to lack of good matrix library and generics. If someone here actually tried to write any ML library in Go, what's your genuine feeling about it?

[1] https://github.com/huichen/mlf [2] https://github.com/xlvector/hector


I wrote a decision tree library (random forests, gradient boosting, etc) in Go while learning the language (https://github.com/ajtulloch/decisiontrees).

It's nice being able to trivially parallelise operations in Go - e.g. constructing the weak learners for a random forest, generating candidate splits, recursing down left and right branches, etc.

    // Recur down the left and right branches in parallel
    w := sync.WaitGroup{}
    recur := func(child **pb.TreeNode, e Examples) {
        w.Add(1)
        go func() {
            *child = c.generateTree(e, currentLevel+1)
            w.Done()
        }()
    }

    recur(&tree.Left, examples[bestSplit.index:])
    recur(&tree.Right, examples[:bestSplit.index])
    w.Wait()
As you said, generics and a matrix library would be make the experience nicer. Just having

    sort :: Ord a => [a] -> [a]
would strip a decent amount of mildly error-prone boilerplate, and there are other cases (splits for cross-validation, etc) where it would be nice to be able to abstract over the type of the slice, etc.


I've got a decision tree/random forest implementation as well. [1] I originally hacked out go code to analyze forests from other programs but have ended up finishing it off and optimizing it to learn faster then other libraries I've tried for my use cases (wide data with lots of categorical and missing values).

The language and tooling (pprof, go fmt, go doc) are great and make it quick to write and optimize stuff so it is well suited for my (largely experimental) purposes.

I also really like slices for writing efficient code as they let you pre optimize and reuse arrays and not have to keep track of the ending position.

Matrix libraries would be nice but you can call c ones via cgo. I am hopping for efficient pure go ones to be developed eventually so you can use them on app engine/nacl/exacycle or other untrusted code environments.

[1] https://github.com/ryanbressler/CloudForest


Hi, author here. Thanks for posting it. As you can see, it's definitely incomplete as it stands, as I haven't been able to spend as much time on it as I would have liked. I'm hoping that will change. If anyone fancies working more formally on it with me, send me a mail at stephen dot whitworth at hailocab dot com.


This is cool. Just wondering, how would speed compare to Julia?


Poorly - Julia calls well tested and really carefully programmed BLAS libraries for the numerical heavy lifting, this doesn't. It is likely to be less accurate, less stable, and slower just for that reason.

The comparison doesn't really work anyway - better would be Julia vs. Go, some ML toolkit in Julia vs. this.


This lib doesn't seems to be ready. A quick look at the code doesn't suggest any attention was given to the code beside throwing it on github. I don't understand this submission. There is nothing to be used here.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: