Hacker News new | past | comments | ask | show | jobs | submit login
Machine Learning frameworks, libraries and software (github.com/josephmisiti)
163 points by misiti3780 on July 16, 2014 | hide | past | favorite | 35 comments



The R portion needs some serious filling - for starters: you include the Julia wrapper to glmnet, which was originally implemented in R.

glmnet - lasso/ridge/elastic net glm models.

e1071 - SVM classifiers.

randomForest - random forest classifiers.

mixOmics - a good collection of component-based approaches (PCA, ICA, PLS, etc. includes sparse variants of all of the above is feature selection is required).

caret - similar to Java's Weka.


Or simply a link to the ML CRAN Task View: http://cran.r-project.org/web/views/MachineLearning.html


I suggest adding Torch. http://torch.ch/ It is a scientific ML framework written in LuaJIT. Recently it was recommended by Yann LeCun, Director of AI in Facebook. http://www.reddit.com/r/MachineLearning/comments/25lnbt/ama_...


Why do people keep complaining about the content or giving suggestions here? I'm pretty sure the original list was created on Github exactly to encourage contributions (push requests).


Some more for C++:

- TMVA (Toolkit for Multivariate Analysis): Widely used in physics, esp. particle physics. Has every classifier you can think of and the kitchen sink, neural nets, BDTs, support vector machines, fisher discriminants, etc.. You can use it for parameter estimation, classification, discrimination and other use cases. Is closely integrated with the ROOT framework, which has a few quirks and gives it a bit of a learning curve, but once you get into it it's very easy to make a multivariate analysis. Also has bindings for Python. - http://tmva.sourceforge.net/

- NeuroBayes: Heard some good things about it, but havent tested it. Used in finance and particle physics. Commercial, but they have special licenses for research. I heard integration with TMVA is planned. - http://neurobayes.phi-t.de/index.php/public-information


There is a NeuroBayes plugin for TMVA. I've put it on Github with some additional patches: https://github.com/sroecker/tmva-neurobayes


A nice list! But this would be significantly more useful if it included project licensing information.

In my case, any library licensed under the GPL is automatically excluded from consideration, so this is a significant factor. I'd rather not spend any time on those.


How do you feel about LGPL?


LGPL is a borderline case. On one hand, it doesn't force itself onto all of your software. On the other hand, it contains the same patent claim landmine that the GPL does and that landmine is considered to be dangerous by many lawyers (GPLv2 Section 7, LGPLv2 Section 11).

So, in practical terms, it depends on who my current client/employer/investor is. Myself, I'd rather not use any LGPLd libraries.


I just wanted to point out that downvoting what you disagree with is… well, let's just say it's not the right way to use HN.

My comment was precise, informative, in reply to a question that was asked of me, based on a number of legal opinions and more years of experience than many people here write into the "age" field on forms.


http://www.shogun-toolbox.org/ has a lot of algorithms, is super fast and has bindings for many languages, check it out!


added!


Weka should be there, it has so many useful tools included and I took a graduate machine learning course and used weka API for its project and it saved me lot of time. Highly recommended.


added


A well maintained library for c# lovers http://accord-framework.net/


There is Incanter for Clojure that is missing.

http://incanter.org/


Javascript is missing a few NLP tools POS - https://github.com/dariusk/pos-js and Node Natural - https://github.com/NaturalNode/natural


added


For Javascript Bayesian Bandits/Thompson Sampling - https://github.com/omphalos/bayesian-bandit.js


added


I have a pretty good decision/tree random forest library for go:

https://github.com/ryanbressler/CloudForest


Please, add Clojure and it's libs (see http://www.clojure-toolbox.com).

And sort languages alphabetically, please.


fixed


cloJure, not cloSure.

Also, I thought you would add all Machine Learning libs, not just link to the Clojure Toolbox.


fixed


I'm surprised at the lack of tools for Java. Do Java developers really not do machine learning? Or is this resource missing some libraries?


it's definitely missing some libraries. with that said, if you are doing any type of distributed machine learning, you almost have to use java via mahout.


the scala libraries like Factorie work just fine from java.


In what sense is this list "curated?" I'm seriously asking. Have you used all/most of these, and can recommend them?


Another huge ecosystem of Java tools is http://gate.ac.uk/


I wish they'd list the license next to each of the projects so I can avoid those that are work unfriendly.


How about http://h2o.ai/ ?


where does it go ?


It goes to "The Open Source In-Memory Prediction Engine for Big Data Science"


typo: Julia -> General Purpose ... Kernal Density - Kernel density estimators for julia




Consider applying for YC's Summer 2025 batch! Applications are open till May 13

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: