The R portion needs some serious filling - for starters: you include the Julia wrapper to glmnet, which was originally implemented in R.
glmnet - lasso/ridge/elastic net glm models.
e1071 - SVM classifiers.
randomForest - random forest classifiers.
mixOmics - a good collection of component-based approaches (PCA, ICA, PLS, etc. includes sparse variants of all of the above is feature selection is required).
Why do people keep complaining about the content or giving suggestions here? I'm pretty sure the original list was created on Github exactly to encourage contributions (push requests).
- TMVA (Toolkit for Multivariate Analysis): Widely used in physics, esp. particle physics. Has every classifier you can think of and the kitchen sink, neural nets, BDTs, support vector machines, fisher discriminants, etc.. You can use it for parameter estimation, classification, discrimination and other use cases. Is closely integrated with the ROOT framework, which has a few quirks and gives it a bit of a learning curve, but once you get into it it's very easy to make a multivariate analysis. Also has bindings for Python. - http://tmva.sourceforge.net/
- NeuroBayes: Heard some good things about it, but havent tested it. Used in finance and particle physics. Commercial, but they have special licenses for research. I heard integration with TMVA is planned. - http://neurobayes.phi-t.de/index.php/public-information
A nice list! But this would be significantly more useful if it included project licensing information.
In my case, any library licensed under the GPL is automatically excluded from consideration, so this is a significant factor. I'd rather not spend any time on those.
LGPL is a borderline case. On one hand, it doesn't force itself onto all of your software. On the other hand, it contains the same patent claim landmine that the GPL does and that landmine is considered to be dangerous by many lawyers (GPLv2 Section 7, LGPLv2 Section 11).
So, in practical terms, it depends on who my current client/employer/investor is. Myself, I'd rather not use any LGPLd libraries.
I just wanted to point out that downvoting what you disagree with is… well, let's just say it's not the right way to use HN.
My comment was precise, informative, in reply to a question that was asked of me, based on a number of legal opinions and more years of experience than many people here write into the "age" field on forms.
Weka should be there, it has so many useful tools included and I took a graduate machine learning course and used weka API for its project and it saved me lot of time. Highly recommended.
it's definitely missing some libraries. with that said, if you are doing any type of distributed machine learning, you almost have to use java via mahout.
glmnet - lasso/ridge/elastic net glm models.
e1071 - SVM classifiers.
randomForest - random forest classifiers.
mixOmics - a good collection of component-based approaches (PCA, ICA, PLS, etc. includes sparse variants of all of the above is feature selection is required).
caret - similar to Java's Weka.