Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Some Lesser Known Machine Learning Libraries (paralleldots.com)
119 points by gargisharma on April 7, 2017 | hide | past | favorite | 51 comments


Pymanopt is a Python version of the Matlab code Manopt by Nicolas Boumal. It adds some features like automatic differentiation. I can't speak highly enough about both of these packages. A lot of people don't realize how useful manifold optimization is. The academic community has been aware of it for a while now, but it seems to have remained a bit of a secret.

It can be used to perform SVD, phase retrieval, global registration of point clouds, and low rank modeling.

I'm (very slowly) writing a Julia port of the code, so if someone wants to take over, feel free.


Yes. We love PyManOpt here at ParallelDots (the startup which wrote the blog).


My first reaction to this, "there must be a reason that they are not known", I click the link and "Error establishing a database connection" :D


We have fixed it now... We were not expecting so much traffic :)


SKOPT link is also broke.


We have fixed the link and added some more libraries.


Another fantastic library for the list - GRT https://github.com/nickgillian/grt

Really great lightweight c++ gesture recognition with DTW, HMM's and SVM's


This looks like a good addition. Will suggest to add it.


What about mlpy, shark, mlpack, shogun, orange, elki, HLearn, etc.? There is very much the list doesn't cover (no offense to the list authors---just, there is a lot out there).


Also probabilistic programming stuff: PyMC, Stan, Dimple, Church. Not sure if these are 'lesser known', PyMC is mentioned often, still much less hype nowadays than neural networks.


Thanks for the suggestions. I don't think PyMC3 and Stan are less known, hence did not suggest to add them in the list. Dimple and Church look good additions. Will get them appended.


Stan is pretty much known in the Bayesian statistic world no?

There are at least 3 Bayesian books in R using Stan.


Maybe the libraries are less known because they're lesser libraries?

---

"Less known libraries" would be libraries that fewer people know. "Lesser known libraries" would be libraries of a lower quality.

[source](http://english.stackexchange.com/questions/24719/difference-...)


You could hyphenate to clarify: lesser-known libraries.

But come on, the English language is ambiguous. We can figure out the meaning from the context. I'm usually a nit when it comes to vocabulary and grammar, but this one...

Actually, maybe the title should explain more of the purpose: Machine learning libraries that should be more popular.


That's a grammatical mistake in the title. All the libraries mentioned have a of work behind them. [disclaimer: I helped compile the list]


I will try scikit-plot, I usually waste time on plotting "by hand". I hope it works with Keras using wrappers.


Snap http://snap.stanford.edu/snappy/index.html, this library is also good and lot of work has been and best part is this runs in parallel mode.


If (unlikely outside High-Energy Physics) one is using ROOT, TMVA is a good option.


I wonder if we'll ever be liberated from libraries written using python-like syntax or python in general


I like coding in Python, so hopefully not any time soon.


If I was going to try machine learning I would code up from scratch rather than relying on black boxes.


Please don't spread bad advice. There is a reason we teach Python and not ia64 Assembly to beginners.


> There is a reason we teach Python and not ia64 Assembly to beginners.

Fashion is a reason, but it's not a good reason.


Relevance and accessibility are others.


we kinda do both, most basic functions and ops have direct complements in high and low level langs. Actually, a good course starts with logic, binary, and some electronics on the side.


These algorithms usually are not black boxes with respect to their mathematical content. And the code is usually open source, so you are free to audit them for your arrogant self.

The term "black box" is used to refer to the fact that it's not always clear how to interpret a model like a deep neural network. It's a statistical issue, not a software issue.


As someone who actually did this, got in to YC, raised a few million and built a team of 22 distributed across 6 timezones around said endeavor: Don't.

It's been 3.5 years since I did it and I don't regret it necessarily but the only reason I did was because I wanted something in an unserved category.


If I was going to try machine learning, I would code it myself from scratch rather than rely on black boxes.


You wouldn't even get to first base without re-inventing all those wheels, in the meantime your first competitor leveraging open source available libraries and models would plow you into the ground.

If you're running a business time to market matters. If you're a hobbyist you can afford to re-invent a wheel to just as a learning experience.

Don't ever mix the two, especially not if you're going to ask other people (friends, family, outsiders) to invest.


I agree with this quite a bit. I mentioned the fact it took me building a business around it to justify effort in to it. Doing it "right" requires a lot of time and a team.


Why not take a well tested open source machine learning library and read the source code. No black box, and no spending a lot of time having re-inventing the wheel just to get some work done.


Not just the code. The code is rarely enlightening. To actually understand what's happening you'll need to at least skim the research behind it.


Here is something you should check out in that case https://github.com/eriklindernoren/ML-From-Scratch . Although be warned, everything from scratch approach l, despite being addictive, is harder than standing on shoulders of giants and takes too much resilience. If you are even a bit of procrastinater like me, it's tend to stay a sweet fantasy a lot of times. It's only when I was half working on a project and had no other option, I could go for implementing my first algorithm from scratch.


I feel at this point this is not a sensible thing to do any more unfortunately. I totally get the impulse though. For instance, there is still nothing great on the JVM for deep learning with symbolic differentiation (deeplearning4j does not have this, correct me if this has changed).

On the other hand, I realize that between writing native interfaces, symbolic differentiation (e.g. writing a port of autograd), network optimisers, custom layers, parameter servers, multi-GPU scheduling and so forth, I'd spend years before getting to do what I wanted to implement in the first place.


We are adding this now to our tensor library: https://github.com/deeplearning4j/nd4j/pull/1750

I've also added numpy interop via our new python interface jumpy: https://github.com/deeplearning4j/jumpy

We are doing a lot more than autograd though, this is going to support dynamic computation graphs, give you direct access to a graph data structure and will later be usable from nd4s (our scala wrapper)

Rather than spending time going back and forth implementing all of those things you could just pitch in with our existing efforts (hint: you'd actually be getting something done rather than debating ;))

We have a parameter server for word2vec, various kinds of optimizers and the like:https://github.com/deeplearning4j/nd4j/tree/master/nd4j-para...

I'd also just like to note for anyone else reading this: Mulling over doing something helps no one.

If you see something that's open source that's close to what you want try engaging the authors to see what they have to say. Maybe they will guide you. We've done that recently for our lapack integration with cpu and gpu as well as various neural net implementations.

No offense but it kills me to see comments like this. I see tons of people complaining about features yet doing nothing to add them let alone engaging open source authors.

It's kinda funny - every time someone has actually did that I've hired them. The developers that actually take action when engaging open source are amazing people.I have a feeling it's because they take the time to learn and get their feet wet even if it's initimidating.

Other neat community initiatives include flink: https://issues.apache.org/jira/browse/FLINK-5782

Nasa (Apache Tika): https://github.com/apache/tika/pull/165

A language for our ETL library DataVec (supports binary vectorization AND sql like transformations!): https://github.com/deeplearning4j/DataVec/issues/224

A scala lib like tensorflow built on top of nd4j: https://github.com/ThoughtWorksInc/DeepLearning.scala

Our spark ml integration: https://github.com/deeplearning4j/deeplearning4j/tree/master...

The community is very active. We have 4200 people in a gitter room alone: http://gitter.im/deeplearning4j/deeplearning4j


I have contributed to dl4j though ;)

I did not mean not criticise dl4j at all, I was simply pointing out an example of a feature I know I was missing at a point, I think we are actually agreeing. It does not always make sense to start something from scratch even though it's fun and a great learning experience. The ramp-up to something really useful in deep learning is simply very high. Further, few people can be an expert on the whole stack and I have no problem admitting to myself that even if I spent 2 years writing something from scratch, many parts would simply not be as good as something I could copy from an existing open source library. That's why contributing to open source also makes more sense to me - you get to work on a part that you can be good at.

Also should point out that when I was having problems with custom loss functions a year ago you guys were extremely helpful on Gitter in discussing issues.


Hard to tell from an HN user name :D. That's great to hear! I get how hard it can be - what you get out of it is learning though. We have some seriously cool examples that are just weekend projects for folks right now with javafx for example: https://github.com/deeplearning4j/dl4j-examples/pull/421

A lot of community contributions are in the examples now, if you haven't used dl4j in a while maybe take a look.


You need to have some demos that one can download and get to work easily. Before you say "we do - look at this link..." see sentence 1.


https://deeplearning4j.org/quickstart

I agree with you if my aim were to mainly promote new users here - I was more targeting someone who knew what dl4j was already and had maybe used it.

I usually don't comment unless someone mentions the library by name. 99.99999% of the people who comment on here are going to likely be more interested in python in which I usually point them at keras.

Thanks for the feedback though! I'm not sure what to do beyond "git clone and import into intellij".

If you'd like feel free to file an issue on I'm guessing? the nd4j repo you were looking at? We always take feedback seriously if people take 5 seconds to post problems they've found. My head of training does our docs and videos and updates the site when he can.

Here are some of our youtube videos: https://www.youtube.com/channel/UCa-HKBJwkfzs4AgZtdUuBXQ


I've already sunk an hour or two into dlj4 and both times I tried it there were install issues. Maybe it's better now but I'm cutting my losses at this point.

Also intellij is 50% of Java developers but likely a much smaller % if you remove the users just using it to dev on Android.

Maybe if I stop being so lazy i'll package my ML lib up.

Anyhow if there are ppl out there looking for a good Java ML lib here are the ones I use:

* Neural Net general lib - Encog

* NLP - StanfordNLP


Hmm - I would recommend our gitter channel: https://gitter.im/deeplearning4j/deeplearning4j

I'm not sure what "install issues" you'd have. Maven is all you need to know.

We're not any different than any other java library out there. There's nothing to "install". It's a library you use via maven/gradle/sbt just like anything else. Rather than commenting on hacker news about this (where most of my devs won't see it) file an issue and say what you had trouble with.

Considering the baseline in deep learning is usually "Install from source here's your c compiler" it's not going to be nearly as bad.


could you define "dl4j install issues"?

All you have to do is setup maven/gradle/sbt project (which majority of JVM based projects of this decade do), and add dl4j dependencies to it! Why so hard?


I guess, it purely depends on what you hope to achieve. If you're going to spend a few months learning how ML works, sure you'll benefit immensely. But if you're planning to apply in some field/area, writing your own library and making sure it's better than others, well, it's not going to be easy or fast..


Probably a better thing to do would be to download one of the many excellent open source libraries and explore how they work. You can then even contribute back. Many open source projects would really benefit from new users contributing to documentation for a start.


Could you name a few such open source projects ?


scikit-learn is a production ready library that has some very well commented and easy to read source code.

https://github.com/eriklindernoren/ML-From-Scratch is an easy to understand understand implementation of some the basic ML algorithms built from first principles and aims for readability over performance.


I was asking about the ones which don't have very good documentation and are in need of contributors.


I find that the XGBoost documentation is pretty lacking.


to code a machine learning library from scratch, you must first invent the universe


If you wish to make an apple pie from scratch, you must first invent the universe. - Carl Sagan


How many roads must a man walk down? Forty-two - The mice.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: