For new entrants, here's an email I sent out to some colleagues of mine just getting into ML. I'm wrapping up a project that's using Mahout, and am getting into Spark & MLlib now. I've regurgitated this on reddit already.
I've been following Apache Spark [0], a new-ish Apache project created by UC Berkeley to replace Hadoop MapReduce [1], for about a month now; and, I finally got around to spending some time with it last night and earllllllly this morning.
Added into the Spark mix about a year ago was a strong Machine Learning library (MLlib) [2] similar to Mahout [3] that promises much better performance (comparable/better than Matlab [4]/Vowpal Wabbit [5])
MLlib is a lower level library, which offers a lot of control/power for developers. However, Berkeley's Amplab has also created a higher level abstraction layer for end users called MLI [6]. It's still being actively developed, and although updates are in the works, they haven't been made available to the public repository for a while [7]
Getting up to speed with Spark itself is really pain-free compared to some tools like Mahout etc. There's a quick-start guide for Scala [8], a getting started guide for Spark [9], and lots of other learning/community resources available for Spark [10] [11]
I've been following Apache Spark [0], a new-ish Apache project created by UC Berkeley to replace Hadoop MapReduce [1], for about a month now; and, I finally got around to spending some time with it last night and earllllllly this morning.
Added into the Spark mix about a year ago was a strong Machine Learning library (MLlib) [2] similar to Mahout [3] that promises much better performance (comparable/better than Matlab [4]/Vowpal Wabbit [5])
MLlib is a lower level library, which offers a lot of control/power for developers. However, Berkeley's Amplab has also created a higher level abstraction layer for end users called MLI [6]. It's still being actively developed, and although updates are in the works, they haven't been made available to the public repository for a while [7]
Check out an introduction to the MLlib on youtube here: https://www.youtube.com/watch?v=IxDnF_X4M-8
Getting up to speed with Spark itself is really pain-free compared to some tools like Mahout etc. There's a quick-start guide for Scala [8], a getting started guide for Spark [9], and lots of other learning/community resources available for Spark [10] [11]
[0] http://spark.apache.org/
[1] http://hadoop.apache.org/
[2] http://spark.apache.org/mllib/
[3] https://mahout.apache.org/
[4] http://www.mathworks.com/products/matlab/
[5] https://github.com/JohnLangford/vowpal_wabbit/wiki
[6] http://www.mlbase.org/
[7] http://apache-spark-user-list.1001560.n3.nabble.com/Status-o...
[8] www.artima.com/scalazine/articles/steps.html
[9] http://spark.apache.org/docs/latest/quick-start.html
[10] http://ampcamp.berkeley.edu/4/exercises/
[11] https://spark.apache.org/community.html