Smile – Statistical Machine Intelligence and Learning Engine

jordigh · on March 29, 2017

"Matlab ©"

Hehe, if you're going to be defending the Mathworks' trademarks, the proper symbol to use is ®, not ©. But who can keep copyright and trademark laws apart, right? It's all the same as long as it's someone else telling you that some intangible thing isn't yours. /s

https://www.mathworks.com/company/aboutus/policies_statement...

But seriously, folks it's not your job to defend the Mathworks' trademarks. It's their job. You can be deferential and do their job for them, but it's not legally required. It seems like kow-towing to me, giving Matlab more respect than it deserves.

This actually is a pet peeve of mine, because I see so many non-Mathworks employees cargo-culting those ® and ™ symbols around without knowing why, because they see the Mathworks themselves using them so much. As a GNU Octave developer, it feels a bit odd to see how effective the Mathworks is at convincing their followers to display the proper respect and protocol towards their products.

zild3d · on March 29, 2017

While we're at it, the trademarked name is in all caps - MATLAB

malcolmgreaves · on March 29, 2017

Interesting project! As a Scala developer, I am always curious when I see a project that's mostly Java with a dash of Scala. Seems like a one meta-pattern these days is to use Scala to create a DSL around a Java project. I'd just go full Scala myself :D but it's nice to see Scala co-mingling happily with Java in a large, important, useful OSS project.

sampo · on March 29, 2017

How does adding

    Array(1.0, 2.0, 3.0, 4.0)

and

    Array(4.0, 3.0, 2.0, 1.0)

result in

    Array(1.7302967433402214, 2.547722557505166, 3.3651483716701107, 4.182574185835056)

jordigh · on March 29, 2017

Floats are weird and unpredictable like that. You can save yourself an RNG call if you simply add two floats. ;-)

More seriously, the result here really is x + y/norm(y). I would be really interested in knowing what the bug is here. Probably just a C&P error that forgot to update the result, since y is unitized (in-place?) in a call further below.

haifeng · on March 29, 2017

Is there something wrong in your code? I don't see the effect.

smile> val x = Array(1.0, 2.0, 3.0, 4.0)

x: Array[Double] = Array(1.0, 2.0, 3.0, 4.0)

smile> val y = Array(4.0, 3.0, 2.0, 1.0)

y: Array[Double] = Array(4.0, 3.0, 2.0, 1.0)

smile> x + y

res2: smile.math.VectorAddVector = Array(5.0, 5.0, 5.0, 5.0)

sampo · on March 29, 2017

I just quoted the example in the webpage (section "Vector Operations"), I didn't run it myself.

haifeng · on March 29, 2017

Good eyes :) It should be a c&p error.

sampo · on March 29, 2017

Ok, they have fixed the website now.

QuercusMax · on March 29, 2017

That does seem very strange.

vkb · on March 29, 2017

Came across this yesterday. Can someone (maybe poster?) talk about when you would use this versus something like scikit-learn or any number of R libraries? Is the goal simply to have all machine learning in Java so it can be productionized easier?

haifeng · on March 29, 2017

The project homepage says "Data scientists and developers can speak the same language now!". So it is surely easier to producitionize a ML project without rewriting the algorithms after the data scientists work out the model with R or Matlab.

madman2890 · on March 29, 2017

There are more python developers than scala developers. There are more python data scientists than scala data scientists. I like the project, though.

haifeng · on March 30, 2017

They are more Java developers than python developers :)

vkb · on March 30, 2017

I don't know that that's necessarily true. The most recent StackOverflow survey[1] shows a difference of 8%, which is not an overwhelming majority. Granted, that's not an unbiased sample size, but I think the OP above is correct...more data scientists use Python than Java.

So anyone wanting to use this library would have to think about tradeoffs: Are the efficiencies lost in data scientists learning to use Java for modeling worth the efficiencies gained in putting a model in production? For some, the answer may be yes, for some no.

[1]https://stackoverflow.com/insights/survey/2017#technology-pr...

SoleilAbsolu · on March 29, 2017

Looks useful...BTW the shell prompt for "Hello World" example is misspelled: smlie> "Hello world"

rz2k · on March 29, 2017

How does this compare to the implementations of Scala in Jupyter notebooks?

I was assuming that it would be something more like Matlab's gui environment or maybe RStudio.

It could be helpful to add an introductory paragraph, especially since "hello world" and "2+3" follow right after the heading "Linear Algebra".

ljw1001 · on March 30, 2017

Smile is an awesome library. If you use it in Java, Tablesaw is a data-frame-like data-munging framework that works well with it. https://github.com/lwhite1/tablesaw

cwyers · on March 29, 2017

It's not THAT R-like. Looking at the front page, their bar chart of performance of their machine learning algorithms is done in Excel. I like to think that no R user would post R benchmarks using an Excel bar chart.

huac · on March 29, 2017

running stuff from the console is R-like; clearly Java doesn't have a ggplot2

AdieuToLogic · on March 29, 2017

  ... clearly Java doesn't have a ggplot2

JFreeChart[0] is likely what many would reach for in the JVM ecosphere to perform ggplot2-type functionality, though Scala devs might want to use something like scala-chart[1] or similar.

0 - http://www.jfree.org/jfreechart/samples.html

1 - https://github.com/wookietreiber/scala-chart