Hacker News new | past | comments | ask | show | jobs | submit login

You can clean the data according to certain metrics: e.g., "We removed all results that were greater than 6 standard deviations from the result".

There are other metrics of dropping data that is "out there".

As someone finishing my Master's, I will be releasing all my work, including test inputs and reproducibility results, as a Mercurial repository. Some of it depends on an exterior compiler, but the interesting ideas don't. I feel it is absolutely critical and honest to release source code and all data.

See the CRAPL: http://matt.might.net/articles/crapl/.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: