Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Usually the hardest part of a learning pipeline is data gather and cleaning; once it is in a suitable format (such that it is easy to create a structured CSV file), the training part is probably the easiest part: just a few lines of Python code.


All parts of a learning pipeline are hard if you want to do it right. Gathering, weeding, and binning your data is meticulous and hard work, and while "a single run" is trivial, rerunning it over and over with new parameters or even a completely different model because the outcome made no sense whatsoever is not.

If updating a YAML file and hitting "run" makes that other "hardest part of learning" easier: hurray!


I agree. That's why some usually used pre-processing methods were implemented in the stable release.. and more is yet to come


And, arguably, data cleaning is the most overlooked part.



From a purely UX perspective, there’s a huge difference between “no lines of code” and “a few lines of code”.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: