Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

No, it's emphatically not a great feature, and it's not clear to me the commenter was recommending that so much as making a nit. Please don't automate the process of choosing and running algorithms on a single sample of data, it's unsound experimental design that undermines your results. If you insist on doing it anyway, at minimum you will need to automate an initial assessment of the sample data to determine if it has a suitable size and distribution to allow you to adjust the significance of results for the number of tests you're running, and partition the data into smaller subsamples.


Hi, thanks for your comment. I actually understood that he meant something like a hyperparameter search/tuning using cross validation (at least that what came in my mind).


Parameter tuning and algorithm selection! I just don’t want to manually start 5 different runs of algorithms i believe which could work good on the data and manually compare the results. And maybe i was too lazy to run the 6th algorithm which now performs much better.

But to be sure, every test should be done with k-fold cross validation. The decision whether to split the training set should not be chosen by the user. It‘s crucial that this is a must!


Cross validation would be good! I think if you build this in you could automatically run a few heuristics to see if the data can be partitioned, or maybe just prompt the user for another sample of the data with the same distribution.




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: