Hacker News new | past | comments | ask | show | jobs | submit login

> you don’t know what dimensions are important in advance

But again this is a huge red flag. I've seen so many data science projects that started with "well, let's just get started with collecting everything and we will figure out what is important later on" and then spent so much time on infrastructure that no useful insights were ever produced.




There's a balance to that. You don't need anything fancier than a Hadoop cluster to store everything. Nowadays you can get that packaged and working out of the box from a number of vendors.

Of course get that data out into analytical datasets is a whole different matter.


Unfortunately there’s no way to know if useful insights won’t be produced unless you explore the full data set.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: