Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

As my professor said: All good statistics is done before you have looked at the data.


True, for clean data. You can’t clean data without looking at it (a lot), though.


If you have data that's so 'dirty' that you can't decide on the filtering rules in advance (or based on only historic data), then what you have is garbage, not data. Therefore, we could call the art of shaping this into meaningful stories garbage science.


Tell me in a comment you have never worked with business data in your life.

Business data is full of minor inconsistencies which are not obvious until you sit in front of it. Products are sold by different units. Reporting ranges and aggregates are slightly different. Subsidiaries use categories which are close but not exactly identical.

There is generally plenty of massaging to do before you can get the information you need.


But where do you get your hypothesis from?


Whatever orifice you desire.


your hypothetical model of reality.

But that's a lot of work so businesses don't want to pay for it.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: