Hacker News new | past | comments | ask | show | jobs | submit login

That confusion matrix makes Edward Tufte so very, very sad.

How on earth am I to tell from that visualization how often it mislabels financial emails as personal? Eyedropper the colors and hope the values in the shades of blue follow a linear scale?




As someone who regularly encounters these types of diagrams, I guess that I'm too close to the issue, because I'm not seeing the problem. To answer your question, financial e-mails are more likely to be mislabelled as personal than as professional, but not as often as it labels them correctly.

A quick curosry glance at the central diagonal tells me that Finance, Personal/Programming, Professional/EPFL, and Group work are the categories of e-mail which are most likely to be categorized incorrectly by the software. Looking at the columns tells me that the Academic and Personal are going to have the most misfiled messages in them.


> To answer your question, financial e-mails are more likely to be mislabelled as personal than as professional, but not as often as it labels them correctly.

Yes, but how likely is each mislabelling? There is no scale to indicate how the colors map onto probabilities.

As well as providing a scale, it would be helpful to make the heatmap an annotated heatmap, in which each square is labelled with the corresponding value (perhaps with values below some threshold omitted to reduce clutter).

Example: https://web.stanford.edu/~mwaskom/software/seaborn/_images/s...

(from the Seaborn docs: https://web.stanford.edu/~mwaskom/software/seaborn/generated...)

Edit: Consider another question - If an email has the true label 'Financial', is it more likely to be mislabeled or correctly labelled? I can guess, but without more knowledge of the scale I can't be certain.


If this was a formal paper, and if the code hadn't been freely available, I'd have agreed with you.

As it is, the specifics of how it mislabels his e-mail is not all that interesting to me, as I don't have access to his e-mail and so the specific numbers are pretty much irrelevant.


I agree with you it would be more sane to have a scale, but as others commented I included it more to have a rough picture of the results than something you can get a lot of specific conclusions from. To be honest for some reason the scale showed up incorrectly right in the middle of the matrix, so I just decided to omit it and leave it intuitively clear that dark blue=a lot.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: