Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I've used this example when teaching students about models used for predictive power vs. models used to better understand mechanisms:

You have a model that predicts with great accuracy that rowdy teens will TP your house this Friday night, so you sit up late waiting to scare them off.

You have a model with less predictive power, but more discernible parameters. It tells you the parameter for whether or not houses have their front lights turned on has a high impact on likelihood of TP. You turn your front lights on and go to bed early.

Sometimes we want models that produce highly accurate predictions, sometimes we want models that provide mechanistic insights that allow for other types of action. They're different simplifications/abstractions of reality that have their time and place, and can lead you astray in their own ways.



Can't you run the model in reverse? Brute force through various random parameters to the model to figure out which ones make a difference? Sure, it could have absurd dimensionality, but then it would be unlikely one could even grasp how to begin. After all, AlphaGo couldn't write a book for Humans about how to play go as well as it can.


That's what model interpretability research is. You can train an interpretable model from the uninterpretable teacher, you can look at layer activations and how they correspond to certain features, or apply a hundred other domain-specific methods depending on your architecture. [0]

Sadly, insight is always lost. In a noisy world where even with the best regularization, some fitting on it, or higher order features that describe it, is inevitable for maximizing prediction accuracy, especially if you don't have the right tools to model it (like transformers adapting to lacking registers [1]) and yet a lot of parameters within chosen architecture.

What's worse, bad expectations are often much worse than none. If your loan had been denied by a fully opaque black box, you may be offered recourse to get an actual human on the case. If they've trained an interpretable student [2], either by intentional manipulation or by pure luck, it may have obscured the effect of some meta-feature likely corresponding to something like race, thus whitewashing the stochastically racist black box. [3]

[0] "Interpretability in ML: A Broad Overview" https://www.lesswrong.com/posts/57fTWCpsAyjeAimTp/interpreta... [1] "Thread: Circuits" https://distill.pub/2020/circuits/ [2] "Why Should I Trust You?": Explaining the Predictions of Any Classifier" https://arxiv.org/abs/1602.04938 [3] "Fairwashing: the risk of rationalization" https://proceedings.mlr.press/v97/aivodji19a


This reminds me of another thing I use when teaching: a perfect model of the entire world would be just as inscrutable of the world itself.

I think having multiple layers of abstraction can be really useful and have done it myself for some agent-based models with high levels of complexity. In some sense, these approaches can also be thought of as "in-silica experiments".

You have a model that is complex and relatively inscrutable, just like the real world, but unlike the real world, you can run lots of "experiments" quite cheaply!


Great example, great way to explain it. Nice work.


Thank you!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: