This is a great read. I've been thinking alot about these basic concepts myself ...

FabHK · on Feb 1, 2017

One good way to think about it is to ask what norm ("distance") between all the datapoints and your "central tendency" statistic you want to minimize, as the article alluded to.

For L2 (squared distance), you get the mean: For fixed x_i,

sum of i=1..N of (x_i - M)^2

is smallest for M = mean.

For L1 (absolute distance), you get the median.

For L0 ("identical or not"), you get the mode.

You can come up with other and alternative concepts, but it's kind of neat that the 3 most common descriptions of central tendency pop out of this one unified approach.