This is a great read. I've been thinking alot about these basic concepts myself lately, especially the idea of central tendency.
Essentially, we can make up any method we want to summarize data and give us a single value that represents the central location of the data i.e. mean vs. least absolute distance vs. distance squared, etc.
I haven't thought much about the difference between "typical case" or "expected value" so that's a very useful distinction to be made, especially when deciding which method you want to pick.
One good way to think about it is to ask what norm ("distance") between all the datapoints and your "central tendency" statistic you want to minimize, as the article alluded to.
For L2 (squared distance), you get the mean: For fixed x_i,
sum of i=1..N of (x_i - M)^2
is smallest for M = mean.
For L1 (absolute distance), you get the median.
For L0 ("identical or not"), you get the mode.
You can come up with other and alternative concepts, but it's kind of neat that the 3 most common descriptions of central tendency pop out of this one unified approach.
Essentially, we can make up any method we want to summarize data and give us a single value that represents the central location of the data i.e. mean vs. least absolute distance vs. distance squared, etc.
I haven't thought much about the difference between "typical case" or "expected value" so that's a very useful distinction to be made, especially when deciding which method you want to pick.