In short *measure theory* is ordinary freshman calculus done in a way that is in...

In short measure theory is ordinary freshman calculus done in a way that is in some contexts significantly more powerful.

Here measure is essentially just a synonym for area, essentially just ordinary area.

Measure theory addresses both the differentiation and integration of freshman calculus, but most of the focus is on integration.

So, in freshman calculus, you are given, say, a real valued function of a real variable, say, function f where, for real numbers x, we have, e.g.,

f(x) = x^2 + 1

Then, say, we graph f and want the area under the curve for x in the interval [0,10]. Okay, you saw that more than 10,000 times in freshman calculus.

Well, in this case, measure theory will give the same answer for the area under the curve. The difference is how that area is calculated.

Here is the shortest description of where measure theory is different: As you recall, in freshman calculus you found the area under the curve by picking points on the X axis, that is, partitioning the X axis into little intervals, on each interval building a rectangle that approximated the area under the curve over that little interval, inserting more in the partition so that the longest of the little intervals had its length get as small as we pleased, adding up the areas of the rectangles, and taking the limit. That was it. That was the definition of what the integral was. Of course, to integrate x^2 + 1 you learned about, for any constant C, the anti-derivative

(1/3)x^3 + x + C

So, here's what measure theory does: Yup, it also works with a partition but, in this case, the partition is on the Y axis instead of the X axis. Then for each little interval on the Y axis, we get a horizontal bar and look at the parts that are under the curve. Then as we add points to the partition, we add up the ordinary areas of the relevant parts of the horizontal bars. The picture is less nice than in freshman calculus but, still, no biggie.

Now, how would one do, say, numerical integration? Sure: The same as in freshman calculus, say, Simpson's rule where work with little trapezoids. Nope, measure theory is not suggesting that we change that.

Here are four features of the approach of measure theory:

(1) There are some goofy, pathological functions that get the integration theory (the Riemann integral) of freshman calculus all confused. E.g., consider the function 0 on the rational numbers and 1 otherwise. Then the upper and lower Riemann sums of those little rectangles never converge to what we want. Bummer. Well, it follows from some theorems that the integral of measure theory rarely gets confused -- of course the theorems are very precise, but the generality is mind blowing, so much that it's darned tricky to construct a case where the measure theory integral fails.

(2) A big deal is what happens when a sequence of functions is used to approximate and converge to another function. A leading example is Fourier series. Well, on taking limits during this approximation, the Riemann integral can get confused when the measure theory integral (Lebesgue) does just fine.

H. Lebesgue was a student of E. Borel in France and did his work near 1900.

(3) Often we want to pass some limits under the integral sign. Again, Lebesgue does much better here than Riemann. Indeed, the Lebesgue integral has a super nice theorem on differentiation under the integral sign (from the TOC of Durrett, that theorem may be the last topic in that book -- it was a really fun exercise when I was studying that stuff).

(4) Notice a biggie: With Lebesgue, actually we used next to nothing about the X axis, that is, about the domain of the function we are integrating. In this way, right away, presto, bingo, we get a theory of integration that works on domains with much, much less in assumptions and properties than the real numbers or the usual finite dimensional real Euclidean vector space. In particular, we are now GO for doing probability theory -- the Lebesgue integral is used to define both probability of events and expectation of random variables. It was A. Kolmogorov in 1933 who noticed, and wrote a monster paper, on how to use measure theory and the Lebesgue integral for a solid foundation for probability theory. Since then for essentially all serious research in probability and stochastic processes, much of mathematical statistics, nearly all of stochastic optimal control, is based solidly on the Kolmogorov, i.e., measure theory, foundations.

So, from a mathematician not much interested in probability, probability theory is just a special case of measure theory where the total area (measure) is just 1. That's not literally or logically wrong but does discard a fantastic baby with any bathwater. Some of the results in probability are just astounding and powerful -- both beyond belief.

So, in measure theory, here is what a measure (a definition of area) is: We start with a non empty set, a space, say, X. From X we have some subsets we call measurable sets. So, measurable set A is a subset of X. In the special case of probability, the measurable sets are the events, that is, e.g., all the trials where our coin comes up heads.

We ask to have enough measurable sets so that all of them form a sigma algebra. Why? Otherwise we don't have much. A sigma algebra doesn't ask for much. The sigma part is supposed to suggest finite or countably infinite adding up, as in the usual use of the capital Greek sigma for summing.

Say our sigma algebra of subsets of our measurable space X is S (probability theory usually uses script F). Then we want the empty subset of X to be an element of S. For A in S we want X - A (the relative complement) to be an element of S. And for B(i) in S for i = 1, 2, ..., we want the union of all the B(i) to be an element of S. These conditions ensure that we will have enough sets in S to have a decently strong theory.

In what we commonly do in applied probability, we wouldn't settle for less. E.g., the empty set is the event it never happens. If H is the event of heads, then T = X - H, the relative complement, is the event tails. If H(i) is the event that flip i comes up heads, then the union of all the H(i) is the event the coin comes up heads at least once or doesn't always come up tails. In probability, those are all simple situations, and just from those we need a sigma algebra of events. And it turns out, that's enough, and has been since 1933.

So, for a measure, say, m, to measurable set A there is real number m(A), the measure (think area or, in probability theory, the probability) of A. Of course in probability we call the measure P instead of m and write P(A) for the probability of event A.

You can begin to see that we are essentially forced into how Kolmogorov applied measure theory whether we like it or not. Sorry 'bout that!

Well, for a measure m, we want countable additivity. So, for disjoint measurable sets B(i), i = 1, 2, ..., we want

m(union B(i)) = sum m(B(i))

for some sloppy math notation since I can't type TeX here!

Usually m(A) is real with m(A) >= 0, and commonly we need to have m(X) = infinity so that m can take on value infinity.

We can also extend to m(A) any real number or any complex number.

Measure theory is the total cat's meow for Fourier theory!

To get a sigma algebra of measurable sets we want, commonly we start with a topology, that is, its collection of open sets, and ask for the unique smallest sigma algebra for which each open set is also a measurable set in the sigma algebra.

When we do this on the real line and assign the measure of intervals their ordinary length and extend that to as many subsets of the reals as we can, we get Lebesgue measure for the real line. We get a lot of sets! It's a tricky exercise, that uses the axiom of choice, even to construct a subset of the reals that is not Lebesgue measurable. Powerful theory!

Suppose we have spaces X and Y, each with a sigma algebra of subsets and a function

f: X --> Y

Then f is measurable if for each measurable subset B of Y f^(-1)(B) is also a measurable subset of X. In measure theory, when we integrate a function, we ask that it be measurable. In the usual cases, it's even tough to construct a function that is not measurable. Darned near any limit of measurable functions is also measurable -- super nice theory.

In probability theory, a random variable is just a measurable function where its domain is a probability space, that is, a sample space Omega with a sigma algebra of subsets script F and a probability measure P.

Of course, there's much more, stacks, shelves, racks of books as difficult as you wish, but the above is a simple, intuitive view from 10,000 feet up.

Or, measure theory is a nicer theory of area and area under a curve that in all the simple cases gives you just what you have been used to!