Topology looks for the patterns inside big data

bite_my_shiny_m · on Nov 20, 2015

As soon as I see the phrases "topology" and "big data" in the same title, I know it's an ayasdi plug.

boxy310 · on Nov 20, 2015

My work in statistical learning is always about multicollinearity, variable cardinality, and model selection. Understanding the topology of the data space is critical, and day-to-day concerns of data cleaning had made me lose sight of that. To that degree, this article was a fantastic reminder.

chestervonwinch · on Nov 20, 2015

I recently saw a talk on this, and I didn't understand the barcode charts then either. For example, I understand that the zeroth order Betti number gives the number of connected components. So, why does the bar code chart show multiple numbers for each radius value? Shouldn't it look more like plot of some non-increasing function where for each radius, there is a single zeroth order Betti number on the y-axis (since the number of connected components is non-increasing as a function of the radius)?

wmsiler · on Nov 20, 2015

The barcode is not usually showing the zeroth Betti numbers. It's usually showing the first Betti number, which intuitively counts the 1-dimensional holes in the space. For each radius, there could be lots of those (imagine several circles that all have a single point in common, then the first Betti number will be equal to the number of circles). At a given radius, there will be one bar over it for each 1-dimensional hole. If a bar is really long, i.e. the same hole exists for many radii, then we assume that it must represent actual structure in the data, rather than just noise.

You could do a barcode for the holes of any fixed dimension, but as you point out, the 0-th dimension case is relatively uninteresting, and as you get to higher dimensions, it's harder to visualize and interpret what is going on. So dimension 1 is most common.

chestervonwinch · on Nov 20, 2015

So, let me try to understand: Let's take the order 1 case. If I pick a value on y-axis and hold it constant, this refers to a particular "1D hole". As I move left to right, increasing the radius, the graph is colored black if this particular hole is present for the given radius and not-colored otherwise. Is it not misleading, then, to label the y-axis as the Betti number since this is a single, global number?

> the 0-th dimension case is relatively uninteresting

It was explained to me that the zeroth order Betti numbers have applications for clustering.

wmsiler · on Nov 20, 2015

Your description is correct. And to call the y-axis the Betti number is a bit misleading. If you are looking at a barcode for 1-dimensional holes, then at a given radius, the number of bars over that radius is the first Betti number (at that radius). So Betti number counts the number of holes, but the barcode graph is keeping track of each hole's "lifetime" as the radius changes.

> It was explained to me that the zeroth order Betti numbers have applications for clustering.

That is correct, so perhaps "uninteresting" was too strong :) The 0-th Betti number counts the number of connected components of the space. So if we are at radius, say, 1 and the 0-th Betti number is 3, then we know the data points can be put into 3 "dense" clusters. By dense, I mean that for every two data points A and B in the cluster, there is a sequence of data points that you could step on going from A to B where each step has distance at most 1. I don't know if that explanation made any sense.

umutisik · on Nov 20, 2015

Also very cool is the recognition of branching in the data by the computation of a persistent Borel-Moore homology. This is the method that was used in their cancer study.

aswanson · on Nov 20, 2015

Can anyone recommend a good introductory text for topology?

j2kun · on Nov 20, 2015

I have a few blog posts, although they are obviously more terse than textbooks in some respects. In order:

http://jeremykun.wordpress.com/2012/08/26/metric-spaces-a-pr...

http://jeremykun.wordpress.com/2012/11/04/topological-spaces...

http://jeremykun.wordpress.com/2012/11/11/constructing-topol...

http://jeremykun.com/2013/01/12/the-fundamental-group-a-prim...

icen · on Nov 20, 2015

Munkres is very good. If you're after algebraic topology, Hatcher is also good (and freely available online).

compactmani · on Nov 20, 2015

I second Hatcher. His AT book is a goto for me. He also has introduction to point set topology notes on his site which are appropriate for someone without a math background.

fao_ · on Nov 20, 2015

Recently I've been reading 'Introduction to Topology' by Mendelson, it's very good so far, although I'm skipping the exercises for the first reading (First reading to try and get the concepts, second reading to solidify them).

It's actually availible free from archive.org[0], which is a huge plus.

[0]: https://archive.org/details/IntroductionToTopology

dharmon · on Nov 20, 2015

I really like the Munkres book (just called "Topology").

efm · on Nov 26, 2015

For an overview of applied topology, Ghrist's Elementary Applied Topology is full of interesting ideas. Not rigorous, but inspiring. https://www.math.upenn.edu/~ghrist/notes.html

ColinWright · on Nov 20, 2015

A lot depends on where you're starting from, what you want to know, and why you want to know it.