The tensor renaissance in data science

chestervonwinch · on May 7, 2015

Every time I get hear about tensors:

"It's just a multidimensional array!"

Ok, simple enough. But then I go to wikipedia get hit with differential geometry, manifolds, multi-linear algebra, etc...

I mean these are all topics I've been meaning to learn for a while, but each one seems like it would take a year to properly introduce myself to (for which I never seem to have the time).

krastanov · on May 7, 2015

Tensors are most definitely not just multidimensional arrays. (this is a pet peeve of mine)

Tensors have geometric meaning just like vectors (just like vectors are not just single-axis arrays). It would be too long to delve into the geometry (especially if it is abstract "geometry" like the geometrical space spanned by the songs in the Pandora database or the space spanned by the faces in a face-recognition software). Instead I will try to give an example:

You all know what rotating a vector means. All the components change in certain way, but the geometrical meaning of this object is preserved in some sense. This is not true for some arbitrary list of numbers. Well, a tensor is just a more complicated geometrical structure that still has very specific rules governing what its components can be and how they can change.

And a counterexample: the Christoffel Symbol is a multi-dimensional array that is not a tensor.

P.S. It is true that it is a very common abuse of nomenclature to call all multidimensional arrays "tensors".

evanpw · on May 7, 2015

I'm not sure if I'm disagreeing with you, but I really dislike the "collection of numbers that transforms in this way" definition of vectors, tensors, etc. I get why it's useful in calculations (that's why physicists like it), but it seems really inelegant to me.

To me, a tensor is a function that maps a collection of m vectors to a collection of n vectors, such that every output is linear in each input. It's true that if you choose a basis, then you can write down an array of numbers which identifies the function, and that changing the basis causes those numbers to change in a certain way, but that's not what a tensor "is".

Of course, the beauty of mathematics is that there are many different ways of looking at the same object, so physicists can deal with their arrays of numbers, I can play with my multilinear functions, and everyone gets the same answer when we ask the same question.

cygx · on May 7, 2015

I really dislike the "collection of numbers that transforms in this way" definition of vectors, tensors, etc.

It is not totally without merit if you consider the tensor bundle as associated to the principal bundle of linear frames. However, that's hardly suitable as an introduction to the topic.

tel · on May 8, 2015

It's probably the Wikipedia introduction, though...

chestervonwinch · on May 7, 2015

It's true though that vectors are basically 1D lists if we're speaking about elements from the vector space R^n. Is it true that tensors are multidimensional arrays if they come from the space R^{m1 x m2 x ... x mn} ?

I'm also confused: do you think about a tensor as a vector from vector space? as a object that maps between vector spaces? or something completely different?

I realize there's a LOT of detail here - just trying to flesh out my 500,000,000 mile view a little more.

musgravepeter · on May 7, 2015

A 1D list of numbers is not necessarily a "vector" in the Differential Geometry sense. The definition of a vector or tensor is based on how it transforms as the co-ordinate system transforms. See e.g. http://en.wikipedia.org/wiki/Covariance_and_contravariance_o...

One weird consequence of this is that if A and B are vectors, the cross product (AxB) is NOT a vector in the tensorial sense. [Aside: Cross products are really weird since they are defined in the familiar way 3 and 7 dimensions only! (see http://math.stackexchange.com/questions/185991/is-the-vector...]

evanpw · on May 7, 2015

The word "vector" just means an element of a vector space: a collection of "things" that you can add together and multiply by numbers. The vector space R^n is defined as the collection of all lists of n real numbers, so you are right in that sense.

But usually when people say that a vector "is" a list of numbers, they mean that if you choose a coordinate system, then the coordinates of a vector uniquely identify it. This is exactly analogous to locations on the earth being uniquely identified by their latitude and longitude: we first have to make a (somewhat arbitrary) choice of coordinates. You wouldn't say that a point on the earth "is" a pair of numbers, though. The case of R^n is confusing, because if you choose the usual coordinate system, then the coordinates of a vector are the same list of numbers as the vector itself.

I like to think of tensors as functions that map some number of vectors to some other number of vectors. But you can add together two tensors (with the same number of inputs and outputs) or multiply a tensor by a number, so you can also think of them as vectors in some other vector space.

platz · on May 13, 2015

Almost seems like tensors are natural transformations in the category theoretic sense.

tel · on May 7, 2015

It's not true, though. A list of numbers in R is not a vector until you imbue the position of each of those numbers with a geometric basis, or, alternatively, relate that list of numbers to the set of all lists of such numbers along with some algebraic operations which turn it into a vector space.

This is similar with tensors. It's even harder to talk about them, though, without the geometric/algebraic bits.

im3w1l · on May 7, 2015

They use tensors here to mean simply multidimensional array. And as this discussion shows, that choice of words is unfortunate and confusing.

rspeer · on May 7, 2015

What I'm seeing is that physicists and geometers disagree with programmers about what "tensor" means -- not in that their views are inconsistent, but in that the programmers' view is too limited for the things that the physicists and geometers want to do with tensors.

They also disagree about what "vector" and "matrix" mean for the same reason, but they can't really put that horse back in the barn. But "tensors" remain unfamiliar enough that this disagreement can come up whenever anyone says "tensor".

Here's a possible truce:

- Programmers should default to talking about "arrays"

- Geometers and physicists should default to talking about "vectors", "matrices", and "tensors"

- We all generally recognize that some (vectors|matrices|tensors) are arrays, and some arrays are (vectors|matrices|tensors), and it's not the worst thing ever if someone uses one to mean the other

- We all generally recognize that you can sometimes do damn cool things by using an array as a tensor or a tensor as an array, so there isn't that much benefit in shouting "NO THEY'RE TOTALLY DIFFERENT".

tel · on May 7, 2015

Eh, I think as you stack more and more "structure" atop something it becomes less and less appropriate to confuse it with its representation. There is afterall already a completely appropriate, cross-disciplinary term for what "programmers" here call "tensors": multidimensional arrays, or even nd-arrays.

No mathematician will bat an eye if you say that a tensor can be represented by an nd-array---but if you equate them, then you're clearly missing something.

dvanduzer · on May 8, 2015

So... Do "programmers" think of "matrix" and "tensor" as "synonymous"?

edit: that was actually a serious question, because i really don't understand this idea that programmers have a common notion of what a matrix is.

tel · on May 8, 2015

I didn't mention the term "matrix". I think it has a much more complex story with respect to common use by "programmers" along with vector. Since linear algebra is such a commonly taught subject there are lots of people with practical working knowledge of matrices. If you don't have the geometric or abstract algebraic perspective here then matrix is almost certainly "2d-array" at first blush, but you'll also admit that there's a sort of non-trivial relationship between these "matrices" and size-conformant "vectors" (1d-arrays) and another one between two size-conformant matrices. This distinction is made from time to time [0][1].

Finally, of course, a matrix is a special case of a tensor. Tensor is such a broader topic that it becomes more and more difficult to talk about it without its geometric underpinnings. This is what I was writing about above.

[0] Is a black-and-white image a matrix? I'd imagine a lot of people would have a bit of a tough time saying yes directly---and for good reason. But there's also a reasonable argument for pretending like it is, and even reasonable arguments for turning it into one!

[1] Numpy distinguishes between nd-arrays and matrices exactly right in that matrices are 2d-arrays only and are imbued with multiplication as composition of maps. In practice, many people I know who are Numpy users just think of matrix as a convenient way to overload *, though.

jheriko · on May 8, 2015

poor choices of name happen all the time. we can't change them after they set in, so we have some really dumb use of terms today in the context of 5 or 10 years ago. it even happens inside of software development alone - the use of "responsive" in web design, which always used to mean "responds quickly", not "adapts to resolution and aspect ratio".

std::vector was an exceptionally poor choice of name...

is it so hard to just call it pseudotensor like it is? its confusing for people trying to research the subject when they come across all kinds of geometry and calculus stuff which is irrelevant to the usage here.

tel · on May 8, 2015

The use of "functor" in C++ is pretty terrible, too.

musgravepeter · on May 8, 2015

I agree it's unfortunate.

I think this is really a flimsy use of a technical term which does have a precise meaning in the context of mathematical physics to make the idea of multi-dimensional array seem "deeper" than it really is.

There are enough hard problems in data science - I don't think it needs to burdened with terms like this.

tel · on May 8, 2015

Especially if information geometry ever grows and data scientists start to actually have to use real tensors.

Enzolangellotti · on May 7, 2015

So, a tensor is kind of a Rubik Cube shaped matrix?

ajkjk · on May 8, 2015

A rank-3 tensor is. It's any number of dimensions you need.

antognini · on May 7, 2015

Tensors can be somewhat intimidating, but it definitely would not take a year to understand tensors. Two references I found particularly helpful:

There's a new textbook on classical physics by Kip Thorne and Roger Blandford. It hasn't been published yet, but the lecture notes on which it's based are online here:

http://www.pmaweb.caltech.edu/Courses/ph136/yr2012/

The first chapter presents a pretty simple introduction to tensors.

The other reference I've found helpful is Chapter 31 of Volume II of the Feynman Lectures on Physics. Feynman (as usual) gives a great introduction to the concept.

To echo krastanov, it is the case that all tensors can be represented as multi-dimensional arrays, but not all multi-dimensional arrays are tensors. At least in physics, a tensor is ultimately a thing, or at least some description of a physical thing. It therefore has some existence independent of whatever coordinate system you use and therefore, if you change your coordinate system, the values of the tensor have to change in certain ways. This means that your tensor cannot, in general be some arbitrary multidimensional array.

There's a bit of confusion about nomenclature since in fields outside physics and math a tensor doesn't need to represent some physical thing so the special transformation properties of a tensor are not so important and it's often just used as a shorthand term for "multidimensional array."

mhartl · on May 8, 2015

There's a new textbook on classical physics by Kip Thorne and Roger Blandford. It hasn't been published yet, but the lecture notes on which it's based are online here

Heh, it's not exactly new, and I'll believe it's been published when I have it in my hands. That book had already been in development for years—and was due out any day now—back when I used it in Physics 136 (Blandford himself taught one of the terms). That was 1997.

The material is great, though, including the coverage of tensors, which served me well the next year in General Relativity (Thorne's last time teaching it). If you're looking for a solid intro to tensor algebra & analysis, I definitely recommend it.

bluemanshoe2 · on May 8, 2015

It has been a long time coming, but it seems serious now. You can preorder it on amazon: http://www.amazon.com/Modern-Classical-Physics-Elasticity-St... Publication Date: May 31, 2016

tel · on May 7, 2015

If you're familiar with quadratic forms in multidimensional calculus you'll know that any bilinear relationship f(x, y) on two vectors from two vector spaces x and y can be represented by a matrix

    f(x, y) = x' A y

for some matrix A. A tensor generalizes this idea to multilinear forms

    f(x, y, z, w, q)

where if you play with the basic idea you'll probably realize that if such an operation were to be represented by some kind of "matrix" A then that matrix would have to be multidimensional. This kind of multidimensional array is a "tensor" although the theory gets a little more complex than just that.

Why isn't any old multidimensional array a tensor? Because we want to see them as arising not as just an arbitrary collection of numbers but instead as representatives of these kinds of multilinear mappings. In particular, we want to know that tensors are invariant under changes of bases. This is similar to how we like to think of matrices as representing "some" linear mapping, but there might be many changes of bases in the source and target vector spaces which change the actual numbers inside without changing the ultimate mapping itself.

Ultimately, to get the fullest understanding of this we would need to talk about vector dual spaces and note that expressions like f(x, y, z, w, q) could involve two "types" of vector spaces, covariant and contravariant, which have different kinds of transformation-under-basis-change properties. To truly be a tensor your multi-dimensional matrix must understand and respect all of these kinds of invariances.

defen · on May 7, 2015

Is it fruitful to think of upper/lower indices as a type system for linear transformations?

tel · on May 7, 2015

I definitely think so. I think it's done informally regardless, but I've never thought about formalizing it.

formulaT · on May 8, 2015

In this context, tensors are just multidimensional arrays. The easiest way to understand them is by analogy with matrices.

Matrices are 2d arrays, and matrix decomposition uses methods inspired by geometry, to analyze data in matrix form. Even though the matrix in question might have no intrinsic geometric meaning (e.g. a matrix of which individual likes which beer), the geometric methods still give useful result.

Similarly, tensor decompositions are inspired by geometry, but are also a simple way to get information from a 3d array. For tensors, the most basic tensor decomposition, where a tensor a_ijk is approximated by sum_l b_li * c_lj * d_lk.

tl;dr don't read too much into the geometric side of tensors. Data science applications don't have a geometric interpretation, but the techniques can still work.

orbifold · on May 7, 2015

If you know the graphical birdtracks/Penrose diagram/Feynman diagram/String diagram notation, then tensors aren't hard at all.

Formally what you need to understand is the "tensor product": Given two vector spaces V,W over some field k (think n-tuples of real numbers), their tensor product is characterized by an universal property: Any bilinear map from the Cartesian product V x W to some other vector space U induces a unique linear map from the tensor product V o W to U. You can then proceed to show that the tensor product of vector spaces is associative (V o W) o U == V o (W o U) and symmetric V o W == W o V and the field k is its unit k o V == V == V o k. Interestingly if you are now given two k-linear maps f : V -> V' and g: W -> W' then you can get a map f o g : V o W -> V' o W'. And in the string diagram notation, you write that as

   V |  W |
     |    |
     f    g
     |    | 
   V'|  W'|

Now an arbitrary tensor is just a linear map a : V1 o .. o Vn -> W1 o .. o Wm.

     V1 | ... | Vn
        | ... |
       ~~~~~~~~~
       |   a   | 
       ~~~~~~~~~
      W1| ... | Wm

Given a k-vector space V, you can also consider the dual V* vector space of k-linear maps f : V -> k to k. In the diagram notation, you need to then introduce arrows and evaluation and co-evaluation maps V o V* -> k and k -> V o V*, which can be visualized as cups and caps. In the standard notation the distinction between vector spaces and dual vector spaces manifests itself in covariant and contravariant indices of tensors.

Differential geometry only enters the picture if you want to study "vector space information", for example the vector space of all possible directions at a point also known as the tangent bundle, attached to points of some geometric space. The simplest manifestation of that concept are vector bundles, since forming the tensor product is an example of a "smooth functor" you can also define a tensor product of vector bundles. What physicists call tensors are sections, that is a "smooth" assignment of a vector in that tensor product at every point, of such vector bundles (typically the tangent bundle/cotangent bundle or some vector bundle associated to a principal bundle of "symmetries")

obastani · on May 8, 2015

I see a lot of answers describing what tensors are, but none really describe why they're important. To understand this, let's go back to some first-year Calculus. If we have a function f, we can approximate f as

f(x+dx) = f(x) + f'(x)dx + O(dx^2)

This should look familiar: taking a = f(x) and b = f'(x), this is just the line a + b.dx! In other words, Calculus is just a way transforming questions about (differentiable) functions into questions about lines.

So this is all well and good, but what if x is a vector? Or f(x) is a vector? Or both? Well, now we can approximate

f(x+dx) = f(x) + Df(x).dx + O(|dx|^2)

Here, Df(x) is nothing other than the matrix [(df_i/dx_j)(x)]. In other words, we still get a linear approximation. But now suppose we get greedy and want to take higher order derivatives: what is the derivative of Df(x)? It's a tensor!

There are a few ways to think about this (which gives rise to the different interpretations of a tensor):

1. It's just df_i/(dx_jdx_k). This is a multi-dimensional array.

2. In 1D, we have

f(x+dx) = f(x) + f'(x)dx + (1/2)f''(x)dx^2 + O(dx^3)

In higher dimensions, we get

f(x+dx) = f(x) + Df(x)*dx + dx'.D^2f(x).dx + O(|dx|^3)

Here, D^2f(x) is a multi-linear map: it takes in two copies of the vector dx, and returns a new vector.

More generalizations exist for various reasons, but I think this intuition captures why tensors show up so often.

rndn · on May 8, 2015

See also "An Introduction to Tensors for Students of Physics and Engineering (2002) [pdf]":

https://news.ycombinator.com/item?id=8651645

animakumar · on May 8, 2015

In the podcast, I mentioned tensor as a multidimensional array to simplify the concept and introduce it to a wider audience. Our algorithms certainly treat tensor as a multilinear map and we design efficient decomposition methods. Most importantly, we do not form these tensors, meaning we do not instantiate them as multidimensional arrays. Instead, we implicitly manipulate the data to obtain decompositions of tensors that model higher order relationships in data.

jheriko · on May 8, 2015

its because they aren't talking about what the majority mean when they use the word "tensor". try looking into "psuedotensor". that is the established word for this for nearly a century. you may have better luck... although probably not because the main use of those historically has been in combination with tensors and tensor fields in the geometric setting and physics.

dxbydt · on May 7, 2015

Luckily, I was at mlconf last week where Dr.Anandkumar spoke - they called her the "tensor lady" :) She's using tensors in machine learning for a bunch of things -

Latent Variable Models: Training LVM's using local search methods like EM, gradient descent, variational bayes etc. have a bunch of problems - they get stuck on local minima, the algorithms are hard to parallelize with poor convergence. In these cases, tensors yield guaranteed learning using embarassingly parallel algorithms, so faster convergence & can be run on Spark.

Also saw a demo on training 2-layer nets for GMM using tensors, and they learnt the weights rather fast. So using tensors in deep learning shows promise, though the techniques are in their infancy.

One of the challenges the professor mentioned was the availability of open source libraries to do tensor decomposition, which the above methods require.

It was a very successful talk - https://twitter.com/cdubhland/status/594220061025435649

Tensor slides: http://www.slideshare.net/SessionsEvents/animashree-anandkum...

tachim · on May 8, 2015

This isn't quite right. Moment methods (that rely on tensor decompositions) have a few problems:

(i) they have convergence bounds, but in practice need more data than we have available

(ii) they don't do as well as EM usually, but using them to initialize parameters for EM sometimes does better than EM with random initialization schemes

(iii) it turns out variational methods can also be embarrassingly parallelized without losing much accuracy in practice

(iv) right now moment methods don't work for arbitrary graphical models

dxbydt · on May 8, 2015

I believe while you are right in general, she is looking at a class of problems for which tensors handily triumph other methods. You might be interested in these papers -

http://newport.eecs.uci.edu/anandkumar/pubs/powerdynamics.pd...

http://newport.eecs.uci.edu/anandkumar/pubs/ProvableNN_spars...

clebio · on May 7, 2015

Does using tensors for storage yield some benefit over the 'panels' metaphor that Python Pandas affords? It seems like the real power of tensors is in the calculus that can be run over them, rather than just them by themselves.

If I want to store lists of numbers, arrays and (linked) lists are basically equivalent. It's when I want to calculate cross- and dot-products that vectors prove especially useful.

The discussions I've seen recently in data science about tensors seems to only be about the storage aspect. And, I suppose, if your libraries _don't_ provide such a storage mechanism, then yes, you'll benefit from adding that capability. But I keep waiting for more from the ML community about leveraging differential geometry full-on. I assume it's out there somewhere, in some shops, but I've not found a lot of discussion about it.

tel · on May 7, 2015

There's actually quite a lot of this under the phrase "information geometry". Offhand, I feel like a lot of this work is coming out of Japan.

http://www.amazon.com/Information-Translations-Mathematical-...

http://www.amazon.com/Algebraic-Statistical-Monographs-Compu...

clebio · on May 7, 2015

Oh, awesome, thanks for those links!

tel · on May 8, 2015

No problem!

There's also one more book I was trying to find which basically recasts normal linear regression in the geometric projection terms that most people know but that are never emphasized when learning it. The nice part of that method is that then you can extend the geometric interpretation to manifolds and get, I believe, GLM very naturally.

clebio · on May 8, 2015

Thanks again. I'd found this text[1], but it's a bit pricey! [1]: http://www.amazon.com/gp/product/0412398605/

riemannzeta · on May 7, 2015

Although not a small amount of effort, it really helps to learn geometric algebra notation:

http://www.amazon.com/Geometric-Algebra-Physicists-Chris-Dor...

Once you get an intuitive grasp of bivectors, it's not hard to "see" tensors as a linear map of bivector into bivector.

cschwan · on May 7, 2015

This is a really great book! It covers a lot of physics and gives plenty of insight on the relation of geometry and physics. It even made me program this library: https://github.com/cschwan/hep-ga (shameless self-advertisement).

jheriko · on May 7, 2015

He keeps saying tensor when he means pseudotensor. It's an important distinction

Tensors and tensor fields can be even more powerful by expressing non trivial relationships between components that remain unchanged under transformation into other spaces... Although whether that applies here I do not really know.

animakumar · on May 8, 2015

It is great to see all the discussion. We also have source code for topic modeling using tensor methods. Look forward to feedback and further code development http://t.co/VvWyTmZLps

nether · on May 7, 2015

Coworker drives a Toyota Matrix, has a license plate frame saying, "My other car is a tensor."