Ok, simple enough. But then I go to wikipedia get hit with differential geometry, manifolds, multi-linear algebra, etc...
I mean these are all topics I've been meaning to learn for a while, but each one seems like it would take a year to properly introduce myself to (for which I never seem to have the time).
Tensors are most definitely not just multidimensional arrays. (this is a pet peeve of mine)
Tensors have geometric meaning just like vectors (just like vectors are not just single-axis arrays). It would be too long to delve into the geometry (especially if it is abstract "geometry" like the geometrical space spanned by the songs in the Pandora database or the space spanned by the faces in a face-recognition software). Instead I will try to give an example:
You all know what rotating a vector means. All the components change in certain way, but the geometrical meaning of this object is preserved in some sense. This is not true for some arbitrary list of numbers. Well, a tensor is just a more complicated geometrical structure that still has very specific rules governing what its components can be and how they can change.
And a counterexample: the Christoffel Symbol is a multi-dimensional array that is not a tensor.
P.S. It is true that it is a very common abuse of nomenclature to call all multidimensional arrays "tensors".
I'm not sure if I'm disagreeing with you, but I really dislike the "collection of numbers that transforms in this way" definition of vectors, tensors, etc. I get why it's useful in calculations (that's why physicists like it), but it seems really inelegant to me.
To me, a tensor is a function that maps a collection of m vectors to a collection of n vectors, such that every output is linear in each input. It's true that if you choose a basis, then you can write down an array of numbers which identifies the function, and that changing the basis causes those numbers to change in a certain way, but that's not what a tensor "is".
Of course, the beauty of mathematics is that there are many different ways of looking at the same object, so physicists can deal with their arrays of numbers, I can play with my multilinear functions, and everyone gets the same answer when we ask the same question.
I really dislike the "collection of numbers that transforms in this way" definition of vectors, tensors, etc.
It is not totally without merit if you consider the tensor bundle as associated to the principal bundle of linear frames. However, that's hardly suitable as an introduction to the topic.
It's true though that vectors are basically 1D lists if we're speaking about elements from the vector space R^n. Is it true that tensors are multidimensional arrays if they come from the space R^{m1 x m2 x ... x mn} ?
I'm also confused: do you think about a tensor as a vector from vector space? as a object that maps between vector spaces? or something completely different?
I realize there's a LOT of detail here - just trying to flesh out my 500,000,000 mile view a little more.
A 1D list of numbers is not necessarily a "vector" in the Differential Geometry sense. The definition of a vector or tensor is based on how it transforms as the co-ordinate system transforms. See e.g. http://en.wikipedia.org/wiki/Covariance_and_contravariance_o...
One weird consequence of this is that if A and B are vectors, the cross product (AxB) is NOT a vector in the tensorial sense. [Aside: Cross products are really weird since they are defined in the familiar way 3 and 7 dimensions only! (see http://math.stackexchange.com/questions/185991/is-the-vector...]
The word "vector" just means an element of a vector space: a collection of "things" that you can add together and multiply by numbers. The vector space R^n is defined as the collection of all lists of n real numbers, so you are right in that sense.
But usually when people say that a vector "is" a list of numbers, they mean that if you choose a coordinate system, then the coordinates of a vector uniquely identify it. This is exactly analogous to locations on the earth being uniquely identified by their latitude and longitude: we first have to make a (somewhat arbitrary) choice of coordinates. You wouldn't say that a point on the earth "is" a pair of numbers, though. The case of R^n is confusing, because if you choose the usual coordinate system, then the coordinates of a vector are the same list of numbers as the vector itself.
I like to think of tensors as functions that map some number of vectors to some other number of vectors. But you can add together two tensors (with the same number of inputs and outputs) or multiply a tensor by a number, so you can also think of them as vectors in some other vector space.
It's not true, though. A list of numbers in R is not a vector until you imbue the position of each of those numbers with a geometric basis, or, alternatively, relate that list of numbers to the set of all lists of such numbers along with some algebraic operations which turn it into a vector space.
This is similar with tensors. It's even harder to talk about them, though, without the geometric/algebraic bits.
What I'm seeing is that physicists and geometers disagree with programmers about what "tensor" means -- not in that their views are inconsistent, but in that the programmers' view is too limited for the things that the physicists and geometers want to do with tensors.
They also disagree about what "vector" and "matrix" mean for the same reason, but they can't really put that horse back in the barn. But "tensors" remain unfamiliar enough that this disagreement can come up whenever anyone says "tensor".
Here's a possible truce:
- Programmers should default to talking about "arrays"
- Geometers and physicists should default to talking about "vectors", "matrices", and "tensors"
- We all generally recognize that some (vectors|matrices|tensors) are arrays, and some arrays are (vectors|matrices|tensors), and it's not the worst thing ever if someone uses one to mean the other
- We all generally recognize that you can sometimes do damn cool things by using an array as a tensor or a tensor as an array, so there isn't that much benefit in shouting "NO THEY'RE TOTALLY DIFFERENT".
Eh, I think as you stack more and more "structure" atop something it becomes less and less appropriate to confuse it with its representation. There is afterall already a completely appropriate, cross-disciplinary term for what "programmers" here call "tensors": multidimensional arrays, or even nd-arrays.
No mathematician will bat an eye if you say that a tensor can be represented by an nd-array---but if you equate them, then you're clearly missing something.
I didn't mention the term "matrix". I think it has a much more complex story with respect to common use by "programmers" along with vector. Since linear algebra is such a commonly taught subject there are lots of people with practical working knowledge of matrices. If you don't have the geometric or abstract algebraic perspective here then matrix is almost certainly "2d-array" at first blush, but you'll also admit that there's a sort of non-trivial relationship between these "matrices" and size-conformant "vectors" (1d-arrays) and another one between two size-conformant matrices. This distinction is made from time to time [0][1].
Finally, of course, a matrix is a special case of a tensor. Tensor is such a broader topic that it becomes more and more difficult to talk about it without its geometric underpinnings. This is what I was writing about above.
[0] Is a black-and-white image a matrix? I'd imagine a lot of people would have a bit of a tough time saying yes directly---and for good reason. But there's also a reasonable argument for pretending like it is, and even reasonable arguments for turning it into one!
[1] Numpy distinguishes between nd-arrays and matrices exactly right in that matrices are 2d-arrays only and are imbued with multiplication as composition of maps. In practice, many people I know who are Numpy users just think of matrix as a convenient way to overload *, though.
poor choices of name happen all the time. we can't change them after they set in, so we have some really dumb use of terms today in the context of 5 or 10 years ago. it even happens inside of software development alone - the use of "responsive" in web design, which always used to mean "responds quickly", not "adapts to resolution and aspect ratio".
std::vector was an exceptionally poor choice of name...
is it so hard to just call it pseudotensor like it is? its confusing for people trying to research the subject when they come across all kinds of geometry and calculus stuff which is irrelevant to the usage here.
I think this is really a flimsy use of a technical term which does have a precise meaning in the context of mathematical physics to make the idea of multi-dimensional array seem "deeper" than it really is.
There are enough hard problems in data science - I don't think it needs to burdened with terms like this.
Tensors can be somewhat intimidating, but it definitely would not take a year to understand tensors. Two references I found particularly helpful:
There's a new textbook on classical physics by Kip Thorne and Roger Blandford. It hasn't been published yet, but the lecture notes on which it's based are online here:
The first chapter presents a pretty simple introduction to tensors.
The other reference I've found helpful is Chapter 31 of Volume II of the Feynman Lectures on Physics. Feynman (as usual) gives a great introduction to the concept.
To echo krastanov, it is the case that all tensors can be represented as multi-dimensional arrays, but not all multi-dimensional arrays are tensors. At least in physics, a tensor is ultimately a thing, or at least some description of a physical thing. It therefore has some existence independent of whatever coordinate system you use and therefore, if you change your coordinate system, the values of the tensor have to change in certain ways. This means that your tensor cannot, in general be some arbitrary multidimensional array.
There's a bit of confusion about nomenclature since in fields outside physics and math a tensor doesn't need to represent some physical thing so the special transformation properties of a tensor are not so important and it's often just used as a shorthand term for "multidimensional array."
There's a new textbook on classical physics by Kip Thorne and Roger Blandford. It hasn't been published yet, but the lecture notes on which it's based are online here
Heh, it's not exactly new, and I'll believe it's been published when I have it in my hands. That book had already been in development for years—and was due out any day now—back when I used it in Physics 136 (Blandford himself taught one of the terms). That was 1997.
The material is great, though, including the coverage of tensors, which served me well the next year in General Relativity (Thorne's last time teaching it). If you're looking for a solid intro to tensor algebra & analysis, I definitely recommend it.
If you're familiar with quadratic forms in multidimensional calculus you'll know that any bilinear relationship f(x, y) on two vectors from two vector spaces x and y can be represented by a matrix
f(x, y) = x' A y
for some matrix A. A tensor generalizes this idea to multilinear forms
f(x, y, z, w, q)
where if you play with the basic idea you'll probably realize that if such an operation were to be represented by some kind of "matrix" A then that matrix would have to be multidimensional. This kind of multidimensional array is a "tensor" although the theory gets a little more complex than just that.
Why isn't any old multidimensional array a tensor? Because we want to see them as arising not as just an arbitrary collection of numbers but instead as representatives of these kinds of multilinear mappings. In particular, we want to know that tensors are invariant under changes of bases. This is similar to how we like to think of matrices as representing "some" linear mapping, but there might be many changes of bases in the source and target vector spaces which change the actual numbers inside without changing the ultimate mapping itself.
Ultimately, to get the fullest understanding of this we would need to talk about vector dual spaces and note that expressions like f(x, y, z, w, q) could involve two "types" of vector spaces, covariant and contravariant, which have different kinds of transformation-under-basis-change properties. To truly be a tensor your multi-dimensional matrix must understand and respect all of these kinds of invariances.
In this context, tensors are just multidimensional arrays. The easiest way to understand them is by analogy with matrices.
Matrices are 2d arrays, and matrix decomposition uses methods inspired by geometry, to analyze data in matrix form. Even though the matrix in question might have no intrinsic geometric meaning (e.g. a matrix of which individual likes which beer), the geometric methods still give useful result.
Similarly, tensor decompositions are inspired by geometry, but are also a simple way to get information from a 3d array. For tensors, the most basic tensor decomposition, where a tensor a_ijk is approximated by sum_l b_li * c_lj * d_lk.
tl;dr don't read too much into the geometric side of tensors. Data science applications don't have a geometric interpretation, but the techniques can still work.
If you know the graphical birdtracks/Penrose diagram/Feynman diagram/String diagram notation, then tensors aren't hard at all.
Formally what you need to understand is the "tensor product": Given two vector spaces V,W over some field k (think n-tuples of real numbers), their tensor product is characterized by an universal property: Any bilinear map from the Cartesian product V x W to some other vector space U induces a unique linear map from the tensor product V o W to U. You can then proceed to show that the tensor product of vector spaces is associative (V o W) o U == V o (W o U) and symmetric V o W == W o V and the field k is its unit k o V == V == V o k. Interestingly if you are now given two k-linear maps f : V -> V' and g: W -> W' then you can get a map f o g : V o W -> V' o W'. And in the string diagram notation, you write that as
V | W |
| |
f g
| |
V'| W'|
Now an arbitrary tensor is just a linear map a : V1 o .. o Vn -> W1 o .. o Wm.
Given a k-vector space V, you can also consider the dual V* vector space of k-linear maps f : V -> k to k. In the diagram notation, you need to then introduce arrows and evaluation and co-evaluation maps V o V* -> k and k -> V o V*, which can be visualized as cups and caps. In the standard notation the distinction between vector spaces and dual vector spaces manifests itself in covariant and contravariant indices of tensors.
Differential geometry only enters the picture if you want to study "vector space information", for example the vector space of all possible directions at a point also known as the tangent bundle, attached to points of some geometric space. The simplest manifestation of that concept are vector bundles, since forming the tensor product is an example of a "smooth functor" you can also define a tensor product of vector bundles. What physicists call tensors are sections, that is a "smooth" assignment of a vector in that tensor product at every point, of such vector bundles (typically the tangent bundle/cotangent bundle or some vector bundle associated to a principal bundle of "symmetries")
I see a lot of answers describing what tensors are, but none really describe why they're important. To understand this, let's go back to some first-year Calculus. If we have a function f, we can approximate f as
f(x+dx) = f(x) + f'(x)dx + O(dx^2)
This should look familiar: taking a = f(x) and b = f'(x), this is just the line a + b.dx! In other words, Calculus is just a way transforming questions about (differentiable) functions into questions about lines.
So this is all well and good, but what if x is a vector? Or f(x) is a vector? Or both? Well, now we can approximate
f(x+dx) = f(x) + Df(x).dx + O(|dx|^2)
Here, Df(x) is nothing other than the matrix [(df_i/dx_j)(x)]. In other words, we still get a linear approximation. But now suppose we get greedy and want to take higher order derivatives: what is the derivative of Df(x)? It's a tensor!
There are a few ways to think about this (which gives rise to the different interpretations of a tensor):
1. It's just df_i/(dx_jdx_k). This is a multi-dimensional array.
In the podcast, I mentioned tensor as a multidimensional array to simplify the concept and introduce it to a wider audience. Our algorithms certainly treat tensor as a multilinear map and we design efficient decomposition methods. Most importantly, we do not form these tensors, meaning we do not instantiate them as multidimensional arrays. Instead, we implicitly manipulate the data to obtain decompositions of tensors that model higher order relationships in data.
its because they aren't talking about what the majority mean when they use the word "tensor". try looking into "psuedotensor". that is the established word for this for nearly a century. you may have better luck... although probably not because the main use of those historically has been in combination with tensors and tensor fields in the geometric setting and physics.
Luckily, I was at mlconf last week where Dr.Anandkumar spoke - they called her the "tensor lady" :) She's using tensors in machine learning for a bunch of things -
Latent Variable Models: Training LVM's using local search methods like EM, gradient descent, variational bayes etc. have a bunch of problems - they get stuck on local minima, the algorithms are hard to parallelize with poor convergence. In these cases, tensors yield guaranteed learning using embarassingly parallel algorithms, so faster convergence & can be run on Spark.
Also saw a demo on training 2-layer nets for GMM using tensors, and they learnt the weights rather fast. So using tensors in deep learning shows promise, though the techniques are in their infancy.
One of the challenges the professor mentioned was the availability of open source libraries to do tensor decomposition, which the above methods require.
This isn't quite right. Moment methods (that rely on tensor decompositions) have a few problems:
(i) they have convergence bounds, but in practice need more data than we have available
(ii) they don't do as well as EM usually, but using them to initialize parameters for EM sometimes does better than EM with random initialization schemes
(iii) it turns out variational methods can also be embarrassingly parallelized without losing much accuracy in practice
(iv) right now moment methods don't work for arbitrary graphical models
I believe while you are right in general, she is looking at a class of problems for which tensors handily triumph other methods. You might be interested in these papers -
Does using tensors for storage yield some benefit over the 'panels' metaphor that Python Pandas affords? It seems like the real power of tensors is in the calculus that can be run over them, rather than just them by themselves.
If I want to store lists of numbers, arrays and (linked) lists are basically equivalent. It's when I want to calculate cross- and dot-products that vectors prove especially useful.
The discussions I've seen recently in data science about tensors seems to only be about the storage aspect. And, I suppose, if your libraries _don't_ provide such a storage mechanism, then yes, you'll benefit from adding that capability. But I keep waiting for more from the ML community about leveraging differential geometry full-on. I assume it's out there somewhere, in some shops, but I've not found a lot of discussion about it.
There's also one more book I was trying to find which basically recasts normal linear regression in the geometric projection terms that most people know but that are never emphasized when learning it. The nice part of that method is that then you can extend the geometric interpretation to manifolds and get, I believe, GLM very naturally.
This is a really great book! It covers a lot of physics and gives plenty of insight on the relation of geometry and physics. It even made me program this library: https://github.com/cschwan/hep-ga (shameless self-advertisement).
He keeps saying tensor when he means pseudotensor. It's an important distinction
Tensors and tensor fields can be even more powerful by expressing non trivial relationships between components that remain unchanged under transformation into other spaces... Although whether that applies here I do not really know.
It is great to see all the discussion. We also have source code for topic modeling using tensor methods. Look forward to feedback and further code development http://t.co/VvWyTmZLps
"It's just a multidimensional array!"
Ok, simple enough. But then I go to wikipedia get hit with differential geometry, manifolds, multi-linear algebra, etc...
I mean these are all topics I've been meaning to learn for a while, but each one seems like it would take a year to properly introduce myself to (for which I never seem to have the time).