If you have any linear algebra background, then the definition of a tensor is straightforward: given a vector space V over a field K (in physics, K = R or C), a tensor T is a multilinear (i.e. linear in each argument) function from vectors and dual vectors in V to numbers in K. That's it! A type (p, q) tensor T takes p vectors and q dual vectors as arguments (p+q is often called the rank of T but is ambiguous compared to the type).
(If you're unfamiliar with the definition of dual vector, it's even simpler: it's just a linear function from V to K.)
Yes, very simple, except that when physicists say "tensor", they mean tensor fields, on smooth, curved manifolds, in at least four dimensions, often with a Lorentz metric. Things stop being simple quickly.
It does not matter on what set a tensor field is defined.
A tensor field is not a tensor, but the value of a tensor field at any point is a tensor, which satisfies the definition given above, exactly like the value of a vector field at any point is a vector.
The "fields" are just functions.
There are physics books that do not give the easier to understand definition given above, but they give an equivalent, but more obscure, definition of a tensor, by giving the transformation rules for its contravariant components and for its covariant components at a change of the reference system.
The word "tensor" with the current meaning has been used for the first time by Einstein and he has not given any explanation for this word choice. The theory of tensors that Einstein has learned had not used the word "tensor".
Before Einstein, the word "tensor" (coined by Hamilton) was used in physics with the meaning of "symmetrical matrix", because the geometric (affine) transformation of a body that is determined by the multiplication with a symmetric matrix extends (or compresses) the body towards certain directions (the axes that correspond to a rotation that would diagonalize the symmetric matrix). The word "tensor" in the old sense was applied only to what is called now "symmetric tensor of the second order" (which remains the most important kind of the tensors that are neither vectors nor scalars).
> There are physics books that do not give the easier to understand definition given above, but they give an equivalent, but more obscure, definition of a tensor, by giving the transformation rules for its contravariant components and for its covariant components at a change of the reference system.
The definition of a tensor as linear maps, while simple to understand, has no content that is useful for doing physics. To do any physics, or for that matter, any geometry with tensors, you need to define the notion of covariance and contravariance.
Besides, the starting with the latter notions allow you to define tensors more naturally. You start with trying to understand how geometric objects transform under coordinate transformations, and you slowly but surely end up with tensors.
No, the definition of a tensor as a linear map, is the only definition that is useful for doing physics.
All the physical quantities that are defined to be tensors are quantities used to transform either vectors into other vectors or tensors of higher orders into other tensors of higher orders (for instance the transformation between the electric field vector and the electric polarization vector).
Therefore all such physical quantities are used to describe multilinear functions, either in linear anisotropic media, or in non-linear anisotropic media, but in the latter case they are applicable only to relations between small differences, where linear approximations may be used.
The multilinear function is the physical concept that is independent of the coordinate system. The concrete computations with a tensor a.k.a. multilinear function may need the computation of contravariant and/or covariant components in a particular coordinate system and the use of their transformation rules. On the other hand, the abstract formulation of the physical laws does not need such details, but only the high-level definitions using multi-linear functions, and it is independent of any choice for the coordinate system.
There is a unique multilinear function a.k.a. tensor, but it can be expressed by an infinity of different arrays of numbers, corresponding to various combinations of contravariant or covariant components, in various coordinate systems. Their transformation rules can be determined by the condition that they must represent the same function. In the books that do not explain this, the rules appear to be magic and they do not allow an understanding of why the rules are these and not others.
I think the point above is that in physics tensor is usually overloaded, and those practicing physicists when they speak of tensors are more often referring to tensor fields, and most often this is in a context with more geometric structure than is required by a tensor space in reference to a vector vector bundle. Typically they (physicists) are dealing with domains where the tensor space is in reference to the tangent bundle of a smooth manifold, with the prototypical example being the metric tensor(field) of space time in general relativity. Another prominent example may include tensor fields defined in reference to the tangent bundle of a group of gauge transformations, as in quantum electrodynamics, quantum chromodynamics, etc.
Obviously these things are not just useful to physics, but are indispensable, and so I think the assertion that only the definition of tensor that is useful to physics is the definition tensor=multilinear map is somewhat out of step. Perhaps it would be better to assert that the concept of multilinear map is essential to every useful definition of tensors in physics.
> On the other hand, the abstract formulation of the physical laws does not need such details, but only the high-level definitions using multi-linear functions, and it is independent of any choice for the coordinate system.
This is exactly the point. Abstract physical laws must be invariant to coordinate transformation. From a pedagogical point of view, perhaps this is less important when discussing anisotropic media, but critical when discussing general relativity. Hence, the first reason why many physicist book writer think it very important that covariance/contravariance of tensors be central to both their definition and their pedagogy as applied to physics. You have to convince the student that tensors are the right mathematical objects to describe reality because they preserve this invariance.
The second reason is just as important. Physics is nothing without validating abstract physical laws by experiment. And that validation can not be done without computing predictions. Which in turn will require the right coordinate system, which will require covariance/contravariance of tensors. You can't just disregard these computations as unimportant or unnecessary from either a pedagogical point of view or a deeper philosophical one.
Covariance and contravariance are mathematical notions, and have to do with whether each multiplicative constituent or the tensor is a vector in your given vector space (covariant) or a linear functional on this space (contravariant). There is no inherent physical meaning to either concept.
I think this is far too simplistic, for one because the values of this putative function depend on the chosen coordinate system.
So I completely agree with the comment you are replying to: when a physicist says "tensor" they really mean a "tensor field" and the definition of the latter is quite a bit more involved than just specifying a multilinear map at each point of a manifold.
The values of the "putative function" do not depend on the chosen coordinate system.
This is the essence of notions like scalar, vector, tensor, that they do not depend on the chosen coordinate system.
Only their numeric representations associated with a chosen coordinate system do depend on that system.
If you compute some arbitrary functions of the numeric components of a tensor in a certain coordinate system, in most cases the array of numbers that composes the result will not be a tensor, precisely because the result will really be different in any other coordinate system, while a tensor must be invariant.
All physical laws are formulated only using various kinds of tensors, including vectors and scalars, precisely because they must be invariant at the choice of the coordinate system.
Here here! Functions do not depend on your choice of coordinates, only the components of tensors do! I think this is why it’s important to keep covariance and contravariance in mind. While tensor(fields) do not depend on coordinates intrinsically, the way we represent them when doing calculations most certainly does, and this is usefully characterized by co/contravariance.
Plus, as if tensor fields on Lorentz manifolds weren't already complicated enough, physicists aren't happy until they can write down some differential equations. So not only are you doing calculus, you're doing it on curved manifolds, with complicated tensor objects, in the context of partial differential equations, which - in the case of general relativity - are non-linear. It's okay to admit that all of this is a bit hard. Hell, as the article points out, Einstein himself had trouble understanding them.
There is a way of defining a vector space without an explicit basis(just as a set with an addition and a scalar multiplication). Similarly, there is a way of defining a vector bundle without choosing an explicit coordinates (as an abstract vector space, as defined above, which varies with the point in the space).
Just as a (p,q) tensor is a multilinear object related to a single vector space, a tensor field is a section of a tensor bundle associated to the vector bundle. (A section is just a function on the underlying space whose value at a point lies in the vector space above the point.)
Usually, the vector bundle relevant in physics is the tangent bundle of a 4-manifold.
This abstract way of defining tensors and tensor fields is manifestly invariant under coordinate changes, but it takes some machinery to set up. Whereas the 'numbers associated to each coordinate system which transform in a certain way' is more direct, but the rules can seem arbitrary at first sight. Also, maybe this approach can generalize to allow more transformation rules which might take some time to put into an abstract setting.
Standard example is a matrix A which transforms as PAP^-1 (where P is linear coordinate change) vs a matrix T which is a linear map between vector spaces.
The same issue appears in software where you can expose a data structure as a tuple of numbers/string fields and then define functions on them, or you can expose it as an abstract data type where the user of the library can only apply certain functions on them and the implementation author can choose different representations(coordinate changes) in which to easily compute the functions.
Very well, let’s just agree that in physics (r,s) tensors usually refer to sections of the tensor product of some fixed number of copies of the tangent bundle (r copies) and cotangent bundle (s copies) of a smooth manifold (almost always pseudo-Riemannian) and leave it there. Elementary!
That's incorrect if V is infinite-dimensional. A (0,1)-tensor is just supposed to be an element of V but with your definition you get an element of the bidual of V. Which is not isomorphic to V when dim V is infinite. And even when dim V is finite, you need to choose a basis of V to find an isomorphism with the bidual. From a math point of view, that's just no good.
No, the isomorphism between V and V** (for finite-dimensional V) is canonical. The canonical isomorphism T:V->V** is easy to construct: map a vector v in V to the element of V** which takes an element w from V* and applies it to v: T(v)(w) = w(v).
GP is giving you an element of V**. You want to turn it into a vector. To do that, please make the inverse isomorphism explicit without using a basis. I'll wait...
Why? All I need to prove is that I have a canonical linear map going in one direction, and that this map is a bijection. (Since I already constructed such a bijection, I could just answer "for any z in V**, take the unique element v in V such that for all w in V*, w(v) = z(w)".) Do you disagree that I provided such a map? You are correct that V** is not isomorphic to V when V is infinite-dimensional, but your statement that "when dim V is finite, you need to choose a basis of V to find an isomorphism with the bidual" is incorrect. This is elementary textbook stuff (e.g. first chapter of Wald's GR book), so I won't argue further.
I responded to a sibling comment with an explanation: https://news.ycombinator.com/item?id=41234913 But good on you for the condescension even though you didn't understand my point ;)
You said even when dim V is finite you need a basis to find an isomorphism with V**. But that’s not true. You’re right if you mean V*, but not the bidual.
Are you even trying to understand my point? Yes, in that direction, explicitly constructing an element of the bidual of V from an element of V is easy. To explicitly find an element of V from the element of the bidual, you need to choose a basis. Just try it, come on! Write down the inverse isomorphism. Let \alpha be an element of V**. Then v \in V such that f(v) = \alpha (where f is the isomorphism you wrote down) is given by...?
I will help you: if (e_i) is a basis of V and (e_i^*) is its dual basis, then v = \sum_i \alpha(e_i^*) e_i. Can you find such a formula without mentioning the word "basis"?
There's both directions of the isomorphism explicitly defined in a programming language. No choice of basis needed to define the maps, only to prove that the constructor for Bidual really gives you all linear functionals on the dual.
You are right about infinite dimensions, wrong about finite dimensions. V and V* are naturally isomorphic for finite dimensions.
In finite dimensions, V and V* are isomorphic, but not naturally so. The isomorphism requires additional information. You can specify a basis to get the isomorphism, but many bases will give the same isomorphism. The exact amount of information that you need is a metric. If you have a metric, then every orthonormal basis in that metric will give the same isomorphism.
You need to correct the 2nd sentence to say that V and V** are naturally isomorphic. V and V* are only unnaturally isomorphic. All of this holds only in finite dimensions, of course.
You have a typo in your first line, and I answered a sibling comment about that. Metrics are irrelevant to the discussion (and I presume you wanted to write "norm" instead of "metric").
As for why I said metric, see https://en.wikipedia.org/wiki/Metric_tensor. Which is technically a concept from differential geometry rather than linear algebra. But then again, tensors are literally the topic that started this. And it is only in differential geometry that I've ever cared about mapping from V to V*.
This comment is in a discussion about an article titled, Tensors, the geometric tool that solved Einstein's relativity problem. Therefore, "tensors are literally the topic that started this discussion."
Hopefully that's a hint that you should attempt to figure out what someone might be talking about before going to schoolyard insults.
Schoolyard insult? Uh? Can you quote the part of my comment that would be the "schoolyard insult"?
> This comment is in a discussion about an article titled, Tensors, the geometric tool that solved Einstein's relativity problem. Therefore, "tensors are literally the topic that started this discussion."
Again, are manifold involved in any way in the definition of tensors and their properties? No? Then why are you even mentioning "metric tensors"? (Which aren't even tensors, but tensor fields...)
Can you provide some examples of important tensors in physics for which the underlying vector space is infinite dimensional? I’m most familiar with the setting of tensor fields on manifolds, in which case the vector bundle consists of finite dimensional vector spaces. Nevertheless, I suppose in the absence of a pseudo-Riemannian metric one lacks a natural isomorphism between vectors/dual vectors. Does this “bidual” distinction arise in that case as well?
The definition may be simple, but it's not very concrete and I'd argue that makes it not strait forward. While examples of vector spaces can be very concrete (think R, R^2, R^30), I struggle to think of a concrete example of a multilinear function from vectors and dual vectors in V to numbers in K. On top of that when working with tensors, you don't usually use the definition os a multilinear function at least as far as I remember.
A simple example of a multilinear function is the inner (a.k.a dot) product <a, b>: it takes a vector (b), and a dual vector (a^T), and returns a number. In tensor notation it's typically written δ_ij.
It's multilinear because it's linear in each of its arguments separately: <ca, b> = c<a,b> and <a, cb> = c<a,b>.
Another simple but less obvious example is a rotation (orthogonal) matrix. It takes a vector as an input, and returns a vector. But a vector itself can be thought of as a linear function that takes a dual vector and returns a number (via the inner product, above!). So, applying the rotation matrix to a vector is a sort of "currying" on the multilinear map, while the matrix alone can be considered a function that takes a vector and a dual vector, and returns a number.
In functional notation, you can consider your rotation matrix to be a function (V x V*) -> K, which can in turn be considered a function V -> (V* -> K), where V* is the dual space of V.
I think you're describing the evaluation map T(v, w) = w(v), which has type (1,1), rather than the inner product, which has type (2,0). The inner product lets you "raise and lower indices" (i.e. convert between vectors and dual vectors), so you can basically pretend that it is the evaluation map.
In physics, the first and even now the most important application of multilinear functions, a.k.a. tensors, is in the properties of anisotropic solids.
A solid can be anisotropic, i.e. with properties that depend on the direction, either because it is crystalline or because there are certain external influences, like a force or an electric field or a magnetic field that are applied in a certain direction.
In (linear) anisotropic solids, a vector property that depends on another vector property is no longer collinear with the source, but it has another direction, so the output vector is a bilinear function of the input vector and of the crystal orientation, i.e. it is obtained by the multiplication with a matrix. This happens for various mechanical, optical, electric or magnetic properties.
When there are more complex effects, which connect properties from different domains, like piezoelectricity, which connects electric properties with mechanical properties, then the matrices that describe vector transformations, a.k.a. tensors of the second order, may depend on other such tensors of the second order, so the corresponding dependence is described by a tensor of the fourth order.
So the tensors really appear in physics as multilinear functions, which compute the answers to questions like "if I apply a voltage on the electrodes deposited on a crystal in this positions, which will be the direction and magnitude of the displacements of certain parts of the crystal". While in isotropic media you can have relationships between vectors that are described by scalars and relationships between scalars that are also described by scalars, the corresponding relationships for anisotropic media become much more complicated and the simple scalars are replaced everywhere by tensors of various orders.
What in an isotropic medium is a simple proportionality becomes a multilinear function in an anisotropic medium.
The distinction between vectors and dual vectors appears only when the coordinate system does not use orthogonal axes, which makes all computations much more complicated.
The anisotropic solids have become extremely important in modern technology. All the high-performance semiconductor devices are made with anisotropic semiconductor crystals.
Here’s maybe a useful example. Consider a scalar potential function F on R^3 that describes some nonlinear spring law. At a point p=(x,y,z), the differential dF can be thought of as a (1,0) tensor measuring the spring force. It acts on a particle at p moving with velocity v to give the instantaneous work of the particle on the spring dF(p)(v). Now, suppose that we want to know how this quantity changes when we vary the x coordinate. The x coordinate is also a function of p, we can represent its differential as dx, which is a co-vector(field). The quantity that captures this change can be thought of as a (1,1) tensor field, which is related to the stiffness of the spring potential in the x direction at each point p. In the usual undergraduate setting, this tensor field is given as the hessian of F, call this H. The action of this tensor looks like the product u^T H(p) v, where in our case, u^T = dx(p) = [1 0 0]. A good giveaway for when a “co-vector” appears in a tensor calculation is whenever there is a “row vector” in a matrix operation (most people identify “column” vectors with proper vectors). It’s helpful in this case that “row” rhymes with “co-“.
I think part of this is “if you have a linear algebra background”. There are a few different explanations of tensors, and different explanations make sense for different people.
Yes, the “multi linear map” definition is accessible to an undergraduate who has taken linear algebra. However, the more common meaning of tensor in physics, like the metric tensor of spacetime, requires some more sophisticated background to understand (differential geometry, Lie Groups come to mind).
Not really to push back as I do agree that this is a bit trickier to get an intuition for than the OP suggests, but the most trivial concrete example of a (1, 1) tensor would just be the evaluation function (v, f) |-> f(v), which, given a metric, corresponds to the inner product.
I think the people who find this definition to be mysterious are really looking for (borrowing from Ravi Vakil[0]) "why is a tensor" rather than "what is a tensor". In that case, a better answer IMO is that it's the "most generic" way to multiply vectors that's compatible with the linear structure: "v times w" is defined to be the symbol "v⊗w". There is no meaning to that symbol.
But these things are vectors, so you could write e.g. v = a⋅x+b⋅y, and then you want e.g. (a⋅x+b⋅y)⊗w = ax⊗w + by⊗w, and so on.
So in some sense, the quotient space construction[1] gives a better "why". It says
* I want to multiply vectors in V and W. So let's just start by writing down that "v times w" is the symbol "v⊗w", and I want to have a vector space, so take the vector space generated by all of these symbols.
* But I also want that (v_1+v_2)⊗w = v_1⊗w + v_2⊗w
* And I also want that v⊗(w_1+w_2) = v⊗w_1 + v⊗w_2
* And I also want that (sv)⊗w = s(v⊗w) = v⊗(sw)
And that's it. However you want to concretely define tensors, they ought to be "a way to multiply vectors that follows those rules". Quotienting is a generic technique to say "start with this object, and add this additional rule while keeping all of the others".
Another way to say this is that the tensor algebra is the "free associative algebra": it's a way to multiply vectors where the only rules you have to reduce expressions are the ones you needed to have.
That abstract approach tends to be how mathematicians view the tensor product (there is also a categorical construction), but I don't find it very helpful for understanding what tensors do, or why they are useful in physics. With the "multilinear map" definition, taking the tensor product T of tensors U and V just means evaluating U and V respectively on the arguments of T and multiplying their outputs. Extend this definition by linearity and you have the tensor product of spaces of tensors.
This is actually a harmful definition, both (1,1) and (0,2) tensors can be written as a matrix but they are very different. It's like calling vector an array but vectors require vector space and arrays are just arrays. It doesn't help that std::vector is very common in CS but 'pushing back' to a mathematical vector just doesn't make any sense
Finite dimension tensors is interesting both in physics(ex: mechanics, electromagnetism, general relativity) and mathematics(ex: representation theory, differential geometry). Infinite dimensions is also used in physics(quantum theory) and in mathematics (Operator Algebras, representation theory again).
I guess? Mostly the cool applications in physics and differential geometry are about tensor fields, which are more complicated than bare tensors. You could argue that they're talking about finite dimensional tensors but tensor fields are kinda a different object (at least subjectively, to me).
A tensor field is a finite dimensional object varying over space. The space of all tensor fields is infinite dimensional. An operator in QM is infinite dimensional at a single point itself, and in QFT we have fields of such operators.
For those without a strong math background but more of a programmers background;
You know the matrices you work with in 2D or 3D graphics environments that you can apply to vectors or even other matrices to more easily transform (rotate, translate, scale)?
Well tensors are the generalisation of this concept. If you’ve noticed 2D games transformation matrices seem similar (although much simpler) to 3D games transformation mateices you’ve probably wondered what it’d look like for a 4D spacetime or even more complex scenarios. Well you’ve now started thinking about tensors.
To add a bit, kudos to root-parent boil-down. Programmers have already the good representation, and call it n-dimensional array, that being a list of lists of lists ... (repeat n times) ... of lists of numbers. The only nuisance is that what programmers call dimension, math people call it rank. It is the sizes of those nested lists what math people call dimensions. It's all set up for a comedy of errors. Also in math the rank is split to make explicit how much of the rank-many arguments are vectors and how many are dual vectors. You'd say something like this is a rank 7 tensor, 3 times covariant (3 vector arguments) and 4 times contravariant (4 dual vector arguments) summing 7 total arguments. I'm assuming a fixed base, so root-parent map determines a number array.
Slight disagree here -- matrices are enough for transformations in 2, 3, 4, and 100 dimensions. Tensors are not arrays with more rows and columns; they are higher dimensional objects -- more indices, not greater range of indices.
Is a tensor higher dimension, or is it the generalized form of the structure encompassing all of it (individual numbers (0 dimensions), vectors (1 dimension), matrixes (2 dimensions), and so on)? Kind of like how an n-sphere describes circles and spheres for n equal to 2 or 3.
Any given tensor has a type (p,q). Say d=p+q. Then d is the 'dimension' of the tensor in the sense that you would need nxnx...n=n^d numbers (think of an nxnx...n array) to describe the tensor where n is the dimension of the underlying vector space.
> or is it the generalized form of the structure encompassing all of it...Kind of like how an n-sphere
If you are asking whether there are examples of tensors parametrized by an integer d, you can cook up examples - like the (d,0) tensor whose input is a sequence of d vectors and just adds up all the components with respect to some basis in each slot and then adds this number across all the d slots.
But just like a n-spere is a special example, of a polynomial in n variables, the above tensor is a specific example - it is the tensor where all the components are 1 in the higher dimensional array.
Sometimes, one considers the algebra of tensors across all dimensions like the symmetric algebra or exterior algebra simultaneously (where there is multiplication operation between tensors of different dimensions), but that might not be what you were asking about.
The problem is that I'm using dimensional in two different senses. A vector with N elements, for example, is a one-dimensional array of numbers operating on N dimensional spaces, and so on for matrices and tensors.
If you're doing graphics programming you're operating in three and four dimensional spaces mostly (four dimensional being projective spaces), because that's the space you're trying to render in two dimensions. But you'll rarely need anything higher than a matrix (a two-dimensional data structure) for operations on that space.
If you're doing physics you're operating in three, four, and infinite dimensional spaces, mostly. And you'll routinely use higher data structures -- even things like moment of inertia for rigid bodies can't really be described without rank 3 tensors (a three-dimensional data structure).
In statistics and machine learning, you're operating in very high dimensional spaces, and will find yourself using non-square tensors especially (in the other areas everything will be square). The data structures will generally be high dimensional as well, but usually just a function of model complexity; so maybe 4 or 5 dimensional data structures.
> (individual numbers (0 dimensions), vectors (1 dimension), matrixes (2 dimensions), and so on)
You're using the word "dimension" in two (distinct) ways. Instead, use the word "rank":
> (individual numbers (0 rank), vectors (1 rank), matrixes (2 rank), and so on)
Now, we can talk about a 4-dimensional rank-1 tensor, e.g., a 4-element vector.
Now, think about a 4x4 matrix: if we multiply the matrix by a 4-vector, we get a 4-vector out: in some ways, the multiplication has "eaten" one of the ranks of the matrix; but, the dimension of the resulting object is the same. If we had a 3x2 matrix, and we multiplied it by a 3-vector, then both the rank has changed (from 2 to 1) and the dimension has changed (from 3 to 2).
A tensor has any number of rank.
More importantly, the ranks of a tensor come in two "flavors": a vector and a one-form. The concepts are pretty darn general, but one way to get a feel for how they're related is that the transpose of a vector can be its dual. This gets into things like pre- and post- multiplication; or, whether we 'covary' or 'contravary' with respect to the tensor.
Frankly, tensor products are a beast to deal with, mechanically, so the literature mostly deals with them as opaque objects. The modern tensor software libraries and high performance computing has seen a sea-change in the use of GR.
In short, tensors generalize matrices. While we can probably guess accurately what "4x4 matrix" means, "4x4 tensor" is missing some information to really nail down what it means.
Interestingly, that extra information helps us to differentiate between the same matrix being used in different "roles". For instance, if you have a 4x4 matrix A, you might think of it like a linear transformation. Given x: V = R^4 and y = Ax, then y is another vector in V. Alternatively, you might think of it like a quadratic form. Given two vectors x, y: V, the value xAy is a real number.
In linear algebra, we like to represent both of those operations as a matrix. On the other hand, those are different tensors. The first would be a rank-(1,1) tensor, the second a rank-(2, 0) tensor.
Ultimately, we might write down both of those tensors with the same 4x4 array of 16 numbers that we use to represent 4x4 matrices, but in the sort of math where all these subtle differences start to really matter there are additional rules constraining how rank-(1, 1) tensors are distinct from rank-(2, 0) tensors.
A matrix is a subset of a tensor and the 4x4 matrix is absolutely a tensor. Tensors can be way more complex and do more but the 4x4 matrix you use in 3D operations is a great starting point.
this is still not it. 4d would simply be a 4x4 matrix instead of 3x3
tensors are something which no one has been able to fully or adequately describe. I think you simply have to treat them as a set of operations and not try to map or force them unto existing concepts like linear algebra or matrices. they are similar but otherwise something completely different.
> Talk to a computer scientist, and they might tell you that a tensor is an array of numbers that stores important data
The conflicting definitions of tensors have precedent in lower dimensions: vectors were already being used in computer science to mean something different than in mathematics / physics, long before the current tensormania.
Its not clear if that ambiguity will ever be a practical problem though. For as long as such structures are containers of numerical data with no implied transformation properties we are really talking about two different universes.
Things might get interesting though in the overlap between information technology and geometry [1] :-)
I've always thought the use of "Tensor" in the "TensorFlow" library is a misnomer. I'm not too familiar with ML/theory, is there a deeper geometric meaning to the multi-dimensional array of numbers we are multiplying or is "MatrixFlow" a more appropriate name?
Since the beginning of computer technology, "array" is the term that has been used for any multi-dimensional array, with "vectors" and "matrices" being special kinds of arrays. An exception was COBOL, which had a completely different terminology in comparison with the other programming languages of that time. Among the long list of differences between COBOL and the rest were e.g. "class" instead of "type" and "table" instead of "array". Some of the COBOL terminology has been inherited by languages like SQL or Simula 67 (hence the use of "class" in OOP languages).
A "tensor", as used in mathematics in physics is not any array, but it is a special kind of array, which is associated with a certain coordinate system and which is transformed by special rules whenever the coordinate system is changed.
The "tensor" in TensorFlow is a fancy name for what should be called just "array". When an array is bidimensional, "matrix" is an appropriate name for it.
I agree. Just like NumPy's Einsum. "Multi-Array Flow" doesn't sound sexy and associating your project with a renowned physicist's name gives your project that "we solve big science problems" vibe by association. Very pretentious, very predictable, and very cringe.
The joke I learned in a Physics course is "a vector is something that transforms like a vector," and "a tensor is something that transforms like a tensor." It's true, though.
The physicist's tensor is a matrix of functions of coordinates that transform in a prescribed way when the coordinates are transformed. It's a particular application of the chain rule from calculus.
I don't know why the word "tensor" is used in other contexts. Google says that the etymology of the word is:
> early 18th century: modern Latin, from Latin tendere ‘to stretch’.
So maybe the different senses of the word share the analogy of scaling matrices.
The mathematical definition is 99% equivalent to the physical one. I find that the physical one helps to motivate the mathematical one by illustrating the numerical difference between the basis-change transformation for (1,0)- and (0,1)-tensors. The mathematical one is then simpler and more conceptual once you've understood that motivation. The concept of a tensor really belongs to linear algebra, but occurs mostly in differential geometry.
There is still a "1% difference" in meaning though. This difference allows a physicist to say "the Christoffel symbols are not a tensor", while a mathematician would say this is a conflation of terms.
TensorFlow's terminology is based on the rule of thumb that a "vector" is really a 1D array (think column vector), a "matrix" is really a 2D array, and a "tensor" is then an nD array. That's it. This is offensive to physicists especially, but ¯\_(ツ)_/¯
The problem with the physicist's definition is that the larger the N the less the geometrical interpretation makes sense. For 1, 2, and even 3-dimensional tensors there is some connection to geometry, but eventually it loses all meaning. Physicist has to give up and "admit" that an N-dimensional tensor really just is a collection of N-1-dimensional tensors.
The tensors in tensorflow are often higher dimensional. Is a 3d block of numbers (say 1920x1080x3) still a matrix? I would argue it's not. Are there transformation rules for matrices?
You're totally correct that the tensors in tensorflow do drop the geometric meaning, but there's precedence there from how CS vs math folk use vectors.
Matrices are strictly two-dimensional arrays (together with some other properties, but for a computer scientist that's it). Tensors are the generalization to higher dimensional arrays.
I could stop right here since it's a counterexample to x being a matrix (with a matrix product defined on it; P.S. try tf.matmul(x, x)--it will fail; there's no .transpose either). But that's only technically correct :)
So let's look at tensorflow some more:
The tensorflow tensors should transform like vectors would under change of coordinate system.
In order to see that, let's do a change of coordinate system. To summarize the stuff below: If L1 and W12 are indeed tensors, it should be true that A L1 W12 A^-1 = L1 W12.
Try it (in tensorflow) and see whether the new tensor obeys the tensor laws after the transformation. Interpret the changes to the nodes as covariant and the changes to the weights as contravariant:
import tensorflow as tf
# Initial outputs of one layer of nodes in your neural network
L1 = tf.constant([2.5, 4, 1.2], dtype=tf.float32)
# Our evil transformation matrix (coordinate system change)
A = tf.constant([[2, 0, 0], [0, 1, 0], [0, 0, 0.2]], dtype=tf.float32)
# Weights (no particular values; "random")
W12 = tf.constant(
[[-1, 0.4, 1.5],
[0.8, 0.5, 0.75],
[0.2, -0.3, 1]], dtype=tf.float32
)
# Covariant tensor nature; varying with the nodes
L1_covariant = tf.matmul(A, tf.reshape(L1, [3, 1]))
A_inverse = tf.linalg.inv(A)
# Contravariant tensor nature; varying against the nodes
W12_contravariant = tf.matmul(W12, A_inverse)
# Now derive the inputs for the next layer using the transformed node outputs and weights
L2 = tf.matmul(W12_contravariant, L1_covariant)
# Compare to the direct way
L2s = tf.matmul(W12, tf.reshape(L1, [3, 1]))
#assert L2 == L2s
A tensor (like a vector) is actually a very low-level object from the standpoint of linear algebra. It's not hard at all to make something a tensor. Think of it like geometric "assembly language".
In comparison, a matrix is rank 2 (and not all matrices represent tensors). That's it. No rank 3, rank 4, rank 1 (!!). So what does a matrix help you, really?
If you mean that the operations in tensorflow (and numpy before it) aren't beautiful or natural, I agree. It still works, though. If you want to stick to ascii and have no indices on names, you can't do much better (otherwise, use Cadabra[1]--which is great). For example, it was really difficult to write the stuff above without using indices and it's really not beautiful this way :(
See also http://singhal.info/ieee2001.pdf for a primer on information science, including its references, for vector spaces with an inner product that are usually used in ML. The latter are definitely geometry.
[1] https://cadabra.science/ (also in mogan or texmacs) - Einstein field equations also work there and are beautiful
In TensorFlow the tf.matmul function or the @ operator perform matrix multiplication. Element-wise multiplication ends up being useful for a lot of paralellizable computation but should not be confused with matrix multiplication.
The idea of tensors as "a matrix of numbers" or the example of a cube with vectors on every face never clicked for me. It was this (NASA paper)[https://www.grc.nasa.gov/www/k-12/Numbers/Math/documents/Ten...] what finally brought me clarity. The main idea, as others already commented, is that a tensor or rank n is a function that can be applied up to n vector, reducing its rank by one for each vector it consumes.
In your cube example you are using the word "vector" to refer to faces of the cube. Did you mean matrix?
My understanding is that the cube is a rank 3 tensor, the faces (or rather slices) of the cube are rank 2 tensors (aka matrices), and the edges (slices) of the matrices are rank 1 tensors (aka vectors).
There are two ways of using linear maps in the context of physics. One is as a thing that acts on the space . The other is a thing that acts on the coordinates . So when we talk about transformations in tensor analysis, we're talking about coordinate transformatios , not space transformations . Suppose I implement a double ended queue using two pointers:
See that the state of the queue is technically three numbers, { memory, start, end } (Pointers are just numbers after all). But this is coordinate dependent , as start and end are relative to the location of memory. Now suppose I have a procedure to reallocate the queue size:
```
void queue_realloc(Queue q, int new_size) {
int start_offset = q->memory - q->start;
int end_offset = q->memory - q->end;
int oldmem = q->memory;
q->memory = realloc(q->memory, new_size);
memcpy(q->memory, oldmem + q->start, sizeof(int) * (end_offset - start_offset);
q->start = q->memory + start_offset;
q->end = q->memory - end_offset;
}
```
Notice that when I do this, the values of start and end can be completely different! However, see that the length of the queue, given by (end - start) is invariant : It hasn't changed!
---
In the exact same way, a "tensor" is a collection of numbers that describes something physical with respect to a particular coordinate system (the pointers start and end with respect to the memory coordinate system). "tensor calculus" is a bunch of rules that tell you how the numbers change when one changes coordinate systems (ie, how the pointers start and end change when the pointer memory changes). Some quantities that are computed from tensors are "physical", like the length of the queue, as they are invariant under transformations. Tensor calculus gives a principled way to make sure that the final answers we calculate are "invariant" / "physical" / "real". The actual locations of start and end don't matter, as (end - start) will always be the length of the list!
---
Physicists (and people who write memory allocators) need such elaborate tracking, to keep track of what is "real" and what is "coordinate dependent", since a lot of physics involves crazy coordinate systems , and having ways to know what things are real and what are artefacts of one's coordinate system is invaluable. For a real example, consider the case of singularities of the Schwarzschild solution to GR, where we initially thought there were two singularities, but it later turned out there was only one "real" singularity, and the other singularity was due to a poor choice of coordinate system:
Although there was general consensus that the singularity at r = 0 was a 'genuine' physical singularity, the nature of the singularity at r = rs remained unclear. In 1921 Paul Painlevé and in 1922 Allvar Gullstrand independently produced a metric, a spherically symmetric solution of Einstein's equations, which we now know is coordinate transformation of the Schwarzschild metric, Gullstrand–Painlevé coordinates, in which there was no singularity at r = rs. They, however, did not recognize that their solutions were just coordinate transform
(If you're unfamiliar with the definition of dual vector, it's even simpler: it's just a linear function from V to K.)