Inside the Matrix: Visualizing Matrix Multiplication, Attention and Beyond

taliesinb · on Sept 26, 2023

Great to see this kind of visualization gaining prominence! Thinking about matrix algebra in higher dimensions makes everything much more intuitive. I started along this road when I was creating 3D visualizations for the Deep Learning Indaba workshop [1], but I've spent some time since then trying to pin down the algebraic aspects of array manipulation in this blog post series that includes lots of visualizations [2], which I'm hoping to serve as a more fundamental tutorial of what array programming is all about and how to think about it more abstractly -- there is a category theory way of looking at it that is I think really nice, though I'm still working on writing that up.

[1] https://tali.link/projects/edu/indaba-2022/

[2] https://math.tali.link/classical-array-algebra/

cs702 · on Sept 26, 2023

Fantastic visualizations. If you're new to Linear Algebra, i.e., the algebra of linear transformations, represented by matrices, and how they act on vectors, and you want to gain an intuitive understanding of it, I recommend:

* "The Essence of Linear Algebra," by 3Blue1Brown: https://www.3blue1brown.com/topics/linear-algebra

* The popular introductory course taught by Gilbert Strang: https://ocw.mit.edu/courses/18-06-linear-algebra-spring-2010...

swagmoney1606 · on Sept 26, 2023

I've been re-watching the course taught by Gilbert Strang!

I cannot overstate how good of a teacher he is. Things click into place while watching him teach.

bawana · on Sept 26, 2023

I NEVER imagined matmul as a 2D -> 3D transformation.

Reminds me of that scene in the movie, CONTACT, where the aging mathematician solves the alien primer by realizing the documents assemble into a 3D cube and the decryption happens when the squiggles on opposite sides of the cube are overlaid resulting in clear text.

taliesinb · on Sept 26, 2023

Yeah, I thought exactly the same thing when I watching Contact again a few months ago!

There are all kinds of fascinating places where you can gain mental leverage by thinking in higher dimensions. For example, the definition of a monoidal category, which includes various equivalences (or for a strict monoidal category, equalities), can be seen as telling you about the existence of certain 3-dimensional "sheets", 2-dimensional slices of which are equivalent (or equal) ordinary functorial string diagrams[0]. This is just a higher dimensional extension of the fact that chaining 1-dimensional slices of functorial string diagrams give you particular paths in an ordinary commutative diagram. see Marsden [1] for more on that.

Unfortunately the computer tools for generating and manipulating these kinds of topological constructs are in their infancy, which is probably why they aren't used much by mathematicians.

[0]: https://twitter.com/nathanielvirgo/status/126201964172083200...

[1]: https://arxiv.org/abs/1401.7220v2

littlestymaar · on Sept 26, 2023

This is the most confusing visualization of linear algebra I've ever seen.

Sure it looks cool, but mostly because it looks magic, which is the opposite of what's you're supposed to do when illustrating mathematical concepts…

The typical textbook illustration using 2x3/3x2 matrices is much, much clearer than this 32x24 … 64x96 mess. The 3D idea is interesting, but why spawn such an insane amount of elements in your matrices?!

Rayhem · on Sept 26, 2023

I agree, and I think this reeks of the Monad Burrito Tutorial Fallacy[1]. Once you know what the manipulations are you can start to visualize doing them in these weird 3d ways, but the understanding came through the struggle to make a coherent picture and not the resulting coherent picture itself. The claim that "matrix multiplication is fundamentally a three-dimensional operation" is ultimately very confusing because it conflates the row & column dimensions of the matrix with the dimensions of the underlying vector space.

Colorized Math Equations[2] has the same problem where people see it and go "Colors! English language! This must be so much more easy to grasp than math! I feel enlightened for having seen this!" But feeling enlightened is very different from being enlightened and it just doesn't hold up. I've found people retain very little understanding if they aren't already familiar with the concept.

[1]: https://byorgey.wordpress.com/2009/01/12/abstraction-intuiti...

[2]: https://betterexplained.com/articles/colorized-math-equation...

EDIT: The "three-dimensional operation" perspective no doubt comes from writing matrices as rectangles, but this is far from the only representation of them. If the vector v = [a, b, c] is shorthand for v = a x_hat + b y_hat + c z_hat (explicitly a sum of basis vectors), then we can write a matrix with a similar set of basis vectors: m = [[a, b, c], [d, e, f], ...] = a x_hat x_hat + b x_hat y_hat + c x_hat z_hat + ... . There's nothing "rectangular" about this any more than a polynomial (as a sum of monomials) is "rectangular". The details then shake out of how (x_hat y_hat) multiplies with (y_hat z_hat). The rectangle is just a mnemonic.

DOUBLE EDIT: In the above sense, multiplying two matrices is more like a convolution -- the x_hat x_hat term of the first matrix multiplies every term of the second, we just know most of those terms will be zero (the product with any term that doesn't start with an x_hat (e.g. y_hat z_hat).

seanhunter · on Sept 26, 2023

Completely agree. To second one of the siblings- a really good set of visualizations which really helped me develop intuition for linear algebra (as mentioned by a sibling) is 3blue1brown's excellent series "The Essence of Linear Algebra". https://www.3blue1brown.com/topics/linear-algebra

The animations really helped me to understand what eigenvectors, eigenvalues, linear transformations, determinants etc are

ormax3 · on Sept 26, 2023

I agree, the visualization is pretty but kinda useless to explain mat-mult.

It would have been more intuitive to show every element in output matrix corresponds to a dot-product of row/column vectors from input matrices, the animation doesn't even highlight those corresponding vectors clearly..

whywhywouldyou · on Sept 26, 2023

Came here to say the same thing. This adds absolutely nothing to any reasonable understanding of matrix multiplication. This is the most complex way I could imagine trying to explain what matrix multiplication "is".

If it's useful to someone working in a very complex environment where these visualizations are necessary to help tease out some subtle understanding, then that's great.

But really, this part is all you need to know about the article:

> This is the _intuitive_ meaning of matrix multiplication:

> - project two orthogonal matrices into the interior of a cube

> - multiply the pair of values at each intersection, forming a grid of products

> - sum along the third orthogonal dimension to produce a result matrix.

This 1. Isn't intuitive, and 2. Isn't the "meaning".

sdwr · on Sept 26, 2023

You kidding me? This is the only reasonable explanation of matrix multiplication. I remember learning it in school, and memorizing how the input dimensions corresponded to output dimensions, without understanding why.

This gets to the why perfectly. We all understand how to navigate a 3D space intuitively. If the math doesn't tie into that, it may as well be wizard nonsense.

ndriscoll · on Sept 27, 2023

The only reasonable explanation of matrix multiplication is that

1. For a linear function f, its matrix A for some basis {b_i} is the list of outputs f(b_i). i.e. each column is the image of a basis vector. For an arbitrary vector x, the matrix-vector product Ax = f(x).

2. For two linear functions f,g with appropriate domains/codomains and matrices A,B, the result of "multiplication" BA is the matrix for the composed (also linear) function x -> g(f(x)). For an arbitrary vector x, the product (BA)x = B(Ax) = g(f(x)).

This tells you what a matrix even is and why you multiply rows and columns in the way you do (as opposed to e.g. pointwise). This also tells you why the dimensions are what they are: the codomain has some dimension (the height of the columns) and the domain has some dimension (how many columns are there). For multiplication, you need the codomain of f to match the domain of g for composition to make sense, so obviously dimensions must line up.

Rayhem · on Sept 26, 2023

> This gets to the why perfectly.

Respectfully disagree. A matrix has basically nothing to do with "living on the surface of a cuboid". It's like saying FOIL is the "why" of binomial multiplication -- the "why" is the distributive and associative properties of the things involved, FOIL is just a useful mnemonic that falls out.

jaggirs · on Sept 26, 2023

If transformer neural networks used 2x3 matrixes, then sure, we could use 'typical textbook illustration' to visualize them. The point of this tool is not to explain matmul with toy examples, but to visualize real data from model weights.

chobytes · on Sept 26, 2023

While I appreciate the effort the author has clearly put in here, Im not sure the visualization provides much in the way of practical intuition. These seem to essentially be mechanistic explanations of the rote bits of matrix math.

There are already rich geometric interpretations which provide useful intuition for that generalizes, rather than just demonstrating mechanical details.

ShamelessC · on Sept 27, 2023

Care to share?

chobytes · on Sept 27, 2023

3blue1brown has a great geometric LA series on youtube.

artemonster · on Sept 26, 2023

Amazing visualizations of something that I dont understand at all. How a bunch of matrices encode information? All tutorials that I have seen are usually like this: step 1: this is a neuron, step 2: lets do some random stuff in python and see magic. Where do I look for fundamental explanations?

machiaweliczny · on Sept 26, 2023

I suggest you take a look at Micrograd inplementation and watch video tutorial about it from Karpathy. Also „Python Deep Learning” book is quite good.

In short you need to understand: vector, linear combination, cross product, partial derivative, chain rule and finding global minimum. If you have basics of linear algebra it’s easy to grok this video.

https://youtu.be/VMj-3S1tku0?si=6r9XRXofN6SOPkpl

hellojebus · on Sept 26, 2023

I second the Micrograd implementation vid, had to pause several times to look up some concepts.. but so far the best resource I've found in regards to gradients work in neural networks

abrichr · on Sept 26, 2023

3Blue1Brown is often recommended: https://www.youtube.com/watch?v=aircAruvnKk

vjeux · on Sept 26, 2023

If you have points in a 2d space (a sheet of paper) and you want to separate them into two, you can draw a line between the two. The equation of a line is y = ax + b.

This is the equation of a neuron is you squint.

So if you chain a bunch of neurons, you are basically drawing a bunch of lines to test whether points belong or not.

With enough lines you can approximate any shape, like a circle.

What neural network do is given enough examples, it finds the lines that are needed to separate the points to give the appropriate label.

lamename · on Sept 26, 2023

Have you seen the neural net series by Statquest? https://youtu.be/HGwBXDKFk9I?si=GxYKy1s996e6Q8-G

It builds a simple CNN and ends with a simple example of how multiple ReLU activation functions can approximate arbitrary curves.

artemonster · on Sept 26, 2023

Yeah, but how learning function approximators answer text questions?

lamename · on Sept 26, 2023

Have you see this one? https://jalammar.github.io/illustrated-transformer/

smaddox · on Sept 27, 2023

If you're interested in developing an intuitive understanding for how information (items and relations) can be represented in vectors and matricies, I highly recommend this paper titled "Hyperdimensional computing: An introduction to computing in distributed representation with high-dimensional random vectors":

http://ww.robertdick.org/iesr/papers/kanerva09jan.pdf

likhuva · on Sept 26, 2023

In my opinion, the fundamental explanations you seek lie in Probability Theory, not matrix theory. When it comes to ML, matrices are just implementation details. I highly suggest this set of notes: https://chrispiech.github.io/probabilityForComputerScientist...

taliesinb · on Sept 26, 2023

You might gain some basic intuition from this (partially complete) Numpy tutorial I wrote for the Deep Learning Indaba: https://arrayalgebra.info

graycat · on Sept 26, 2023

Own favorite way to understand matrix multiplication: For any positive integer n, and any n x n matrix A of real and/or complex numbers, there are n x n matrices U and H so that

A = UH

Each of U and H consists of real and/or complex numbers and all real numbers if A consists of all real numbers.

Here U is unitary which means that for any n x 1 vector (real and/or complex) x, Ux is the same as x except is rotated and/or reflected and the lengths

|x| = |Ux|

that is, U does not change lengths or distances and, thus, is a rigid motion (rotation, reflection).

For H, for x in a sphere S, the set of all Hx is just an ellipsoid.

So, A = UH where U is a rigid motion, rotation, reflection, and H converts a sphere to an ellipsoid. The H is said to be Hermitian (there was a mathematician Hermite). Right, an ellipsoid has mutually perpendicular axes, and those are the eigenvectors of H.

My favorite result in linear algebra, and with a fairly short and simple proof.

krackers · on Sept 27, 2023

(For those looking for more info, this is called the polar decomposition. See https://csmbrannon.net/2013/02/14/illustration-of-polar-deco...)

I think the closely related SVD is more commonly used though, and Wikipedia has a nice animation https://en.wikipedia.org/wiki/Singular_value_decomposition

joewferrara · on Sept 26, 2023

These visualizations are very cool, and will hopefully lead to more understanding of what's happening with neural networks internally. The next thing we need is an example/write up of someone using these visualizations to either 1. troubleshoot and improve a neural network or 2. interpret the meaning of the weights of neural network. What would be amazing is using these visualizations as a framework for building a new neural network interpretability tool that identifies common patterns of weights in neural networks that are discovered to work well. This could lead to more insight into when a neural network has converged "correctly".

janalsncm · on Sept 26, 2023

Visualization of weight matrices can be especially helpful for architectures like DCN or FM/FFM. You can directly see feature importances which is computationally infeasible in a fully connected network for example.

boywitharupee · on Sept 26, 2023

Can it only visualize matmul operations? It would be great if it could visualize other operations as well, such as the dot product.

mangeld · on Sept 26, 2023

Only a TOP G understands the matrix