What are some applications of elementary linear algebra outside of math?

I’m TAing linear algebra next quarter, and it strikes me that I only know one example of an application I can present to my students. I’m looking for applications of elementary linear algebra outside of mathematics that I might talk about in discussion section.

In our class, we cover the basics (linear transformations; matrices; subspaces of $\Bbb R^n$; rank-nullity), orthogonal matrices and the dot product (incl. least squares!), diagonalization, quadratic forms, and singular-value decomposition.

Showing my ignorance, the only application of these I know is the one that was presented in the linear algebra class I took: representing dynamical systems as Markov processes, and diagonalizing the matrix involved to get a nice formula for the $n$th state of the system. But surely there are more than these.

What are some applications of the linear algebra covered in a first course that can motivate the subject for students?

Solutions Collecting From Web of "What are some applications of elementary linear algebra outside of math?"

I was a teaching assistant in Linear Algebra previous semester and I collected a few applications to present to my students. This is one of them:

Google’s PageRank algorithm

This algorithm is the “heart” of the search engine and sorts documents of the world-wide-web by their “importance” in decreasing order. For the sake of simplicity, let us look at a system only containing of four different websites. We draw an arrow from $i$ to $j$ if there is a link from $i$ to $j$.

The goal is to compute a vector $\underline{x} \in \mathbb{R}^4$, where each entry $x_i$ represents the website’s importance. A bigger value means the website is more important. There are three criteria contributing to the $x_i$:

  1. The more websites contain a link to $i$, the bigger $x_i$ gets.
  2. Links from more important websites have a more relevant weight than those of less important websites.
  3. Links from a website which contains many links to other websites (outlinks) have less weight.

Each website has exactly one “vote”. This vote is distributed uniformly to each of the website’s outlinks. This is known as Web-Democracy. It leads to a system of linear equations for $\underline{x}$. In our case, for

$$P = \begin{pmatrix} 0&0&1&1/2\\ 1/3&0&0&0\\ 1/3& 1/2&0&1/2\\ 1/3&1/2&0&0 \end{pmatrix}$$

the system of linear equations reads $\underline{x} = P \underline{x}$. The matrix $P$ is a stochastical matrix, hence $1$ is an eigenvalue of $P$. One of the corresponding eigenvectors is

$$\underline{x} = \begin{pmatrix} 12\\4\\9\\6 \end{pmatrix},$$

hence $x_1 > x_3 > x_4 > x_2$. Let

$$G = \alpha P + (1-\alpha)S,$$

where $S$ is a matrix corresponding to purely randomised browsing without links, i.e. all entries are $\frac{1}{N}$ if there are $N$ websites. The matrix $G$ is called the Google-matrix. The inventors of the PageRank algorithm, Sergey Brin and Larry Page, chose $\alpha = 0.85$. Note that $G$ is still a stochastical matrix. An eigenvector for the eigenvalue $1$ of $\underline{x} = G \underline{x}$ in our example would be (rounded)

$$\underline{x} = \begin{pmatrix} 18\\7\\14\\10 \end{pmatrix},$$

leading to the same ranking.

Another very useful application of Linear algebra is

Image Compression (Using the SVD)

Any real matrix $A$ can be written as

$$A = U \Sigma V^T = \sum_{i=1}^{\operatorname{rank}(A)} u_i \sigma_i v_i^T,$$

where $U$ and $V$ are orthogonal matrices and $\Sigma$ is a diagonal matrix. Every greyscale image can be represented as a matrix of the intensity values of its pixels, where each element of the matrix is a number between zero and one. For images of higher resolution, we have to store more numbers in the intensity matrix, e.g. a 720p greyscale photo (1280 x 720), we have 921’600 elements in its intensity matrix. Instead of using up storage by saving all those elements, the singular value decomposition of this matrix leads to a simpler matrix that requires much less storage.

You can create a rank $J$ approximation of the original image by using the first $J$ singular values of its intensity matrix, i.e. only looking at

$$\sum_{i=1}^J u_i \sigma_i v_i^T .$$

This saves a large amount of disk space, but also causes the image to lose some of its visual clarity. Therefore, you must choose a number $J$ such that the loss of visual quality is minimal but there are significant memory savings. Example:

The image on the RHS is an approximation of the image on the LHS by keeping $\approx 10\%$ of the singular values. It takes up $\approx 18\%$ of the original image’s storage. (Source)

This is a simpler example, but maybe that’ll be good for undergraduate students:

Linear algebra is a central tool in 3-d graphics. If you want to use a computer to represent, say, a spaceship spinning around in space, then you take the initial vertices of the space ship in $\mathbb{R}^3$ and hit them by some rotation matrix every $.02$ seconds or so. Then you have to render the 3-d image onto a 2-d screen, which also involves linear algebra (and stuff with colors and lighting probably)

There are probably graphics packages that do a lot of that work for you these days (I actually don’t know that much programming), but the linear algebra is still a pretty good first-order approximation for what the computer is doing.

We can also use Linear Algebra to solve

Ordinary Differential Equations

An ODE is of the form

$$\underline{u}'(t) = A \underline{u}(t) + \underline{b}(t)$$

with $A \in \mathbb{C}^{n \times n}$ and $\underline{b}(t) \in \mathbb{C}^{n \times 1}$. If we have an initial condition

$$\underline{u}(t_0) = \underline{u_0}$$

this is an initial value problem. Assuming the entries of $\underline{b}(t)$ are continuous on $[t_0,T]$ for some $T > t_0$, Picard-Lindelöf provides a unique solution on that interval. If $A$ is diagonalisable, the solution of the homogeneous initial value problem is easy to compute.

Let

$$P^{-1} A P = \Lambda = \operatorname{diag}(\lambda_1, \dots, \lambda_n),$$

where $P = \begin{pmatrix} x_1 & \dots & x_n \end{pmatrix}$. Defining $\tilde{\underline{u}}:= P^{-1} \underline{u}(t)$ and $\tilde{\underline{u_0}} = P^{-1} \underline{u_0}$, the IVP reads

$$\tilde{\underline{u}}'(t) = \Lambda \tilde{\underline{u}}(t), \; \tilde{\underline{u}}(t_0) = \tilde{\underline{u_0}} =: \begin{pmatrix} c_1 & \dots & c_n \end{pmatrix}^T.$$

These are simply $n$ ordinary, linear differential equations

$$\tilde{u_j}'(t) = \lambda_j \tilde{u_j}(t), \; \tilde{u_j}(t_0) = c_j$$

for $j = 1, \dots, n$ with solutions $\tilde{u_j}(t) = c_j e^{\lambda_j(t-t_0)}$. We eventually retrieve $\underline{u}(t) = P \tilde{\underline{u}}(t)$.

Example: We can write

$$x”(t) = -\omega^2 x(t), \; x(0) = x_0, \; x'(0) = v_0$$

as $\underline{u}'(t) = A \underline{u}(t), \; \underline{u}(0) = \underline{u_0}$, where $\underline{u}(t) = \begin{pmatrix} x(t)&x'(t) \end{pmatrix}^T$ and

$$A = \begin{pmatrix} 0&1\\ -\omega^2&0 \end{pmatrix} \text{ and } \underline{u_0} = \begin{pmatrix} x_0\\ v_0 \end{pmatrix}.$$

Computing eigenvalues and eigenvectors, we find

$$\underline{u}(t) = c_1 e^{i \omega t} \begin{pmatrix} 1\\ i \omega \end{pmatrix} + c_2 e^{-i \omega t} \begin{pmatrix} 1 \\ -i \omega \end{pmatrix}.$$

Using the initial condition, we find $x(t) = x_0 \cos(\omega t) + \frac{v_0}{\omega} \sin(\omega t)$.

Matrix exponential: I don’t know if your students already are familiar with the matrix exponential, but using it we find a solution of the homogeneous initial value problem to be given by

$$\underline{u}(t) = e^{(t-t_0)A} \underline{u_0}.$$

To solve the inhomogeneous differential equation, we use can vary the constants. Since every solution of the homogeneous system $\underline{u}'(t) = A \underline{u}(t)$ is of the form $\underline{u}(t) = e^{tA} \underline{c}$ for some constant vector $\underline{c}$, we set $\underline{u_p}(t) = e^{tA} \underline{c}(t)$ and find by plugging in

$$\underline{c}'(t) = e^{-tA} \underline{b}(t).$$

Thus,

$$\underline{u}(t) = e^{(t-t_0)A} \underline{u_0} + \int_{t_0}^t e^{(t-s)A} \underline{b}(s) \, \mathrm ds.$$

The restricted isometry property (RIP) of matrices is something not too hard for undergraduates to understand: it means that a (rectangular) matrix $A$ satisfies
$$ (1-\delta) \|x\| \le \|Ax\|\le (1+\delta)\|x\| \tag{1}$$
for all vectors $x$ with at most $s$ nonzero components. The constant $\delta$ should be small, and of course independent of $x$. The number $s$ is strictly less than the dimension of the domain of $A$ (its number of columns). This means that $A$ encodes sparse vectors with little distortion to their norm.

The fact that “fat” RIP matrices exist (with the number of their rows less than the number of columns) is not obvious, and there is no easy deterministic algorithm to construct them. But suitably random matrices are known to satisfy RIP with high probability. The use of such matrices is essential in the work of Candes and Tao from about 10 years ago, which formed the mathematical foundations of compressed sensing, a novel signal processing technique now applied in MRI and other areas where making a large number of measurements is expensive.

Multiplication of a graph’s adjacency matrix can be used to calculate the number of walks of length $n$ from one vertex to another. In particular:

Proposition. For any graph formed of vertices connected by edges, the number of possible walks of length $n$ from vertex $V_i$ to vertex $V_j$ is given by the $i,j^\text{th}$ entry of $A^n$, where $A$ is the graph’s adjacency matrix.

This proposition is proved by induction.

The nicest examples you could inspect are the simple triangle and square… and unit cube. For nice problems on the basics, you might check Exercise 1.2.15,16,17 in Hubbard$^2$.

You can also make a matrix that allows for directed edges. For some examples via exercise, see Exercise 1.2.18,19 in Hubbard$^2$.

This has applications to not only graph theory, but computers and multiprocessing.

Stoichiometry (not that our students would ever stoop to linear algebra for it) is a very elementary place it shows up. Quantum mechanics is an advanced place. Linear programming is ubiquitous in business applications, as is game theory in economics and political science, and a lot of game theory is based on linear algebra and Markov processes. Least squares and $A^\top A$ show up all over statistics and econometrics.

The stoichiometry issue raises an ilk of question that shows up in graph theory (which presumably is of interest in some applications): Given an integer matrix, what is the integer vector with “smallest” entries in its kernel? When is there a vector with just $\pm 1$? When is there a vector with all nonnegative entries? Etc.

I worked as a software engineer for 27 years for a large Defense corporation. They used Finite Element software tools to model spacecraft designs for stress tests, amount of construction material required, simulated launch testing, etc. Finite Element theory is based on a matrix of vectors that describe the connections and forces on elements of a structure. It also applies to bridges and other civil engineering structures. It may also apply to thin shell models, but I never worked with those.

Another matrix application was to perform text searches in document troves. This involves a lot of natural language approximation. We created a taxonomy (word list) of interesting words for a particular application. Then we used a normalization process to define a word basis as the core of a word, so that “report”, “reports”, “reporting”, and maybe “reporters” would all count for the word “report”. Then create a vector of a document, based on the count of each normalized taxonomy word in the document. Then create a vector from your requirement description text, or even a subset of interesting taxonomy words. Do a dot-product of the requirement text vector with the text vector of each stored document. The documents with the highest dot-product values are closely related to your text search criteria.

In general, linear algebra can be used to approximate models of any system. Define elements of the system, make a vector of properties of the element. Then try to normalize your matrices, or apply equations and transformations to minimize overall cost, maximize overall strength, etc., relating to properties of the model.

One of the examples my students have absolutely loved in the past is Hill cipher. It is a “real” application although outdated but the students do love playing with it. Using some sort of a encoding, convert a message to a bunch of numbers, and then a key matrix is chosen. The message can be organized into an array and multiplied by the key for encryption. And the encrypted message can be multiplied by the inverse of the key matrix to be decrypted. It is fun to give messages to students and them encrypt/decrypt them. It is also very simple to ask them to break a scheme and figure out the key by giving them a couple of plaintext/ciphertext pairs. You can also demonstrate some man-in-the-middle attacks and how a malicious eavesdropper can change the contents of the message without having to know the key at all.

If you want to be a bit more advanced, you can also do modular arithmetic so that you can take the English language and add any three symbols like space, comma, and a period to have a total of 29 symbols and then work modulo 29. 29 is a prime number so it avoids a whole bunch of invertibility issues. Matrix arithmetic in a finite field like Z/29Z illustrates some concepts very clearly. A smaller modulus will make it easier to do this by hand if you want to try that.

Anything to do with scheduling and maximising linear systems: an airline scheduling planes and pilots to minimise the time airplanes are stationary and pilots are just sitting around waiting is an example. Linear optimisation saves millions if not billions of dollars each year by allowing companies to allocate resources optimally, and it’s basically an application of linear algebra.

A standard application in undergraduate physics is to find the principal axes of inertia of a solid.

For any solid object, we can construct an inertia matrix $I_{ij}$, which is a 3×3 symmetric matrix describing how the object’s mass is distributed in space, with respect to a specific Cartesian reference frame $\{x_1,x_2,x_3\}$.

Its diagonal elements $I_{ii}$, or moments of inertia are defined as
$$I_{ii} = \int (r^2- x_i^2) \rho(\vec{r})dV$$
and its off-diagonal elements, or products of inertia are
$$I_{ij} = -\int x_ix_j \rho(\vec{r}) dV$$

(here $\rho(\vec{r})$ is the density of the object at
point $\vec{r}$ in space).

The principal axes of inertia are the eigenvectors of this matrix (which form an orthogonal set, since $I$ is symmetric).

Without trying to explain the physics itself here (check out here for a simple description, or here for more details), the special thing about the principal axes is that they are the only ones about which the body will freely rotate in the absence of any external forces or torques.

(Finite dimensional) quantum mechanics, if this is considered “outside of maths”:

The basic postulates of quantum mechanics read:

  • The quantum system is described by a Hilbert space (make this a complex vector space in finite dimensions).
  • “observables” are Hermitian operators (read: symmetric/self-adjoint matrices in finite dimensions) and measurement outcomes are given by eigenvalues, hence you need to know that every Hermitian matrix is diagonalizable with real eigenvalues (spectral theorem).
  • composite systems are given by the tensor product (this might not be “elementary”, but still, it’s pure linear algebra.
  • time evolution is given by the Schrödinger equation.

The famous “bra-ket”-notation of Dirac used in virtually all textbooks on quantum mechanics is all about inner products and the duality between a vector space and its dual.

Of course, most of quantum mechanics needs infinite dimensional systems and thus functional analysis, but quantum computing and quantum information routinely consider finite dimensional systems (the Hilbert space for qubits is just $\mathbb{C}^2$ and the most important matrices are the Pauli-matrices and their dual).

In short: You can’t do quantum mechanics without knowing the spectral theorem and in quantum information, other norms, such as the Schatten-p-norms deriving from the singular values are frequently used. In order to be able to understand their merit, you need to know what the SVD is.

I like the Google Page Rank and Adjacency Matrix points. Linear Algebra is a deep subject that is readily connected to computer science, graph theory, and combinatorics in unexpected ways. The traditional connection is with numerical analysis.

However, Linear Algebra is closely related to graph theory. There is a field known as algebraic graph theory which utilizes vector spaces, spectral theory, and group theory to study graphs. The idea of linear independence is closely related to the property of a graph being acyclic. And so finding a basis is like finding a spanning tree in a graph. This idea is formalized quite nicely with Matroid Theory.

A Matroid $\mathcal{M}(G, I)$ is a construct with a ground set $G$ and an independent set $I \subset 2^{G}$, where $H \subset G \in I$ iff $H$ is independent. If $G$ is a set of vectors, then $I$ contains all the linearly independent subsets. Similarly, we can let $G$ be the edge set of a graph, and $I$ contains all subsets of edges that don’t form a cycle. If you weight the elements of the ground set and sort them, you can construct a greedy basis. Observe that Kruskal’s algorithm is an instance of this greedy basis approach, but applied to graphs.

Matroids also come into play relating linear independence to collinearity on the Fano Plane. That is, we don’t want three vertices on the same line on the Fano plane. If the vertices of the Fano Plane are weighted, we can label them with vertices from $\mathbb{F}_{2}^{3}$ such that any three vectors are linearly dependent if their vertices are on the same line on the Fano Plane.

Vector Spaces over graphs are nice to explore as well. Cycle Space and Cut Space are the common ones, where they are over the field $\mathbb{F}_{2}$ with addition being the symmetric difference operation. MacLane’s planarity criterion is based on the cycle space.

Spectral graph theory is another neat topic. Once you have the proof about powers of the adjacency matrix counting walks on the graph, you can use the trace operation on the adjacency matrix and the spectral theorem to give you diagonalization. You can easily count triangles and edges using eigenvalues. The number of distinct eigenvalues of the adjacency matrix is also at most one less than the diameter of the graph. There are other neat spectral properties of graphs.

Optimization was mentioned above. Both Linear and Integer programming techniques rely heavily on optimization. You can do a lot of economics here. You can also formulate the Network-Flow problem as an LP, and the min-cut problem is the dual.

Using Vandermonde matrices, one can show that for any $k$ and any $n$, there exists $k$ points in general position in $\Bbb R^n$. Indeed, given $k$ (assuming $k>n$), pick $k$ distinct real numbers and consider ${\bf v}_i=(r_i,r_i^2,\ldots,r_i^{n})$ for $i=1,\ldots,k$ and use Vandermonde’s determinant to prove the claim.

Linear Algebra is widely used in the optimization of particle accelerators. These machines (either rings or linear) are composed by a number of electro-magnetic elements which are far from perfect and perfectly aligned. As a result the beam is not exactly steered. To correct for this you have a set of Beam Position Monitors (BPM), that tells you how much the beam is displaced, and a set of corrector magnets, that can adjust the beam trajectory. According to the size of the machine their number can be up to few hundreds and often the number of correctors is lower than the number of BPMs. They are placed all around the machine, but they are useless if you are not able to find a proper configuration of the correctors!

A popular and effective technique consists in measuring the response matrix $R$ which tells you how each BPM responds to each corrector with a simple product:
$$b = R\;c$$
Where $b$ and $c$ are vectors containing the values of the BPMs and correctors. Now all that we need to do is to determine the configuration of correctors that minimizes the excitation of the BPMs (this means that the beam is passing closer to their centre). Very often the procedure goes through an SVD which simplifies A LOT the computation of the minimum.

Once properly implemented in the control system, this and slightly more advanced techniques are extremely faster and more effective than manual/empirical optimizations.

Affine Geometry works well to smoothly fit curves and surfaces to a set of desired data points. Affine Geometry builds on top of Linear Algebra and is essential to any engineering that combines absolute and relative for both direction and displacement. Engineering the moving parts of a vehicle combine absolute and relative motion with both rotational and non-rotational movement.

I’m using matrix operations when work with neural networks. It helps to represent weights and inputs of neural network as matrix which also give me the way to parallel computations at different processors or cores of processors. Before usage of matrix operations I didn’t know how to spread calculations at neural networks in parallel mode

Computer science has a lot of applications!

  1. Manipulating images.
  2. Machine learning (everywhere).
    For example: Multivariate linear regression $(X^TX)^{-1}X^{T}Y$. Where X is an n x m matrix and Y is an N x 1 Vector.

In game development, basic linear algebra pops up all the time. This is some code I wrote this morning:

var toMouse = transform.InverseTransformPoint(Screen.mousePosition);
var cosOfAlpha = Vector3.Dot(toMouse, _toSameEdgeBind)/(toMouse.magnitude * _sameEdgeMagnitude);
var alpha = Mathf.Acos(cosOfAlpha);