What are some mathematically interesting computations involving matrices?

I am helping designing a course module that teaches basic python programming to applied math undergraduates. As a result, I’m looking for examples of mathematically interesting computations involving matrices.

Preferably these examples would be easy to implement in a computer program.

For instance, suppose

so that $F_n$ is the $n^{th}$ term in the Fibonacci sequence. If we set

1 & 1 \\ 1 & 0

we see that

1 & 1 \\ 1 & 0
\end{pmatrix} =
F_2 & F_1 \\ F_1 & F_0

and it can be shown that

A^n =
F_{n+1} & F_{n} \\ F_{n} & F_{n-1}

This example is “interesting” in that it provides a novel way to compute the Fibonacci sequence. It is also relatively easy to implement a simple program to verify the above.

Other examples like this will be much appreciated.

Solutions Collecting From Web of "What are some mathematically interesting computations involving matrices?"

If $(a,b,c)$ is a Pythagorean triple (i.e. positive integers such that $a^2+b^2=c^2$), then
1 & -2 & 2\\
2 & -1 & 2\\
2 & -2 & 3
is also a Pythagorean triple. In addition, if the initial triple is primitive (i.e. $a$, $b$ and $c$ share no common divisor), then so is the result of the multiplication.

The same is true if we replace $A$ by one of the following matrices:

1 & 2 & 2\\
2 & 1 & 2\\
2 & 2 & 3
\quad \text{or}\quad
-1 & 2 & 2\\
-2 & 1 & 2\\
-2 & 2 & 3

Taking $x=(3,4,5)$ as initial triple, we can use the matrices $A$, $B$ and $C$ to construct a tree with all primitive Pythagorean triples (without repetition) as follows:

\end{matrix}\right.\\ \\
\end{matrix}\right.\\ \\

Source: Wikipedia’s page Tree of primitive Pythagorean triples.

(Just my two cents.) While this has not much to do with numerical computations, IMHO, a very important example is the modelling of complex numbers by $2\times2$ matrices, i.e. the identification of $\mathbb C$ with a sub-algebra of $M_2(\mathbb R)$.

Students who are first exposed to complex numbers often ask “$-1$ has two square roots. Which one is $i$ and which one is $-i$?” In some popular models of $-i$, such as $(0,-1)$ on the Argand plane or $-x+\langle x^2+1\rangle$ in $\mathbb R[x]/(x^2+1)$, a student may get a false impression that there is a natural way to identify one square root of $-1$ with $i$ and the other one with $-i$. In other words, they may wrongly believe that the choice should be somehow related to the ordering of real numbers. In the matrix model, however, it is clear that one can perfectly identify $\pmatrix{0&-1\\ 1&0}$ with $i$ or $-i$. The choices are completely symmetric and arbitrary. Neither one is more natural than the other.

If $(a,b,c),(d,e,f)\in\mathbb{R}^3$, then$$(a,b,c)\times(d,e,f)=(b f-c e,c d-a f,a e-b d).$$This formula seems rather arbitrary, but there is another way of defining the cross-product which uses matrices. Define$$A(x,y,z)=\begin{pmatrix}0&-x&z\\x&0&-y\\-z&y&0\end{pmatrix}.$$Then$$A(a,b,c).A(d,e,f)-A(d,e,f).A(a,b,c)=A(b f-c e,c d-a f,a e-b d).$$Some properties of the cross-product are an immediate consequence of this, such as for instance:

  • $(a,b,c)\times(a,b,c)=(0,0,0)$;
  • more generally, $(a,b,c)\times(d,e,f)=-(d,e,f)\times(a,b,c)$.

Iterated Function Systems are fun to draw and are computed with matrices. They can also illustrate what matrix multiplication does.

enter image description here

As an example, this draws Barnsley’s fern:

import numpy as np
import matplotlib.pyplot as plt
from random import randint

fig = plt.figure()

mats = [np.array([[0.8,0.03],[-0.03,0.8]])]

offsets = np.array([[0,1],[0.4,0.2],[-0.4,0.2],[0,-1]])

vec = np.array([0,0])
xlist, ylist  = [], []
for count in range(100000):
    r = randint(0,100)
    i = 0
    if r > 70: i += 1
    if r > 80: i += 1
    if r > 90: i += 1
    vec = mats[i].dot(vec) + offsets[i]

plt.plot(xlist, ylist, color='w', marker='o', markeredgewidth=0.1, markersize=0.3, markeredgecolor='k')

The roots of any polynomial $$p(x) = \sum_{i=1}^{n} c_i x^i$$ are the eigenvalues of the companion matrix

0 & 0 & \cdots & 0 & -c_0/c_n \\
1 & 0 & \cdots & 0 & -c_1/c_n \\
0 & 1 & \cdots & 0 & -c_2/c_n \\
\vdots & \vdots & \ddots & \vdots & \vdots \\
0 & 0 & \cdots & 1 & -c_{n-1}/c_n

which you can have them compute via power iteration.

…at least in the case of real roots. Different methods may be needed for complex roots.

Finite Difference Method

IMO pure math problems are often not that attractive for undergrads and pupils. People can better relate to applied problems, even if abstracted a little, than to completely abstract math problems.

So, if you want to show them something more applied here is an example using differential equations. For the 1D case you most likely don’t need anything, for the 2D case you need the concept of partial differentiation.

The following equation e.g. models the behaviour (deformation $u$) of the membrane of a drum (2D), if you apply pressure $f$ on it. The same for 1D, which is easier to explain, since partial derivatives might not be known.

        | f
 |--------------|       ⇒          |--         --|
 0              1                     --      --

For 2D: Let $Ω=[0,1]²$. Solve:
$$Δu(x,y) = f(x,y) \text{ in } Ω, \qquad u(x,y)=0 \text{ on } ∂Ω.$$
For 1D: Let $I=[0,1]$. Solve:
$$-u”(x)=f(x) \qquad \text{ in } I, \qquad u(0)=u(1)=0.$$

Let’s first look at the 1D case, since it is easier to deduce the 2D case from it afterwards. They should be able to understand the idea of 1D FDM immediately.

The idea of “F”DM is to cover the interval $I$ with $n+2$ points, so called nodal points:

    0                                1

   x_0   x_1    x_2    x_3    x_4    x_{n+1}


So we now have $$0=x_0<x_1<…<x_{n+1}=1.$$

These points should be equidistant, meaning: $$x_{i+1}-x_i = h, \qquad i=0,…,n.$$

Now since the differential equation holds for all $x∈I$ it also holds for all $x_i$:

$$-u”(x_i)=f(x_i) \qquad ∀i=1,…,n; \qquad u(0)=u(1)=0.$$

The next step of F”D”M is to approximate the derivative with the difference quotient:

$$u”(x) = \frac{u(x-Δx)-2u(x)+u(x+Δx)}{Δx^2}$$
$$-u”(x_i) = \frac{-u(x_{i-1})+2u(x_i)-u(x_{i+1})}{h^2}, \qquad i=1,…,n$$

And we get $n$ equations:
$$ \frac{-u(x_{i-1})+2u(x_i)-u(x_{i+1})}{h^2} = f(x_i), \qquad i=1,…,n$$
and the boundary condition is:
u(x_0)&=0, \\ u(x_{n+1}) &= 0

Written as a matrix system it is:

1 &0 \\
-1 &2 & -1 & \\
& -1 &2 & -1 & \\
& & \ddots &\ddots & \ddots & \\
& & & -1 & 2 & -1 \\
& & & & & 0 & 1 \\
\begin{pmatrix}u_0 \\ u_1 \\ \vdots \\ \vdots \\ u_n \\ u_{n+1} \end{pmatrix}
\begin{pmatrix}f_0 \\ f_1 \\ \vdots \\ \vdots \\ f_n \\ f_{n+1} \end{pmatrix}$$

using the notation $u_i = u(x_i)$, and setting $f_0=f_{n+1}=0$.

The inner part of that matrix is symmetric and positive definite, so everything you could wish.

Edit: It might be that you need to multiply $f$ by $-1$ to get the deformation $u$ in the correct direction. Usually one looks at $-Δu=\pm f$, because of two reasons: First the matrix given by $-Δ$ is positive definite, and second the heat equation reads $∂_tu-Δu$.

Edit2: I noticed, that symmetric positive definite needs to be explained more. The equations of the boundary values don’t have to be solved. They can be transferred to the rhs. Therefore, you actually solve the following $n×n$-system:
2 & -1 & \\
-1 &2 & -1 & \\
& \ddots &\ddots & \ddots & \\
& & -1 & 2 & -1 \\
& & & -1 & 2
\begin{pmatrix} u_1 \\ u_2 \\ \vdots\\ u_{n-1} \\ u_n \end{pmatrix}
\begin{pmatrix} f_1 + h^2f_0 \\ f_2 \\ \vdots \\ f_{n-1}\\ f_n + h^2f_{n+1} \end{pmatrix}$$

And this matrix is symmetric, positive definite by weak row sum criterion.

In 2D you need a grid of points, e.g. $m×m$ points.
If you sort them lexicographically (row-wise)

        |           |           |
---  (i+m-1)------(i+m)------(i+m+1) ---
        |           |           |
        |           |           |
---   (i-1)--------(i)--------(i+1)  --- 
        |           |           |
        |           |           |
---  (i-m-1)------(i-m)------(i-m+1) ---
        |           |           |

the matrix is:
$$\frac{1}{h^2}[diag(4) + diag(-1,1) + diag(-1,-1) + diag(-1,m) + diag(-1,-m)]$$

simply because
\begin{align*}-Δu(x,y) &= -∂_{xx}u-∂_{yy}u \\ &=
\frac{-u(x_{i-1})+2u(x_i)-u(x_{i+1})}{h^2} + \frac{-u(x_{i-m})+2u(x_i)-u(x_{i+m})}{h^2}

If they understand everything about FDM, you can also introduce FEM to them. That can be done within 2 hours.

Using bilinear Finite Elements on the same lexicographically ordered grid, and the “tensorproduct-trapezoidal-rule”
$$Q_t(f):=\frac{|T|}{4}\sum_{i=1}^4f(a_i),\qquad a_i \text{ corner points of cell } T,$$ to evaluate the integrals $∫_{T}ϕ^iϕ^jd(x,y)$, will result in the same matrix as 2D FDM (except of the $h^{-2}$-factor).

Rotation matrices are a typical example of useful matrices in computer graphics

$${\bf R_\theta} = \begin{bmatrix}\cos(\theta)&\sin(\theta)\\-\sin(\theta)&\cos(\theta)\end{bmatrix}$$

They rotate a vector around origo by the angle $\theta$.

If you want to make it more complicated you can make them in 3 dimensions. For example to rotate an angle $\theta$ around the $x$-axis:

$${\bf R_{\theta,x}} = \begin{bmatrix}1&0&0\\0&\cos(\theta)&\sin(\theta)\\0&-\sin(\theta)&\cos(\theta)\end{bmatrix}$$

Even a bit more complicated are the affine transformations, in 2D you can make one like this:

$${\bf A_{r,\theta,x_0,y_0}} = \begin{bmatrix}r\cos(\theta)&r\sin(\theta)&x_0\\-r\sin(\theta)&r\cos(\theta)&y_0\\0&0&1\end{bmatrix}, {\bf v} = \begin{bmatrix}x\\y\\1\end{bmatrix}$$

which with matrix multiplication scales the current vector ($x$ and $y$ in $\bf v$) a factor $r$, rotates it with angle $\theta$ and then adds (translates by) $[x_0,y_0]^T$.

Not really sure if it applies, but matrices offer a cool way of doing geometric optics (in small angle approximation). A ray is represented by a row vector $\begin{pmatrix} y & \alpha \end{pmatrix}$, where $y$ is current height above the optical axis and $\alpha$ the angle which the ray makes with the axis.

This way, all optical instruments just turn into 2×2 matrices (because the small angle approximation drops the goniometric functions and makes everything linear). Then, if the ray (with vector $r$) meets some instruments (with matrices $M_1, M_2, \ldots, M_n$) on the way, it will finally emerge as a ray given by $rM_1M_2 \ldots M_n$.

As an example, let’s find the matrix of empty space of “length” $L$. The angle of the ray won’t change, while the height changes according to the formula $y’ = y + L \sin \alpha \approx y + L \alpha$. If we put that into a matrix, we get
$$\begin{pmatrix} 1 & 0 \\ L & 1 \end{pmatrix}.$$

A very important matrix is a matrix of spherical boundary between two parts of space with different refractive indices. After some picture-drawing and using Snell’s law, you get
$$\begin{pmatrix} 1 & R^{-1}(N^{-1} – 1) \\ 0 & N^{-1} \end{pmatrix},$$
where $R$ is the radius of the sphere and $N = n_2/n_1$ ($n_1$ is the refractive index “before” the boundary (as the ray goes), $n_2$ “after”).

From this, you can derive everything about the simple optical instruments. For instance an infinitely thin lens is just one spherical boundary just after another. If you need a thick lens, just put some empty space between the two boundaries.

This is the coolest approach to the geometric optics that I know of, mainly because there is 1:1 correspondence between the real instruments and matrices, and because it’s so modular (you can build anything out of few matrices). I’ll understand if you won’t like it (because you clearly asked for a mathematical calculation), but I guessed it could be worth sharing anyway.

Edit: A nice article for start (from which I learned about this — also with some images) is here: http://www.southampton.ac.uk/~evans/TS/Toolbox/optic2.pdf

They might be interested in

Discrete Fourier Transform (DFT) Matrix


The two point DFT: ${\displaystyle {\begin{bmatrix}1&1\\1&-1\end{bmatrix}}}$


Toeplitz Matrix: (Wolfram) and wikipedia


a & b & c & d & e \\
f & a & b & c & d \\
g & f & a & b & c \\
h & g & f & a & b \\
i & h & g & f & a

Here is a pretty amusing matrix computation with an accompanying graphical demonstration: take a suitably small $h$ and consider the iteration

$\mathbf x_{k+1}=\begin{pmatrix}1&-h\\h&1\end{pmatrix}\mathbf x_k$

with $\mathbf x_0=(1\quad 0)^\top$. If you draw the points generated by this iteration, you should see a circle arc starting to form.

This is because the iteration is actually a disguised version of Euler’s method applied to the differential equations


which of course has the known solution $(\cos t\quad \sin t)^\top$.

Yet another way to look at this is that the iteration matrix $\begin{pmatrix}1&-h\\h&1\end{pmatrix}$ also happens to be a truncation of the Maclaurin series for the rotation matrix $\begin{pmatrix}\cos h&-\sin h\\\sin h&\cos h\end{pmatrix}$.

A related algorithm is Minsky’s circle drawing algorithm from the HAKMEM (see item 149), where the iteration now proceeds as (temporarily switching to component form, and changing Minsky’s original notation slightly):

$$\begin{align*}x_{k+1}&=x_k-h y_k\\y_{k+1}&=y_k+hx_{k+1}\end{align*}$$

This can be recast into matrix format by substituting the first recurrence for $x_{k+1}$ in the second recurrence, yielding


and we once again recognize a truncated Maclaurin series for the rotation matrix.

If $A=[a_{ij}]$ with $a_{ij}$ being the number of (directed) arcs from vertex $i$ to vertex $j$ in a graph, then the entries of $A^n$ show the number of distinct (directed) paths of length $n$ from vertex $i$ to vertex $j$ also.

While the calculation grows for higher numbers of vertices, it is highly amenable to simple programming, especially where tools are built in for handling matrix multiplication or value extraction.

How about Polynomial Fit using a Vandermonde matrix?

Given two vectors of $n$ values, ${\bf X}=\pmatrix{x_1 & x_2 & \ldots & x_n }^\top$ and ${\bf Y}=\pmatrix{y_1 & y_2 & \ldots & y_n }^\top$

Construct the n×(m+1) Vandermondle matrix $\bf A$ and polynomial coefficients (m+1)×1 vector $\bf C$ representing a m order polynomial (m<=n)

$$ \begin{align} \mathbf{A} & = \left[ \matrix{
1 & x_1 & x_1^2 & \ldots & x_1^m \\
1 & x_2 & x_2^2 & \ldots & x_2^m \\
\vdots & & & \vdots \\
1 & x_n & x_n^2 & \ldots & x_n^m \\ } \right] & \mathbf{C} & = \pmatrix{c_0 \\ c_1 \\ c_2 \\ \vdots \\ c_m} \end{align} $$

Finding the polynomial fit means solving the system $\mathbf{A} \mathbf{C} = \mathbf{Y}$ using a least squares method. This is done with

$$\mathbf{C} = \left( \mathbf{A}^\top \mathbf{A} \right)^{-1} \mathbf{A}^\top \mathbf{Y}$$

Once you have the coefficients $c_0,c_1,\ldots c_m$ the polynomial fit is

$$ \boxed{ y_{fit} = c_0 + c_1 x + c_2 x^2 + \ldots + c_m x^m } $$

REF: https://stackoverflow.com/a/8005339/380384

Normally you can’t do vector addition using matrices but you can if you add a dimension and use certain block matrices.Consider an $n \times n$ block matrices of the form $\begin{pmatrix}I&v\\0&1\end{pmatrix}$ with $I$ the $n-1\times n-1$ identity matrix, $v$ an $n-1$ column vector $0$ and $n-1$ row vector and the $1$ simply a $1 \times 1$ block. If you multiply two matrices of this form with vectors $v_1$ and $v_2$ it will add the vectors and preserve the block structure. You can also look at what happens when the identity matrix is replaced with some matrix $A$ and multiply them together too.

The $4 \times 4$ case is particularly interesting for computer science because the OpenGL transform uses this form to encode translations of graphical objects and their local linear transformation relative to other objects you’re working with. This form also has the advantage of using the $1$ position to scale objects appropriately under a perspective projection, however this is outside the scope of linear algebra.

Here’s the polynomial derivative matrix (where $n$ is the degree of the polynomial):

$\newcommand{\bbx}[1]{\,\bbox[15px,border:1px groove navy]{\displaystyle{#1}}\,}\newcommand{\i}{\mathrm{i}}\newcommand{\text}[1]{\mathrm{#1}}\newcommand{\root}[2][]{^{#2}\sqrt[#1]} \newcommand{\derivative}[3]{\frac{\mathrm{d}^{#1} #2}{\mathrm{d} #3^{#1}}} \newcommand{\abs}[1]{\left\vert\,{#1}\,\right\vert}\newcommand{\x}[0]{\times}\newcommand{\summ}[3]{\sum^{#2}_{#1}#3}\newcommand{\s}[0]{\space}\newcommand{\i}[0]{\mathrm{i}}\newcommand{\kume}[1]{\mathbb{#1}}\newcommand{\bold}[1]{\mathbf{#1}}\newcommand{\italic}[1]{\mathit{#1}}\newcommand{\kumedigerBETA}[1]{\rm #1\!#1}$

$$\large \mathbf {Derivative\s matrix\s}_{(n+1)\times (n+1)}=\begin{bmatrix}
0 & 1 & 0 & 0 & \cdots & 0 \\
0 & 0 & 2 & 0 & \cdots & 0 \\
0 & 0 & 0 & 3 & \cdots & 0 \\
0 & 0 & 0 & 0 & \cdots & 0 \\
\vdots & \vdots & \vdots & \vdots & \ddots & \vdots \\
0 & 0 & 0 & 0

Let me do a sample calculation:

Question: What’s $$\derivative{}{}{x}\left(2x^3+5x^2+7\right)$$

Find the derivative matrix and multiply it with coefficients of the polynomial.

$$\mathbf {Answer}=\begin{bmatrix}
0 & 1 & 0 & 0\\
0 & 0 & 2 & 0\\
0 & 0 & 0 & 3\\
0 & 0 & 0 & 0\\

And integral calculation (where $n$ is the degree of the polynomial, again)!

$$\large \mathbf {Integral\s matrix\s}_{(n+\bold{2})\times (n+\bold{2})}=\begin{bmatrix}
0 & 0 & 0 & 0 & \cdots & 0 \\
1 & 0 & 0 & 0 & \cdots & 0 \\
0 & \frac{1}{2} & 0 & 0 & \cdots & 0 \\
0 & 0 & \frac{1}{3} & 0 & \cdots & 0 \\
\vdots & \vdots & \vdots & \vdots & \ddots & \vdots \\
0 & 0 & 0 & 0

Question: What’s $$\int\left(2x^3+5x^2+7\right)\cdot\mathrm{d}x$$

$$\mathbf {Answer}=\begin{bmatrix}
0 & 0 & 0 & 0 & 0\\
1 & 0 & 0 & 0 & 0\\
0 & \frac{1}{2} & 0 & 0 & 0\\
0 & 0 & \frac{1}{3} & 0 & 0\\
0 & 0 & 0 & \frac{1}{4} & 0\\
\end{bmatrix}\Rightarrow \frac{1}{2}x^4+\frac{5}{3}x^3+7x+\mathit{constant}

Warning: Don’t forget $+constant$.

Inspired by: $3$Blue$1$Brown

Quantum computing

Quantum logic gates are often represented as matrices, and since Python has built-in complex numbers no additional effort is required. It should be within the reach of undergraduates to implement a simulation of Shor’s algorithm for a handful of qubits.

As a longer follow up to my comment you can investigate the generalisation
$B_m^n$ for $$B_m^1=\begin{pmatrix} m & 1 \\ 1 & 0 \end{pmatrix} = \begin{pmatrix} M_2 & M_1 \\ M_1 & M_0 \end{pmatrix}$$

with $m=2$ “silver Fibonacci numbers” Pell Numbers

with $m=3$ “bronze Fibonacci numbers”

According to https://oeis.org/A006190 the “bronze Fibonacci numbers” can be in turn generated from a vector matrix of Pell Numbers multiplied by previous bronze numbers prefixed by a 1. {Credited to Gary W. Adamson}

$$B_3[5]=\begin{pmatrix} 29 & 12 & 5 & 2 & 1 \end{pmatrix} \times\begin{pmatrix} 1 \\ 1 \\ 3 \\ 10 \\ 33 \end{pmatrix}= 109$$

Is there an analogous relationship between the Fibonacci numbers and the Pell numbers? I can’t see one on quick inspection.

Edit: Sorry I got my arithmetic wrong the same analogous relation appears between Fibonacci and Pell numbers. $B_2[5]$ is the fifth Pell number.

$$B_2[5]=\begin{pmatrix} 5 & 3 & 2 & 1 & 1 \end{pmatrix} \times\begin{pmatrix} 1 \\ 1 \\ 2 \\ 5 \\ 12 \end{pmatrix}= 29$$

Two-dimensional affine transformation matrices can work great, because you can generate them with simple formulas, apply them to geometric shapes to obtain new shapes (or even hand them a library that can graph Bézier curves given control points) and see the results as a picture.

It is very hard to imagine dealing with control theory and dynamical systems without using matrices. One example is the state-space representation of a dynamical system. Suppose a linear system is described as:
$$\dot x=Ax+Bu\\y=Cx+Du$$
where $x\in\mathbb R^n,A\in\mathbb R^{n\times n}$. Aside from the fact that matrix representation of equations is really cool and gives us many information just by looking at it, system analysis and control synthesis becomes much easier. One can show that for a given input $u(t)$ and known initial conditions $x(0)$, the state of the system at any time can be obtained from:
where $\Phi(t)=e^{At}$. Now the coolest part (in my opinion) is, how to compute $e^{At}$?

We can show that if $X=P^{-1}AP$, then $X^n=P^{-1}A^nP$ for every $n\in\mathbb Z$. Therefore from taylor’s formula
So if we can find a mapping to get an $X$ from $A$, whose exponential is easy to compute, then $\Phi(t)$ will be at our hand. This is where the Jordan normal form enters the scene.

Since this got reopened again, here is a nice matrix product evaluation due to Bill Gosper.

Letting $\mathbf A_0=\mathbf I$, and evaluating the recurrence

$$\mathbf A_{k+1}=\mathbf A_k\begin{pmatrix}-\frac{k+1}{2(2k+3)}&\frac5{4(k+1)^2}\\0&1\end{pmatrix}$$

yields a sequence of matrices that converge to $\mathbf A_\infty=\begin{pmatrix}0&\zeta(3)\\0&1\end{pmatrix}$, where $\zeta(3)$ is Apéry’s constant.

This can be further generalized to the evaluation of odd zeta values, by considering instead the matrix


which when used in the iteration will yield $\zeta(3),\zeta(5),\dots,\zeta(2n+1)$ in the last column.

I cannot put my hands on my copy but I’m sure the title is Cubic Math. This is from early days of the Rubik’s cube and teaches how to solve it. The steps of twisting one side after another are of interest in that they correspond to matrix manipulations. The operations are done in extended sequences followed by the reverse of the last few parts to regain the starting point.

I believe faster methods have come along since.

PageRank (Google’s core search algorithm) is arguably the most financially valuable algorithm ever created, and is just a glorified eigenvector problem. Wikipedia has a short Matlab implementation you can reference. Eigenvector centrality is a simpler variant which your students may prefer.

Eigenmorality is a fun application.

How about anything that involves linear regression?

Regression can be seen as a line-fitting problem where one minimizes the error between the real value and the predicted value for each observation.

Consider an overdetermined system

$$\sum_{j=1}^{n} X_{ij}\beta_j = y_i,\ (i=1, 2, \dots, m),$$

of m linear equations in n unknown coefficients, $\beta_1$…$\beta_n$, with $m > n$. Not all of $X$ contains information on the data points. The first column is populated with ones, $X_{i1} = 1$, only the other columns contain actual data, and n = number of regressors + 1. This can be written in matrix form as

$$\mathbf {X} \boldsymbol {\beta} = \mathbf {y},$$


$$\mathbf {X}=\begin{bmatrix}
X_{11} & X_{12} & \cdots & X_{1n} \\
X_{21} & X_{22} & \cdots & X_{2n} \\
\vdots & \vdots & \ddots & \vdots \\
X_{m1} & X_{m2} & \cdots & X_{mn}
\end{bmatrix} ,
\qquad \boldsymbol \beta = \begin{bmatrix}
\beta_1 \\ \beta_2 \\ \vdots \\ \beta_n \end{bmatrix} ,
\qquad \mathbf y = \begin{bmatrix}
y_1 \\ y_2 \\ \vdots \\ y_m
\end{bmatrix}. $$

Such a system usually has no solution, so the goal is instead to find the coefficients $\boldsymbol{\beta}$ which fit the equations best, in the sense of solving the minimization problem

$$\hat{\boldsymbol{\beta}} = \underset{\boldsymbol{\beta}}{\operatorname{arg\,min}}\,S(\boldsymbol{\beta}), $$

where the objective function S is given by

$$S(\boldsymbol{\beta}) = \sum_{i=1}^m \bigl| y_i – \sum_{j=1}^n X_{ij}\beta_j\bigr|^2 = \bigl\|\mathbf y – \mathbf X \boldsymbol \beta \bigr\|^2.$$

This minimization problem has a unique solution, given by

$$(\mathbf X^{\rm T} \mathbf X )\hat{\boldsymbol{\beta}}= \mathbf X^{\rm T} \mathbf y.$$

Let me copy-paste a problem from an old worksheet of mine, about Cayley’s rational parametrization of orthogonal matrices. As a particular case, it gives a formula for Pythagorean triples (up to common factors), which is distict from the tree @Pedro has brought up.

The next exercise shows one way (due to Cayley) of finding a “random” orthogonal matrix with rational entries. See also
http://planetmath.org/cayleysparameterizationoforthogonalmatrices and Chapter 22 of Prasolov’s “Problems and theorems in linear algebra”.

Exercise. Let $S$ be a skew-symmetric $n\times n$ matrix (that is, $S^{T}=-S$).

(a) I claim that $S-I$ is invertible for “almost all” choices of $S$. More precisely, for a given $S$, the matrix $\lambda S-I$ is invertible for all but finitely many real numbers $\lambda$.

Proof: If $\lambda S-I$ is not invertible, then $\det\left( \lambda S-I\right) =0$, thus $\det\left( S-\dfrac{1}{\lambda}I\right) =\left( \dfrac{1}{\lambda}\right) ^{n}\det\left( \lambda S-I\right) =0$, and so $\dfrac{1}{\lambda}$ is an eigenvalue of $S$. But $S$ has only finitely many eigenvalues. Thus, $\lambda S-I$ is invertible for all but finitely many real numbers $\lambda$.

From now on, we assume that $S-I$ is invertible.

(b) Show that the matrices $\left( S-I\right) ^{-1}$ and $S+I$ commute (that is, we have $\left( S-I\right) ^{-1}\cdot\left( S+I\right) =\left( S+I\right) \cdot\left( S-I\right) ^{-1}$).

Proof: We have $\left( S+I\right) \cdot\left( S-I\right) =S^{2}-S+S-I=\left( S-I\right) \cdot\left( S+I\right) $. Multiplying this with $\left(S-I\right)^{-1}$ from the left and with $\left(S-I\right)^{-1}$ from the right, we obtain $\left( S-I\right) ^{-1}\cdot\left( S+I\right) =\left( S+I\right) \cdot\left( S-I\right) ^{-1}$.

(c) Show that the matrix $\left( S-I\right) ^{-1}\cdot\left( S+I\right) $ is orthogonal.

Proof: We have $\left( S+I\right) ^{T}=S^{T}+I^{T} = -S + I$ and $\left( S-I\right) ^{T}=S^{T}-I^{T} = -S – I$. Now,
& \underbrace{\left( \left( S-I\right) ^{-1}\cdot\left( S+I\right)
\right) ^{T}}_{=\left( S+I\right) ^{T}\cdot\left( \left( S-I\right)
^{-1}\right) ^{T}}\cdot\left( S-I\right) ^{-1}\cdot\left( S+I\right) \\
& =\left( S+I\right) ^{T}\cdot\underbrace{\left( \left( S-I\right)
^{-1}\right) ^{T}}_{\substack{=\left( \left( S-I\right) ^{T}\right)
^{-1}\\\text{(the transpose of an inverse is}\\\text{the inverse of the
transpose)}}}\cdot\left( S-I\right) ^{-1}\cdot\left( S+I\right) \\
& =\left( S+I\right) ^{T}\cdot\left( \left( S-I\right) ^{T}\right)
^{-1}\cdot\underbrace{\left( S-I\right) ^{-1}\cdot\left( S+I\right)
}_{\substack{=\left( S+I\right) \cdot\left( S-I\right) ^{-1}\\\text{(by
part (b))}}}\\
& =\underbrace{\left( S+I\right) ^{T}}_{= -S + I}
\cdot\left( \underbrace{\left( S-I\right) ^{T}}
_{=-S-I}\right) ^{-1}\cdot\left( S+I\right)
\cdot\left( S-I\right) ^{-1}\\
& =\left(-S+I\right)\cdot\underbrace{\left(
-S-I\right) ^{-1}\cdot\left( S+I\right)
}_{=-I}\cdot\left( S-I\right) ^{-1}\\
& =\left(-S+I\right)\cdot\left(-I\right)
\cdot\left( S-I\right) ^{-1}=-\underbrace{\left(-S+I\right)
\cdot\left( S-I\right) ^{-1}}_{=-I}
=- \left(-I\right) = I.

(d) Starting with different matrices $S$, we get a lot of different
orthogonal matrices using the $\left( S-I\right) ^{-1}\cdot\left(
S+I\right) $ formula ($S-I$ is “almost
always” invertible, so we do not get into troubles with
$\left( S-I\right) ^{-1}$ very often). Do we get them all?

No. In
fact, orthogonal $n\times n$ matrices can have both $1$ and $-1$ as
determinants (examples: $\det I=1$ and
$\det J=-1$, where $J$ is the diagonal matrix whose diagonal is
$-1, 1, 1, \ldots, 1$), whereas $\det\left(
\left( S-I\right) ^{-1}\cdot\left( S+I\right) \right) $ always is
$\left( -1\right) ^{n}$.

Proof (of the latter claim):
\det\left( S-I\right) & =\det\left( \left( S-I\right) ^{T}\right)
=\det\left( -S-I\right) \\
& \ \ \ \ \ \ \ \ \ \ \left( \text{since }\left( S-I\right) ^{T}
=-S-I\text{ as we found in (c)}\right)
& =\left( -1\right) ^{n}\det\left( S+I
\right) .
& \det\left( \left( S-I\right) ^{-1}\cdot\left( S+I\right) \right) \\
& =\left( \underbrace{\det\left( S-I\right) }_{=\left( -1\right)
^{n}\det\left( S+I\right) }\right)
^{-1}\cdot\det\left( S+I\right) \\
& =\left( \left( -1\right) ^{n}\det\left(
S+I\right) \right) ^{-1}\cdot\det\left(
S+I\right) =\left( \left( -1\right) ^{n}\right) ^{-1}=\left( -1\right)

(e) There is a way to tweak this method to get all orthogonal
matrices. Namely, if $Q$ is an orthogonal $n\times n$ matrix, then we can get
$2^n$ new orthogonal matrices from
$Q$ by multiplying some of its columns with $-1$. Applying this to matrices of
the form $\left( S-I\right) ^{-1}\cdot\left( S+I\right) $ gives us all
orthogonal matrices (many of them multiple times). (I am not asking for the
proof; this is difficult.)

(f) Compute $\left( S-I\right) ^{-1}\cdot\left( S+I\right) $ for
a $2\times2$ skew-symmetric matrix $S=\left(
0 & a\\
-a & 0
\right) $.

[Remark: Do you remember the formulas for Pythagorean triples? In
order to solve $x^{2}+y^{2}=z^{2}$ in coprime positive integers $x,y,z$, set
x=m^{2}-n^{2},\qquad y=2mn\qquad \text{and}
\qquad z=m^{2}+n^{2}.
The answer in part (f) of the above exercise is related to this, because a
$2\times2$ orthogonal matrix $\left(
p & u\\
q & v
\right) $ must have $p^{2}+q^{2}=1$, and if its entries are rationals, then
clearing the common denominator of $p$ and $q$, we obtain a Pythagorean

Markov chains

Let us consider $n$ states $1,2,\dotsc,n$ and a transition matrix $P=(p_{i,j})_{1\leq i,j\leq n}$. We request that $P$ be stochastic : it means that every entry $p_{i,j}$ is nonnegative and that the sum on each line is equal to $1$ : for every $i$, $\sum_{j=1}^n p_{i,j}=1$.

We define a sequence $(X_n)_{n\geq 0}$ of random variables the following way : let’s say we start in the state $1$, i.e. $X_0=1$ with probability 100%. This is represented by the row vector
Each step we transition from state $i$ to state $j$ with probability $p_{i,j}$. For instance, for $X_1$ we have $\mathbb{P}(X_1=1)=p_{1,1},\dotsc,\mathbb{P}(X_1=n)=p_{1,n}$. The distribution of $X_1$ can be written as
In fact, if the distribution of $X_k$ is
then the distribution of $X_{k+1}$ is
Hence for every $k$, we have
so the distribution at any time can be computed with matrix multiplications.

What’s interesting is that in some (most) cases, the distribution converges to a limit distribution $x^{(\infty)}=\lim_{k\to \infty} x^{(0)}P^k$.

Cramer’s Rule for Solving a System of Simultaneous Linear Equations

Cramer’s rule provides a method for solving a system of linear equations through the use of determinants.

Take the simplest case of two equations in two unknowns
Now form

a_1 & b_1 \\
a_2 & b_2
c_1 & b_1\\
c_2 & b_2
a_1 & c_1\\
a_2 & c_2
where $A$ is the coefficient matrix of the system, $A_x$ is a matrix formed from the original coefficient matrix by replacing the column of coefficients of $x$ in $A$ with the column vector of constants, with $A_y$ formed similarly.

Cramer’s rule states
$$x^* =\frac{|A_x|}{|A|} \qquad y^* =\frac{|A_y|}{|A|} \tag{3}$$
where $|A|$, $|A_x|$, $|A_y|$ are the determinants of their respective matrices, and $x^*$ and $y^*$ are the solutions to ($1$).

Writing these solutions fully in determinant form:
c_1 & b_1\\
c_2 & b_2
a_1 & b_1\\
a_2 & b_2
a_1 & c_1\\
a_2 & c_2
a_1 & b_1\\
a_2 & b_2

Cramer’s rule can be extended to three equations in three unknowns requiring the solution of order $3$ determinants, and so on.

Rather than give a new example, I’d like to present a way to expand on your own example. You mention that
1 & 1 \\ 1 & 0
\end{pmatrix} =
F_2 & F_1 \\ F_1 & F_0
A^n =
F_{n+1} & F_{n} \\ F_{n} & F_{n-1}
Though, as @Henrik mentions, the matrix multiplication simply consists of doing the “usual” addition of Fibonacci numbers.

However, when you see a matrix power, e.g. $A^n$, you think of diagonalisation to efficiently compute this matrix power.

Any decent mathematical software should be able to compute for you that
1 & 1 \\ 1 & 0
\end{pmatrix} = \begin{pmatrix}
-\frac{1}{\varphi} & \varphi \\ 1 & 1
-\frac{1}{\varphi} & 0 \\ 0 & \varphi
-\frac{1}{\sqrt{5}} & \frac{\varphi}{\sqrt{5}} \\ \frac{1}{\sqrt{5}} & \frac{1}{\varphi\sqrt{5}}
\end{pmatrix} = P\Lambda P^{-1},$$
where $\varphi = \frac{1+\sqrt{5}}{2}$ is the golden ratio. Therefore,
$$A^n = P\Lambda^n P^{-1} = \begin{pmatrix}
-\frac{1}{\varphi} & \varphi \\ 1 & 1
\left(-\varphi\right)^{-n} & 0 \\ 0 & \varphi^n
-\frac{1}{\sqrt{5}} & \frac{\varphi}{\sqrt{5}} \\ \frac{1}{\sqrt{5}} & \frac{1}{\varphi\sqrt{5}}
Working out the product of these three matrices is really easy, so we get:
F_{n+1} & F_{n} \\ F_{n} & F_{n-1}
\end{pmatrix} = A^n = \begin{pmatrix}
\frac{\varphi^{n+1} – \left(-\varphi\right)^{-(n+1)}}{\sqrt{5}} & \frac{\varphi^{n} – \left(-\varphi\right)^{-n}}{\sqrt{5}} \\ \frac{\varphi^{n} – \left(-\varphi\right)^{-n}}{\sqrt{5}} & \frac{\varphi^{n-1} – \left(-\varphi\right)^{-(n-1)}}{\sqrt{5}}
In other words, through the use of diagonalisation we have proven that
$$F_n = \frac{\varphi^{n} – \left(-\varphi\right)^{-n}}{\sqrt{5}}.$$