# How is $L_{2}$ Minkowski norm different from $L^{2}$ norm?

I am reading the book Multidimensional Particle Swarm Optimization for Machine Learning and Pattern Recognition.
They use $L_{2}$ Minkowski norm (Euclidean) as the distance metric in the feature space for Long-Term ECG Classification.

I am myself using just $L^{2}$ seminorm.
I did not find reason why they use Minkowski norm.
Little info here what is Minkowski space.

The book Riemann-Finsley Geometry 2005 says that

The book The Geometry and Spacetime – An Introduction to Special and General Relavity 2000 says that

The Minkowski geometry of spacetime as the invariant theory of Lorentz
transformations, making constant comparisons with the familiar
Euclidean geometry of ordinary space as the invariant theory of
rotations.

The Minkowski space has been used in the inverse problem for nonlinear hyperbolic equations.

What are the advantages of Minkowski norm to $L^{2}$ seminorm when considering ECG classification?

#### Solutions Collecting From Web of "How is $L_{2}$ Minkowski norm different from $L^{2}$ norm?"

The authors of *Multidimensional Particle Swarm” use the ordinary Euclidean metric. They just give it a strange name “$L^2$ Minkowski norm”, in my opinion unnecessarily. But it may be common within the subject area.

The book Riemann-Finsler Geometry gives the definition of a related concept, but it’s not the definition the authors of the first book use. Notice that this is a book from a rather different area of mathematics.

The book Geometry and Spacetime is talking about yet different concept.

Minkowskian distance between two vectors $\mathbf{x}_i$ and $\mathbf{x}_j$ forms the general equation for $L^m$, which is defined as

d(\mathbf{x}_i,\mathbf{x}_j)=\biggl( \sum_{k=1}^p (x_{ik}-x_{jk})^m \biggr)^{1/m},

where $p$ is the number of features.
The Euclidean and Manhattan distance are simply special cases of the $L^1$ and $L^2$ distance, respectively.

Quite basically, you can use almost any distance metric you want for the fitness function. For example, the dot product: $\mathbf{x}_i^T\mathbf{x}_j$, polynomial kernels such as $K(\mathbf{x}_i,\mathbf{x}_j)=(1+\mathbf{x}_i^T\mathbf{x}_j)^d$, or radial basis kernels:
$K(\mathbf{x}_i,\mathbf{x}_j)=\exp[-d(\mathbf{x}_i,\mathbf{x}_j)]$.

Minkowski is merely a general family of distances, and when working with metaheuristics such as PSO it is better to evaluate different distance metrics for their informativeness in class prediction (classification).