chapter within the in-progress e book on linear algebra. The desk of contents to date:
- Chapter-1: The basics
- Chapter-2: Measure of a map (present)
Keep tuned for future chapters.
Linear algebra is the software of many dimensions. It doesn’t matter what you could be doing, as quickly as you scale to ( n ) dimensions, linear algebra comes into the image.
Within the previous chapter, we described summary linear maps. On this one, we roll up our sleeves and begin to cope with matrices. Sensible concerns like numerical stability, environment friendly algorithms, and many others. will now begin to be explored.
Notice: all photos on this article, until in any other case said are by the writer.
I) The way to quantify a linear map
Determinants are probably the most historical ideas in linear algebra. The roots of the topic lay in fixing methods of linear equations. And determinants would “decide” if there even was an answer price in search of. However in many of the instances, the place the system does have an answer, it offers additional helpful info. Within the fashionable framework of linear maps, determinants present a single quantification of linear maps.
We mentioned within the previous chapter the idea of vector areas (mainly n-dimensional collections of numbers — and extra typically collections of fields) and linear maps that function on two of these vector areas, taking objects in a single to the opposite.
For instance of those sorts of maps, one vector area could possibly be the floor of the planet you’re sitting on and the opposite could possibly be the floor of the desk you could be sitting at. Literal maps of the world are additionally maps on this sense since they “map” each level on the floor of the Earth to a degree on a paper or floor of a desk, though they aren’t linear maps since they don’t protect relative areas (Greenland seems a lot bigger than it’s for instance in among the projections).
As soon as we choose a basis for the vector area (a set of n “unbiased” vectors within the area; there could possibly be infinite selections normally), all linear maps on that vector area get distinctive matrices assigned to them.
In the interim, let’s limit our consideration to maps that take vectors from an 𝑛-dimensional area again to the 𝑛-dimensional area (we’ll generalize later). The matrices corresponding to those linear maps are 𝑛×𝑛 (see part III of chapter 1). It could be helpful to “quantify” such a linear map, specific its impact on the vector area, ℝⁿ in a single quantity. The form of map we’re coping with, successfully takes vectors from ℝⁿ and “distorts” them into another vectors in the identical area. Each the unique vector 𝑣 and the vector 𝑢 that the map transformed it into have some lengths (say |𝑣| and |𝑢|). We are able to take into consideration how a lot the size of the vector is modified by the map, |𝑢|∕|𝑣|. Perhaps that may quantify the influence of the map? How a lot it “stretches” vectors?
This method has a deadly flaw. The ratio relies upon not simply on the linear map, but additionally on the vector 𝑣 it acts on. It’s subsequently not strictly a property of the linear map itself.
What if we take two vectors as an alternative now, 𝑣₁ and 𝑣₂ that are transformed by the linear map into the vectors 𝑢₁ and 𝑢₂. Simply because the measure of the only vector, 𝑣 was its size, the measure of two vectors is the realm of the parallelogram contained between them.

Simply as we thought of the quantity by which the size of 𝑣 modified, we will now speak by way of the quantity by which the realm between 𝑣₁ and 𝑣₂ modifications as soon as they move by way of the linear map and turn into 𝑢₁, 𝑢₂. And alas, this once more relies upon not simply on the linear map, but additionally the vectors chosen.
Subsequent, we will go to a few vectors and take into account the change in quantity of the parallelepiped between them and run into the identical drawback of the preliminary vectors having a say.

However now take into account an n-dimensional area within the unique vector area. This area could have some “n-dimensional measure”. To know this, a two dimensional measure is an space (measured in sq. kilometers). A 3 dimensional measure is the amount used for measuring water (in liters). A 4 dimensional measure has no counterpart within the bodily world we’re used to, however is simply as mathematically sound, a measure of the quantity of 4 dimensional area enclosed inside a parallelepiped shaped of 4 4- d vectors and so forth.

The 𝑛 unique vectors (𝑣₁, 𝑣₂, …, 𝑣ₙ) kind a parallelepiped which is reworked by the linear map into 𝑛 new vectors, 𝑢₁, 𝑢₂, …, 𝑢ₙ which kind their very own parallelepiped. We are able to then ask in regards to the 𝑛-dimensional measure of the brand new area in relation to the unique one. And this ratio, it seems, is certainly a operate solely of the linear map. No matter what the unique area appeared like, the place it was positioned and so forth, the ratio of its measure as soon as the linear map acted on it to its measure earlier than would be the similar — a operate purely of the linear map. This ratio of 𝑛-dimensional measures (after to earlier than) then is what we’ve been in search of: an unique property of the linear map that quantifies its impact in a single quantity.
This ratio by which the measure of any 𝑛-dimensional patch of area is modified by the linear map is an effective technique to quantify the impact it has on the area it acts on. It’s known as the determinant of the linear map (the rationale for that identify will turn into obvious in part V).
For now, we merely said the truth that the quantity by which a linear map from ℝⁿ to ℝⁿ “stretches” any patch of 𝑛-dimensional area relies upon solely on the map with out providing a proof because the objective right here was motivation. We’ll cowl a proof later (part VI), as soon as we arm ourselves with some weapons.
II) Calculating determinants
Now, how do we discover this determinant given a linear map from the vector area ℝⁿ again to ℝⁿ? We are able to take any 𝑛 vectors, discover the measure of the parallelepiped between them and the measure of the brand new parallelepiped as soon as the linear map has acted on all of them. Lastly, divide the latter by the previous.
We have to make these steps extra concrete. First, let’s begin taking part in round on this ℝⁿ vector area.
The ℝⁿ vector area is only a assortment of 𝑛 actual numbers. The only vector is simply 𝑛 zeros — [0, 0, …, 0]. That is the zero vector. If we multiply a scalar with it, we simply get the zero vector again. Not attention-grabbing. For the following easiest vector, we will substitute the primary 0 with a 1. This results in the vector: 𝑒₁ = [1, 0, 0, …, 0]. Now, multiplying by a scalar, 𝑐 offers us a distinct vector.
$$c.[1, 0, 0,.., 0] = [c, 0, 0, …, 0]$$
We are able to “span” an infinite variety of vectors with 𝑒₁ relying on the scalar 𝑐 we select.
If 𝑒₁ is the vector with simply the primary aspect being 1 and the remaining being 0, then what’s 𝑒₂? The second aspect being 1 and the remaining being 0 looks like a logical selection.
$$e_2 = [0,1,0,0,dots 0]$$
Taking this to its logical conclusion, we get a set of n vectors:

These vectors kind a foundation of the vector area that’s ℝⁿ. What does this imply? Any vector 𝑣 in ℝⁿ could be expressed as a linear mixture of those 𝑛 vectors. Which signifies that for some scalars 𝑐₁, 𝑐₂, …, 𝑐ₙ:
$$v = c_1.e_1+c_2.e_2+dots +c_n.e_n$$
All vectors, 𝑣 are “spanned” by the set of vectors 𝑒₁, 𝑒₂, …, 𝑒ₙ.
This specific assortment of vectors isn’t the one foundation. Any set of 𝑛 vectors works. The one caveat is that not one of the 𝑛 vectors ought to be “spanned” by the remaining. In different phrases, all of the 𝑛 vectors ought to be linearly unbiased. If we select 𝑛 random numbers from most steady distributions and repeat the method 𝑛 occasions to create the 𝑛 vectors, you’ll get a set of linearly unbiased vectors with 100% chance (“virtually certainly” in chance phrases). It’s simply very, not possible {that a} random vector occurs to be “spanned” by another 𝑘 < 𝑛 random vectors.
Going again to our recipe in the beginning of this part to search out the determinant of a linear map, we now have a foundation to precise our vectors in. Fixing the idea additionally means our linear map could be expressed as a matrix (see part III of chapter 1). Since this linear map is taking vectors from ℝⁿ again to ℝⁿ, the corresponding matrix is 𝑛 × 𝑛.
Subsequent, we wanted 𝑛 vectors to kind our parallelepiped. Why not take the 𝑒₁, 𝑒₂, …, 𝑒ₙ commonplace foundation we outlined earlier than? The measure of the patch of area contained between these vectors occurs to be 1, by very definition. The image under for ℝ³ will hopefully make this clear.

If we gather these vectors from the usual foundation right into a matrix (rows or columns), we get the identification matrix (1’s on the principle diagonal, 0’s all over the place else):

Once we mentioned we might apply our linear rework to any n-dimensional patch of area, we’d as properly apply it to this “commonplace” patch.
However, it’s straightforward to indicate that multiplying any matrix with the identification matrix ends in the identical matrix. So, the ensuing vectors after the linear map is utilized are the columns of the matrix representing the linear map itself. So, the quantity by which the linear map modified the amount of the “commonplace patch” is similar because the n-dimensional measure of the parallelepiped between the column vectors of the matrix representing the map itself.
To recap, we began by motivating the determinant because the ratio by which a linear map modifications the measure of an n-dimensional patch of area. And now, we confirmed that this ratio itself is an n-dimensional measure. Specifically, the measure contained between the column vectors of any matrix representing the linear map.
III) Motivating the essential properties
We described within the earlier part how a determinant of a linear map ought to merely be the measure contained between the vectors of any of its matrix representations. On this part, we use two dimensional area (the place measures are areas) to inspire some basic properties a determinant should have.
The primary property is multi-linearity. A determinant is a operate that takes a bunch of vectors (collected in a matrix) and maps them to a single scalar. Since we’re proscribing to two-dimensional area, we’ll take into account two vectors, each two dimensional. Our determinant (since we’ve motivated it to be the realm of the parallelogram between the vectors) could be expressed as:
$$det = A(v_1, v_2)$$
How ought to this operate behave if we add a vector to one of many two vectors? The multi-linearity property requires:
$$A(v_1+v_3, v_2) = A(v_1,v_2)+A(v_3,v_2)tag{1}$$
That is obvious from the shifting image under (be aware the brand new space getting added).

And this visualization may also be used to see (by scaling one of many vectors as an alternative of including one other vector to it):
$$A(c.v_1, v_2) = c.A(v_1, v_2) tag{2}$$
This second property has an essential implication. What if we plug a unfavourable c into the equation?
The realm, 𝐴(𝑣₁, 𝑣₂) ought to then be the other signal to 𝐴(𝑐·𝑣₁, 𝑣₂).
Which suggests we have to introduce the notion of unfavourable space and a unfavourable determinant.
This makes lots of sense if we’re okay with the idea of unfavourable lengths. If lengths — measures in 1-D area — could be optimistic or unfavourable, then it stands to purpose that areas — measures in 2-D area — also needs to be allowed to be unfavourable. And so, measures in area of any dimensionality ought to as properly.
Collectively, equations (1) and (2) are the multi-linearity property.
One other essential property that has to do with the signal of the determinant is the alternating property. It requires:
$$A(v_1, v_2) = -A(v_2, v_1)$$
Swapping the order of two vectors negates the signal of the determinant (or measure between them). When you realized in regards to the cross product of 3-D vectors, this property might be very pure. To inspire it, let’s suppose first of the one-dimensional distance between two place vectors, 𝑑(𝑣₁, 𝑣₂). It’s clear that 𝑑(𝑣₁, 𝑣₂) = −𝑑(𝑣₂, 𝑣₁) since once we go from 𝑣₂ to 𝑣₁, we’re touring in the other way to once we go from 𝑣₁ to 𝑣₂. Equally, if the realm spanned between vectors 𝑣₁ and 𝑣₂ is optimistic, then that between 𝑣₂ and 𝑣₁ should be unfavourable. This property holds in 𝑛-dimensional area as properly. If in 𝐴(𝑣₁, 𝑣₂, …, 𝑣ₙ) we swap two of the vectors, it causes the signal to change.
The alternating property additionally implies that if one of many vectors is solely a scalar a number of of the opposite, the determinant should be 0. It is because swapping the 2 vectors ought to negate the determinant:
$$start{align}A(v_1, v_1) = -A(v_1, v_1)
=> 2 A(v_1, v_1) = 0
=> A(v_1, v_1) = 0end{align}$$
We even have by multi-linearity (equation 2):
$$A(v_1, c.v_1) = c A(v_1, v_1) = 0$$
This is smart geometrically since if two vectors are parallel to one another, the realm between them is ( 0 ).
The video [6] covers the geometric motivation of those properties with actually good visualizations and video [4] visualizes the alternating property fairly properly.
IV) Getting algebraic: Deriving the Leibniz formulation
On this part, we transfer away from geometric instinct and method the subject of determinants from an alternate route — that of chilly, algebraic calculations.
See, the multi-linearity and alternating properties which we motivated within the final part with geometry are (remarkably) sufficient to offer us a really particular algebraic formulation for the determinant, known as the Leibniz formulation.
That formulation helps us see properties of the determinant that might be actually, actually onerous to look at from the geometric method or with different algebraic formulation.
The Leibniz formulation can then be decreased to the Laplace enlargement, involving going alongside a row or column and calculating cofactors — which many individuals see in highschool.
Let’s derive the Leibniz formulation. We want a operate that takes the 𝑛 column vectors, 𝛼₁, 𝛼₂, …, 𝛼ₙ of the matrix as enter and converts them right into a scalar, 𝑐.
$$c=f(vec{a_1}, vec{a_2}, dots vec{a_n})$$
We are able to specific every column vector by way of the usual foundation of the area.

Now, we will apply the property of multi-linearity. For now, to the primary column, 𝛼₁.

We are able to do the identical for the second column. Let’s take simply the primary time period from the summation above and check out the ensuing phrases.

Notice that within the first time period, we get the vector 𝑒₁ showing twice. And by the alternating property, the operate 𝑓 for that time period turns into 0.
To ensure that two 𝑒₁’s to look, the second indices of the 2 𝑎’s within the product should every turn into 1.
So, as soon as we do that for all of the columns, the phrases that received’t turn into zero by the alternating property would be the ones the place the second indices of the 𝑎’s don’t have any repetition — so all distinct numbers from 1 to 𝑛. In different phrases, we’re in search of permutations of 1 to 𝑛 to look within the second indices of the 𝑎’s.
What in regards to the first indices of the 𝑎’s? These are merely the numbers 1 to 𝑛 so as since we pull out the 𝑎₁ₓ’s first, then the 𝑎₂ₓ’s, and so forth. In additional compact algebraic notation,

Within the expression on the precise, the areas 𝑓(𝑒_{𝑗₁}, 𝑒_{𝑗₂}, …, 𝑒_{𝑗ₙ}) can both be +1, −1, or 0 because the 𝑒ⱼ’s are all unit vectors orthogonal to one another. We already established that any time period that has any repeated 𝑒ⱼ’s will turn into 0, leaving us with simply permutations (no repetition). Amongst these permutations, we are going to generally get +1 and generally −1.
The idea of permutations carries with it signs. The indicators of the areas are equal to the indicators of the permutations. If we denote by 𝑆ₙ the set of all permutations of [1, 2, …, 𝑛], then we get the Leibniz formulation of the determinant:
$$det([vec{a_1}, vec{a_2}, dots vec{a_n}]) = |A| = sumlimits_{sigma in S_n} sgn(sigma) prod limits_{i=1}^n a_{i,sigma(i)} tag{3}$$
This formulation can also be described intimately in mathexchange post, [3]. And to make issues concrete, right here is a few easy Python code that implements it (together with a take a look at case).
One shouldn’t truly use this formulation to calculate the determinant of a matrix (until it’s only for enjoyable or exposition). It really works, however is comically inefficient given the sum over all permutations (which is 𝑛!, which is super-exponential).
Nevertheless, many theoretical properties of the determinant turn into trivial to see with the Leibniz formulation once they can be very onerous to decipher or show if we began from one other of its types. For instance:
- Proposition-1: With this formulation it turns into obvious {that a} matrix and its transpose have the identical determinant: |𝐴| = |𝐴ᵀ|. It’s a easy consequence of the symmetry of the formulation.
- Proposition-2: A really related derivation to the above can be utilized to indicate that for 2 matrices 𝐴 and 𝐵, |𝐴𝐵| = |𝐴| ⋅ |𝐵|. See this answer within the mathexchange post, [8]. This can be a very handy property since matrix multiplication comes up on a regular basis in varied decompositions of matrices, and reasoning in regards to the determinants of these decompositions could be a highly effective software.
- Proposition-3: With the Leibniz formulation, we will simply see that if the matrix is higher triangular or decrease triangular (decrease triangular means each aspect of the matrix above the diagonal is zero), the determinant is solely the product of the entries on the diagonal. It is because all permutations bar one: (𝑎₁₁ ⋅ 𝑎₂₂ ⋯ 𝑎ₙₙ) (the principle diagonal) get some zero time period or the opposite and make their phrases within the summation 0.

The third truth truly results in probably the most environment friendly algorithm for calculating a determinant that almost all linear algebra libraries use. A matrix could be decomposed effectively into decrease and higher triangular matrices (known as the LU decomposition which we’ll cowl within the subsequent chapter). After doing this decomposition, the third truth is used to multiply the diagonals of these decrease and higher matrices to get their determinants. And at last, the second truth is used to multiply these two determinants and get the determinant of the unique matrix.
Lots of people in highschool or college when first uncovered to the determinant, study in regards to the Laplace enlargement, which includes increasing a couple of row or column, discovering co-factors for every aspect and summing. That may be derived from the above Leibniz enlargement by accumulating related phrases. See this answer to the mathexchange post, [2].
V) Historic motivation
The determinant was first found within the context of linear methods of equations. Say we now have 𝑛 equations in 𝑛 variables (𝑥₀, 𝑥₁, …, 𝑥ₙ).

This method could be expressed in matrix kind:

And extra compactly:
$$A.x = b$$
An essential query is whether or not or not the system above has a novel answer, x. And the determinant is a operate that “determines” this. There’s a distinctive answer if and provided that the determinant of A is non-zero.
This traditionally impressed method motivates the determinant as a polynomial that arises once we attempt to clear up a linear system of equations related to the linear map. We are going to cowl this in additional depth in chapter 5.
For extra on this, see the wonderful reply within the mathexchange post, [8].
VI) Proof of the property we motivated with
We began this chapter by motivating the determinant as the quantity by which the ℝⁿ → ℝⁿ linear map modifications the measure of an n-dimensional patch of area. We additionally mentioned that this doesn’t work for 1, 2, … n − 1 dimensional measures. Beneath is a proof of this the place we use among the properties we encountered in the remainder of the sections.
Outline (𝑉, 𝑈) as 𝑛 × 𝑘 matrices, the place
$$ V = (v_1, v_2, dots, v_k) $$
By definition,
$$|v_1, v_2, dots, v_k| = sqrt{det(V^t V)} $$ and
$$ |u_1, u_2, dots, u_k| = sqrt{det(U^t U)} = sqrt{det((AV)^t (AV))} = sqrt{det(V^t A^t A V)} $$
Solely when n = okay is V is a sq. matrix, so
$$|v_1, v_2, dots, v_k| = sqrt{det(V^t A^t A V)}$$
$$= sqrt{det(V^t) det(A^t) det(A) det(V)} $$
$$= det(A) sqrt{det(V^t V)} = det(A) |v_1, v_2, dots, v_k| $$
References
[1] Mathexchange put up: Determinant of a linear map doesn’t rely on the bases: https://math.stackexchange.com/questions/962382/determinant-of-linear-transformation
[2] Mathexchange put up: Determinant of a matrix Laplace enlargement (highschool formulation) https://math.stackexchange.com/a/4225580/155881
[3] Mathexchange put up: Understanding Leibniz formulation for determinants https://math.stackexchange.com/questions/319321/understanding-the-leibniz-formula-for-determinants#:~:text=The%20formula%20says%20that%20det,permutation%20get%20a%20minus%20sign.&text=where%20the%20minus%20signs%20correspond%20to%20the%20odd%20permutations%20from%20above.
[4] Youtube video: 3B1B on determinants https://www.youtube.com/watch?v=Ip3X9LOh2dk&t=295s
[5] Connecting Leibniz formulation with geometry https://math.stackexchange.com/questions/593222/leibniz-formula-and-determinants
[6] Youtube video: Leibniz formulation is space: https://www.youtube.com/watch?v=9IswLDsEWFk
[7] Mathexchange put up: product of determinants is determinant of product https://math.stackexchange.com/questions/60284/how-to-show-that-detab-deta-detb
[8] Historic context for motivating determinant: https://math.stackexchange.com/a/4782557/155881