of the in-progress e-book on Linear Algebra, “A birds eye view of linear algebra”. This e-book will put a particular emphasis on AI functions and the way they leverage linear algebra.
Linear algebra is a basic self-discipline underlying something one can do with Math. From Physics to machine studying, chance idea (ex: Markov chains), you identify it. It doesn’t matter what you’re doing, linear algebra is at all times lurking beneath the covers, able to spring at you as quickly as issues go multi-dimensional. In my expertise (and I’ve heard this from others), this was on the supply of an enormous shock between highschool and college. In highschool (India), I used to be uncovered to some very fundamental linear algebra (primarily determinants and matrix multiplication). Then in college degree engineering training, each topic unexpectedly appears to be assuming proficiency in ideas like Eigen values, Jacobians, and so on. such as you had been imagined to be born with the data.
This chapter is supposed to supply a excessive degree overview of the ideas and their apparent functions that exist and are necessary to know on this self-discipline.
The AI revolution
Virtually any data may be embedded in a vector area. Photos, video, language, speech, biometric data and no matter else you’ll be able to think about. And all of the functions of machine studying and synthetic intelligence (just like the latest chat-bots, textual content to picture, and so on.) work on prime of those vector embeddings. Since linear algebra is the science of coping with excessive dimensional vector areas, it’s an indispensable constructing block.
Plenty of the strategies contain taking some enter vectors from one area and mapping them to different vectors from another area.
However why the concentrate on “linear” when most attention-grabbing features are non-linear? It’s as a result of the issue of constructing our fashions excessive dimensional and that of constructing them non-linear (normal sufficient to seize every kind of complicated relationships) develop into orthogonal to one another. Many neural community architectures work through the use of linear layers with easy one dimensional non-linearities in between them. And there may be a theorem that claims this sort of structure can mannequin any operate.
Because the manner we manipulate excessive dimensional vectors is primarily matrix multiplication, it isn’t a stretch to say it’s the bedrock of the fashionable AI revolution.
I) Vector areas
As talked about within the earlier part, linear algebra inevitably crops up when issues go multi-dimensional. We begin off with a scalar, which is simply a lot of some type. For this text, we’ll be contemplating actual and sophisticated numbers for these scalars. Basically, a scalar may be any object the place the essential operations of addition, subtraction, multiplication and division are outlined (abstracted as a “area”). Now, we would like a framework to explain collections of such numbers (add dimensions). These collections are referred to as “vector areas”. We’ll be contemplating the instances the place the weather of the vector area are both actual or complicated numbers (the previous being a particular case of the latter). The ensuing vector areas are referred to as “actual vector areas” and “complicated vector areas” respectively.
The concepts in linear algebra are relevant to those “vector areas”. The most typical instance is your flooring, desk or the pc display screen you’re studying this on. These are all two-dimensional vector areas since each level in your desk may be specified by two numbers (the x and y coordinates as proven under). This area is denoted by R² since two actual numbers specify it.
We are able to generalize R² in several methods. First, we are able to add dimensions. The area we dwell in is 3 dimensional (R³). Or, we are able to curve it. The floor of a sphere just like the Earth for instance (denoted S²), continues to be two dimensional, however in contrast to R² (which is flat), it’s curved. Thus far, these areas have all mainly been arrays of numbers. However the thought of a vector area is extra normal. It’s a assortment of objects the place the next concepts needs to be nicely outlined:
- Addition of any two of the objects.
- Multiplication of the objects by a scalar (an actual quantity).
Not solely that, however the objects needs to be “closed” beneath these operations. Because of this for those who apply these two operations to the objects of the vector area, you need to get objects of the identical kind (you shouldn’t depart the vector area). For instance, the set of integers isn’t a vector area as a result of multiplication by a scalar (actual quantity) can provide us one thing that isn’t an integer (3*2.5 = 7.5 which isn’t an integer).
One of many methods to precise the objects of a vector area is with vectors. Vectors require an arbitrary “foundation”. An instance of a foundation is the compass system with instructions — North, South, East and West. Any course (like “SouthWest”) may be expressed by way of these. These are “course vectors” however we are able to even have “place vectors” the place we’d like an origin and a coordinate system intersecting at that origin. The latitude and longitude system for referencing each place on the floor of the Earth is an instance. The latitude and longitude pair are one option to establish your own home. However there are infinite different methods. One other tradition may draw the latitude and longitude traces at a barely completely different angle to what the usual is. And so, they’ll provide you with completely different numbers for your own home. However that doesn’t change the bodily location of the home itself. The home exists as an object within the vector area and these other ways to precise that location are referred to as “bases”. Selecting one foundation means that you can assign a pair of numbers to the home and selecting one other one means that you can assign a special set of numbers which can be equally legitimate.

Vector areas may also be infinite dimensional. As an example, in miniature 12 of [2], the complete set of actual numbers is regarded as an infinite dimensional vector area.
II) Linear maps
Now that we all know what a vector area is, let’s take it to the following degree and speak about two vector areas. Since vector areas are merely collections of objects, we are able to consider a mapping that takes an object from one of many areas and maps it to an object from the opposite. An instance of that is latest AI applications like Midjourney the place you enter a textual content immediate and so they return a picture matching it. The textual content you enter is first transformed to a vector. Then, that vector is transformed to a different vector within the picture area by way of such a “mapping”.
Let V and W be vector areas (both each actual or complicated vector areas). A operate f: V ->W is claimed to be a ‘linear map’ if for any two vectors u, v 𝞮 V and any scalar c (an actual variety of complicated quantity relying on climate we’re working with actual or complicated vector areas) the next two circumstances are glad:
$$f(u+v) = f(u) + f(v) tag{1}$$
$$f(c.v) = c.f(v)tag{2}$$
Combining the above two properties, we are able to get the next consequence a few linear mixture of n vectors.
$$f(c_1.u_1+ c_2.u_2+ … c_n.u_n) = c_1.f(u_1)+c_2.f(u_2)+…+c_n.f(u_n)$$
And now we are able to see the place the identify “linear map” comes from. If we go to the linear map, f, a linear combination of n vectors (LHS of equation above), that is equal to making use of the identical linear map to the features (f) of the person vectors. We are able to apply the linear map first after which the linear mixture or the linear mixture first after which the linear map. The 2 are equal.
In highschool, we study linear equations. In two dimensional area, such an equation is represented by f(x)=m.x+c. Right here, m and c are the parameters of the equation. Be aware that this operate isn’t a linear map. Though it satisfies equation (1) above, it fails to fulfill equation (2). If we set f(x)=m.x as an alternative, then it is a linear map because it satisfies each equations.

III) Matrices
In part I, we launched the idea of foundation for a vector area. Given a foundation for the primary vector area (V) and the dimensionality of the second (U), each linear map may be expressed as a matrix (for particulars, see here). A matrix is only a assortment of vectors. These vectors may be organized in columns, giving us a 2-d grid of numbers as proven under.

Matrices are the objects individuals first consider within the context of linear algebra. And for good motive. More often than not spent practising linear algebra is coping with matrices. However it is very important do not forget that there (typically) are an infinite variety of matrices that may signify a linear map, relying on the premise we select for the primary area, V. The linear map is therefore a extra normal idea than the matrix one occurs to be utilizing to signify it.
How do matrices assist us carry out the linear map they signify (from one vector to the opposite)? By the matrix getting multiplied with the primary vector. The result’s the second vector and the mapping is full (from first to second).
Intimately, we take the dot product (sum product) of the primary vector, v_1 with the primary row of the matrix and this yields the primary entry of the ensuing vector, v_2 after which the dot product of v_1 with the second row of the matrix to get the second entry of v_2 and so forth. This course of is demonstrated under for a matrix with 2 rows and three columns. The primary vector, v_1 is three dimensional and the second vector, v_2 is 2 dimensional.

Be aware that the underlying linear map behind a matrix with this dimensionality (2x3) will at all times take a 3 dimensional vector, v_1 and map it to a two dimensional area, v_2.

Basically an (nxm) matrix will map an m dimensional vector to an n dimensional one.
III-A) Properties of matrices
Let’s cowl some properties of matrices that’ll permit us to establish properties of the linear maps they signify.
Rank
An necessary property of matrices and their corresponding linear maps is the rank. We are able to speak about this by way of a group of vectors, since that’s all a matrix is. Say we now have a vector, v1=[1,0,0]. The primary factor of the vector is the coordinate alongside the x-axis, the second is that alongside the y-axis and the third one the z-axis. These three axes are a foundation (there are a lot of) of the three-d area, R³, that means that any vector on this area may be expressed as a linear mixture of these three vectors.

We are able to multiply this vector by a scalar, s. This provides us s.[1,0,0] = [s,0,0]. As we fluctuate the worth of s, we are able to get any level alongside the x-axis. However that’s about it. Say we add one other vector to our assortment, v2=[3.5,0,0]. Now, what are the vectors we are able to make with linear combos of these two vectors? We get to multiply the primary one with any scalar, s_1 and the second with any scalar, s_2. This provides us:
$$s_1.[1,0,0] + s_2[3.5,0,0] = [s_1+3.5 s_2, 0,0] = [s’,0,0]$$
Right here, s’ is simply one other scalar. So, we are able to nonetheless attain factors solely on the x-axis, even with linear combos of each these vectors. The second vector didn’t “increase our attain” in any respect. The variety of factors we are able to attain with linear combos of the 2 is strictly the identical because the quantity we are able to attain with the primary. So although we now have two vectors, the rank of this assortment of vectors is 1 because the area they span is one dimensional. If however, the second vector had been v2=[0,1,0] then you may get any level on the x-y airplane with these two vectors. So, the area spanned can be two dimensional and the rank of this assortment can be 2. If the second vector had been v2=[2.1,1.5,0.8], we may nonetheless span a two dimensional area with v1 and v2 (although that area can be completely different from the x-y airplane now, it could be another 2-d airplane). And the 2 vectors would nonetheless have a rank of 2. If the rank of a group of vectors is similar because the variety of vectors (that means they will collectively span an area of dimensionality as excessive because the variety of vectors), then they’re referred to as “linearly impartial”.
If the vectors that make up the matrix can span an m dimensional area, then the rank of the matrix is m. However a matrix may be considered a group of vectors in two methods. Because it’s a easy two dimensional grid of numbers, we are able to both take into account all of the columns because the group of vectors or take into account all of the rows because the group as proven under. Right here, we now have a (3x4) matrix (three rows and 4 columns). It may be considered both as a group of 4 column vectors (every three-d) or 3 row vectors (every 4 dimensional).

Full row rank means all row the row vectors are linearly impartial. Full column rank means all column vectors are linearly impartial.
When the matrix is a sq. matrix, it seems that the row rank and column rank will at all times be the identical. This isn’t apparent in any respect and a proof is given within the mathexchange submit, [3]. Because of this for a sq. matrix, we are able to speak simply by way of the rank and don’t should hassle specifying “row rank” or “column rank”.
The linear transformation equivalent to a (3 x 3) matrix that has a rank of two will map all the things within the 3-D area to a decrease, 2-d area very similar to the (3 x 2) matrix we encountered within the final part.

Notions carefully associated to the rank of sq. matrices are the determinant and invertibility.
Determinants
The determinant of a sq. matrix is its “measure” in a way. Let me clarify by going again to pondering of a matrix as a group of vectors. Let’s begin with only one vector. The best way to “measure” it’s apparent — its size. And since we’re dealing solely with sq. matrices, the one option to have one vector is to have it’s one dimensional. Which is mainly only a scalar. Issues get attention-grabbing after we go from one dimension to 2. Now, we’re in two dimensional area. So, the notion of “measure” is now not size, however has graduated to areas. And with two vectors in that two dimensional area, it’s the space of the parallelogram they kind. If the 2 vectors are parallel to one another (ex: each lie on x-axis). In different phrases, they don’t seem to be linearly impartial, then the realm of the parallelogram between them will turn into zero. The determinant of the matrix shaped by them can be zero and so will the rank of that matrix be zero.

Taking it one dimension increased, we get 3 dimensional area. And to assemble a sq. matrix (3x3), we now want three vectors. And because the notion of “measure” in three dimensional area is quantity, the determinant of a (3x3) matrix turns into the quantity contained between the vectors that make it up.

And this may be prolonged to area of any dimensionality.
Discover that we spoke in regards to the space or the quantity contained between the vectors. We didn’t specify if these had been the vectors composing the rows of the sq. matrix or those composing its columns. And the considerably stunning factor is that we don’t must specify this as a result of it doesn’t matter both manner. Climate we take the vectors forming the rows and measure the quantity between them or the vectors forming the columns, we get the identical reply. That is confirmed within the mathexchange submit [4].
There are a number of different properties of linear maps and corresponding matrices that are invaluable in understanding them and extracting worth out of them. We’ll be delving into invertability, eigen values, diagonalizability and completely different transformations one can do within the coming articles (test again right here for hyperlinks).
In case you favored this story, purchase me a espresso 🙂 https://www.buymeacoffee.com/w045tn0iqw
References
[1] Linear map: https://en.wikipedia.org/wiki/Linear_map
[2] Matousek’s miniatures: https://kam.mff.cuni.cz/~matousek/stml-53-matousek-1.pdf
[3] Mathexchange submit proving row rank and column rank are the identical: https://math.stackexchange.com/questions/332908/looking-for-an-intuitive-explanation-why-the-row-rank-is-equal-to-the-column-ran
[4] Mathexchange submit proving the determinants of a matrix and its transpose are the identical: https://math.stackexchange.com/a/636198/155881