Intro to Vectors and Matrices in Machine Learning

Read original article here

Programming is a great way to get insights about math concepts. You’ll see here tips and tricks to learn math, more...

Programming is a great way to get insights about math concepts. You’ll see here tips and tricks to learn math, more specifically linear algebra, from a coding perspective. You’ll see the relationship between Numpy functions and linear algebra abstract concepts. At the end of this mini-tutorial, you’ll know what vectors and matrices are and why they are the core of machine learning and data science.

If you’re a bit into data science and machine learning, you might hear the word vector all the time. Let’s clarify what they are.

You can distinguish geometric vectors, which are arrows pointing in space, from coordinate vectors, which are list of values stored in arrays. The relationship between the two is that you can take the coordinates of the endpoint of the arrows to get values that depends on the coordinate system.

Mathematically, you can refer to vectors with lowercase bold italic letters, as $boldsymbol{v}$ for instance. Let’s have the following vector $boldsymbol{v}$:

With Numpy, vectors are coordinate vectors: one-dimensional arrays of numerical values. You can create vectors with the function :

The variable  contains a Numpy one-dimensional array, that is, a vector, containing two values. From a geometric point of view, you can consider each of these values as coordinates. Since there are only two values, you can represent the vector in a Cartesian plane.

Let’s use Matplotlib to represent this geometric vector. You can use the function  to draw arrows. The four first parameters are respectively: the starting point of $x$, the starting point of $y$, the ending point of $x$, and the ending point of $y$.

You can also draw axes with  and  (the parameter  allows you to set the axes behind the other elements).

You can see the arrow corresponding to the vector $boldsymbol{v}$. This vector has two components (the values in the array): $1$, that we represented on the $x$-axis and $-1$ represented on the $y$-axis.

It is convenient to take a vector with only two components as an example to represent it geometrically. However, the concepts you’ll learn are applicable for more components.

You can also represent vectors as the ending point of the arrow only. For instance, let’s simulate some data:

You can represent each data sample as a geometric vector:

However, it is easier to represent data samples as points corresponding to the ending of the arrows:

In data science, you can use vectors to store the values corresponding to different features. This allows you to leverage linear algebra tools and concepts on your data.

You saw how to create a vector using the function . Note also that many Numpy functions return arrays. For instance, look at the following chunk of code:

The function  is used to draw random values from a normal distribution. You can see that it returns a Numpy array with the random values.

Let’s consider this array as a geometric vector and plot it:

Let’s now create a vector with more components to illustrate the basics of indexing in Numpy:

You can get only part of the vector using indexing. You can use values or list of values as indexes. For instance:

You can also use a semicolon to get element from an index to another: . For example:

If you omit  or , it will uses respectively the first element and the last element. For instance,  will return the three first elements.

You can index from the last value using a negative sign. For instance,  corresponds to the last value,  to the one before, etc.:

You can also look at the shape of an array with the attribute :

You can see that there are 10 components in the vector $b$. Looking at the shape of your vectors tells you how many components it contains.

Say that you have multiple vectors corresponding to different observations from your dataset. You have one vector per observation with a length corresponding to the number of features. Similarly, you can have one vector per features corresponding containing each observations (you’ll see that transposition allows you to go from one view to another).

Matrices are two-dimensional arrays: they have rows and columns. You can denote a matrix with an uppercase bold italic letter, as $boldsymbol{A}$. For instance, you can have:

The matrix $boldsymbol{A}$ contains three rows and two columns. You can think of it as two column vectors or as three row vectors.

Let’s take an example creating a matrix containing random values:

The matrix $boldsymbol{C}$ has 5 rows and 3 columns. You can look at its shape using again the  attribute:

Unlike with vectors, the shape of matrices is described by two numbers (instead of one): the first tells you the number of rows and the second the number of columns.

Like with vectors, you can get subsets of matrices using indexing. Since there are rows and columns, you need to use two indexes. For instance, remembering that Python uses zero-based indexing, to get the elements in the second row and the third column of the preceding matrix $boldsymbol{C}$, you do:

If you want to get the column 0, you need to take all rows (using ) for this column:

If you want the last rows, you can do the same (all columns using ) and use :

The norm of a vector, denoted with double vertical bars like $lVert boldsymbol{v rVert}$, is a value (a scalar) associated with the vector that satisfy the following rules:

The physical concept of length satisfy these rules, so the length of a vector is a kind of norm. This also means that you can have multiple kinds of norms.

Vector norms are used in machine learning in cost functions for instance: the difference between estimated value and true value for each data sample is stored in a vector, and the norm of this vector tells you how well the estimation is. Another example is in regularization, where you add the norm of a vector containing the parameters of you model to the cost function. This norm tells you how large the parameters are, allowing the algorithm to avoid too large values (and thus, limit overfitting).

You’ll find more details about the mathematical definitions of norms ($L^1$ and $L^2$) and their use in machine learning in my upcoming session at ODSC, “Introduction to Linear Algebra for Data Science and Machine Learning With Python.” You’ll also learn to consider matrices as linear transformations, linear combinations, and how to understand least square approximation using the matrix form of systems of linear equations. I hope you enjoyed this article on vectors and matrices in machine learning.

Images Powered by Shutterstock

The Data Daily

Intro to Vectors and Matrices in Machine Learning