Mathematics Behind AI & Machine Learning

Core concepts for ML engineers.

Jan 9 · 8 min read

Photo from movie A Beautiful Mind (2001)

Let’s face reality, mathematics is far from being enjoyable. To learn it, we often lack time, and most importantly, motivation. Why do we need all these symbols and a bunch of figures? What’s the sense? It turns out, a lot of sense. Especially if you have something to do with machine learning.

The point here is not to acquire knowledge, but to be able to use it. For example, you might be interested to find out how the model works by looking in a black box with hyperparameters instead of using a random selection of hyperparameters. Approaching this task without mathematical knowledge you are like a baby with the hammer.

So, to deal with everything, it is necessary to pass the difficult way of researching warrior and ask many questions like — why? where? and for what?

But if you have the courage to exacerbate all the turmoil, in the end, you will have a pleasant feeling! The feeling of a winner who has every chance of becoming a skilled machine learning specialist. Intriguing? Here’s an intuitive and beginner-friendly guide to help you to do this.

Let’s talk science!

Essential Math for Machine Learning

Mathematics is quite daunting and requires bogging down into the theory. But for machine learning, math is not about crunching numbers, it is about what is happening, why it’s happening, and how we can play around with different things to obtain the results we want. Before going any further into complex terms, I suggest taking count on intuition rather than memorization of formulas.

Here is a brief overview of our learning journey:

Statistics

Multivariate Calculus

Linear Algebra

For each of these components, I have shared the main concepts and useful material for you to study.

Statistics

source: reddit

It is difficult to overestimate the importance of statistics for Machine Learning at any level. All classic machine learning is based on statistical learning, standard A / B tests are based on it.

Statistics is a collection of tools that you can use to get answers to important questions about data. We need it to help transform observations into information and to answer questions about samples of observations.

Concepts to know: descriptive statistics, distribution, hypothesis testing, and regression. Apart from this, I suggest take special attention to Bayesian probability theory

Conditional probability, prior probability, posterior probability, maximum likelihood estimation. That’s all.

Books & Courses:

All of Statistics: A Concise Course in Statistical Inference — Larry Wasserman. The book gives all the basic provisions of probability theory and statistics. It is divided into three parts; they are: Probability, Statistical Inference, Statistical Models and Methods. The book does have a reference or encyclopedia feeling. As such, there are a lot of chapters, but each chapter is reasonably standalone.

Introduction to Statistical Machine Learning — Gareth James. This is another excellent book (with free PDF version), the example is the use of R language.

Statistics Fundamentals Succinctly — Katie Kormanik. The first sections provide basic definitions with illustrations and comments, the last reveals the significance of T- and Z-tests. The materials are presented in an accessible language, with the minimum necessary mathematical apparatus. This guide is an excellent introduction to statistics from a practical point of view.

Data Analysis & Statistics Courses — Beginners course on Statistics. Covers all elementary concepts. These courses cover topics such as machine learning, business analytics, probability, randomization, quantitative methods and much more.

Probability

source: giphy

After going through some of the basic concepts on statistics, you will probably realize Probability and statistics are two closely related mathematical subjects, but what’s the difference? Probability deals with predicting the likelihood of future events, while statistics involves the analysis of the frequency of past events.

Well, Probability is undeniably a pillar of the field of machine learning. Why so? Machine learning is about developing predictive models from uncertain data. Uncertainty means working with imperfect or incomplete information. There are three main sources of uncertainty in machine learning, they are: noisy data, incomplete coverage of the problem domain and imperfect models. And yes, we can manage uncertainty using the tools of probability.

Concepts to know: joint, marginal, and conditional probability; distributions, maximum likelihood, entropy, density estimation, Bayesian probability, classification.

Books & Courses:

Probability for Machine Learning — Discover How To Harness Uncertainty With Python — Jason Brownlee. As the author promise, this book will help to cut through the equations, Greek letters, and confusion, and discover the topics in probability that you actually need to know. Well, I couldn’t agree more, this is one of the best books to discover probability.

Probability for Machine Learning — Nice article with tips to solve questions based on probability.

Stanford’s Probability and Statistics — one of the best courses for gaining insight about Probability and Statistics in order to develop a strong foundation for Machine Learning.

Multivariate Calculus

source: giphy

Our next stop is Multivariate Calculus. Filling in the gaps in statistics, it’s time to start studying Multivariate Calculus. Although at first glance, it seems to be needed exclusively for universities, without calculus you can’t deal with backpropagation or to master a deep learning course in a qualitative way. All this stuff is more important than you think.

In a few words, calculus is a set of tools for analyzing the relationship between functions and their inputs. In Multivariate Calculus, we can take a function with multiple inputs and determine the influence of each of them separately.

Concepts to know: differential and Integral calculus, partial derivatives, vector-values functions, directional gradient, hessian, jacobian, laplacian, and lagragian distribution.

Courses:

Calculus — edX — a course from the Massachusetts Institute of Technology, consisting of 3 parts:

Calculus 1A: Differentiation — a course on finding a derivative, its geometric interpretation and physical meaning.

Calculus 1B: Integration — a course on finding the integral, its connection with the derivative and application in engineering design, scientific analysis, probability theory, and statistics.

Calculus 1C: Coordinate Systems & Infinite Series — a course on calculating curves, coordinate systems, approximating functions to polynomials and infinite series. All this is necessary to build mathematical models of the real world.

Multivariable calculus from Khan Academy — it is only good for scratching the surface. It is great when supplemented with other resources.

Linear Algebra

Last but not least. Linear Algebra — Daily bread and butter for Machine Learning. Here it is better to play most of your attention. Why? Without this, you can’t develop machine learning methods, simulate the behavior of various objects, or optimize the process of clustering and reducing the dimension of the data description. In other words, you can’t live without linear algebra.

After all, machine learning programs are actually linear functions you use to add data to in order to get some predictable results. If your task is to determine the relationship between the two phenomena, you use linear regression. When certain data needs to be divided into classes, logistic regression helps you. Principal Component Analysis methods and Support-Vector Machine methods, regularization functions, gradient descent — all these tools help ML engineers in their daily work.

Concepts to know: The concept of a vector and vector space; the concept of a linear operator; communication of operators and matrices; matrix decompositions (LU, SVD at least); eigenvectors and eigenvalues; orthogonal, unitary operators; symmetric and hermitian operators; quadratic forms, reduction to the main axes.

Books & Courses:

Linear Algebra for Machine Learning By AppliedAICourse — this course is one of the best way to pick up linear algebra for machine learning. It helps to go from a real-world business problem to a first cut, working and deployable AI solution to the problem.

Linear Algebra By MIT OpenCourseware — This course covers matrix theory and linear algebra, emphasizing topics useful in other disciplines such as physics, economics and social sciences, natural sciences, and engineering.

Matrix Computations, Gene Golub and Charles Van Loan — Great resource for understanding the numerical problems associated with implementations of matrix algorithms. You can use it regularly for writing source code for solving problems that require the concepts discussed in the book.

My tips to learn math faster

Recognize and accept the fact that it is impossible to become a good mathematician with the flip of a switch. If you meet a person who solves mathematical problems much better than you, do not blame yourself for the lack of ability to the subject or in the absence of knowledge.

Exercise daily. Exercise whenever and wherever possible. Remove from your habits aimless viewing of social networks, TV, video games, etc. Turn on your willpower!

Do not exercise for too long. Take breaks. Otherwise, you are risking to face burnout and demotivation. It is sometimes useful to change activities for a day or two to rest, but not too often.

Be sure to have two notebooks: one for theory, the other for practice. Number each sheet. Make a table of contents on the back sheet of the theory notebook (topic — page). This will come in handy for you in the future.

If you get the wrong answer in the problem, solve it again. No need to come up with excuses and put off a second decision. In such situations, it is important not only to find the right answer but also to understand why the last time you solved the problem incorrectly.

Final Words

Can I do machine learning without knowing the math? You can — use ready-made libraries, find the necessary algorithms for each task on the Internet, apply them to data. But a true professional understands the principles behind these tools and therefore works with them more effectively.

In general, in order to have a sufficient mathematical background, you don’t have to get stuck on every concept for long. It is much more efficient to study only what is limited by the scope of the current task, and not everything in a row.

I will say more, a person’s memory is so arranged that everything that is not needed is forgotten. To teach what “may be needed” means a futile waste of time. Even those who need it will have to learn again. Remember that.

………………………

If you do anything cool with this information, leave a response in the comments below or reach out at any time on my Instagram and Medium blog.