The mathematical models behind Machine Learning
Facebook
Twitter
LinkedIn

The mathematical models behind Machine Learning

The solutions we arrive at can only be as smart as the foundations on which we build them.

Since the beginning of time, the sciences have accompanied human beings, who, driven by the logic and reason that characterize them, have achieved significant advances in this broad and complex field. Without going so far, we can evidence human creativity and ingenuity in a fact as simple (apparently) as building a round object and putting it to roll, such invention combines sciences such as physics and mathematics.

As humanity has progressed, sciences have developed in such a way that they have become the evolutionary engine of man in society, and a great reflection of this are all the technological advances that are already part of everyday life.

En años recientes con el auge de la ARTIFICIAL INTELLIGENCE, hemos visto como las máquinas han sido capaces de aprender y analizar información, el desarrollo de estas capacidades es conocido en las ciencias de la computación como Machine Learning – ML. Así pues, el objetivo de este artículo es explicar cómo en este momento de la humanidad, las matemáticas continúan jugando un papel protagónico (aunque se tenga poca conciencia de ello), en la implementación de procesos asociados a Machine Learning.

What are the mathematics that make up machine learning?

Generally, it is thought that in order to build machine learning models it is enough to use those that are in libraries, with predetermined programming languages and that only require the action of introducing the necessary information.

This is totally incorrect, since, first, you must know what information is relevant for the correct operation of a model, because as a well-known saying in this field goes “garbage in, garbage out”, and second and more important, not knowing the basics of these tools generally ends up in alternatives that in the end will not solve any problem. 

For this reason, I will now develop the fundamentals to which I referred above:

Algebra lineal

Vectors, linear equations, matrices, all these concepts are key within Machine Learning, and must be known, because from the beginning we find solutions where cyclic operations such as a For or While are replaced by operations between matrices, in order to seek efficiency in performing these calculations, if we move forward we find representations of objects in vector spaces, complex issues such as principal component analysis (PCA), where through matrix operations and projection of vectors, we can reduce the characteristics within an analysis.

How does it apply to the real world?

To clarify the above, let’s take the field of personality psychology as an example; we all have different personalities and these have different nuances, someone can be very similar to another person in some aspects, but completely different in others. Is there any way to quantify this?

For about 100 years, attempts have been made to answer this question, with results that, although they do not cover the whole magnitude of this unknown, have been able to partially represent the personality of a human being.

Now it should be clarified that such interpretations have always been made through qualitative criteria, but is it possible to represent someone’s personality in a quantitative way? To do so, we will take as our main reference the lexical hypothesis whose main idea is based on the fact that any personality trait of an individual must always correspond to a word in the language, for example, a person is brave, sensitive or shy.

At the beginning, psychologists dedicated to the study of the subject found around 4500 words describing the different human traits, later professionals in the same field were given the task of grouping these words until the number was reduced to 500. But what happens if we experimentally take these 500 words and apply PCA to them?

What we obtain is the reduction of these 500 characteristics to 5 traits with which we can classify the personality of an individual (extroversion, responsibility, neocriticism, cordiality and openness to experience), these traits are known in psychology as the Big 5 model. A sample of why linear algebra will never go out of fashion.

Differential, integral and multivariate calculus

Implicit in the other mathematical topics necessary to understand about machine learning, it is quite useful to know about integrals and partial derivatives when reviewing optimization functions, or to find Hessian matrices, with which we can find the convexity of a function, this feature is very important because it helps us to choose or discard the function, as well as to find its minimum points, which translates to the most optimal answer that we can find in this procedure. This is also known as a hyper parameter, a variable within a function that as it is adjusted makes the model we are generating get better or worse.

It must be taken into account that when we talk about training models, we are talking about analyzing gigantic amounts of information, an action that involves time and high processing costs, so answering questions such as: What will be that magic number that we must place in our hyper parameter, which will make our model work perfectly, through instinct and experience is not enough. We must know the theory of how the model we are applying works in order to be certain that what we are doing makes sense, and there is nothing better than calculus to support this type of decision.

Statistics and probability

Most of the models generated through machine learning are or involve statistical and probabilistic elements, so knowledge of probability theory, combinatorics, set theory, Bayes’ law, among the most relevant, serve as allies to face the various problems that arise.

An example of the above is seen in decision trees, a very popular technique within ML whose roots are in the conditional probability theorem. In the same way we find support vector machines, which in colloquial words can be defined as classifier algorithms, whose answers are based on the probability that a point belongs to one or another class, within a vector space. 

Have you ever wondered how weather forecasting or demographic predictions work? The answer is time series modeling, which is nothing more than the application of various statistical methods such as trend estimation, seasonality calculation, correlation analysis, among the most prominent, to a set of data that have been measured at different time steps and are ordered chronologically. Thus we can undoubtedly refer to statistics and probability as the core of ML.

Optimization

The mathematical concept for optimization is based on selection criteria and a set of elements, the challenge is to find the element that best fits these criteria, always taking into account the efficiency of the process and the use of resources, it is not the same to find an accurate result spending 3 days to find it and the capacity of 10 servers, to find a result not so accurate, but in 10 minutes and with a single server.

A clear example of this topic is given in the Descending Gradient which is an algorithm that applies optimization techniques to find a local minimum value within a function. Explained in a simple way, suppose you are walking on a mountain and suddenly the fog covers the environment, at that moment, your only goal is to go down the mountain to find the plain, so cautiously and step by step you start to go down until at one point you find flat land, in this scenario your local minimum would be the flat land, your function would be to go down that mountain and the gradient would be yourself because you are the one who goes through the whole mountain to find the flat spot.

How could I start learning about it?

The main problem faced by most people who want to learn about this interesting field of IA, is the large number of action fronts that it has, which translates into learning about new concepts, new tools, topics that at first glance seem tangled, but to the extent that their bases are known, they become easier to understand.

So, from a more personal point of view, and as a recommendation, it is best to start with the root of everything, statistics and probability, to have an approach to the types of problems that we will face, from there we can continue with a review of linear algebra, which I findfun but still challenging, following our roadmap would be everything related to calculus, having clear basic concepts of derivatives, integrals and dimensions is paramount to finish with optimization, which as we saw earlier has linear algebra and calculus inside.

Conclusion

Knowing the mathematical foundations of machine learning helps us on a daily basis to solve key issues in our work, such as: selecting the algorithm that best fits our problem, choosing the hyperparameters that best fit our model, identifying problems in models such as bias or overtraining, finding an optimal function to solve our problem, among the most important tasks.

This is why this topic is indispensable for someone who is interested in learning about ML. We must remember that, broadly speaking, we are trying to build technology capable of emulating human tasks and behaviors, and if we don’t know how we are building these technologies, chances are that the solutions we arrive at will be only as smart as the foundations on which we build them.

Blogs