The main goal of statistics is understanding and explaining relationships of variables. These actions can range from supervised to unsupervised problems and from explaining causal relationships to understanding different clusters. Often statistical methods can be used to make predictions, forecastings, and estimations. To point out, there is a big overlap between machine learning and statistics, where the later have the advantage of being simpler and retaining interpretability. By this I mean, that we know the reason behind the certian decision that a statistical model makes but we cannot say the same for the machine learning. Nevertheless, many machine learning techniques are based on statistics and more and more they are needing statistics in order to evaluate their models.

Statistics assist us in discovering causal mechanisms and making inferences, while they quantify the certainty or uncertainty of these relations. The importance of the quantifying action becomes evident if we consider that there are two types of reasoning, which are deductive reasoning and inductive reasoning. Deductive reasoning is the process of reaching a conclusion on a certain situation, individual or instance from one or more general premises. Let me give you an example. Let’s say we know that someone is allergic to peanuts. This means that the person will have trouble when he eats peanuts (premise). In the real world, person A, who is allergic to peanuts, will have an allergic reaction when he eats peanuts.

On the other hand, inductive reasoning is the process of deriving general premises from a specific instance. For example, we see a person having an allergic reaction after he ate peanuts. Therefore, we conclude that the person is allergic to peanuts. While the deductive reasoning is always certain, because we derive it from a general law, inductive reasoning is not necessarily certain. In our case, the person might have experienced an allergic reaction because he touched something he is allergic to for instance the package of the peanuts, rather than because he ate peanuts. As we can see, there is always an associated uncertainty to an inductive conclusion, and statistics aids us with quantifying this uncertainty.

The methods that I will present in this section for you will be applied in a business context. They will range from simple techniques like linear regression and descriptive statistics to more advanced techniques like instrumental variable regression, error-correction model, and survival analysis. Furthermore, in this section, I will cover methods that do not directly estimate something or do not require labeled data as the methods of factor analysis, principal component analysis, and clustering are.