Machine learning, often also advertised as AI, is about making decisions, or better, let machines make the decision to some degree. It involves training models to predict values and unlike statistics that have the goal of understanding a causation, machine learning does not do the above. To sum up, statistics may be useful to understand possible underlying relationships for building better machine learning models.
As a result of the disadvantage that they have compared to statistics, machine learning has introduced many additional methods and procedures (like new imputation methods, k-fold cross validation or optimization algorithms) that support the user to achieve this goal. However, there are other methods like neuronal networks – which are getting a lot of attention at the moment – that are hardly interpretable. It is difficult to understand the reason why the model makes certain decisions. What is even more difficult is understanding the reason that hides behind the failure of the model. Some models might be “prejudiced” and this way they discriminate against certain groups of the population without it being visible in the structure of the model.
In addition, there are more challenges to machine learning for business. In order to effectively use machine learning, you will need a huge amount of labeled data, which you will rarely find for your own purpose. Let me show you an example. If you want to predict whether an investment will work or not, then you will need a huge amount of data about past investments and their success or failure. Even if you find this information and you develop a model, in case your model encounters a completely new case or a case that was underrepresented in your training data, it might completely fail to make a precise prediction. Furthermore, the training time and development time of models can be long. This meaning that if the environment is constantly changing, then some other algorithms might be more suitable than others.
The questions that each of you might have at the moment are:
- How much data do I need to make a precise prediction?
- What is possible with the data I have?
- How long will the development take and what are the associated costs?
Besides supervised, unsupervised and semi-supervised approaches, this blog will also especially address these problems and questions to help you apply the machine learning algorithms to any business context. I will not only show you the individual methods and algorithms you will need but I will also present important strategical considerations and the ways you can use learning curves to estimate project costs and feasibility.