Why Recipes for Machine Learning Solutions Don’t Work

People who ask me “how do you solve a certain project with machine learning” often expect some kind of a recipe as if they were baking a cake. I understand the expectation and often find myself trying to give something as close to a recipe, but lately I have come to realize that the answer is more a process than a recipe.

So what’s wrong with the recipe idea? A recipe is a bunch of steps that someone who has never done that before but is sufficiently skilled can follow to get the same result as the one who wrote the recipe with relative confidence.

You need some skills. You need to have some understanding how to prepare the ingredients, how to regulate the heat, and so on, but more or less, you can literally cook up something edible.

Cooking up an ML model

The problem with data science is that the ingredients may be structurally similar but differ quite significantly in the details. For example, if you’re building a recommender system, you know that there are algorithms like collaborative filtering that in general work well given user-item interactions, but there are many aspects of the data that might lead to completely different results. For example:

  • How much do your set of items change? If you’re selling fashion, the items might be highly seasonal. If you have a constant stream of new content, you might have very little information on the data.
  • How does the customer behavior look like? Are people coming back often? Do you see them just once or twice per year?
  • How clean is your data, are there many bots?
  • How many logged in customers do you have? Can you even track customers coming back?
  • What kind of customer engagement do you want to achieve? Is it more transactional, or do you want people to come back every day?

These are just some aspects. Even if you’ve built recommender systems successfully in the past, you might not be able to follow the exact same steps and get a great result.

You can already see hints of this in the cooking example. Your ingredients might be of quite different quality, so that you need to cook them longer or shorter. You need to know what to look for and adapt.

In data science projects you have this variability as an integral part of what the work is about.

So, I’m not a cook?

So, what am I doing as a consultant? Am I just hoping that I’ll be able to help my client somehow? How do experts satisfy their salary?

As it turns out, knowledge of past approaches that work well is only half of what distinguishes a novice from an expert. The other part is knowing how to approach this problem in a way that gives you a higher chance of figuring out a solution.

For me, a central part of this approach is the following: I have learned over and over again that it is key to formalize the problem in a way that you can evaluate the quality of a solution using your data. You do this by defining a procedure (I really mean code) that takes a solution for your problem, and runs it against your data set and measures in a problem specific way how well it works.

This can be as simple as computing square error on a hold-out set, or to run predictions on old data, and can become arbitrarily complex once interactions are involved.

Defining the problem this way is painful, because it is very explicit and you need to make decisions about what matters to you. There is very little room for vagueness.

For problems like recommender systems, you have the problem that you cannot go back in time to show the customers different recommendations, but you still will want to use your data as a proxy to see what your customers might be interested in, for example, by looking at more actions after a recommendation event and see what they clicked on.

Being truly data driven

The beauty of this is that you don’t need an expert anymore to tell you whether your solution works or not. I think this is fundamentally different from many other technical problems where you’ll often bring in an expert to tell you whether your solution works or not.

If you formalized your problem in a way that can be run, you don’t need to rely on someone’s opinion (not even mine), because you have a tool in your hands that let’s you objectively measure whether a solution works or not. If you figure out a really simple hack that performs well, you can take that solution and run.

Expert might be good to give suggestions what to try, or how to properly use some methods, but at the end of the day, their guess is just as good as the numbers tell you.

Of course, you will probably eventually figure out that there is something wrong with the way you formalized your problem. Maybe there is some way to game your evaluation metric. Or it doesn’t reflect what your customers want. But in this case, you fix that problem and iterate again.

I really don’t like the term democratization, but I think this is a good example. Anybody has the same chance at finding a better solution.

Learning how to organize work from academia, for once

By the way, driving innovation by formalizing a problem into a benchmark has a long tradition in the machine learning community (and in science as a whole). Just to give a few example, the PASCAL VoC Challenge has for many years compiled data sets for object detection in images. Once the problem seemed to be solved, they made it a bit more difficult next year. Such data sets have been one of the driving force behind the rise of deep learning models.

More recently, Google’s alphafold has made breakthroughs for the problem of predicting a protein’s structure from their DNA sequence. This whole field has for many years been shaped by the CASP challenge, where they withheld certain newly found protein structures to have true out-of-sample test data.

The beauty of such challenges is that they open up the competition to everyone who has access to the data, so that many people can work on the problem in parallel. They essentially decouple the problem from approaches and people trying to solve it. I think that’s an important part for allowing for a paradigm change like going from more or less biologically inspired computer vision models, which mimicked what we know about the visual processing system down to individual filter types, to deep learning models, which were in a way also biologically inspired but overall much more flexible. I don’t think you would’ve gotten there if the discussion had just stayed on the details of the current approach, but this is often how technical decisions in companies look like. Of course, not everything can easily be made measurable, and so on, but the power of data is that you can encode problems which are quite complex, something that isn’t easy to do if you write down a metric as a formula. But I’m getting sidetracked.

You probably won’t take this approach to open up a problem to the whole company, but even if you “only” have a single team working on a problem, I think this is the best starting advice I can give if you want to become more principled on solving problems with data science.

2 thoughts on “Why Recipes for Machine Learning Solutions Don’t Work

  1. Super insightful Mikio, Thanks for sharing.

    I especially can relate to the recommender systems and data driven decision part.
    The problem we face sometime is that there are too many ideas and then variations of ideas to test. Do you have some suggestions on how to organise that.

    1. Hey Ankur, many thanks!

      What’s your experience with trying to prioritize by something like “expected impact vs. effort?” Generally, I’d try simple things or things that can be done quickly also to establish a baseline and then gradually moving on to more complex approaches. Could also depend on the overall roadmap etc. So again, no simple answers 😉

Leave a reply to mikiobraun Cancel reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.