In a few days on August 1st, I will have completed my fourth year at Zalando. It is my first job out of university and I was fortunate enough to have their trust to make the switch from supervising a bunch of Ph.D. students to managing teams and leading people. Later, I switched to an individual contributor role, and explored a new kind of leadership while continually expanding my scope. As is natural in such situations, not everything was a success, some things were pretty painful, but overall, there was a lot to be learned, and there still is.
In an earlier post I sketched the different levels at which you work on AI. This time I want to take a more personal look at how I eventually made the transition. I hope this can provide some perspective for you on how to get to the next stage, or equally likely figure out you’re fine where you are.
Studying ML
Back when I was a student myself and well into being a student, the main task was to learn the tools of the trade. Probability theory, linear algebra, statistics, and all kinds of machine learning algorithms. I learned how they worked, sometimes how to derive the method from first principles, how to efficiently implement them, and so on. I have started and abandoned a couple of machine learning libraries, first in MATLAB, later in JRuby, and Scala.
If you are on a scientific career, this is of course only the first step, and eventually the details become part of a bigger journey, which is to develop new models that solve problems that were unsolved before. But before you can work on the problems, actually, you have to understand what everyone is working on, what the important problems are, and so on. And then you have to manage yourself to set you on the right path, let yourself work out some problems, and then decide how to continue.
Looking back, having to both do the detail work as well as the strategic planning is one of the most important things to learn when doing a Ph.D. If you are lucky, you have a supervisor to teach you how to do it, but eventually, you need to know how to make a plan, try it out, and come back from it if you are stuck.
For me personally, doing research also always had a tooling aspect. Being a computer scientist by training, I was always asking myself how to improve the tools and create my own toolbox to be able to focus on what is important and what isn’t. Some people get really good at memorizing the dozen of lines of code to implement nested cross-validation. I spent time automating that (and also in languages like MATLAB that are not really that well fit for higher level abstractions).
I not only had basic frameworks for encoding learning algorithms and evaluation, I also had code to automatically output result tables in LaTeX, keep track of experiments, and so on. My goal was always to be able to code on the level on which I was thinking. If I wanted to tweak a feature to see whether it worked, ideally I wanted to really just focus on that, and rerun, and not do any additional work (like, say, copying huge pieces of code and work on them because I don’t have good abstractions for code).
There is another level to research, namely projects and grants. Once you know how to solve a bigger problem and how to work towards it, you need to think about the multiyear projects that give the outer shelter to those projects. To get the money, you have to write grant proposals that outline the problem, and a rough idea how you want to solve them, how much resources you need, and so on. This proved not to be too different from the approach I now see again in approaches like “working backwards.“
I got pretty good at working on these proposals although I never managed to write one and get funded that was wholly based on an idea that was mine.
Making The Step To Industry
I am simplifying a lot here, but eventually in 2015 I made the switch, left my permanent position in academia to join Zalando. For the past few years I had a side project around real-time data analysis with approximative algorithms, and writing that thing and doing pilot projects with potential customers was a lot of fun. Writing a paper and waiting for months on a review that felt mostly like gatekeeping didn’t cut it anymore.
I was hired as Delivery Lead for the recommendation and search team at Zalando, which were 9 people in the beginning. 1.5 years later, this would have scaled up to more than 40 people.
I was keen on learning how to lead people, and how to manage teams. From the very beginning I was involved in hiring as well. I wasn’t aware of it, but I would interview 50-100 people per year.
Given my interest ing tooling, I was also very interested in learning what it takes to bring machine learning to production. One of the last courses I gave at the university were around Big Data. Back then, machine learning researchers tried to mostly ignore that technology. Java and the JVM were mostly unknown to researchers who had been using MATLAB and R and more recently Python. Back then, the approach to large scale learning was to find more and more clever ways to solve the optimization problems behind the popular machine learning algorithms quickly, and not so much by trying to apply raw computing power and distributed computing to scaling up computation.
Luckily, in the teams I could quickly learn what it took to train models on months of click data and then serve them in ways that would handle tens of thousands of requests per second.
Later, the search teams would rebuild the search service from the ground up and I started to see the first patterns which would eventually make it into a talk I gave at O’Reilly’s AI conference last year.
Beyond the purely technical I also learned a lot about how to set up teams, especially mixed teams of data scientists and software engineers, or how important it is to have a good mix of people in the team, not just technically, but also socially.
I also spent considerable time figuring out how to structure a whole department of teams, so that they can have a clear sense of purpose, ownership, and independence. Those who have been part of the resulting reorganization would probably not agree that this solved all problems, but I still think that the core idea was right.
Back On The Technical Track
About two years after I started Zalando was investing more into building out a principal track, and I felt like it is time to shift gears again. As much as I enjoyed learning how to manage people and serving them to do their job, I also felt like I was not making enough use of all the deep technical knowledge I had acquired the years before.
I stayed in the same area initially, and having established good working relationships helped a lot. But now I also had to learn how to guide people to make good use of ML, and resisting the temptation to do everything myself.
Instead, I started to look into what was needed to create the right condition for everyone to create ML based products. Product managers, engineers, data scientists, they all needed to learn how to adapt what they knew to this new challenge. Maybe I had expected that I would dive deep into hardcore ML again, but for now my path lead me to understand how to enable teams and departments to do ML.
Again, starting on the team level, I saw that all the different roles need slightly different understanding of what is required. Product managers need to understand the potential of ML and how to interact with data scientists, software engineers and managers need to understand that ML based projects run differently than normal software development projects, and so on.
Over time, an opportunity came up to change the role again so that it would have broader scope, leading to more topics to look into.
One of the big projects I did over the last year was to play a key role in defining an ML platform. Looking at our teams’ pain points and also how other companies are doing it (for example Twitter, Uber, Airbnb, or Facebook have talked about their approach), we came up with a platform vision that is still in the process of being delivered. This is a space I find personally extremely interesting. If you look at the blog posts of other companies above, you can definitely see some common patterns, but there also seems to be elements that are closely tied to differences in company culture and approach of doing AI.
I am starting to see connections to general company strategy and which role ML has to play there. I feel like I am only at the beginning of understanding this, but I am pretty sure that getting this right can have a tremendous impact on how effective you are as a company of creating ML and AI based products.
The Deep Learning Revolution
As I said, this is sort of my personal story over the past years. One interesting detail is that most of the deep learning revolution took place while I was at Zalando. I could already tell from the interviews I did. While in the beginning it was natural to ask about support vector machines, these days people know mostly basic algorithms like random forests while most of their expertise is about deep learning methods.
Teams that had more “classical” ML pipelines with explicit feature extraction pipelines are now gradually being replaced by deep learning models. The reduction in pipeline complexity is often quoted as one of the big advantages (besides often being better in accuracy as well). Where before you had to maintain pipelines and retrain and re-optimize everything if you changed one part, you can now retrain one deep learning model and you are done.
At the same, it is also clear that raw computing power demands, especially on the GPU have increased a lot, to the point that for some very advanced applications like GANs, training the models costs more than most companies can probably afford.
There is a shift to pretrained models that are then only slightly adjusted to a concrete application.
I am not sure whether this is the direction we’re going in. I am personally hoping for more effective learning algorithms that can keep the expressive power of deep learning while reducing training complexity.
From Statistics To Company Strategy… And Beyond?
Looking back (like most practitioners in the field, and also in other fields) I covered a lot of ground from the first time I went through the derivation of the closed form solution for least squares regression to thinking about data strategies and ML platforms. Still, so far it felt like I was following a natural path forward.
I am frankly not quite sure what the next steps are. There is probably a lot of recent developments to revisit. I could also continue to ever more abstract levels. On the other hand, maybe it is also time to take everything I have learned and apply it to build something interesting and hopefully useful.
As I said, there are probably more pretrained models available right now than years before. I have a hunch it is easier now than ever to prototype complex models. And I hope there are definitely better uses for this technology than we already see in some places.
Thanks for reading, let me know how you have experienced the past few years. I’d also be happy to learn about your personal journey with ML and AI.