Skip to main content

Johannes Otterbach — Unlocking ML for Traditional Companies

Johannes talks about quantum computing, the state of the ML tools ecosystem today, and the challenges of developing and deploying models for customers.
Created on May 2|Last edited on May 12


About this episode

Johannes Otterbach is VP of Machine Learning Research at Merantix Momentum, a machine learning consulting studio that helps their clients build AI solutions.
Johannes and Lukas talk about Johannes' background in physics and applications of ML to quantum computing, why Merantix is investing in building a cloud-agnostic tech stack, and the unique challenges of developing and deploying models for different customers. They also discuss some of Johannes' articles on the impact of NLP models and the future of AI regulation.

Connect with Johannes:

Listen



Timestamps

0:00 Intro
44:30 Outro

Watch on YouTube



Transcript

Note: Transcriptions are provided by a third-party service, and may contain some inaccuracies. Please submit any corrections to angelica@wandb.com. Thank you!

Intro

Johannes:
If you take those big models, you'll run into a problem. You need already compute power. You need infrastructure. You need MLOps. You need a whole department to actually make use of those models. Not many people have that, especially those companies that it's most useful for.
Lukas:
Today, I'm talking to Johannes Otterbach. He was originally a quantum physicist and then went into machine learning and he's currently VP of Machine Learning at Merantix Momentum. Merantix is a really interesting company that develops all sorts of machine learning applications for customers, small and large, and then really deploys them into these real-world systems and hands them off. We really get into real-world applications of ML in factories and other places, the tooling required to make machine learning work, and also how to do smooth handoffs so that customers are actually successful in that transition. This is a really interesting conversation and I hope you enjoy it.

Quantum computing and ML applications

Lukas:
My first question for you is looking at your resume, you're one in a long line of people that kind of moved from physics into machine learning. I'd love to hear what that journey was like, you know, studying quantum physics. And I think you worked on a little bit of quantum engineering or quantum computing. And then now you do machine learning. How did that happen?
Johannes:
That's a great question. I think initially I was super excited about physics because physics, I just saw as something to understand the world. I'm really excited about understanding how things work, taking things apart to put it back together. That's always drawn me to physics rather than engineering. I was on track to just do a career in physics and then AlexNet came out and the ImageNet challenge happened. Like, "Holy crap. There is something really cool happening." It's always funny to tell people I did my PhD before ImageNet was a thing, because that makes me really old. But it was kind of an exciting time. And so when I heard that, I was like, "Well, I want to reconsider my career as a physicist anyway," at that point, and looked into what this AlexNet was about and the ImageNet challenge. This covered this whole field of data science and big data that was starting off at that time. That's a very natural transition for a physicist because we are good at statistics, we are good at modeling, we like math. And then I fell in love with this big data, data science. And since then, I've been continuously driving at understanding the language of data. ML is just like an expression of that language, and that's why I fell in love with it. And now I'm here.
Lukas:
You did do some work in quantum computing, is that right? Do you think that quantum computing has anything to apply to ML? Or do you think ML has anything to apply to quantum computing? How do you think about that?
Johannes:
I think it actually...it's mutually beneficial and I see there will be a convergence of those two fields in the near future. There are four different quadrants that we can talk about. We have classical and quantum, in terms of engineering and in terms of data. You have quantum data and classical data and you have quantum algorithms and classical algorithms. You can actually start to think in those four quadrants. I think that right now we see that a lot of effort is being put into using quantum algorithms to classical data. That I think is actually potentially the wrong way to think about it. We should always think about like quantum algorithms for quantum data and maybe classical algorithms for classical data. These crossfields are a little bit more complicated to solve. I think cross-fertilization is going to be happening.
Lukas:
What is quantum data?
Johannes:
Quantum data is essentially data that comes out of quantum states. I don't know how deep you are into quantum computing, but typically in quantum computing we don't talk about definite outcomes in a way, but we're describing systems by wave functions, which are — naively speaking — the square root of probabilities. Quote unquote. Don't take this too seriously. What you get with this is essentially expressions through quantum data, which just has a phase and an amplitude. If you start measuring this, you get a lot of complex numbers, you get various different types of phenomena. And those data typically take an exponential time to read out into classical states. When you have a quantum state and you want to completely express that quantum state as classical data, you get an exponential overhead in storage.
Lukas:
What's a situation in the real world where I would have quantum data? I can imagine how these quantum computers produce that, but when would I be collecting quantum data?
Johannes:
When you actually deal with quantum systems. If you want to start to understand molecules, for example. Very deep interactions of molecular properties, they are ruled by quantum rules. If you want to simulate molecules, you would rather want to do it in quantum space than a classical space. That's really the way to go. That's why modern or early stage today's quantum computers are more simulators of other quantum systems. You use these computers to simulate quantum systems that you want to study in a very, very controlled fashion. And then you deal with quantum data at that point.
Lukas:
Are we actually able to simulate quantum systems in a useful way? Because, you know, I have experience with classical mechanics systems and the simulations seem to break down very quickly. I can only imagine that the quantum simulations are much harder and probably harder to make accurate.
Johannes:
We are getting really good results. A lot of quantum experimental physics is essentially doing that. We have toy models that we use in order to validate our mathematical theories. A good example is a field that I worked in back in the past, which is quantum optics, where we have a lot of laser fields and single atoms. And we start to put them together in a certain fashion in these laser fields so that we can simulate materials that we really have a hard time understanding. Like for example, high temperature superconductivity. We have certain types of mathematical models — statistical models — that we think about like how these things can come across or can come about. And then in order to study the effects of these models, we use a very clean system that we have a high level of control for, and try to simulate those mathematical models, and see if those models then give rise to these phenomena that we see, for example, in these materials that have high temperature superconductivity. So we use a much simpler system to stimulate a much more complex system in order to probe our understanding of the physical loss here, in this case.
Lukas:
Is there applications of ML to that? I feel like we've talked on this show to some chemists in different fields, and they've been sort of using ML maybe to approximate these kinds of interactions. Is that an interesting field to you?
Johannes:
I think that's an interesting field to me. But actually I think I'm much more excited about a completely different avenue of applying ML to quantum systems. If you think about building a quantum computer, you have a lot of different qubits. These are like the atomic units. You have bits in a computer, a classic computer. You have qubits in a quantum computer. To use and address these qubits, we have to very, very meticulously control those qubits in order to really make them do what we want. You cannot just flip a switch from zero to one, but you have to control everything between zero and one. It's a very, very analogous computer, like an analog computer to a certain extent. And in order to control these kinds of systems, I think here is where ML comes into play because you can use, for example, reinforcement learning techniques to do optimal control of these quantum gates in order to facilitate those two-qubit interactions or three-qubit interactions now to get a high-fidelity quantum computer. And I think that might be the one of the early applications of ML to quantum systems and to quantum computers. My firm belief is that we probably need machine learning techniques — modern machine learning techniques — in order to scale quantum computers to sizes that are actually useful
Lukas:
Interesting. I feel like I've met a number of people in machine learning that kind of feel like they're refugees from quantum computing. Like they felt like it didn't really have a path to real world applications and kind of moves into machine learning. When I saw your resume, I wondered if you were one of those people, but it sounds like you're pretty optimistic about the future of quantum computing.
Johannes:
Yeah. I think that the question is on which timescale, right? The quantum computer is still very nascent and I feel that quantum computing will go through like the same kind of winters that machine learning went through a while [ago]. When this will happen, I don't know, but we will see these kind of winters coming out. I — in my lifetime — want to see some more impact on a shorter-term time scale. And I think that machine learning is the right path for that. I actually don't think that I shut the door. At some point I want to do a bit of quantum computing again, but maybe take my ML knowledge to quantum systems in order to facilitate some better approaches to do that. But right now, quantum computing is very much at the hardware level and I'm a software guy.

Merantix, Ventures, and ML consulting

Lukas:
Cool. Well, tell me about your work at Merantix. Maybe we could start with what Merantix is and what you work on there.
Johannes:
Yeah, sure. Merantix is a super cool construct actually. We have two separate units, we're half Merantix Momentum and we have Merantix Studio, which is the overarching company. Merantix Studio is actually a venture studio that focuses on deep tech in Berlin. The idea here is that we have like pre-vetted industry cases, where we've then looked for what we call entrepreneurs-in-residence that want to work on certain critical domains that we deem necessary in order to bring AI into broad adoption outside of just B2C businesses. The venture studio looks at those different use cases then starts to seed an entrepreneur-in-residence, lets them have six months to a year of vetting the use case, and then build up their venture. Merantix Momentum is one of the special ventures because we are actually not an independent venture, we are a 100% subsidiary of Merantix Studio. We are focusing on these use cases where it's not big enough to actually build a venture by itself, but actually still need help for certain domains. We try to focus on use cases of clients that have actual problems to see how can we actually apply ML techniques and ML deployment techniques and MLOps to help those customers in need. Classic example is, for example, visual quality control manufacturers. They have no IT stack, they have no IT system. But they have very hard visual quality control problems. So building a vision classifier based on a convolutional network just offers itself. We build that for them and make sure that it's actually scalable and then also help them put it into production close to the sensors. You can't build an own venture around it, but Merantix Momentum can actually do it. That's what we're here for. And so within that ecosystem-
Lukas:
-why do you think you can't build a venture around that? I mean, it seems like that'd be pretty useful to a lot of people.
Johannes:
I think the question is how quick do you gain significant market cap, right? I think eventually you can build a venture around this, but I think the adoption is not big enough yet in order to build your own venture around it. In a way, Merantix Momentum is the venture that can actually do that. Because we're...in that sense we are a professional services department where we go in and say like, "Hey, you have a problem. You want to have a one-off machine learning model. We can help you get there." That's what we're doing. So that's kind of the venture around that. But like, you wouldn't build a venture to just go out and do visual quality control for company X, Y, or Z.
Lukas:
How does it work? I mean, I would think that doing this kind of thing for customers would be very hard to scope, right? Because I feel like one of the challenges of machine learning is you don't really know in advance how well a particular application is going to work. And then downstream from that, it'd probably be hard for customers to estimate how well different levels of quality of the model would really impact their business. How do you make sure that a company is going to be happy at the end of one of these engagements, or do you just view it as sort of an experiment?
Johannes:
That's a really great question. I think that we are getting some traction on that. The key here is to work earlier with customers to understand their needs. We really have very intense engagements before we start our work to make sure, "Is the use case actually solvable? How big is the challenge? What kind of data challenges do we meet? Which kind of approaches would we actually take?" And really take the customer on a journey before we really say like, "Now we start engaging." The way that we approach this is a staged approach where we have more individual workshops, which we call the AI Hub. Which is a pre-study to an actual work engagement, implementation engagement, so that the customer understands what can be achieved with which data, with which kind of effort. And then we start the implementation work. When implementation work comes, of course, it's a professional services. There's always a little bit of security and risk, but we already mitigated the risk significantly. Often it comes out that some problems are not solvable, and then we go to a different type of model. Which I'm actually working on.
Lukas:
What type of model is that? You work on unsolvable problems? Is that what I just heard you say?
Johannes:
Not unsolvable problems, but problems that you cannot just do in a client engagement, right? There's a different funding strategy — that also exists in the US to a certain extent, but much more so in Germany, in Europe — which is publicly funded research projects. The German state, or the federal government, is interested in solving certain types of problems that are industry-spanning, but they're too hard for just a single company to just work on it because you have to bring many, many different domain experts together. So they fund consortial research, which is typically like 4 to 10 partners where you have application partners that bring their challenge problems and datasets with them. Then you have academic partners that bring in academic state-of-the-art research facilities. And then you also have professional services company like us who really understand deployment models, deep tech industry applications, "How do we make machine learning models robust?". And you engage in translational transfer research to use the academic results to apply to industry problems. Once you solve that, then you have enough data to actually then bring it to a client engagement in a B2B relationship.
Lukas:
Can you talk about some of the things you're working on, specifically?
Johannes:
Specifically? Yeah, we have a bunch of research projects that are going on with big manufacturers in automotives in Germany. We just are about to finish a project on self-driving cars, autonomous vehicles. Very classic use case for Germany, I would say. Here, the idea really is that car manufacturers do not really understand all the details that are involved in building a, for example, segmentation map for optical flow application. But they are very, very good in understanding functional safety regards. And so really bringing those two domains together of saying like, "We need self-driving cars, autonomous vehicles, but we don't know how to build the segmentation models. We need the domain expertise," and we say, "We know how to build those segmentation models, but we don't know, actually, what are the safety critical features?" How do we bring those together? That was a research project that we worked at.
Lukas:
Oh, that's cool. So you're doing segmentation on vision, basically, from vehicles?
Johannes:
So there's...computer vision is one of them. We were investigating synthetic datasets, where you have essentially a rendered dataset in order to pre-train those models. Optical flow detection, bounding box detection, person detection. These are some classic models. We also have other research projects that are much more going into optimization problems, where you need to understand how manufacturing pipelines actually look like. Cool example — I unfortunately cannot name the company name — but like imagine you have a critical element for building a car seat. There's metal bars. And these metal bars, they are funnily enough going through like 50 different manufacturing steps. Sounds crazy, but it's actually true. Those 50 manufacturing steps are distributed over 10 different factories of 5 different just-in-time partners.
Lukas:
Wow. Can you give me some examples of what these steps might be? It's hard to picture 50 steps in a metal bar.
Johannes:
The raw metal forming to the raw rod. Then the first processing to bring it to the right rod. Then you do chroming of the rod. Then you start the first bending iteration. Then you rechrome, refinish. Do the second bending, do the next step, and so on until it's in the right shape. There's a lot of these steps. Yeah. I didn't know about that either. It's pretty crazy. What happens now is that in your manufacturing process, a mistake happens at step number 10. You don't notice that mistake until step number 15, when your metal bar is a little bit outside of specifications. Typically what happens is that now we take this whole batch and you put it to a scrap metal and start from scratch. However, the challenge now is like, "Can you do something in step number 20, maybe, that you can bring that rod back into specifications?" So that at process step 30, 40, 50, it fits again back into specifications. Now we can imagine this is like a very high-dimensional optimization problem with a very sparse reward signal. Classic optimization problem. That's [the] kind of research projects that we're working at. And now is the question, what kind of techniques in the field of ML can we use and transfer to those kinds of problems? And what kind of data do we actually need for that?
Lukas:
So what would be the choice here? What would you do differently at, say, step 20 that might make it useful in the end?
Johannes:
We have to find what are the kind of levers, right? And there is different types of process that maybe you don't heat it up as much, or you over-bend it a little bit into one direction and rebend in the other direction. Maybe you do a refinishing at some point. These are all the levers that we have. We have to explore, "What is the actual problem?" And here you start to see that the devil's in the details. What are actually the defects that matter? Like it's a causal inference problem. It's a Bayesian learning problem. We don't know yet because we just started this project. I wish I knew the answer, but then I would have already published something around that.
Lukas:
Wow, so you're just working on a totally wide range of machine learning applications in the real world.
Johannes:
That's right.

Building a cloud-agnostic tech stack

Lukas:
You must be building a really interesting set of tools to make this possible. Can you talk about the stuff that you're building that works across all of these different applications?
Johannes:
Yeah, no, that's a super question because I think that's one of the things that we do extremely well, and we have a lot of fun doing that. Maybe let's start a little bit back because one of the challenge that we have — being in Europe — lots of companies have very, very little trust in cloud deployments here. You have to start with the customer and say like, "What happens here?" And one of the things that people are super afraid of is vendor lock-in. So we have to build a tool stack that really is cloud-agnostic. We can deploy on-prem, we can do it on GCP, AWS, Azure, you name it, whatever it is. That's the first prerogative; we need to understand how to build a stack that's completely agnostic of the underlying cloud. And so in order to do that, we start of course building stuff on Terraform and Kubernetes. We do extensive use of those systems to automate a lot of deployment tasks. So, infrastructure as code. Now, once you start to go into like all of these files, you're getting fairly quickly lost in them because these configuration files start to become very, very complicated. So we started to build tools to automate how we actually write deployment files. We have an internal tool — which we also funnily enough call dev tool — that essentially is nothing else than building very specifically pre-programmed template files in order to spin up complete deployments automatically. And so we are completely independent of the actual underlying cloud, because we can just spin up the templates of a full deployment cluster. And on top of that, we can then start using all kinds of other tools that we need in these clusters that we deploy. We're typically heavily relying on Dockers. So you build a Docker file that we can then deploy on a pod that we command using Kubernetes or Terraform. For the deployments then we use Seldon. We use a flight(?) pipeline to automate complete learning pipelines. CI/CD in that loop is done with flight(?). Right now we still have cloud build here, but we're already thinking about how to get that out of the loop. So we're trying to be really, really cloud-agnostic and build a stack ecosystem on these modern ML tools.
Lukas:
Does this stack that...this stack, I guess you're deploying into a customer's production environment. Does this include training or is it just for running a model for the customer?
Johannes:
It really depends on what a customer actually wants. We are right now...we're targeting towards MLOps Level 2, I think that's what Google calls it. We are not quite there yet, but so right now we still have a split between manually triggering a retraining that we do internally using our stack in the cloud or on their on-premise system. And then also having a separate manual step to actually deploy it into production. And we're doing both of them. We can actually do the deployment step and the retraining step using all of our infrastructure. And the target really doesn't matter, because we build it cloud-agnostic. We can, for example, do a re-training on our internal cloud, which we mostly use GCP right now for us. But if the customer wants to have the model in their production stack, we train it on our cloud and then move it to their production stack on-prem.
Lukas:
What have you learned building these tools? I mean, it sounds like you're making the stuff, you're deploying it. There's many, you know, people trying to build these things. What have been the kind of lessons, actually, when these things get deployed into customers' systems?
Johannes:
That it's really, really hard to do.
Lukas:
Why is it hard? Cause it's...conceptually it's simple. What actually really makes it hard?
Johannes:
It's actually not that hard if customers are okay with using cloud deployments. I think what makes it hard is if they're using on-prem in their own stack, because then suddenly the tools are not yet at the point where you can just abstract away every kind of sysadmin. You're always having this touch point between "How's the hardware actually managed?" and "How can you deploy it?" As soon as you have a Kubernetes cluster installed on-premise, you're probably fine again. But until you get there, you cannot abstract that system away. And then you're also getting these realities of the business, that you sometimes have to deal with IOT devices. Deploying stuff onto IOT, that really is not there yet. I think the tools are falling short on that end, but I think that it's just a matter of time until we have more tools that are ready for IOT deployments.
Lukas:
How do you think about monitoring the system in production? I'd imagine these things could be somewhat mission-critical, but I noticed you didn't really mention production monitoring. How do you think about that?
Johannes:
I think it's very important and we do it. We are not necessarily deploying extremely mission-critical systems right now. So that's what we haven't done yet. I think we're getting there soon. But right now, it's mostly just like measuring uptime and making sure that the stack doesn't fold under load. So it's just the standard production monitoring that is just Grafana load testing, throughput measurements, and these kinds of things. Not necessarily decision-making and auditing trails in that regard. So it's more like a standard site reliability monitoring that can be automated fairly easily using Grafana or any other monitoring tools that you like.

The open source tooling ecosystem

Lukas:
Got it, got it. I thought you might want to talk about some of the tools that that you've developed, like Squirrel, and Parrot, and Chameleon. Can you describe what these are?
Johannes:
Yeah, that's really cool. My personal favorite right now is Squirrel, just because we're just about to launch it and then release it out into the world, which is super fascinating. The goal here is that if you take a look into the ecosystem, we are very, very good in building ML models for training on single GPUs. But as soon as anybody encounters for the first time trying to deal with multiple GPUs, you get into big problems. And many frameworks have come across that are actually helping you to distribute a model, but nobody has really thought about, "How do you distribute the data?" And there are not many frameworks out there. There is a few things that we have looked at that are trying to solve that, and the ecosystem is getting bigger, but we are now decided we want to go into like a place where we can really make data loading on distributed systems as easy as possible. It doesn't need to be only for deep learning, but it can be for a lot of different things. And on top of that, also build in potential access control levels, right? Like you want to pull that one from this packet, the next one from that packet, the third one from this packet, and make sure that you mix and match this very well. That's what Squirrel's really about, to make data access and data storage and data writing super, super simple. As simple as you can do it by just abstracting away a file system. You can be on a cloud, it can be on local, it can just be pulled from the internet. And it should be easy to integrate in any kind of framework. That's really what we're doing here.
Lukas:
And your plan is to make this open source?
Johannes:
The idea is to make this open source. Exactly.
Lukas:
Cool, cool. I guess, do you have a preference of other open source tooling? Do you guys kind of standardize on your ML framework and things like that? What's your set of tools that you would typically like to use?
Johannes:
I mean we, of course, are also standardizing as much as we can. You can imagine, having many, many customers who want to have standardized tools. Our standard framework is PyTorch. That's what we're doing internally for training these models. We're also getting a lot of PyTorch Lightning as an easy framework. We're also using Hydra — that's developed by Facebook — as an interface and an entry point into those systems.
Lukas:
Why did you pick PyTorch Lightning? What did you like about that?
Johannes:
I think the idea here is that it really abstracts away much of what ML training frameworks have to do. You're writing a data loader. You're having an optimizer. You're having a training loop and you have a logger. And typically when you just look at typical GitHub repositories, everybody writes "for a batch in dataloader do all of these kinds of things". It's a very repetitive code. Like, just abstract this away, do some software engineering so it's robust, and then you can go with that, right? It's especially important if you're doing production models or you just have to retrain and you need to be stable on that. Software maintenance is, I think, one of the things that is not really in the academic ML community. Which comes as a surprise to me, because the field that is coming out of engineering should value good code quality a little bit more, I feel. So we have to do it ourselves. So, use tools that make maintenance and debugging of machine learning models easier. Frameworks are the way to go for that because you don't want to build it yourself if the community can help you maintain the systems.
Lukas:
Do you also use PyTorch to run the models in production? I know some people will kind of change the format or do something to the model before it's deployed. Do you just load up the model as serialized from PyTorch or do you do anything special there?
Johannes:
No, we typically deserialize it from PyTorch directly because right now our motive is to ship Dockers around the world. I think eventually we probably — for certain applications — need to go into a more standardized framework, like ONNX or something like that. That will change the game potentially. But right now we are still using the binary Docker.
Lukas:
Where do you see gaps in the tooling right now? As someone that likes to make and sell ML tools. What parts of the stack feel mature and what parts feel broken?
Johannes:
What feels broken to me is that you have to plug many systems into many systems. That feels a little bit sad, because that makes it really hard sometimes to stay abreast of the edge. I don't think that there's anything lacking in the community right now. I more feel like the problem is that too many people are building too many tools instead of just coming together and taking one tool and bringing it to the next level. The thing that then happens is that people try to be different from others instead of making one tool that solves a lot of problems. Counter example where this worked really well is in the data science world, right? You just need two or three libraries in the data science world, which is scikit-learn, numpy, and pandas. And you're set. If you're going into [the] MLOps domain, I don't know how many tools [are] out there. You probably know better than me. It's just...I wonder sometimes why.
Lukas:
Yeah, that's fair. I mean, I definitely think there's always a moment where there's an explosion of ideas and tools and then things start to standardize for sure. And I think we're still at that explosion stage.
Johannes:
I think so.
Lukas:
That's what makes it interesting to be in this world right now.
Johannes:
I agree. I think that there's a lot of abstractions we haven't figured out. Like, for example, deployment to IOT. But I'm super curious about...that I haven't seen much development until recently is, "How do you deploy models in heterogeneous environments? How do you train on heterogeneous environments?" I think there is still a lot of ML tooling that needs to get better. Not everybody has a huge data center of homogenous hardware. So how do we deploy models or train models on heterogeneous hardware?

Handing off models to customers

Lukas:
I guess another question I have is, how do you hand off these models to a customer? You give them a Docker, but if they want to keep iterating on a model — once they've taken it from you — are they able to do that? How do you think about that? Because it does sort of feel like machine learning projects are never really complete, if you know what I mean.
Johannes:
Yeah, no, I understand what you're saying. It depends on the customer. I don't think that there's a one rule fits all. Some customers just come back and say, "Hey, we need retraining or we need a fresh up. Can you do that for us?" Because they don't have an IT department. Some people want to jumpstart their IT department. They say, "Okay, we know machine learning is the future. We don't have an IT department yet, but maybe engage with you and you help us to jumpstart the engine," right? And then they start continuing on that goal. It's always of course a conversation because it's also tricky for us to say, "Hey, we're offering our expertise, we put in a lot of sweat, tears, and blood, and then you take it to the next level." That's always sad as well. So it's just always a tricky conversation. But we're happy to help people. And I ultimately think that everyone benefits, if the community just grows.

The impact of NLP models on the real world

Lukas:
I guess another question I wanted to ask you about is, you've written a few thought pieces on AI. I don't know if you have a favorite, but I think one interesting one was your writing on the impact of NLP models on the real world. If you could summarize for people who haven't read it? My perspective is that, in a way, the NLP field seems to be doing a whole bunch of very amazing things. And I know people argue about, "Is this real intelligence or not?" or like, you know, "How much does it really matter?" But I guess from my perspective, as a technologist and enthusiast, I kind of can't believe how good text generation has got, in some sense. And yet I think the impact to me is smaller than I would've imagined from how impressive the demos look. I don't know how you feel about that.
Johannes:
No, I see your point and I think that's exactly the reason why I like working where I am. Because it's right in the middle of driving the adoption of modern AI techniques. I think the reason why you'll feel the impact is not as big as it could have been or should have been is that it's really, really hard to bring technology like that to people who are not technologists like us. That's really the challenge here. You have to bridge that gap. And there is this early adopter gap and that needs to be bridged, and we are not there yet. I'm also with you. I don't really want to get into this philosophical debate. Is it intelligent? Is it conscious? Whatever it is, it's useful technology. Let's bring it to the people and have them have a better life with it, right? Let's solve some problems with that. That's maybe the philosophical side. The practical side is, if you take those big models, you'll run into a problem. You need already compute power. You need infrastructure. You need MLOps. You need a whole department to actually make use of those models. Not many people have that, especially those companies that it's most useful for. Take, for example, news outlets or media outlets. They are completely focused on a very different problem. They don't have technologists that just take a GPT-2 or even a GPT-3-sized model to put it into production and then figure out the use cases, right? That's just not how the economics of these companies work. Bringing it to those people, it's just really hard. That, I think is the reason why we don't see that impact yet. It's going to come, but it's still going to take a few years.
Lukas:
What do you think are the next things that we're going to notice — just as consumers — from the impact of these more powerful NLP models?
Johannes:
I do think that a lot of stuff that will come is improvements in search. I think that the the signals that we get from similarity clustering is significant, and we just need to figure out how to adopt that into real worlds. If you just run GPT-3-sized models the search is slow, so we just need to do some improvements on that. But I do think that we see a re-ranking on that front. I also think that a lot of automation will happen for automated text generation, and that's a positive thing. I don't know how much time you spend on emails. I certainly do a lot and you probably do too. And it would be nice to just automate some of that stuff away. I also talked to several customers in Germany that have this funky problem where they're in a logistics space. Logistics is a very old-school domain where you get very free-form order forms. There are armadas of people that just do nothing else than take those emails that are just free-floating, written, and turn them into structured texts by just manually copy-pasting into a structured field. Sounds easy. It's not. It's a very, very hard NLP task. Once we bring these big models into that realm, I think there will be a lot of automation for the better. I do think there's a lot of potential. I'm very excited about the future of those models.

Thoughts on AI and regulation

Lukas:
Cool. You also wrote an article on AI and regulation I wanted to touch on. I'm curious your perspective on regulation. I mean, obviously it's coming, but I'd be interested to know what you think about it. Like what good regulation would look like.
Johannes:
If I only knew, right? That's a good discussion. I think being in Europe, one of the things that I needed to learn is, "How can you use regulation in order to build value systems of a society into your AI deployments?" And that can be a good thing. I think the regulation needs to address the realities of AI as being an experimental technology and we need to deal with these uncertainties, but also make sure that we are not opening the door for extreme abuses, and give people and consumers the right to protest. How to exactly build that regulation? I don't know. I think that what I appreciate about the regulatory frameworks that we have in the EU is that we are more willing to iterate on regulations, which is good. We make a draft, we see how it's being in practice. Some things work, some things don't work. We try to adjust. Classic example, GDPR and the cookie banners. I don't know how many cookies you have to click away. It's really annoying and people got it. And now we're trying to figure out how to build the regulation, that we don't have to do this anymore. But it takes time. And I think it's a process. I think as a technologist, you're actually building software for humans, right? You don't build technology for your own sake. You're building in order to make something better, to do something better. To make somebody's life better.
Lukas:
I guess, specifically, what's a regulation that you would like to see happen?
Johannes:
What I would like to see happen is to allow for ML models to have a sandbox environment where you can say, "I can do tests on real-world scenarios where it can collect data in the real world, in a given risk frame." And then you can get risk certifications that are going up. Where it says like, "Okay, I did my first test that was an exposure of — I don't know — a million dollars in risk." Just an arbitrary number, don't take them for fixed prices. A certifier says, "Okay, that's great. Now we can go to the next iteration phase." And then you build up this risk where you can say a certifier is willing to back you up on insurance for a given risk factor. Because only then can you actually use these experimental technologies to go out into the real world. Because right now, hands are often bound, right? Like by data privacy issues, by copyright issues, by security concerns. The regulatory uncertainties around that — especially for a startup that builds ML — is really, really high. I would like to see having protected environments, where you are allowed to test things within a certain box. I think that would be a good regulation because the consumer can slowly gather trust and can see what it can do in the real world. You start to see curiosity and you have it under control to a certain extent because if the company does something wrong, it's going to get penalized and that's bad for the company. I think that would be a good regulation I would like to see, in this form or another.
Lukas:
I saw you also wrote on ML and environmental impact and that's something I care about a lot and have looked at. What's your thoughts there? Do you feel like people should be finding ways to train models with less compute? How do you reconcile the fact that you're also doing model training in your day-to-day job?
Johannes:
It's a complicated question. On the one hand, big models and ML models are really powerful and important. On the other hand, you need to make sure that you're not burning up the planet with them, right? My stance on this is, "Let's reduce those models as much as you can." Fine-tuning, zero-shot learning. Once you shrink them and really invest in that money, let's make sure that this cost — this carbon footprint and the monetary stuff — amortizes. That's what we're currently seeing, right? There's a lot of interest in training these big models. Pre-train them because they fine-tune very well. I just feel like there's too many people who want to just build them from scratch and not figure out what can we do with the existing ones. I hope to see a change a little bit in that. That's my take on it. It's not just like "Shun it", but also "Let's be conscious about it".

Statistical physics and optimization problems

Lukas:
Makes sense. We always end with two questions, and the second-to-last question that we always end with is, what's a topic in machine learning that you think is understudied? What's something that — if you had more time — you would love to look into more deeply?
Johannes:
If I had more time, I would probably put on my physicist hat again and try to understand a lot of the optimization problems within machine learning. There's a whole field that is just ripe for discovery. Which is the combination of loss landscapes and optimization problems in deep learning models and the connection to a statistical physics. I think that is a really, really valuable lesson. It can actually help statistical physicists understand certain things better, but also statistical physics can probably help the ML community understand much better what's actually happening under the hood. I would love to contribute to this much more, but that's very far away from my own everyday.
Lukas:
You know, I've seen papers on this topic and I always find them impenetrable, because I think I don't have the background in physics that people are assuming. Can you describe a little bit of what this says to someone like me, who maybe knows some of the math and is interested, but doesn't quite follow? Is there an interesting result that you could point to from this analogy?
Johannes:
Physicists typically think in terms of what we call a phase diagram. Classic phase diagram is the different states of water. You have vapor, water, and ice. Similar effects happen in all kinds of other physical materials. One of the funny things that you can see is that these kind of phase transitions are different where you go from one phase to another phase, like from liquid to vapor. These kinds of transitions also happen in optimization landscapes of machine learning problems. For example, when you tune the number of parameters in the model, you go from the model not being able to optimize at all to the model just suddenly optimizing perfectly. People describe this as a spin class to jamming transition. Very technical term, but it essentially means from being like almost quasi-frozen state to something that is just very, very, very viscous. It's very different physical properties and you can see those in machine learning models. These are the early indications that you can use — these kind of methods and tools that we developed in statistical physics — to understand the dynamics that happen in machine learning models. Ultimately I think this will help us also train these models much better at a much cheaper cost.

The challenges of getting high-quality data

Lukas:
Cool. Well, on a much more practical note, when you think about all the models that you've trained and put into production, what's the hardest piece of that, at this moment? What is the biggest challenge from a customer wanting a model to do a particular thing to that thing deployed and working inside of their infrastructure?
Johannes:
I think actually getting the high-quality data is really hard. Because that's where the customer comes in and you need to actually pick them up at that point and tell them it's not just "data in and model out", but you need high-quality data. We did a project for semantic segmentation of very, very fine detailed mistakes on huge metal surfaces. These are tiny scratches. You have maybe like 5 or 6 pixels on like a 10000 x 1000 pixel image. And you need to find a loss function for that. These images are recorded from various different angles and labeled by different people. So on some images there's scratch, on some images there is not. Same piece of metal, but you see the scratch and you don't see the scratch. Helping people understand how to label data, how to bring the data into a quality that the model can actually pick something up, it's really the complicated part. I think that's an understudied problem.
Lukas:
How did you actually get the data labeled in this case? I do have some experience with data labeling.
Johannes:
Essentially having an armada of people that use the labeling tool and teach them what to label for and get a huge feedback loop.
Lukas:
Did you build a custom tool for this? To find the scratches.
Johannes:
Yeah, we used open source software — I don't know actually which piece we used — and then just adjusted it for that use-case in order to make this quick and fast.

Outro

Lukas:
Awesome. Well, thank you so much. This was really fun and so many different insights. I love it. Thank you.
Johannes:
Yeah. Thank you.
Lukas:
If you're enjoying these interviews and you want to learn more, please click on the link to the show notes in the description where you can find links to all the papers that are mentioned, supplemental material and a transcription that we work really hard to produce. So check it out.

Iterate on AI agents and models faster. Try Weights & Biases today.