AMA with Piero Molino

Piero answers questions about Ludwig, a code-free deep learning toolbox. Made by Angelica Pan using Weights & Biases
Angelica Pan


Welcome Piero! Super glad to have you here to talk about your work and your field. According to Jeremy Howard,'s mission to "make neural nets uncool again" will only be complete once neural nets are full "no code". How do you see their mission and approach aligning, or not, with yours?
I don't think coolness or uncoolness is the right gradient to follow. There have been many technologies in the course of the history of computer science that went from being handcrafted only by experts to be used by many (networking, databases, the web, and many others) and I think it will be the same for machine learning.

One of the GPT-3 demos that got folks most excited was a no-code web app maker. Do you think that natural language interfaces for no-code approaches to ML/AI/IA hold promise? If so, what do you think the timeline is for a viable product in that direction: 2, 5, or 10 years?
OpenAI is selling access to GPT-3 and there are customers paying for it, so it is already a product. How successful as a product it will be only time will tell, as the cost of running it is likely nonnegligible, but there are many other aspects in the success of a product than its technical merits.

Q: Charles
In the new release, you've added Transformers, which have taken NLP by storm in the last few years. The paper An Image is Worth 16 x 16 Words, under review at ICLR, suggests they might do the same to computer vision in the next few. Do you think that the future of DL is fewer and fewer architectures making use of simpler and simpler computations, as suggested by The Bitter Lesson? Is that phenomenon part of your long-term planning for automated deep learning?
Q (followup): Sayak Paul
Since Transformers are freer from many inductive priors (think about architectural design choices) so I definitely see this way of approaching DL architectures would be more and more relevant. Credits to Yannic for introducing the notion of inductive prior free architectures in his video on 16x16 words.
A: Piero
1) I don't think that paper does anything special, there have been transformers and attention used in computer vision for about 3 years. 2) not sure there is going to be any convergence. Practical machine learning is always about finding the right amount of bias for learning something, so the issue is not "let's find a single architecture that can learn everything" because we already know that the mlp is that architecture, the issue is "given a certain computational budget, what is the best model I can learn", and right now at least transformers are not always the best answers for all data regimes. 3) Answering Sayak, I believe 2 is already an answer also to your point, but let me add that without biases models can't learn anything useful, so we need to find the right ones (or, in NAS sense, we may want to create a mechanism so that the right ones can be discovered).

Siddhi Vianyak Tripath:
Hey Piero, why did you go with CLI as the main piece of the toolkit? Is it designed for any specific workflow where there are clear advantages of using CLI instead of a high-level programmatic API?
That is not accurate. Ludwig has a programmatic API and actually, in v0.3 we did work so that it is the centeriece of the codebase, so much so that before each command scripthad its own implementation, in some cases qith quite come code duplication, while now in v0.3 they all use the programmatic API under the hood, and in most cases that basically means that their implementation was standardized and went back to be 2-3 lines of code.

Amritesh Khare:
Hey Piero, What is the target user base for Ludwig? To me, it seems like a toolkit that lets developers automate the task of choosing the best architecture for a given task, based on the input/output features. The abstraction provided let's you focus on the task/results without worrying about going through all the research to choose the right model/encoder/combiner.
The audience is bored data scientists / ML people who don't want to rewrite the same code for every new project they work on, so the audience is me. I built the tool to make my own life easier, but I discovered that many other people were as bored as I was of writing data preprocessing, postprocessing, a training loop, visualizations, glue code, and so on.

Eric Schles:
What do you see as the future of auto-ml? What is the biggest channel in auto-ml today? Should auto-ml always be used to set a baseline or do you see a future where feature engineering is part of the auto-ml pipeline in a robust way? Are there any future plans to extend passed TensorFlow in the Ludwig API? Are there any plans to encorporate a larger suite of models passed neural networks?
putting ML in the hands of non-programmers, for real (which most automl solutions today don't do yet) not sure what do you mean by channel. If you mean application, I would say those applications where squeezing the little bit of performance has huge value (advertising?) but this may change quickly in the future (see the first point) I imagine a future where feature engineering is not needed not sure what do you mean by "passed", but yes the plan is to add more architectures to choose from, for that, I need help from the community as there are new models every day, and keeping up with all of them is difficult Ludwig provide a really simple way to add new architectures, even more so with v0.3

Krisha Mehta:
Hey Piero. What was your motivation behind developing Ludwig? Do you think we will be slowly moving to complete auto-ml space in the future?
See my answer to Amritesh, but in short, I wanted to avoid doing repetitive tasks, so I automated them so that I can focus on the fun parts of the job (writing models, running experiments, analyzing results). No, I don't think we will be moving towards entirely automl but I believe there will be a shift and fewer people will write their own models, like today for databases, not many people write their own indexing algorithms.

Yash Kotadia:
Hi Piero, why did you go with FastAPI for model serving? Did you also consider other alternatives?
I did some tests and it was the fastest I could find (plus it has additional very cool features with types). If you think there are better alternatives, let me know, as there's a lot I don't know on this topic.

Boris Dayma:
Hi Piero, what is the main focus for the future development of Ludwig?
There's a lot of things we can do, we have a backlog of feature requests and a lot of stuff we want to do. Other than adding more architectures and pre-trained models, we want to improve the preprocessing component to be able to work with non-local data, we want to make it seamless to do operations in parallel (for instance we are considering a ray integration) and we are also considering adding additional backends.

Hello, Piero! How are you doing? What makes you really excited about the power of Ludwig now, and what makes you really pumped about the potential, and the power of code-free deep learning in the future (however long that is)? Thanks!
Hi Ivan! I think there are at least 3 things I really like about Ludwig. One is the speed at which you can go from 0 to a workable solution. This in turn enables a much faster feedback cycle, and that's the key to unlock real improvement in applications imho. The second aspect is the fact that ideally, more people can use dl models thanks to it, and this again can unlock unexpected usages and new developments that are usually pretty cool. Finally, I think it provides a solid structure people can add to and expand upon, both in terms of contribution, but also in terms of building on top "now that we don't have to write code for building a model for task X what can we code on top of it?”. I forgot one more thing I'm excited about: multimodal and multitask models that with Ludwig become very easy to obtain, most other tools constrain you with their notion of the task, Ludwig tries to avoid it

Hi Piero! Thank you for this #AMA! With the recent advancements in the field of auto-ml, what should we consider as a good baseline? A common notion is to build a non-deep learning-based solution to get a baseline and then use deep learning to push the numbers. Where do you see auto-ml in this paradigm?
That's usually a good idea, but part of the reason is that you shouldn't need to spend a lot of time to build the baseline (although baselines deserve love). With automl even what has considered a complex model is very fast to obtain, so maybe this can change. Also what is considered a simple model and a good baseline is always shifting.

Ayush Thakur:
Hi Piero, thank you for your time. What's the spectrum of the areas to which auto-ml can be applied? My head can easily imagine auto-ml applied to standard tasks with datasets built-in standard way. What kind of flexibility auto-ml offers now?
It depends if you talk about automl in general or if you talk about Ludwig. In most automl tools you are constrained by the tasks that the designer decided for you. In Ludwig there's no notion of the task, tasks are implicitly defined by your choice of inputs and outputs. I think it's one of the main interesting things about Ludwig.

Hi Piero. Given PyTorch's growing popularity, are there any plans to implement a PyTorch backend for Ludwig?
We are considering it. But there are also some downsides to it: 1) from the Ludwig user perspective, it doesn't really make a difference 2) the situation is ever-changing, Jax is gaining popularity too, and other frameworks may too in the future, so I prefer spending time adding interesting features than always rewriting the same stuff on top of different libraries, 3) the core development team of Ludwig is very small (currently me + 2 other people who helped with this release, Travis and JIM) so we have to be careful on what we prioritize and adding a backend may end up meaning double the work. That said, we are actively discussing the possibility, and opinions from the community are more than welcome.

Kyle Goyette:
Hello Piero, are there any plans to integrate reinforcement learning algorithms into Ludwig?
Not at the moment. Although in practice you could already do some forms of reinforcement learning with Ludwig by writing some minimal code that manages the outer loop of interacting with an environmnet and collecting data, then you cna use Ludwig for training an imitation learning model. But I can see how specific losses and model architectures that people specifically use in RL could be added to Ludwig, it's just not at the top of our backlog right now (but we would welcome collaborations and contributions).

Veronica Jung Yeon Kim:
Hi Piero! What kind of insights can we draw from an auto-ml process being applied on a dataset? And what’s the interpretability models generated by auto-ml? Suppose I am working with an X-ray dataset and use auto-ml to train a model for it, will it be able to explain why the model it picked was picked, and why the other models were a bad fit? And to extend this question further is auto-ml applicable to specialized data such as X-ray or satellite images? Thanks a lot!
Automl doesn't do anything magical, it just figures out which model is better among a set of possibilities (hopefully doing something smarter than trying them all ). So the interpretability of the model obtained through automl is the same interpretability you would get by training that same model yourself. And the same is true for xray data or any other kind of data. that said if you have a task that requires interpretability you can either constrain the automl process to use only models that you would know how to nterpret, or you can try to use tools on top of the models that try to give you interpretations, like lime or shap

Hi Piero, How much of an engineering challenge was porting the backend from TF1 to TF2? How much more work would it take to add a new backend such as PyTorch?
It was not as challenging as difficult, but it was a lot of work, more than I originally expected. At the same time, it allowed for reworking the internals in a more modular and more testable way (tf1 with all the global state a non-inspectable graph execution was problematic in this sense). Now adding a PyTorch backend would be way simpler, as most of the architectures are implemented as an object with a init and call function, it would be straightforward to write a forward function instead and modify calls from tf.something to torch.something. Plus obviously some other adaptation, but it would be much much less work than porting from tf1 to tf2.