Updated: Feb 19
Robin.ly Exclusive Interview at NeurIPS 2019
Dr. Peter Richtarik is a professor at the King Abdullah University of Science and Technology primarily working in the area of big data optimization and machine learning.
He is known for his work on randomized coordinate descent algorithms, and stochastic gradient descent. He is also the co-inventor of Federated Learning, a Google platform to improve communication efficiency, as well as data privacy. He has 12 papers accepted at NeurIPS.
In the interview with Robin.ly, he shared deep insights on the importance of data privacy in the AI era and the current limitations. He also reviewed the development of machine learning and deep learning in the past decade and discussed some of the key trends and challenges in the coming 10 years.
Robin.ly is a content platform dedicated to helping engineers and researchers develop leadership, entrepreneurship, and AI insights to scale their impacts in the new tech era.
Subscribe to our newsletter to stay updated for more NeurIPS interviews and inspiring AI talks:
NeurIPS 2019 Paper Highlights: Randomized Second-order Methods
Margaret Laffan: I'm here with Peter Richtarik, a professor at the King Abdullah University of Science and Technology. Peter, welcome, and thank you for joining us this morning.
Peter Richtarik: My pleasure. Thanks.
Margaret Laffan: Your research group has 12 papers that have been accepted by NeurIPS this year. Can you describe one of those papers and share some of the highlights that impressed you most?
Peter Richtarik: It's difficult to pick one of these papers. There’s some work on reinforcement learning which, from the viewpoint of optimization, it's a zero-error optimization, optimization with a gradience. You only get a reward function evaluation and you try to optimize that way. There's some work on gradient time methods, but maybe what I like to talk about is randomized second-order methods.
The machine learning models that are normally trained with what are called “stochastic randomized first-order methods". And there's a very good reason for this because they're scalable, they're very fast. And they scale to big data and big models very well. But they have one huge difficulty, there is one big issue with them. And the issues that they depend on are so-called “conditioning” of that problem. So the data could be bad, and then the methods are very slow; the data is good, the methods are very good.
So in order for us to make these things as fast as possible, with these first-order methods, we tried to do all kinds of tricks in order to reduce the bad effect of this bad conditioning. So we may be using land search, we may be using something like important sampling, we may be using mini batching, we may be using adaptivity, which I do to collect some curvature information.
There are all kinds of tricks that we try to employ. And there's one trick that everybody knows would work. And this is using Newton's type approaches. But it only works in theory, in practice, nobody was able to make this really work very well. And we have one breakthrough paper, I think, which I hope in the future will be the basis of something that will work very well in practice. At the moment, it is theoretical, where we developed the first randomized Newton method, which has certain properties that previous attempts didn't have. I don't know how technical I can be or should be, but one of the properties is that we can look at a single data point at a time, which is the norm when training machine learning models, or small numbers, so-called “mini-batch” of data points at a time, and compute only the first and second-order information coming from the model and the data point, only of that, and we can get a rate for training, which is independent of any condition number.
Math and Computer Science
Margaret Laffan: As a researcher with a strong math background, what is most interesting to you about the intersection of math and computer science?
Peter Richtarik: In some sense, there’re parts of computer science which is a subset of mathematics, so-called “theoretical computer science”, but computer science is a wider field. It includes ideas coming from engineering and other fields as well.
For me, what is really interesting is the search for truth. I like mathematical truth. When you can prove something, there's a theorem which describes reality, you know it's always true, and it can be the foundation for some other discoveries later on. And now there are these truths to be found in computer science. So when we build algorithms, for instance, in the case of machine learning for training machine learning models, then there are some simple settings in which we know these things work, and we know how well they work. These are actual rules, actual laws of nature, so to speak, and we know exactly what happens. But unfortunately, in the simplistic scenarios, this is not exactly how we train the models. We have to go beyond that. And then there's lots of engineering that needs to come into play. So this is one of those things.
Another thing is when practitioners come up with some ideas, which seemed to work, but there's no proof that they work. So for instance, there's an idea I tried to quote something and I'm surprised that it works 10 times better than something before. Is it just a coincidence? Am I lucky? Maybe on this dataset it works, but I tried another one, it doesn't. We see this over and over again, all the time. Then theoreticians can come in and try to figure out, is it just coincidence, or is this something to be discovered? Can we, again, push the boundaries of what we know? So that we have foundations, which are a little bit more elevated than before? I like this kind of convergence of theory and practice.
Stochastic Optimization & Data Privacy
Margaret Laffan: You talked about stochastic optimization, and I know you're doing a lot of research here. Can you talk about the interplay between data and machine learning? Can you tell us some more around what this entails and potential real-world use case where it can be applied?
Peter Richtarik: In fact, all of the training that we do right now in supervised machine learning, I would say 99% of all the buzz that's out there, there is an element of stochastic optimization behind this. Stochastic optimization is the procedure with which we train machine learning models. And now the question is, why do we inject randomness into the process? It seems counterintuitive. Randomness is random, maybe you get a random result. But that's not the case. The randomness is injected because we want to learn faster. It turns out, one can show this empirically and also theoretically. You inject the randomness in some very funny, interesting ways. There're many ways you can do this and this is an active area of research. Where should the randomness be injected? In what way? And so on. And then everything is better, faster.
Margaret Laffan: When you think about a real-world use case, where do you see this being potentially applied?
Peter Richtarik: This is not being potentially applied, this is being applied as we speak in all the deployed products. For instance, in federated learning and distributed learning, when you want to send messages from mobile phones to some server, you would not like to reveal privacy. You would like to maintain privacy, not reveal private information. So one of the things that you could do is, you could add noise to the message that you want to send. And by adding noise, you obfuscate the data point, whatever you want it to transmit. And in this way, you achieve a certain level of protection.
But at the same time, there's a very interesting phenomenon going on which we found out in my group - and this is some other work that we've done - that this procedure, in fact, even improves the training procedure. It seems very counterintuitive. So what we try to achieve is privacy. But we realized that this mechanism also improves the training. So this is very, very interesting, and it can be captured theoretically. And we can see this in practice.
Limitations of Current Data Privacy Application
Margaret Laffan: And you're starting to cover one of the questions that we wanted to ask about, of course, because you're the co-inventor of Federated Learning, which is a Google platform to improve communication efficiency, as well as data privacy. I know you started to talk a bit more around that because data privacy is so important to all of us, especially in AI. What are the current limitations in managing data privacy?
Peter Richtarik: From my point of view, we're at the beginning of this whole idea of trying to merge three different fields, which is privacy, cryptography, and machine learning. There’re lots of things that need to be done. There’re some of the things we don't properly understand, but we already see when we put some systems together, that they do work to certain extent, and they're practically useful. And that is why they are deployed, that’s why companies are starting up and having products to sell. So we see a huge potential there.
But from the research perspective, there're lots of and lots of things that we don't understand. One of the things, for instance, is exactly this precise interplay between machine learning, the training face, and privacy protection mechanisms.
We don't even know whether we’re using the right privacy protection mechanisms, and we're using them in an isolated way. For instance, there is a secure multi-party computation. There are encryption mechanisms such as sparsification and quantization mechanisms, homomorphic encryption, etc. We don't have a product which combines all of them. And we don't know whether there're some other encryption mechanisms, which we haven't yet explored. And at the same time, once we do explore them, how to combine them with these training methods. And once we use the training methods, what network architecture should we use? There're many, many questions of this type. And I'm very excited that we can be asking these questions, testing them in real life, and seeing how that works.
Machine Learning In The Past 10 Years
Margaret Laffan: Taking that and when you think about the trends and challenges in machine learning over the past 10 years, what do you see have been the main learnings from that?
A lot happened over the last 10 years. When I started doing something with machine learning, we didn't have the boom, the deep learning winter was still very deep, and it was freezing. And nobody was really talking about it at that point in time. So a lot happened over the last 10 years. And this is very difficult to recapture with a few sentences.
One thing that happened is deep learning, which everybody talks about, which is one of the most exciting things that happened to machine learning. In the last 10 years, even less than that. Of course, there has been deep learning around for a very long time, but this time is going to stick I believe. But what we need to do is, we need to combine these technologies with other things. Deep learning on its own is super powerful, but it will not survive, it will not solve all the problems. For instance, we need to be using some rule-based methods combined with deep learning. And this is very apparent in applications in natural language processing, autonomous vehicles and so on and so forth. If we just rely on that single tool and nothing else, we will not succeed.
We need to be questioning all the foundations, we need to be questioning whether the way we are training at the moment which is through the empirical minimization paradigm is the right thing to do. For instance, people in finance, they have known for a very long time that you have to take risk into consideration. So there's this very famous mark of its model for building portfolios, which takes into account the expected reward in correlation with risk. So you want to have a lot of rewards but only if the risk is mitigated and bounded. But this is not exactly what we're doing with machine learning right now. We are in this pre-Markowitz world where we don't take that risk into consideration. And again, this is one of the big things that we need to be doing. So we need to transform everything by capturing the risk as well because we don't want to have cars, which on average perform very well, but every now and then, they hit a building. So we need to avoid that risk.
Future Trends in AI
Margaret Laffan: When you think about the next 10 years, what do you see as being some of the things that we should look out for? Which are going to be game-changers in the next decade?
What's happening at the moment is, a lot of stuff that was done in research labs, in industry, and in academia is becoming a mature technology, which is finding its own way into products. Right now, I see a turning point that this is going to be transforming industries over the next 10 years. So we are moving away from a research-focused field to an engineering-focused, product-focused field, where research will be super important that we already have figured out so much that those things we already know right now can be immediately useful. And we didn't yet tap, even the surface of the implications that we could be really getting out of this. So there are lots of things we can do with machine language. There's no company in the world that is doing it right now. And these companies will need to be built and be doing these things.
I think we'll see a lot of this, but at the same time, we have challenges all over the place. For instance, with data. One question about data is, do we have enough data? All the machine learning models are very hungry for data. And sometimes we have enough data but the data may be of questionable quality. A lot of effort needs to be put into cleaning the data. Do we pay some people to do that? If we have lots of data, this is even impossible. You upload hundreds hours of videos to YouTube every minute, we cannot be paying people to be looking at these videos to check for content. So we need automated data cleaning procedures. In some regions, we don't have enough data. So we have to be asking, can we learn from less data? Can we do zero-shot learning? Or can we maybe automatically generate some data through simulations or through some other procedures, so that in the world where we don’t have enough data, we simply generate it so that these machine learning models can be trained.
Then, with all of these, we have privacy issues. Where's the data coming from? It's usually coming from human activity, personal activity. We do something on our phones, we do something on our computers. We click online, we have health records, and we do not want this data to become public. There have been lots of issues with data leakages, with breaches. And it’s very clear that public is not very happy about these things. We have GDPR (General Data Protection Regulation) last year in May, China followed this later on. So it's increasingly important that we protect the data that these machine learning models are really thriving on. And now this is creating new challenges. And I think this is one area, which I'm interested in particularly. And over the next 10 years, I think we'll be building this.
AI Adoption in Saudi Arabia
Margaret Laffan: Peter, one more question for you. You're based in Saudi Arabia, how is AI adoption in Saudi Arabia?
There's lots of stuff happening around AI in Saudi Arabia. For instance, I’m at the King Abdullah University of Science and Technology. We have been talking about an AI initiative for a number of years, and we have finally launched it. The entire university is going to be transforming in a certain sense through AI. We have autonomous vehicles on campus as we speak right now. And AI is impacting science, for instance. This is something that may be not many people talk about. But AI is not just interesting for industry, AI is becoming a tool for scientific discovery. And this is something which people at universities in areas such as biology, chemistry, engineering, are really excited about. This is one aspect.
Another aspect is that we have national AI strategy in Saudi Arabia, in a similar way that there are lots of national AI strategies all over the world in many different countries, those countries increasingly realized that this is not just a wave that's going to go away. Now some of it of course is high, but lots of it is here to stay. And we need to think ahead, and we need to build these strategies and that is what we do at our university in Saudi Arabia as well .
Margaret Laffan: Peter, thank you so much for your time today. It was a pleasure talking to you here at NeurIPS, and we hope you have a wonderful rest of your day.
Thanks for having me.