Comprehensive CV System To Help Cars Understand The Environment. Forrest Iandola, Ex-CEO @ DeepScale
Updated: Oct 31, 2019
This episode features Forrest Iandola, former CEO and Co-founder of DeepScale, a Silicon Valley based startup aiming to bring advanced efficient perception capabilities to edge devices. He shared in-depth insights on the development of deep learning networks and how DeepScale's unique computer vision technology can help vehicles understand the environment.
Forrest completed a Ph.D. in electrical engineering and computer science at UC Berkeley, where his research focused on deep neural networks. His best known work includes deep learning infrastructure such as FireCaffe and deep models such as SqueezeNet and SqueezeDet. His advances in deep learning led to the founding of DeepScale in 2015.
Forrest Iandola joined Tesla as a senior staff machine learning scientist on September 30, 2019. This interview was conducted when Forrest was in the CEO position at DeepScale.
Highlight 1: Difference between SqueezeNAS, SqueezeNet & SqueezeDet
Highlight 2: How was DeepScale created?
Robin.ly is a content platform dedicated to helping engineers and researchers develop leadership, entrepreneurship, and AI insights to scale their impacts in the new tech era.
Subscribe to our newsletter to stay updated for more inspiring talks:
1. What Does DeepScale Do?
Margaret Laffan: Forrest, welcome. Thank you for joining us today.
Thanks for having me.
Margaret Laffan: Let’s talk about DeepScale. You founded DeepScale when you were still getting your PhD at UC Berkeley. Tell us more about what DeepScale does.
At DeepScale, on the technical side, we focus on making the smallest possible neural networks that still produce good results, so designing neural networks that fit on tiny, tiny devices that can fit in your pocket, or in the case of our core product, could fit in a low-cost car. Our main customer-facing effort is building computer vision systems to help cars understand their environment.
Margaret Laffan: Can you tell us a bit more about the type of processors that you’re putting this into, or the applications of this is going into?
Sure. In terms of the processing platforms, we, of course, can run a lot of this stuff on fairly low-end NVIDIA GPUs, replacing autonomous vehicle prototypes that often have full trunk of servers with lots of GPUs in them. We can replicate a lot of capability depends on those in terms of deep neural network inference for computer vision and something as small as a single server or less.
We can also take what used to run on a single server and scale it all the way down to something even smaller than the smallest devices that NVIDIA makes. For example, Renesas is one of the top automotive chip companies. They make these really, really small kind of tensor processing unit style devices that are lower-cost and have lower footprint than the smallest NVIDIA devices you can buy.
2. Ph.D. Research Influence On DeepScale
Margaret Laffan: How did your previous research at Berkeley influence DeepScale, especially SqueezeNet and SqueezeDet?
I guess my story during my time at Berkeley is, I came in very interested in two things, parallel computing and computer vision. And those are kind of separate research fields. But I was, for a number of reasons, very into both of them had some great mentors on those as an undergraduate. I met Kurt Keutzer, whom ultimately I decided to have as my Ph.D. advisor. He actually works at the intersection of parallel computing, how to make things run fast, as well as machine learning and computer vision.
And up to a certain point, you can kind of change the underlying implementation, add more parallelism, make it more kind of optimal in terms of making a computer vision model or neural network run fast. Beyond a certain point, what you need to do is to start changing the design of the neural network itself. And you can start changing the design to target having less memory usage, having less computing required or even more nuanced things.
As we embarked on that research direction, we initially tried doing what is popular today which is a neural architecture search, trying to automate this. We picked a search space that is sort of similar to AlexNet, which is a very popular model four or five years ago, and we didn’t find anything that interesting. But we realized that we need a better starting point for neural architecture search, a better search space to search in.
Before I even got around to doing neural architecture research on that search space, one of the models I found during a warm-up exercise was what we now call SqueezeNet. So we basically changed all the dimensions of the neural net, and we had all these things that the computer could fiddle with in terms of dimensions, but some of our initial guesses were so good that we got lots of interest from industry in the research. The most frequent people who would be excited about it were in the automotive industry. They came to us saying “We’ve got these really high-quality neural networks, but we cannot fit them on affordable devices.” So they wanted our help with that.
Margaret Laffan: Back in 2016, when you’re doing this, edge computing wasn’t as popular as it is right now. So what inspired you to develop those tiny DNNs to work on the accelerators/processors? And I know you started to help us understand that a bit more as well.
I think a lot of things that require a lot of computing like neural networks often start in an environment where you can use a lot of computing, like say a server farm. And that’s kind of where the first applications start to take off, things like mining social media data, things like organizing the web information. Those are the initial applications that I think were part of the reasons why companies like Google and Facebook went so far, they’re pushing neural networks so hard and spending so much money on it.
But when you look at how an individual interacts with their devices, a lot of the things that they’re doing don’t actually need to talk to the cloud. If you’re talking to your phone so I can recognize your speech, or you’re taking pictures of things, there is no real reason it has to get sent up to the cloud unless the neural networks or other algorithms just can’t run efficiently on your phone. So I think if all the people out there are using neural nets everyday on their smartphones having a cloud to do it, that gets very expensive both in terms of plugging up our data pipes as well as it is very expensive in terms of cloud time.
So our vision for a long time has been what are the things that have started to work on the server side that we could get the same quality results on an embedded device or on a smartphone, for example.
4. What’s Next?
Margaret Laffan: As you see this developing and deep learning moving into the future, what do you see as being the next thing that will happen?
So what comes after this in deep learning - I think each deep learning application is in a slightly different state. So I would say three of the biggest application areas for deep learning today are computer vision or more broadly, understanding your environment, your surroundings based on images, based on lidar scans, based on these sorts of things, but making sense of your surroundings. So that’s computer vision kind of stuff. Then there’s speech recognition and more generally, audio analysis, so what am I saying? That kind of thing. Then there’s actually natural language processing and more generally, text understanding, so looking at the text and making sense of it.
They are each in very different situations. Computer vision had a breakthrough that really starts migrating to deep neural networks for a lot of our problems, moments in around 2012, 2013 time frame when AlexNet won the ImageNet competition. Within a year of that happening in 2012, there were a huge number of people working on computer vision who had started trying to solve whatever problem they’re solving with deep neural nets and often getting very good results.
Since then, in three or four years where accuracy improved a lot and still improving somewhat, but it’s tapered off a bit. Now the name of the game for even very mainstream people, conferences like computer vision and pattern recognition (CVPR) has become, how do we make this all run efficiently in an affordable environment? So it’s like “Let’s get things working really well, then let’s shrink them down.”
I would say speech recognition is something where so much of the research is happening behind closed doors at places like Google and Baidu, for example, because labeling data for speech is very, very expensive. And open research on speech recognition is not quite a ghost town, but it might not be the wrong idea if you think of it as a ghost town, I mean it’s so moved to just behind closed doors and industrial stuff, so it’s hard to tell how that’s going.
I would say natural language processing is where computer vision was in 2013, where natural language processing did jump over to some form of deep neural network, recurrent neural networks, LSTMs (Long short-term memory), a few years ago and that improved the results a lot. But now a lot of core problems in natural language processing had this big boost in the last 12 months in their accuracy with the rise of BERT, the algorithm from Google.
Margaret Laffan: I know we’re going to talk about that later.
5. Difference between SqueezeNAS, SqueezeNet & SqueezeDet
Margaret Laffan: Your most recent work done with SqueezeNAS is also helping DNNs on edge hardware. What’s the difference among your other papers, SqueezeNet and SqueezeDet, the difference between the three? How have they evolved?
It’s been sort of a progression. SqueezeNet was a neural net that we designed manually. And the target application was image classification, which is the thing that you often do when designing a neural net and you retarget it for other tasks later on.
SqueezeDet was designed for object detection. It’s kind of just saying what is the overall contents of this image? What are the individual objects? What are their positions? What are their categories? SqueezeDet did that in a highly efficient manner.
More recently what we’ve seen is, first of all, there are more computer vision problems out there than ever because deep neural nets are working so well. And second, there are more deep neural net oriented processors out there than ever before. There are so many big companies as well as startups are developing deep neural net processors. And what we’re seeing is the neural net task that you’re solving, the problem you’re solving - object detection and classification, segmentation or something more exotic - and the processors you’re running on, depending on what those are, your neural net design may need to look totally different, or if you get the absolute maximum speed, you can get unmaximum accuracy. And so having humans do that work is becoming more and more intractable.
In the last couple of years’ neural architecture search, basically what some reporters call this some kind of an oversimplified way is "AI to create AI”, has really begun to outperform what human experts are doing in terms of neural net design to get the best speed and accuracy.
To answer the last part of your questions on how SqueezeNAS is different, so SqueezeNAS as one of the latest in a line of work in the research community about neural architecture search, the early work from say, Google for example on reinforcement learning-based neural architecture research required a lot of time to search for the right model, like thousands of days worth of GPU or TPU time. And SqueezeNAS requires more like 10 GPU days to beat humans at some challenging tasks.
Margaret Laffan: More affordable as well.
Precisely. So someone with just a server with a few GPUs in their closet could actually make some progress in a weekend.
6. Starting Up DeepScale
Margaret Laffan: You talked earlier about professor Keutzer, and he, of course, co-founded DeepScale with you. We know he’s a very accomplished professor at Berkeley and has been an investor and advisor to many startups. What was that process like for the two of you coming together and creating DeepScale?
This was actually the second or third company that we thought about starting. So when I came out of my undergraduate degree at University of Illinois and went to Berkeley and joined Kurt’s lab, Kurt and I had not just our research interests in common but also our entrepreneurial interests. So Kurt had been an early employee at Synopsys which ultimately went public, and he was the CTO there. He went on to invest in and advise a number of startups. And he was sort of waiting for the right grad student to come along who wanted to start a company that he was interested in.
We looked at a lot of different things. So early on we actually looked at starting a company together on digital advertising. So basically figuring out what are the products in YouTube videos, and then putting the right ads on them. We made some progress there, but ultimately it didn’t turn out to be what we wanted to do.
But anyway, during that process, I actually almost dropped out of grad school at least once. And what Kurt convinced me to do is, “Hey Forrest, why don’t you work on research that might actually be useful to the kind of thing that we might build.” And we’re thinking about what could we do with AI at the edge and then he said, “Ok, why don’t we do some open research in this area, show people what we’re able to do, give it a little bit away and see where it goes?”
Margaret Laffan: How do you feel having gone through that for the last few years? Are you glad to get stuck with this?
Forrest Iandola: It’s pretty amazing. I feel like I’ve gotten a year worth of life experience every couple of months of this.
7. What’s Changed About Autonomous Vehicles?
Margaret Laffan: I’d like to talk to you about autonomous vehicle. The attention on AV has been increasing at a rapid pace, of course, these last few years and of course it’s been progressing at a rapid pace as well. As someone who has observed first hand the growth of this space, what would you say has changed the most?
I think in the last two or three years, people have built more mature systems that also have maybe more specific constraints on what they can do. So autonomous driving has gone through several sort of Gartner hype cycles going back to the late 80s. I think the most recent iteration of that started in 2010-ish with Google really starting to invest in autonomous driving and maybe peaking around 2015, 2016. When you try to scale that to millions of users, you start to find more and more edge cases that maybe you didn’t originally designed for.
And so I think now with that kind of observation of, it’s not that hard to make a demo that works once in a while for just to show off, but to make this work day in and day out, we have to be very specific about what problem we’re solving, and where it’s going to work and where it’s not going to work at first. And so I think that led to people who are doing fully autonomous driving being much more precise about geofencing, so limiting the operating domain where the vehicle would work for the time being, and people who are working on more consumer grade vehicles with driver assistance, collision avoidance, lane centering similarly are being more precise about what they will bring out for that technology.
Margaret Laffan: And of course everything needs a lot of training data as well.
Data is very expensive.
8. What’s DeepScale Looking to Solve?
Margaret Laffan: What problems in autonomous driving is DeepScale looking to solve?
Our core problem that we’re working on is helping the car understand its environment. Some people call this computer vision, some people call this perception, where the lanes, where the signs or the other vehicles. This is the data that we provide to the motion planning system and control system that typically our customer, an automaker, or a tier-one supplier to design.
9. Opinion on the Evolution of NLP Architecture
Margaret Laffan: We also want to talk about NLP, because I know you’ve completed some research in this space and we’ve talked about it earlier as well in the interview. The NLP research community has many exciting gains which you referred to as well with Google BERT and XLNet. We’ve also interviewed with Zhilin Yang, the author of XLNet. What do you think of the evolution of NLP architecture such as XLNet?
Good question. I guess the brief history of this is, after various approaches with recurrent networks and LSTMs, BERT was sort of the splashiest in the line of work from Google and others focused on different things, instead of using recurrent models, they use attention models. Those have been around for a while, but in fact, maybe Neural Turing Machines around 2014 might be a good starting point. But BERT was sort of demonstrating, there may be hundreds of different NLP tasks that matter, but the BERT paper showed that at least 10 different very popular, very hotly contested for best results of NLP problems, attention networks work very well.
Now I think there are two of the things that are going on. One is we’re seeing a lot of exploration on the design of the attention network. XLNet has slightly different design of an attention network. I think there will be many more like it. I’m tinkering with some of these myself in my spare time. And then I think there will also be a very stratified range (17:05) of attention networks available to you from very, very low resource ones to very, very expensive ones. Right now we only have very expensive ones for the most part. And the other thing that I see happening is a proliferation of everybody who has a favorite natural language processing problem is going to see what attention networks can do for them in terms of improving their results or changing their approach.
10. DeepScale in the Next 3-5 Years
Margaret Laffan: Forrest, where do you see yourself and DeepScale in the next three to five years?
I think over the next few years, I’d like to - maybe more on the order of 10 years - be able to point to overwhelming majority of vehicles are safer on the road because of our technology and because of what we are doing. I think in the next three to five years, I love a viewpoint that all the cars at least in select car dealerships that have our technology and then able to drive well and are able to do sophisticated things that previously were reserved for very expensive vehicles.
11. Suggestions for Engineers / Researchers
Margaret Laffan: Any suggestions or advice for engineers and researchers who want to be entrepreneurs?
Just start. Tim Ferriss(18:20) [Correct] said that take the money you might spend studying entrepreneurship as a degree and go try to start a company, you probably will learn more, practical wisdom about how to do this.
Margaret Laffan: What is the hardest thing that you’ve learned in your journey?
I think business is a lot like computing. It’s sort of like a nerdy comment, but people who are deep in computer science will tell you that there’s this thing called Amdhal's Law(18:45), which says that the speed of your program that runs is limited by whichever thing is slowest, which is sort of obvious. And, as you parallelize -- or from a management perspective delegate -- some aspects of it, well great, you sped up that part but there are still bottlenecks and then become the big part. So I think you are limited by what you’re worst at.
And I think in my journey at DeepScale, there’s been someday where I could say the thing we are limited by is recruiting; the thing we are limited by is finding the right customers; the thing we’re limited by is being able to run neural networks fast enough. And I think part of being a leader is figuring out what is the next bottleneck will be and how I can get ahead of it.
Margaret Laffan: Forrest, thank you so much for your advice. And also thank you very much for that perspective as well. I think that helps a lot of people out there who are in the process of building or on their journey in different ways. Thank you for joining us today.