Updated: Jul 11, 2019
This episode features Dr. Zhou Yu, Assistant Professor at the Computer Science Department of UC Davis. Her research focuses on natural language processing, multimodal sensing and analysis, machine learning, and human-computer interaction. She led a student team to win the 2018 Alexa Prize for creating the best chatbot and advancing conversational AI. She is also featured in Forbes 30 Under 30 in Science list. Dr. Yu received her PhD from the Language Technology Institute at Carnegie Mellon University in 2017.
Dr. Yu shared her experience in the 2018 Alexa competition and her view on the current landscape of conversational AI.
Highlight 1: How would BERT impact Natural Language Processing Research?
Highlight 2: The challenges in conversational AI
Robin.ly is a content platform dedicated to helping engineers and researchers develop leadership, entrepreneurship, and AI insights to scale their impacts in the new tech era. Subscribe to our newsletter to stay updated for exclusive content & events:
Wenli: In this episode, we have the pleasure to have Dr. Yu to join us and talk about her interest in NLP and her journey in this field.
Wenli: Thank you for joining us here. So I have seen some of the previous interviews that you have done and you said that when you were applying to the doctor program in CMU and the one thing that made you stand out was because you are a double major in computer science and also linguistics. And linguistics is truly a major that lets you understand one of the most intriguing aspects of human knowledge and behavior. So I wanted to know how linguistics contributes to your current work?
Yeah. So my area of research is natural language processing. Some people may call it computational linguistics. So it’s really combining machine learning, statistical methods and the linguistics knowledge together, to solve problems that touches linguistics effects. We were mostly trying to say how can we inject a linguistic knowledge, some motivations behind linguistic theories, knowledge as observations to statistical models to build better systems to solve practical problems, such as natural language understanding (NLU), generation and for example, dialogue planning which I do a lot as well.
Wenli: Dialogue planning? Can you tell us a little more about that with linguistics background?
Yes. So if you look at people talking to each other, we wanted to understand in terms of structure of this dialogue. A system basically can be trained with such structure to understand conversational flow better, so it can come up with better reply with respect to different individuals. So trying to make the system so intelligent, you have to understand the semantics behind what people are talking about. You want to understand syntax behind the user’s utterance. For example, is the user asking me a question or giving me an opinion? So these information basically are very tightly connected to all these linguistic theories.
Wenli: Recently, your paper, “Unsupervised Dialog Structure Learning" was accepted by NAACL. What was the technical breakthrough that you presented in this paper?
This paper is mostly about using some unsupervised learning methods, some of the changes that we made to a version or recurrent neural network. We wanted to basically try to automatically without any supervision to learn dialogue structure from human-human conversations, so that we can use such structure to help building better dialogue systems.
So current dialogue systems usually have two types of pipelines. One is usually what we call a rule-based dialogue system that the entire dialogue manager is triggered by a big diagram, so you are fully from a one-block dialogue state to the other according to the rule. The other one is called end-to-end trainable or statistical based dialogue module, which means that your dialogue model is structurally a statistical policy which has all these kind of transition probabilities between states. So what we wanted to propose is, if you don't have any labeling from the dialogues, for example you can’t annotate ads and stuff like that, then you just run our algorithm, we will give you a flowchart, just like the rule-base system. And in the flowchart, we also give you the probability of how these states transition to each other. So if you want to build a rule-based system in industry, you can use our learned structure as the first step for your expert to work on, it saves you a lot of time. And it helps you to understand your data better. If you want to build a statistical method on your statistical dialogue module, then you can use our learned structure with the transition probability to design a better reward function for our RL-based methods.
Wenli: I know that your most recent accepted paper must be your favorite paper so far, but what’re some of the breakthroughs, some of the progress that you have made compared to the last paper that you have done on this?
It's sort of a line of my recent research. I wanted to try to promote a lot as learning with less supervision. We want to build dialog systems with less supervision, because in practice, nobody can give you a thousand label dialog for you to train a good system. Most of the companies when they come and say: Oh, I want to build a customer service system. Here are the 20 dialogues we have.They’re from like some of the customer service people write out from top of their mind. So we're really looking at in terms of how can we reduce resources they need to build a good dialogue system to start their customer services? So we have a line of research that is trying to solve these problems in different aspects. So the previous paper we mentioned is just one aspect about how do we us unsupervised learning method to help us to learn structures to build better dialogue systems? We have some follow-up work on how to use meta learning, all these other transfer learning methods to adapt learning models from other rich resource domain to low resource domain in dialogues. Some people call it one-shot learning or few-shot learning.
We also have additional papers, tried to explore different ways in incorporating supervision to reduce a number of dialogues needed to build better systems. So this is a one-liner research that it's different from my previous research mostly. If you know my previous research, I mostly worked on multimodal dialogue systems: How to combine different streams of information - vision, acoustic language together to have a better user experience in general. This line of research I think, my learning with less supervision research is also very important, very practical in a way to directly help the industry to be more willing to adopt ML-based dialogue system pipelines.
Wenli: Just like you mentioned that there are needs for NLP, especially unsupervised learning with chatbots right now in the industry. And some research have shown that chatbots will save business up to eight billion dollars in 2020. And we know that the field of your research are receiving more and more attention from the industry at this point, like you said that the companies have their needs and your research are meeting their needs. So I'm curious about how much influence the industry has on your research?
They definitely influenced my research in a way - I talk to the company people all the time. I give talks at different companies like Facebook or Google, different companies - to understand what they are working on, what they really need in terms of algorithms or models or data to help them to build better models that could reach to millions of people. And they also are giving us tremendous help in terms of funding and resource to support my research, to help me and help my students in general to pursue more consistent directions that we think is very promising.
We definitely think there are differences - there should be differences between industrial research and academic research. Academic research in a long way, we hope we can do more fundamental research that is also more forward-looking, so that the companies could take advantage and develop their own methods on top of it to apply to their more specific applications. We also would contribute more in terms of creating more standardized platforms and generating better data and evaluation metrics, which would serve for their great purpose.
Wenli: Yeah. So you know that those big companies like Amazon or Facebook, they have their own research labs, and they have their own research scientists. Do you think that you have more freedom in the academic world comparing to just working for one company and facing the market being applicable?
Yes, there are definitely differences here. Also in industry, you definitely see this kind of different granularity in terms of how close their research is connected to their products. But I definitely see academia to be the place that can give you the most freedom in terms of what direction you want to go in a research.
Wenli: Speaking of teaching, I know you have been an assistant professor for almost two years now. And there’s a huge jump from being in the graduate school to becoming someone that's teaching students and having students with you. So is there any difference in perspective that you learned from this big jump, or do you think that any of your previous professors have some influence on you, like the teaching style and stuff like that?
Definitely. It's a huge difference comparing my previous life as a graduate student to my current life as a professor. What I really learnt from my previous advisors and in general other senior faculty members are, your job as a professor is creating a good environment that is supportive for collaboration, supportive for innovation so that all the students will benefit. So this good environment would attract good students, and in the same time, the students and this environment would help each other, learn from each other and collaboratively generate better work. So my job is this kind of facilitation work. I need to make sure the funding is there, and the funding is not attached to specific things that students do not have the freedom to change their subjects. I also make sure our work has been accepted by the bigger community to make sure other people would benefit from our work as well.
Wenli: So basically you were a individual contributor in the team, and now you’re like driving this cruise with you and you're taking definitely more responsibilities, I was thinking.
Yeah. The responsibility is bigger, because mostly it’s like you're not by yourself, you have to be responsible for other people as well.
Wenli: Going back to what we were just talking about. We were talking about the Google brand and you mentioned about BERT(Bidirectional Encoder Representations from Transformers). And some people in this industry are saying BERT is changing the entire NLP. What's your thought on that?
First of all, I definitely acknowledge that BERT is a good innovation. It really helps a lot of downstream tasks. But what I would really say is, it doesn't really say like other people do not have any work to do anymore. Representation is very important and fundamental, but it's just a representation. So basically everybody else has a better baseline system. So the innovation should be in individual tasks and everything else. So in general, we have a lot to do based on what BERT has already done. I definitely think we want to make sure for a specific individual test or for a specific type of model, we want to make sure that if the innovation we introduce based on BERT can improve the results, then that's the best.
Wenli: A really exciting thing to happen recently is that you and your team won the Amazon Alex Award. So basically it’s a competition rewarding the team that comes with the chatbot can maintain the longest conversation. Your team had an average of 10 minutes of conversation, and then you won the first place with 14 students from UC Davis. How was that about?
Alexa Price is definitely a really important and big project in my group last year. We were mostly focusing on trying to improve user experience with the social bot. So it's really an inter-discipline work that we did, trying to combine human-computer interaction knowledge, trying to combine natural language processing, trying to combine knowledge-based data mining, all these disciplines together to create an engaging bot. It’s a very incredible journey. I have to point it out that Amazon gave us a great platform that we could collect real data, which is very very valuable in dialogue system research. Our system was able to reach millions of users. We collected so many data points over the year, so we could actually do much better work using the data that they provided with us.
We also think it's also a greater responsibility. Now since our bot is online all the time, you can definitely get access to it, say let's chat, so it will open up the bot. We also want to make sure our bot is not doing damage to any of the users interacting with it. We want to make sure that if the kids are interacting, the kids are learning something. The kids are not getting injected with biases from all the social medias and webs. So it’s also our own team’s responsibility on how to make sure our system is actually doing good to the general public.
It's an incredible journey definitely, we received a lot of recognition from the field, but there's a long way to go as well. We talked about its average is only 10 minutes, I do really want it to be as long as possible. So we're gonna attend the competition again this year, trying to push that boundary a bit further. In previous year, we mostly focused on developing better language understanding tools to understand this specific type of conversation which is open domain, which is we are processing on utterances with ASR errors (automatic speech errors). There's no punctuation, there's no capitalizing, very irregular texts with a lot of ellipsis in self-corrections. So in our lab, we developed a toolkit to do sentence segmentation, ASR error corrections, dialogue act prediction, syntactic dependency parsing. So it’s a toolkit for understanding on open domain chatbots. We're gonna release a toolkit really soon, once that we got author annotated data, we're going to release the annotated data as well. We aim to publish everything next month before the next Alexa Price starts.
Wenli: I know that for some of the teams that in the Alexa Prize competition, have been in the competition for more than a year. Definitely, they collected more data than your team. And your team and yourself started from scratch. I'm just curious about this journey, because you didn't start it with any data.
Definitely. When we started the Alexa Price last year, we're sort of in this kind of underdog position. We were a new team, so I did have some experience as a student from last year. But I wasn't able to follow through the entire process. Then also my students were new students, they're all first-year students, they didn't know how to build dialogue system at that moment. We definitely also had disadvantage about: we didn't have any accumulated data, we didn't have some existing dialogue system pipeline that we could use out of the box. But we did have some of the knowledge from my previous PhD work, that we had some dialogue structure that we could reuse for our pipelines. So I think all the students did a great job in making this possible. Basically we started at very low in terms of rating, and we're like finally jumping, doing better and better and catching up in the end.
Wenli: How was the semi-final? I know you started very low and then at the time of semi-final, you weren’t the top one or two.
Yeah, that’s also an interesting thing. In terms of semi-final, they're doing an accumulation of the scores of the two months. We started not so good, that's why when you came into the accumulative scores, we were like number three, so we were selected as a wild-card team. But if you look at the past one or two weeks, we're number one all the time.
Wenli: So you were climbing really fast.
Yeah, so that's one of the things we figured out. We also had some system issues which makes the latency of the system longer, which harms the system for the rating. Once we fixed that system problem, we're doing much better. So there’re definitely hiccups during the system development that you learn over time. which are very valuable lessons as well.
Wenli: That’s so intense, because it was during a competition - it’s like a marathon - the competition lasts for eight months. And you’re fixing your bugs and improving the algorithms along the way. I know you were leading a team of 14 students and every one of them was considered as equal contributor. I’m just figuring out what kind of leader you are. Are you doing it all, or are you setting up the rules and letting them handle it?
I'm definitely very involved in a competition. I’ll make sure, basically our team lead and me definitely help them to structure the progress of the team, and make sure everybody is responsible for individual module. We also make sure that we meet regularly every week, all the students together, so that they can work together, because it's a big system, collaboration work is very important. I think I'm more of - helping students to understand how to collaborate with each other is my major job.
Wenli: Another thing that I'm really curious about is, why is the robot named “Gunrock”?
Oh yeah. This is an interesting question, because when you think about “Gunrock”, you think: Oh, is it a guy's name? Well, Alexa is a female voice. So “Gunrock” is actually not a human being, it’s a horse’s name. So UC Davis, probably people know it as famous for their agricultural studies, their veterinary school in animal science. Its mascot is a blue horse named “Gunrock”, so our bot is named after the mascot Gunrock. We also have a symbol of our team which is the blue horse.
Wenli: What are some of the secret sauce that you used so Gunrock won the champion?
One thing that we mentioned before as we developed our new NLU toolkits, specifically for this competition or open dialogue systems. We also did a little bit of work on modifying the generation process. For example, we built our automatic algorithm to figure out when to insert disfluencies, also speechcon which s the hyper prosody cues like “Wow!” “Haha!”, these kind of more emotional cues inside dialogue systems. And we do find people like them much more in social parts. They think this system is more expressive with these kind of add-ons. And they think the system is more natural to have disfluencies as well.
Wenli: Yeah, because human do you make mistakes. This disfluency is quite a bit. And I remember you’re saying that it makes people wanted to chat with their robot more when the robot has a pause like a human pause.
Yeah. These pauses or hiccups are very important to appear natural in front of people as well.
Wenli: One thing I'm really curious about doing the research in NLP is that what kind of metrics do you use to define a good conversation?
It really depends on what type of system are you building or what are the tasks you are focusing on. If it's a very clear collaborative task like booking a restaurant, booking movie tickets, then a lot of people would use objective metrics such as task completion rate, like how frequently you complete the task. And also average dialogue length, like how efficiently you complete the dialogue. The other metrics are more subjective and general, like how people would rate you in terms of engagement, positivity or how willing they want to talk to your bots again. When it goes into a more complex test like a negotiation/persuasion, then it is a bit harder.
We recently had a new project that we collected a new data set which is called “Persuasion”. We are persuading people to donate to charity, then you can design the task in terms of object, in terms of how likely you are able to persuade people to donate, and how much they are willing to donate. Then it goes into a bit further into social conversational like Alexa. Then the metric is you don't really have a task in it, you're just entertaining each other. Then a lot of people would use metrics such as the length of conversation. For example, the longer you can engage with people in a conversation, the better your system's ability is. We also collect user feedbacks, for example, how likely you want to talk to the system again, how engaging the system is, or these type of subjective reviews.
Wenli: Comparing to other fields in technology like computer vision, I mean language and human conversation is non-duplicatable, you can’t predict humans. So what are the trade-offs like the bottle neck that you're facing?
When you talk about conversations, we definitely have this kind of a sequential thing going on. And because language is the surface form, they look similar but their semantics meanings could be very different. So especially in dialogues, if the users say something different, then it goes into a totally different path. So it makes having a complete possibility going through a big data set impossible, because it’s just going to expand from changing one word is gonna be different, or changing two words is gonna be different, which makes everything harder, especially doing these kind of interactive systems. That's why a lot of people would propose to use a simulator to simulate how users will behave.
And then another problem is how do you build a good user simulator that is standardized so people know they are testing on the same thing? It's basically something like Atari game for game RL (reinforcement learning) environments, or a pseudo code for this kind of robotics. That's why that's a big problem in dialogue field.
Our recent work actually as we are preparing for EMNLP (Empirical Methods in Natural Language Processing conference）is on, how do we level with the playground? We're gonna publish different user simulators using different models that we trained. So when people comment to the dialogue field, they can just use these simulators instead of build their own. And if you use our simulators, we would have leaderboards, so everybody understands who are they comparing against with.
Wenli: What was your goal entering into this field? Did you have a specific goal that you wanted to accomplish?
One of the things I always really want to do is to build a system that is truly intelligent, that really requires a lot of effort in understanding common sense and understanding conversational situations.
Wenli: How do you define truly intelligence?
Being able to do anything you ask for it.
Wenli: How far do you think we are compare to the goal you have?
Very far away, which is good.
Wenli: So a lot of problems to solve?
Yes. There are a lot of new things we can develop. You can think about all these kind of science fictions. One of the previous movies called Her, which is a famous one. There is a voice assistant voiced by Scarlett Johansson, that is able to help the user on different specific tasks, like scheduling meetings, like reading letters and stuff; but it's also being able to be as a friend, talk about things, being able to understand emotions, being able to reciprocate emotions. So I would definitely think a system that is able to complete most of the tasks to basically release humans from tedious tasks like customer service would be great, but being able to truly understand human being, provide companionship, it’s also another important goal.
Wenli: Companionship, wow. Thank you so much for coming to share with us!
Thank you for having me!