Fangjin Yang, Co-Founder & CEO @ Imply: How To Evolve An Open Source Project Into A Business

Updated: May 14, 2019

This episode features Fangjin Yang, co-founder and CEO of Imply, a data analytics startup providing high-performance solutions to large scale event streams. Fangjin is also a co-author of Apache Druid, an open-source data store designed to quickly digest massive event data. Prior, he was the Engineering Lead at Cisco systems and Metamarkets (acquired by SnapChat).

Fangjin shared his career path and his entrepreneur experience on how to evolve an open source project into a business.

Highlight 1: The Evolution of Data Analytics

Highlight 2: How does Druid differentiate from other real-time streaming tools?

Highlight 3: Challenges & mistakes in the entrepreneurial journey

Highlight 4: Advice on starting your own business

Complete Interview: is a content platform dedicated to helping engineers and researchers develop leadership, entrepreneurship, and AI insights to scale their impacts in the new tech era. Sign up with us to stay updated and access exclusive event, career, and business mentorship opportunities.

Full Transcript

Alex: Welcome to Entrepreneurship Talk. Just a brief introduction of is an online content platform aiming to create a better understanding of leadership, entrepreneurship and AI insights for researchers and engineers. Today, we are glad to have the Founder and CEO of Imply, Fangjin Yang. Welcome, Fangjin!

Fangjin: Thank you for having me here!

Alex: Just a brief introduction of Imply. Imply is a data analytics company founded in 2015. They are focusing on high performance analytics on large scale event streams. Fangjin is also the co-author of Apache Druid, an open source data store designed to quickly ingest massive quantities of event data. Before he was an engineer lead at Metamarkets, which was acquired by SnapChat in 2017, Fangjin received his master's degree from University of Waterloo in computer engineering. Can you give us a brief introduction of what you guys do and who are your client?

Fangjin: Sure, absolutely. Imply is what is known as an open core company. And open core companies typically have an open source project at the heart of our product. And then there's sort of an enterprising end-to-end solution that we sell to customers. For people that may not be familiar with open source, this is an idea of free software. In Imply’s case, we're really built around an open source technology called Apache Druid, which is a next generation database designed to ingest and analyze very large volumes of digitally generated data. And what we do at Imply is we build a complete solution around this core Druid project that's designed for large companies, enterprises to be able to use.

Alex: Since you're involved in lots of key technology revolutions in this big data market, can we walk through some of the key milestones in the past maybe 5 to 10 years?

Fangjin: The way I see the evolution of data analytics is, the first generation of technologies are data warehouses. So if you look at what companies were using many decades ago, data warehouses by Oracle, Teradata, HP and many other vendors were very, very popular. I think over the course of the last 10 years, that started to change a little bit because people started realizing, data is getting more complex, data is not growing in volume. And combined with the rise of both the Hadoop ecosystem, which is another popular open source project, as well as public cloud vendors, analytics has shifted where I think now there's a central storage location and different types of engines to query a central storage locations. So this is what's known as kind of a data lake architecture. It's separating where data is stored with different systems that can get value out of the data. This is how Amazon runs its analytic services; this is how Google runs all this analytics services; and this is what people do in that deep ecosystem. I think where analytics is now shifting towards is, actually moving towards more real-time dynamic workflows. So a lot of businesses today are investing in the digital aspects of their business, putting more and more of their business online. And what that means is, instead of working with kind of static files, you now have continuous streams of information. And I believe analytics is migrating to be able to ingest and also analyze continuous flows of information. And this is the world that Imply belongs to. We're trying to build a new type of database for this world.

Alex: As you know, since 2015 or 2016, AI is getting hotter and hotter. How should data react to the rise of AI and machine learning? Could you give us some examples?

Fangjin: Yeah, so it's interesting. So, AI and machine learning, the concepts have been around for a very long time. They have been around for decades, I think it's in the investment circles, they have been getting more attention and immediately started getting more attention as a result. I think it's an open question of, are companies actually going to be successful with their AI solutions? But nonetheless, I do think AI and machine learning, they do have value for the whole data ecosystem. The way I see AI and ML broken down is there's the algorithms piece, and then there's the compute piece. There's a lot of data that you have to work with, and then their systems that have to do that raw compute in order to produce answers of this data, and like the raw compute feeds data into more intelligent algorithms that basically can help make decisions, help make recommendations and so on so forth. So what I see what we do at Imply with a lot of other systems is to build a foundation that like compute layer, that higher level AI technologies can leverage for a lot of their applications like number crunching.

Alex: Do you have someone AI clients? How do you work with those companies? There's a bunch of AI engineers inside working with you guys, right? How do you work with them?

Fangjin: Yeah, now we have one of our clients, which is a major financial bank, they're combining technology with their homegrown AI system. And the idea is, they're trying to better identify identical customers across different product lines, and then potentially understand what other products they might be interested in. And in that case, they are using our technology for more low level compute for very fast number crunching, for getting in data immediately, and being able to crunch numbers on that data. And then they build their own high level abstractions and algorithms on top of that low level data.

Alex: Is it mostly like structured data or unstructured data?

Fangjin: It's actually a mixture, I would say, probably the best way to categorize this is semi-structured. Because unstructured data could be like a video file, could be like an audio file, it could be very difficult to process and deal with. So most of the time, what we deal with is more semi-structured data, where there's a more fluid schema, but there's like a notion of keys and values.

Alex: You mentioned about the event stream, can you briefly introduce what is event stream?

Fangjin: So event Stream is an industry term used to describe the output of a digital business. I think there're three main categories of event streams. It can be user generated data. So if I'm a user, and I'm using a web page, I'm using some sort of mobile product or some sort of digital product, my actions and how I engage with that product produces data, that's useful. But event streams can also encompass the metrics generated by applications. So it could be KPIs of the applications, their performance, etc. But it can also be the low level infrastructure data, like server metrics, server logs, basically, the output as people are interacting with different types of products. So what events teams are, are like discrete events, a continuous flow of events that's describing some sort of action occurring.

Alex: You worked for Metamarkets for four years. How did you join Druid project and why you decided to start your own business?

Fangjin: When I first moved out to Silicon Valley in 2009, I joined a very large corporation, which was Cisco. And I was doing some research and development work for them. But I realized very quickly that the big company really wasn't for me, there was a lot of structure, there was a lot of support, which was nice, but I was really fascinated by startups. So I started looking for the smallest company I could find that was doing interesting work. So I was one of the first five employees of this tiny company called Metamarkets. And I joined because during the interview process, I met another very strong engineer, he was the VP of Engineering at that time. He brought me in, and he was like: Hey, I just started this new project called Druid. I think we could do some cool stuff here, so I joined that project right away. There was really just the two of us. And we started building out initially for some use cases that we saw Metamarkets as a startup. But then we started realizing, these use cases have applications in a lot of different places. And while when we decided to open source the technologies, actually a lot of major companies came to us and said: Hey, we actually have similar use cases to what you guys are doing. And over time, more and more companies started coming to us. And then at a certain point, I realized, oh, there's actually market potential here, there's a market gap. This technology is solving a need that doesn't exist right now. I think this was the opportunity to start a company, and I always wanted to start a company because I was very fascinated by startups.

Alex: Imply focuses on real-time analytics. As we know, there are many other real-time data streaming tools, such as Flink, Storm, Kafka. How does Druid differentiate from those projects?

Fangjin: Definitely more. So this is kind of a technical answer. But really, when you look at data architectures, it's very rare to see a single technology solve everything that's required for a particular use case. And what's much more common is people tend to build data stacks, they combine different technologies together. And each of those technologies is specialized to be good at certain things. So for example, in a more modern streaming analytics stack, there're three main components. There's one piece which is responsible for getting data from somewhere and transfer it to somewhere else. So this is what's known as a message bus. And this is what Apache Kafka does. It’s very, very good at delivering data from one place to another. Another very critical component of analytics as a whole is cleaning data, transforming data and processing data. Because when you have raw data, it's very, very difficult to use raw data as is. You oftentimes have to transform it in many different ways. So this is known in the industry as extract transform load, and the short name of it is ETL. And the piece that does ETL is what's known as a stream processor. So Apache Flink, Apache Storm, Spark streaming are actually almost entirely designed to do ETL for streaming data. And now the last piece is, how do you get answers from the data? How do you store this data for long term? And how do you get answers from it? This is typically what databases do. This is what Druid does. Druid is not competitive to neither Kafka nor Flink or Storm, but it's very complimentary. So the stack could be input data into Kafka, you transform it in something like Flink, and then you deliver that transformed data into Druid for further queries. And that's an end-to-end stack.

Alex: You guys have this Imply Could, which is some managed service for AWS? I know AWS has many restrictions. So how do you handle this?

Fangjin: Imply is packaged software. And what that means is, you can actually install it on premise, you can install it on any Linux based environment. But one of our core products is Imply Cloud, which is a managed service for AWS. And what we mean by managed service is we help a client deploy our software into their AWS account. So for the client, they actually fully own their data, they can control the hardware that the data runs on, and what we help with is making operations and deployment much easier. So in that case, we haven't seen many restrictions of AWS, its model has actually worked pretty well. It’s value add for the client, because they get the benefits of not having to manage the software, there's a lot of automation behind the scenes, so it's easy to operate. And then they also get the benefit that they own the data. So we're not looking at their data. We're not owning their data, they own that. So it's a model that other companies such as Databricks, which is the company behind Apache Spark, and Qubole and others have adopted to. It has great effect.

Alex: You previously mentioned that you started this company in 2015. And then I remember your raised your seed round from Khosla Ventures? And then lately, last year you raised A round from Andreessen Horowitz. Both of them are great investors. So how did you find them? How did you convince them? The seed round from Khosla Ventures was $2 million. It was not easy. And the A round for $30 million, was even harder.

Fangjin: Metamarkets itself, the startup I was at, before it was bought by Snapchat was funded by Khosla Ventures. And at that time, when we decided we were ready to start our own company, we actually talked to our employer first. And we had a conversation of how this can be done, and they introduced us to Khosla Ventures. We had talked to some other investors as well, but it was actually a very quick process for the seed. At that point, Druid started to get pretty widely adopted, there were a lot of major companies using it and talking about it, and it was enough buzz around it that I could just found potentially a company there. And after we founded the company, the first two years are always the hardest when you have to figure everything out. But as we got better at building our product, as the product became more mature, and then as we got better at selling it, we were actually growing pretty fast in sales. We're thinking about doing an A, whether or not makes sense. And it was coincidental at that time that Andreessen Horowitz actually reached out to us, asked to grab a coffee, we started talking, one thing led to another the next thing we know was that we're talking to a lot more people and that's what became kind of the A.

Alex: You have three co-founders. Besides you, is any other people from Metamarkets?

Fangjin: Both my co-founders are from that company as well. We all worked together for a very long time. That's why I trust them a lot. And we know we work very well together. It's also very beneficial that they're amazing world-class engineers. One of my co-founders is now completely leading Druid, leading the architectural direction of it. And the other one, he was one of the creators of D3.js, which is one of the most popular visualization libraries in the world. He's published several books, top courses, and he’s a world class engineer. So I've always enjoyed working with them. And when I started a company, I just considered them better than anyone else I want to work with.

Alex: You started your career as an engineer, and lately you saw this opportunity, you founded this company and you became the founder and CEO. So how did you transform from an engineering background to be one of the entrepreneurs, or maybe business person?

Fangjin: I think, from very young, I was interested in startups. Even when I was in university, I was trying to learn more about how did the startups work? How did they get started? What’s venture capital? A decade ago, there was not this information around. There was no YC, there was none of these incubators. So you're kind of figuring out everything on your own. I started my career as an engineer, but in the back of my mind, I was always like: How do I learn more about startups? So I started at Cisco. And I realized, I'm not going to be learning everything I want to know at such a major organization. That's what prompted me to find the smallest company I could find. And once I found that company, I realized starting a company is extremely difficult, not only that you have to be a good engineer, you also have to have a very good head for business. So I spent a lot of my early years of career, focusing on trying to be a good engineer, trying to build a good system that would add value to different companies and getting more companies to use it. And then once I started the company transition very quickly, until I learned about what's known as go-to-market strategy, like how do you take a technology and product actually position in the market and then began to sell it?

And from that, part of it is spending time with people, trying to find mentors that can help you, part of it as a lot of my own research, and I probably read like 50 to 60 different books on the concept of go-to-market strategy, everything from sales to marketing, everything in between. And as you go through the journey, it's a very sink-or-swim environment, you either figure it out or your company dies, so you hope you figure it out. But as you go through the journey, you realize: Oh, there are skills that I’m good at and can create value for go-to-market strategy and the skills I’m less good at. For the skills I’m less good at, I need to find other people that can supplement my skills. So from that, you start building out your executive team and members of it. So I think honestly, the way to learn it is just to do it, just like anything else. You can read a million books, but at the end of the day, you just do it.

Alex: I know, this is a tough journey from technology to product to market, and you mentioned the go-to-market strategy. So what is your typical way? And what is your insight about a technology company? What would be the path?

Fangjin: I'll say this. So broadly, there're two types of companies. There're business to consumer and business to business companies. Imply is the latter, we sell to other businesses. But for either way of selling, there's very complex strategies of how does this company actually make money? And when I first started my company, I didn't realize a lot of this. I was more naive, I thought that if you have very strong technology, people will figure it out, and then they'll start to adopt it, which is not the case. Because no matter what you are doing, especially in the business space, there is competition. And then it becomes how do you convince the companies to adopt your technology over the competition? Because some of the competitions have relationships that last for decades with the customers you're trying to get into. They have teams that are thousands of times larger than yours. You're a tiny thing with three people and a dog in a tiny office, basically trying to figure out like how can I convince one of the world's largest companies to give us money? And then you realize: Okay, the technology is almost the secondary to how do you talk about the technology in terms of the problems it solves, and the value that it creates. And then from that, you start learning about specific use cases, you start learning about specific problems, and you start building more and more go-to-market strategy. It's not an easy journey by all means. But as I mentioned, either figure it out, or your company die. So you obviously spent a lot of time and effort to try to figure it out.

Alex: Did you start from some reference clients first and build some really successful stories?

Fangjin: We were fortunate, because we had the open source, which was the free software. And there were many people that use free software. And then for us, that's a starting point to say: Well, what’s like Netflix is using this, what’s like Yahoo is using this, what’s like these major technology companies are using this. And this is some of their use cases that they publicly talked about.

We were also fortunate that the technology was very, very popular in China, there're books written about it, or things like that. So we can always talk about some of the Chinese companies like Alibaba, Tencent and how they're using it, and that helped us get started.

Alex: US and China, in which country do you have lots of customers. Will theirs be the typical applications?

Fangjin: Today, we have customers in a couple of different countries, primarily in the US, but we have an expanding presence now in Europe, Middle East, and then we have a couple clients in China and Asia Pacific as well. But we’re primarily US based. We have a little bit more of a horizontal approach to our technology, which faces a couple different use cases we've seen. We've seen the technology becomes very popular for digital advertising and digital marketing data. So as people are viewing ads, clicking on ads, data getting generated from that is being analyzed often with our systems. We see a lot of adoption for user experience as well, like understanding how users engage and interact with products. And this helps companies improve their products faster, and helps companies A/B test their products and do rapid iteration on their development. We’ve started seeing use cases with more network security and also network flow data as well. So it’s kind of like the bottom of the stack with more infrastructure level data. We're still learning about these use cases, there's been more uses coming out for supply chain and manufacturing analytics, which we're pretty excited about, because there's some really interesting work that we can do in that area.

Alex: So in China, do you actually host your service on some China based cloud platforms?

Fangjin: Today, we're not doing that. So today, most time when we work with a Chinese company, it tends to be a very large Chinese enterprise that we have a closer engagement and closer relationship with. In the future, there might be opportunities to partner with one of the cloud providers. I think all the major cloud providers are using Druid today. I think there's potential collaboration, but I think we're in too early stages as a company to really explore broad international expansion.

Alex: In your three years’ experience as an entrepreneur, what is your biggest challenge, or maybe even mistake?

Fangjin: So many challenges, and so many different mistakes, I don't know if I can point to the biggest one. I would say that some of the stuff I struggled with early on are not some of the things I struggle with today. So when I first started, I think I was more sensitive to rejection that I am today. Rejection comes in many different forms, right? It's like investors telling you, your company's not going to succeed; its potential clients telling you we don't see the value out of your product; it's candidates telling you, we don't want to work at this company, because it probably goes nowhere. So it comes in a lot of different forms. And in the beginning, every time you hear rejections, you’re like: I feel bad. But after you see enough rejections, it's like: Oh, whatever. It’s just Tuesday, just stop caring about it. So in the beginning, I think I dealt with rejections a little bit harder than I do today. Today, it doesn't matter. It's like every day you get rejected, you move on. I would say that the things I wish I'd done better when I first started, were that I wish we had focused even more, we knew that focus is very important in the startup. But because you can get distracted so easily in a startup, there's a lot of random little things that we did here and there.

Alex: Because you had limited resource.

Fangjin: Yeah. When looking back, all those would probably a waste of time. And we should have been even crazier in our focus.

Alex: But it's hard, it is a thinking process, right? You do lots of practice, so you can figure out the path.

Fangjin: Yeah, you do. But you cannot do too many, you also need to trust your beliefs and your conclusions of the market. And then go on and on about mistakes. But overall, your journey is not easy. And I think every startup makes tons of mistakes.

Alex: I think a lot of our audience might be interested in starting their own business? What would be your advice?

Fangjin: I would say, honestly, I think that startups are very, very difficult. They are very, very difficult, you hear about all the time, but it's like one of those things until you do it, you don't realize like how difficult it is. And I think people before they start, really should understand their motivations, why they're starting this company, And I hear a couple of different motivations, and some are bad, and some I think are better. So for example, when I talk to people, they say: I want to start a company. I ask them: Why do you want to start? And they're like: Oh, I want to be rich. Or ask them: Why do you start a company? They say: Oh, I want to be my own boss. But oftentimes, that's not the case for a startup. Because when you do a startup, the cash is very, very important. You're not making a lot of money, you might potentially in the future, but during the most difficult days of a startup, and there's some really, really hard days. And I do not believe if your motivation is like money, you will survive those days, you will probably give up. But I know many, many people have decided, it's not worth it, it’s too hard. It's because their motivation is not strong enough.

Or people say: I want to start a company, because I'll be my own boss. That's not the case. Because you have to report anything to everybody now. If one of your employee is quitting, you're dropping everything, trying to make them stay. If some persons are not happy, you’re trying to make them happier. You have your board to report to, you report to everybody. You're the one with the most number of bosses. So I also think that's not a strong enough motivation. And I think the motivation that gets you through the most difficult days has to be something deeper. Oftentimes, it's a true belief that you believe that, you're creating something that has not existed and it needs to exist. And it's usually I think, for founders, they have deeper reason of why they keep going. Because there're a lot of other things you can do to make more money, or to have more freedom without kind of the pressure and the stress of having to sit in the founder seat.

Persistence and endurance. I think the greatest startups have endured, they survived. They just kept going. So I would say, the biggest takeaway, that I think the most important attribute for a founder is endurance.

Alex: Where do you see yourself and Imply in the future three to five years?

Fangjin: Founders are optimistic, so I tend to be optimistic. Imply has been tripling or more than tripling every single year in terms of revenues, so I think we're on a very strong growth path. We want to obviously continue to grow the company, building it to be as great as possible. We'll see. I think startup is a roller coaster. You never know what's gonna happen, but right now I’m very optimistic in the future.

Alex: Okay. Good luck for that journey. Thank you very much, Fangjin. This is Entrepreneurship talk. Thank you Fangjin, for sharing these great insights and also stories. Thank you!

Fangjin: Thanks a lot.