We recently caught up with Dave Sullivan, Founder and CEO of Blackcloud BSG - the company behind Ersatz - and host of the San Francisco Neural Network Aficionados group. We were keen to learn more about his background, recent developments in Neural Networks/Deep Learning and how Machine Learning as a Service (MLaaS) is evolving...
Hi Dave, firstly thank you for the interview. Let's start with your background...
Q - What is your 30 second bio?
A - Sure, so I was born in '85 in the SF bay area. I got a chance to play with computers when I was 10 and by 12 had started learning to program - first with BASIC, then visual basic, then java, c/c++. Although I stuck with it over the years, I always viewed programming and technology in general as a hobby rather than a potential career opportunity. Because work is supposed to suck, right?
I graduated in '09, right in the middle of the Great Recession with a degree in history. I moved back to the bay area and started going to tech meetups where, for the first time, I realized that there was a huge industry in tech and entrepreneurship was actually a valid adult career choice.
So in November 2010, I started Blackcloud BSG as a software consultancy, basically software development for hire. Deep learning is something that I've been working with for the past 3 years - it started as a hobby, then became somewhat of an obsession, and now it's a product. But I definitely took the unconventional route to get there...
Q - How did you get interested in working with data?
A - Well, I was researching poker bots. Not super seriously, it was just one of those random wikipedia searches, but it kind of introduced me to this whole world of machine learning, intelligent agents, etc. I kept reading and even though I'd have tons of stuff to learn before I could do anything with machine learning, it was enough to spark my interest and I was able to motivate myself to really study and develop these skills. And once you're working with it, once you start solving practical problems, you start to appreciate what the term "data science" really means. It's really about being able to visualize in your mind how data all interacts together, starting to think of data as its own entity with characteristics - it's about developing that intuition.
Q - What was the first data set you remember working with? What did you do with it?
A - Email data. I started learning about NLP with NLTK, a python NLP library. I wanted to cluster emails according to similarity, mostly by looking at the content. I started doing cool things like using it to surface emails that required follow up, etc. That actually led to me learning about deep learning through some of the NLP research that was coming out back in ~2008 with word embeddings and all that.
Q - Was there a specific "aha" moment when you realized the power of data?
A - Honestly, not really. It's been more of a slow creep. I think people realize that data is going to be a big deal, that data is all around us, etc. etc. But even though everyone is saying it, I think most people don't quite understand how important it's going to be. You've got to think through all the ramifications of it, and the more I do, the more I become convinced "data" and what we do with it is going be as transformative to our society during the next 20 years as the Internet has been in the past 20. But it's taken me a while to come to that conclusion.
Dave, very interesting background and context - thank you for sharing! Next, let's talk more about Neural Networks...
Q - What have been some of the main advances that have fueled the "deep learning" renaissance in recent years?
A - Well, it started in 2006 when people started using "unsupervised pre-training" as a way to sort of "seed" a neural network with a better solution. Traditionally, one of the problems with neural networks had been that they were very difficult to train - particularly with lots of layers and/or lots of neurons. This was a way around that and it helped renew interest in a field that was all but dead (neural networks I mean).
So more research started coming out, there was a lot of looking into this unsupervised pre-training idea, trying to figure out why it was working. In ~2008, 2009 people started using GPU implementations of neural networks and got massive performance boosts, as high as 40x in many cases. Suddenly that makes neural networks a lot more attractive - a 40x speedup makes something that used to take 40 days take 1 day - that's huge. So with that came much bigger models.
Since then, there's been a lot of really interesting new research. Google, Facebook, et al. have been getting into it, companies like mine are trying to build products around it - deep learning has come a long way in a short time and a lot of problems from the past have been revisited and solved. For instance, recurrent neural nets used to not be particularly practical - but now they hold state of the art in audio recognition and they are a really powerful tool for time series analysis. All of this is very recent though, just a few years old.
So now you have this situation where there's a ton of money getting pumped into this area, and thus there's a ton of people working on these problems, many more than there used to be. The models are pretty well defined at this point in the sense that much of the research is ready to be applied to industry. Meanwhile, the pace of new breakthroughs seems to be increasing (with "dropout" and word compositionality being two recent major developments)
Now, it could turn out that there's some other deal killer or gotcha with neural nets and they turn out to not be as useful as everyone thought (this would be the third time...) But personally, I think there's something there, and I think it's the area where we're going to see the biggest machine learning breakthroughs in the near term. But at the same time, just because deep learning is gaining momentum doesn't mean that everything that came before it should be written off. Different tools should be used for different jobs...
Q - What are the main types of problems now being addressed in the Neural Network space?
A - The really big wins for deep learning have been in vision and audio problems. Those are the types of algorithms that are already in use by several really big companies. So you're seeing gains in just about any industry that could benefit from being able to interpret images and sounds better. But that whole process is really just getting started.
The next big area for deep learning is going to be natural language processing. Our demo (wordcloud.ersatz1.com) is a good example of some of the work that's being done there. Google in particular has been leading some really interesting research. For instance, it turns out that they can train models that can learn to organize data and words in such a way that different words become linearly combinable - like king + chair might be very close to the vector for throne. Everyone's sort of scratching their heads on why that happens/works, but the answer to that could be pretty interesting. If you solve vision, audio, and text, you've got a pretty robust set of "inputs" with which you can then start building more complex agents. You can start thinking about this idea of "higher level reasoning" and what that even means exactly. But even before all that, our sorta-smart devices are all going to get upgrades thanks to deep learning and software is going to be making a lot more decisions that humans used to make.
Q - Who are the big thought leaders?
A - Haha, other than me? j/k... But there are basically 3 big guys in deep learning: Hinton (at google), LeCunn (at Facebook), and Bengio (I don't think anyone's snagged him yet?). But each of those guys have a lot of students, and those are really where the new ideas are going to come from. The big thought leaders - no one has really heard of them yet, but they're definitely there, they're the guys publishing at NIPS and a whole bunch of others that are self taught and tinkering with this stuff in their spare time in some yet unknown corner of the world.
Q - What excites you most about working with Neural Networks?
A - Well, in the near term, I think the most fundamental win from neural networks is this idea of automating the feature engineering process. I saw some cool research at NIPS this year that basically used these concepts to build a system like Pandora automatically (in a day, perhaps). But in order to do the same thing, Pandora spent years and probably a lot of money building a database of features - this was all feature engineering. You cut down on that part of the pipeline, and huge value is created.
In the longer term, I'm excited to be working with neural networks and machine learning more generally because, like I say, I really do think the impact on the world is going to be as important as the Internet has been. I mean, theoretically we could end up in a place where the idea of "work" as we know it just becomes relatively unnecessary - perhaps even economically inefficient. That poses all kinds of really fundamental questions for society, just like the internet has already started doing in a major way. And it's really cool to think about taking part in that conversation and really exciting to think about having an opportunity to shape it.
Q - What industries could benefit most from deploying Neural Network algorithms and techniques?
A - Any industry where the accuracy of their predictions can make a significant financial impact to their business. For a company like Netflix, increasing the accuracy of movie recommendations from what they were doing before by 10% might not be a huge deal. But for a company involved in any kind of algorithmic trading (be it options, commodities, or comic books), an extra 10% increase in the quality of certain decisions in their pipeline can make a really big difference to their bottom line. Oil exploration is another one that fits this. But those are the obvious ones - these kinds of techniques can also be applied to robotics (self driving cars, housekeeping robots, car driving robots), game design (Minecraft is algorithmically generated, so imagine something like that but way more original/complex/varied every time you play - and tailored to your unique gamer tastes), blogging (there will definitely be companies that crawl the internet and generate pretty readable articles with linkbait headlines with minimal human involvement), our phones (Siri will get better), there's a bunch more. Business X "with machine learning" will probably be a semi-valid business strategy soon enough. But really, every industry is going to benefit from better tools and an expanding pool of people that know how to use those tools.
Really compelling and inspiring stuff - thanks for all the insights! Now let's talk more about Ersatz...
Q - How did you come to found Ersatz?
A - Well, the 30 second bio kind of gives the broad strokes, but I really built Ersatz because I was frustrated by the existing tools available. I mean, you can download Pylearn2 (and we use Pylearn2 for certain pieces of ersatz, actually) and get started with it. But there's a lot of ground to cover in just getting something up and running. Then you also have to worry about the hardware component (GPU done right gives 40x speedups, which makes neural nets practical, so you want that). Then once you're training models, you want to learn things about them, but you're not really sure how. And that's to say nothing of the subtle kinds of bugs that can creep into this type of software - it's hard enough troubleshooting data issues, having to debug algorithmic issues too doesn't really help. This is all the kind of stuff we're trying to make easy with Ersatz. Making it so you can become a neural network practitioner instead of having to learn how to build them.
This process has played out with other product categories already - people used to build their own operating systems, databases, etc. Some people still do I guess, but there's a lot more that choose to buy software. And I think neural nets are kind of a good example of this, where many companies could benefit from them, and even more products can benefit from them. So people will have a database storing their data and a neural net back-end making it smarter. And we want Ersatz to be that neural net back-end.
Q - What specific problem does Ersatz solve? How would you describe it to someone not familiar with it?
A - Sure, so the hard part about machine learning is learning to think about your problem in terms of machine learning. Knowing how to frame your questions to get the answers you're looking for... So, assuming you're at least familiar with machine learning basics... Ersatz can do a few things: dimensionality reduction, data visualization, supervised learning, unsupervised learning, sample generation. We use neural networks to do all the heavy lifting on these tasks, and that's the part that we hide away from the user. Basically, if you provide the data and you understand how to ask the right questions, Ersatz takes care of all the learning algorithms, setting of parameters, the GPU hardware, etc. etc.
Q - Makes sense. Could you tell us a little more about the technology - how does it work, what models do you typically use - and why?
A - Sure, so basically, you've got 2 basic units: a worker and a job server. When you upload your data, it gets uploaded to S3. When you create a model, it creates a job for a worker. An available worker sees the job and pulls the data down from S3. These workers are GPU servers, and that's where the actual neural network stuff all happens. As it trains, the worker reports information back to the job servers which update statistics and dashboards in Ersatz. The stack is pretty much entirely Python, with a good bit of C/C++ in there too. And of course, quite a bit of JS on the frontend - we use D3js on our charts. Pretty standard fare really, we try to be relatively conservative about technology we use without going overboard.
In terms of models, well, we've got a few that we support, depending on the type of problem/data you have. We have standard deep nets (with different types of non-linearities, dropout, all the bells and whistles), autoencoders (for dimensionality reduction or feature learning), convolutional nets (for image problems), and recurrent nets (for time series problems).
Great, that makes it very straightforward to understand - thanks! Now, a couple of more operational questions...
Q - What has it been like boot-strapping the company throughout?
A - Really tough! And humbling, I think. You're kind of forced to learn to work with limited resources, which turns out to be a good skill to have. But it's also very frustrating, and often times it can feel like bootstrapping is slowing you down. Sometimes you actually can accelerate growth by throwing more money at a problem. The problem is, if you throw it at the wrong stuff, you just start losing money faster, and you lose momentum going down the wrong path. I think companies that raise money too early really do themselves a disservice. Once you do that, a clock starts ticking. In the beginning, being bootstrapped gives you a bit more time and flexibility to look at various options, try different ideas, and it also gets you in the mindset of needing to conserve resources. But I also don't believe in the "bootstrapped 4life!" mantra - I think that's just masochistic.
Q - You mention on your website that you manage an entirely remote group of developers (in 14 different countries!) - how do you make that work?
A - The actual number varies depending on what client projects we're working on (right now it's 11 people in 8 countries, for instance). But in order to get it to work - there are a few things I've learned... First, I think you kind of either need to be remote or have an office - mixing the two isn't great and I think companies that have tried strapping on a remote team but had it not work out basically have this problem - everyone in the office communicates fine, they neglect communication with the remote team, and when things break as a result of that communication breakdown it's the fault of "remote work". So assuming you want to do remote... You've got to instill a culture of DIY - everyone can make, and is responsible for, their own decisions. Technically, you're required to be online on skype for 3 hours a day usually, 8am PST - 11am. This is relatively loosely enforced, depending on what's going on. So you really have to be self sufficient here sometimes.
Using skype as your office is really cool because meetings can occur while you're not there and you can just read back what happened while you were out. Meetings happen asynchronously, which simply doesn't happen at an office. People put thought into their communications. Also, fewer interruptions - you can just sit and meditate on certain issues, not be interrupted. We're pretty good about using our various project management systems - that part is really important. It is nice to be able to hire anywhere in the world. I just think it requires a generally different management style, but it's a viable organizational model and it allows my team to get a lot done. But I won't lie, it introduces its own problems - it's hardly some holy grail that magically solves all your operational issues. And it's also still very new - even just 10 years ago, it would have been very difficult to start this company the way that I have. But people live their lives on the internet now, geography matters less and less and everyone anywhere in the world is just a click away.
Very interesting - look forward to hearing more about your and Ersatz' successes going forward! Finally, it is advice time!...
Q - What does the future of Neural Networks / Machine Learning look like?
A - That is the gazillion dollar question isn't it?
Q - Any words of wisdom for Machine Learning students or practitioners starting out?
A - Don't be intimidated about getting into it! The basics aren't that complicated - with enough banging your head against the wall, anyone can get it. This is a field that is wide open - there is no "theory of relativity" for AI yet, but there probably will be, and I think it's actually pretty likely that we'll see that in our lifetimes. It's a really unique time in history right now, and this is a revolution that pretty much anyone in the world with an internet connection can take part in. While many aspects of the worldwide economy are really messed up and will continue to be, I don't think there's ever been a time where economic mobility has been more decentralized. No matter where you are or who you are, you can take part if you're smart enough. So yeah, my advice: jump in, before it gets crowded!
Dave - Thank you so much for your time! Really enjoyed learning more about the work going on in neural networks and what you are building at Ersatz.
Ersatz can be found online at http://www.ersatz1.com and Dave is on twitter @_DaveSullivan.
Readers, thanks for joining us!
P.S.If you enjoyed this interview and want to learn more about
- what it takes to become a data scientist
- what skills do I need
- what type of work is currently being done in the field