Alumni Q&A: Dano Gillam on Charting Your Path as a Data Scientist
Based on July 2020 interview
- ACME Class of 2016
- eBay, Data Scientist
- Tech, Consumer Internet
- If you’re interested in data science, never stop 1) learning (ACME, podcasts) and 2) doing data science (Kaggle, personal projects, jobs).
- Practice convincing others that data science can solve interesting problems. You can create your own opportunities throughout your career regardless of the company. All you need is a lot of data to work with.
- Your first job has a factor of luck, depending on your connections and how well you do interviewing. After your first job you’ll actually have experience, will gain skills, and can move toward the career you want.
Tim Riser: Can you give students a high-level summary of your background and how you came to this point?
Dano Gillam: I started off at BYU as a bioinformatics major, then switched over to ACME in school. I worked as a web developer and then as a data miner. Before I graduated, I was hired by a startup called Bind to start their data science team from scratch—which was a lot of faith to put in someone who was coming out of college! They probably should not have done that. But I gained a lot of valuable experience. I worked for them remotely for two years. Then I came over to Utah to work with Jeff [Humphreys] at Owlet for a little bit less than a year, and now I work at eBay.
Tim Riser: What drew you to your current work?
Dano Gillam: ACME is where I was introduced to data scientist as a career and was where I decided I wanted to be one. I decided very early on that what I wanted to get out of ACME was machine learning. I enjoyed all the programming parts of ACME—all the linear algebra and all the algorithm design—and I took statistics as my concentration. Since my wife was graduating a semester later than me, I had a whole extra semester, which I used to take machine learning classes from the CS Department. In school, I would pretty much just Google: “What does it take to be a data scientist?” There are a million video tutorials claiming “Become a data scientist in a month,” where they can trivialize it. My first job involved a lot of data engineering, so my work focused on SQL and data manipulation. I knew enough about machine learning to convince my higher-ups that we needed to use it more. I would explain to them the benefits of a state-of-the-art algorithm and how it could solve many of our problems. They all said “Great, let’s do that,” and I replied, “Okay. I’m going to go read up more about the algorithm and actually learn how to do it well.” ACME had prepared me—I had only done it in school, never on real problems, but it gave me confidence I could do everything. I thought I was going to stay at my first job forever. That was my mindset going in. But every time I switched companies, my work became more and more focused on doing exactly what I wanted to do. Fun fact: I got into eBay with only a 30 minute interview.
Tim Riser: While preparing to join eBay this year, what was it that got you most excited about eBay?
Dano Gillam: The data. If a company has well-structured data—and a lot of it—it’s very easy to decide your own project. I came into eBay and looked at everything they had. I had conversations with people, and I thought, “Wouldn’t it be cool if we could do X?” I then fought for that and convinced everyone that it was a worthwhile venture. If a company has a good data set, then you will always be happy as a data scientist. You will always be able to come up with new problems and convince people that the problem you’re solving is the best one. My first company, Bind, had a huge health Insurance data set right off the get-go. With enough data, you can do anything. You can say, “Oh, I want to learn more about doing neural networks. Well, I’m going to solve this problem using one.”
Tim Riser: What do you people most commonly misunderstand about the work that you do?
Dano Gillam: At Bind and also at eBay, when I came in, the other data scientists were very into R and had a particular sort of understanding about data science. The R perspective of data science is very focussed on frequentist statistics. The workflow is normally something like this: a manager comes to you with a question, and you analyze the data to answer the question. How I differentiated myself was this –when someone came to me with a question, I would create a tool that would answer the question now and for all eternity, and I would implement that tool on their systems. If the company wanted to know the probability of a person getting a surgery, one option is just coming back with a number and handing it over. Instead, I would come back with an algorithm with a whole workflow built in Python that would answer that question and we could implement it. That was something that R couldn’t do. That implementation is actually putting an algorithm into your system, not just coming with a visual and answering the question one time. That really made me stand out at all my jobs, because they were used to that sort of statistician mindset versus the data scientist mindset, which is to create repeatable tools that answer the question.
Tim Riser: I remember when we were in school, you said, “I’m just going to get a job and learn for four or five years.” Is that how it played out?
Dano Gillam: Every time you get a new job, you end up learning the incumbent technology in that job. That technology is now in your resume and you understand it. In the four years since I graduated ACME I’ve increased in salary a hundred and fifty percent of what I started off making. So I really wouldn’t put much weight on your first salary out of school. Your first job is kind of luck, depending on your connections and how well you present yourself. Only after your first job, do you actually have experience and can be more selective. Eventually you will get the career you want.
Tim Riser: What is the theory or model of how you broke into data science?
Dano Gillam: First, I decided to be a data scientist. Then I ran with the idea. I started the Data Science Club at BYU, and that was fun. I did a whole bunch of Kaggle competitions, which are surprisingly very accurate to real problems. A casual competition is great for when you want to learn an algorithm. That was my start. Then like I said, in my web development job, I convinced a manager in the office that doing machine learning would be cool and he hired me out of my web development job to be a data miner. Though primitive, it was my first time implementing machine learning in an actual job. I learned a bit of SQL, a couple web programming languages and a lot of NLP. It was a lot of fun.
Tim Riser: A lot of ACME students are interested in getting into machine learning and data science, and feel those fields are very adjacent to what they’re learning in ACME. What advice would you give them on breaking in?
Dano Gillam: Don’t be afraid to take non-ACME courses in machine learning. The CS department does offer them and they are in Python. It was great coming from ACME into those CS Python machine learning courses. I was already a step ahead because I was so comfortable in python and linear algebra. I would also recommend a podcast by OCDevel where he talks about machine learning. Of course, you can’t really learn machine learning from a podcast, but boy did I get good at talking about it. He talks about the different algorithms: what they’re good for, what they’re bad for. He has a whole podcast about the different languages: R, Python, SQL, and where they fit in the ecosystem and why Python is the best. I guess a random piece of advice: I think everyone should take a class or get R somewhere on their resume. I’ve yet to get a job that didn’t list R on the application. Hiring managers who know little about data science think of R and python as interchangeable. I’ve never actually had to use it though. Like I said, I’ve been good at explaining the benefits of Python over R. But sometimes the people or text-scrapers that scan through your resume will look for R and not seeing it may turn them off.. Though my R experience came from a single stats class in college. It was enough that I could read other people’s R code and work together with them.
Tim Riser: Thinking broadly, what skills have been useful to you in your career so far?
Dano Gillam: SQL. Every single problem starts with a database before you even get to Python, and I got good at using the Python package called SQLAlchemy and generally making it so Python can integrate with SQL. Having Python query the database instead of querying it using another application helps tremendously in automating . I got really good at integrating Python with SQL. SQL is always the first step of any company I’ve ever been in, and no one gives you csv’s. That’s usually not how it works. Also, scikit-learn has a lot of good stuff other than the algorithms. All the algorithms in scikit-learn are last-gen algorithms, but they’re great for learning how the algorithms work. But sk-learn has something called pipelines, which lets you train not only your model but also your preprocessor. This lets you take raw data and run it through these sk-learn pipelines, performing the cleaning and piping it directly into the model. That’s very necessary for any time you actually integrate machine learning, because although your training data might have been cleaned, you still need a good way of automatically cleaning new data as it comes in. I mention this because it wasn’t mentioned in ACME. Of course, packages like pandas and numpy are used daily. Understanding numpy helps you understand how pandas works internally. I used a lot of technologies, but I feel like the summary is Python and SQL and really good Integrations with Python. For any particular algorithm, you don’t use sk-learn, you use whatever is state of the art at the time. I use TensorFlow for neural networks. I switch between XGBoost and CatBoost when doing random forests. I feel like anytime I need to use an algorithm, I go to Google and find the best implementation at the time, because it’s always changing, instead of trying to find a generic package that has all of them.
Tim Riser: You started out your career building a data science function from the ground up. Can you sketch the arc of that? What was there when you got there? What did you add and where did you leave them?
Dano Gillam: When I was there, they had just acquired a health claims data set and had set it up in a database. One of the first steps was making the database faster, by doing SQL indexing and learning how that particular flavor of SQL worked. They had just purchased the dataset, put it there, and had never even looked at it. There was a lot of exploratory analysis to do. I started off by downloading samples, getting an idea of how the variables looked, how different features were correlated, and other data exploration to understand the dataset. At this time I was working with actuaries and with the leads of the company to decide how to use the data. Because I understood the data, I could come in and say, “Here are the questions that we can answer right now with the data that we have. And here’s the value to the company if we do that.” That sort of paved the way for the data science team’s trajectory. People thought, “Dano’s had a lot of success with this” and started hiring some other data scientists. Many data scientists we hired would come in using R and I would convert them over to Python, and it was worth it to me. I would say, “Go take a month to learn Python upfront, and transfer all your skills from R.” Eventually we hired a data scientist manager. He had previously managed a team of a hundred data scientists and he came over to Bind. He was a guy who could convert business needs into data science problems, and he helped us as a team justify and prioritize what projects the company needed, which is very important as a start-up. And that’s kind of how I grew it. I was a proof of concept for the company. I got their data and I showed them what we could do with it and had good results and showed the company that data science was something worth investing in. The best part about working at a startup was that I wasn’t slightly involved. I wasn’t working on small pieces of the company. I was working on the critical problems that needed to be solved by the company.
Tim Riser: Your experience now is obviously different from your first two startup companies, since eBay was a start-up 20 years ago. Tell me about that transition.
Dano Gillam: I still feel like it was a little bit of a start-up because I didn’t come into a big data science team. There’s one other data scientist and he works on completely different problems than I do. It’s just another job where I can find my own problems and convince people that they’re worth working on.
Tim Riser: Did you develop that ability to find problems during ACME? Where did that come from?
Dano Gillam: I don’t know when I developed that. Hmm, I guess what I did develop in ACME was the habit of talking about problems and getting other people excited about them, because that’s really step one. You go to a manager and the manager says here’s the status quo—this is how things have been done in the past. Data science is new, and what we can do is new, and what Python can do is completely different from what people have been doing with SQL, R, SAS, and other stats languages. People with our skill set can do things completely differently. So step one is always selling that—explaining to managers that while this is what you’re doing, it could be so much more. I think that’s a very important thing that I got from ACME, just because in ACME we’re always talking through problems together because ACME was so group focused. We were always talking about the possibilities, like: “Whoa, we just learned this algorithm. You could probably do this or do that with this algorithm.”
Tim Riser: Are there any specific skills that you wish you’d spend more time on in ACME?
Dano Gillam: I was busy in ACME. I had no more time to spend on anything else. I was in the mindset of learning where even if I had extra time after class, like I said, I would go do Kaggle competitions and I would go learn different packages in Python to just be better at Python. I was very focused on machine learning and Python and programming, and less on math theory to be honest. I was pretty bad at theory in comparison, and on those theory tests I did pretty poorly, but I was really good at the programming part. That’s partly true for most majors. A major will teach you some stuff that won’t directly apply to what you do next. So you focus on the parts of your major that you enjoy the most. I really liked the algorithm design and that turned out really well for me because I didn’t go into academia but instead I went into industry. All of my math background from ACME was useful for finding the state-of-the-art, reading through algorithms, finding the new ways that neural networks are working, the new ways that random forests are working, understanding all the math behind them so that I could decide which one to use. If I didn’t go through ACME, I wouldn’t have that foundational understanding of what the algorithms were doing, and that makes it impossible to figure out what’s going wrong when something goes wrong. Good job, ACME, good job.
Tim Riser: You’ve been out in the real world, worked across three companies, interacted with colleagues from other backgrounds. How would you say ACME has prepared you for our work?
Dano Gillam: Perfectly. I work with people much older than me, but ACME prepared me. The space has changed so much, and experience is quickly overcome by learning the right thing at the right time, and ACME did a great job at that. It makes me think that ACME should re-evaluate what tools it’s using constantly because five years from now, maybe Julia’s the new tool of choice. I don’t know, but it’s definitely important not to ever get stuck on a particular tool, since the tools are always changing.
Tim Riser: If you ever start using something instead of Python tell Dr. Jarvis immediately.
Dano Gillam: Yeah, I don’t think it’s going to happen. But I’m sure that’s what they said about other languages. In Python that probably won’t be the case because most packages in Python don’t use just Python –TensorFlow in Python drops down to C and numpy immediately drops down to C, so Python does a good job at doing the best of each language. Anything that it needs to be faster at, it uses another language to do. So, I hope it survives.
Tim Riser: Thanks for sharing, Dano!
Dano Gillam: If I have a God-given talent, it is getting other people excited about this stuff. I do it with people at work all the time. “You may not care, but let me tell you about how exciting data science is and everything that could be done.”