Alumni Q&A: Drew Pearson on Machine Learning Pipelines and Careers
Based on July 2020 interview
- ACME Class of 2017
- Ancestry, Associate Data Scientist
- Tech, Consumer Internet
- Picking a topic for a personal data project that you’re interested in makes it more likely you will try harder, try more things, and stand out more as a result. If you don’t have any current interests, just start anywhere, and pay attention to what is interesting.
- Learn how to communicate complex topics in simple ways so anyone can understand. Start by trying to explain the last model that you built to your parents, siblings, cousins, or someone with no math background.
- Ask people who you trust and who are in the field you want to get into to look at your resume, give you feedback, and tell you what things you should change and what keywords you need to add.
Tim Riser: Can you tell us a little about your background and how you came to where you are today?
Drew Pearson: I’ll start the story at BYU. I was first interested in ACME because of sports analytics. I was always interested in sports and always enjoyed math, so I wanted to get into sports analytics. I didn’t declare my major until I declared ACME my fourth year at BYU. I’m one of those people with five or six years at BYU. So I was drawn to ACME, decided to declare ACME and loved it—though I quickly learned that I didn’t want to get into sports analytics! But I definitely did want to stay in the data space. I got interested in machine learning and using models to build things. That’s what geared up or guided my search of careers post ACME. I graduated three years ago in 2017, and was able to get a job at Ancestry, which I’ll go into more depth later. I got a job there as an associate data scientist and worked there for three years. I actually just stopped working there last week as I’m about to start a dual degree at the University of Virginia for a master’s in data science and an MBA.
Tim Riser: What drew you to your current work?
Drew Pearson: I was drawn there for several reasons, but I have a different point to lead with: I probably didn’t do as much research as I should have done. That would be my first headline: do more research into the companies and into your job before you accept the offer! But there are a couple things that were interesting to me: basic things like the location, the salary, and the title kind. On a deeper level, I really liked the manager I would work for (Tyler Folkman). I got along well with him and I saw myself excited to work for him. The specific work itself drew me in as well. I naturally didn’t have a perfect understanding of what my work would be, having never done the job, but I knew it was going to be more of a machine learning than analyst role. As you in ACME know there’s kind of a difference between a data analyst and a data scientist in some ways. I knew I didn’t really want to build dashboards to present metrics. I wanted to actually build models and that’s what this role offered.
Tim Riser: What was it that excited you about this hiring manager?
Drew Pearson: He was friendly, personable, and I felt like he was really interested in me. I felt like he cared about my growth and not just in a “Here’s what you need to to do” kind of way. I got that sense from talking to him a couple times in the interview process. During the first screening phone call I built a pretty good rapport with him, but after my on-site interviews, I was able to meet with him for another thirty or forty-five minutes and learn more about his philosophy. I asked how he manages a team and got to see he’s not the type of person that’s going to micromanage, he’s not going to meddle, and instead empowered his team. Having a knowledgeable manager that could help me with any questions was important to me. Hearing that he wanted me to grow, and that he knew this job is important not just for the company but also for my career growth got me really excited.
Tim Riser: What does career growth look like for a data scientist?
Drew Pearson: Data science is always evolving, so career growth is about continuing to learn and applying what you learn in different situations. As you grow, you’re able to do more interesting things with your toolbox. At first your data science toolbox is kind of small. Maybe you’re pretty good at doing a simple classification problem, so maybe you can do random forests really well. As you grow and learn more tools, learn more skills and learn how to apply them, you can be faced with any sort of problem: natural language processing, computer vision, whatever it is—and you’ll be able to throw some interesting models at it, understand how the models are working, and solve these novel, interesting problems to provide even more value for the company. That’s how I see the growth in the initial phases of a data science career.
Tim Riser: Looking back on yourself as a student, what things did you misunderstood or not get quite right about your current role as a data scientist?
Drew Pearson: Again, I don’t think I had the best understanding about my current role before joining. I think I lucked out. I could have ended up in a much worse situation, but this role has been great, really great. I loved my time at Ancestry and am so grateful for the chance to work there. Even though I did not do the research, I came to realize how great a place it is to work. I do plan to do the research moving forward. From here on out, whenever I’m thinking of accepting a role, my plan is to do all I can to get the offer, and then once I have the offer in hand, do some more research, like talking to other people on the team and talking to the hiring manager, to really understand what the company and role is like. When I was going into my role at Ancestry, I didn’t really know what to expect. I was pleasantly surprised with how much I was going to be in Python every day, and how much I was going to be able to try different models and actually build some things that get to be put into the product. I also didn’t really understand how engineering teams work and how products get made. For example, I didn’t know that the engineering team is segmented into different products and I didn’t know how the data science fit in with the engineering teams. I neglected to ask about the whole organizational structure of teams. I had no idea to do that. I’d only been working on small projects on my own notebook, or maybe collaborating with a couple of friends on one notebook. I had never worked with a massive product before. Those were things that I got to learn over time.
Tim Riser: What would be your quick summary of a data scientist’s guide to navigating tech company organizational structures?
Drew Pearson: The most common format for every team is different. Every organization deals with their data science team differently, and that’s where the research comes in. You should ask questions like, “How does your team or your organization or your company deal with data science?” With Ancestry, it was following what I think has been termed as “prod pods”, or production pods. The way it works for a lot of other teams is that your data science team is by itself on a kind of island, and data scientists get plugged into different production teams. As an example, at Ancestry we have Hints, where we hint out different historical records, but then we also have Search, where you can search historical records. You could think of those as two different products. There is a data scientist working on each of those products, and they’re in those prod pods, but luckily all the data scientists are actually their own team. And so they get to collaborate with each other as well.
Tim Riser: How did you get the offer in the first place? How did you break into your field? How would you sum up the theory of how Drew Pearson broke into industry?
Drew Pearson: The hard thing is that coming out of undergrad as a data scientist is rare. I started as an associate data scientist. The first thing to note is that at a company of Ancestry’s size and caliber, people with master’s degrees or Ph.D.’s generally get the title of data scientist. Luckily, there was one other ACME person there, and that was another reason I was excited to work at Ancestry. Maria Fabiano Moraley was the other ACME student straight out of undergraduate, and the two of us were the only people that didn’t have a graduate degree in the entire data science team. And so how did we do it? I won’t speak for Maria. But for me, there was a couple things I did. One, it was just a flat-out numbers game. I applied to 50 companies my senior year and I got rejected over and over. The more things you apply to, the higher chance you have of getting into one right? And so I really did apply to a ton of jobs. Many of them, I probably wasn’t qualified for but why not throw your name in the hat? What did that do for me? Well, it really helped me get experience interviewing. And that would be my second tidbit: practice interviewing, practice selling your story, and practice communicating effectively. The third thing I did was the resume. The resume is so important to break into the field at Ancestry. We’ve had different times where we’ve had internships and we can get upwards of a thousand applicants for a data science internship. If you think about this number-wise, if there are a thousand applicants for one recruiter to look at, if they just spend a minute per applicant, that’s over 10 hours of their time. A recruiter usually has 5-15 positions to work, so that’s 50 hours that they’re having to spend looking at resumes. The real secret is recruiters don’t spend a minute per resume. They can do it a lot faster and immediately flush out the top resumes. You need to make sure your resume is really on point. The way I did that is I asked people who I trust that were in the field I wanted to get into to look at my resume, give me feedback, and tell me what things I need to change and what keywords I need to add. So that was another one I spent a lot of time on: meeting with different people well into their careers and getting their feedback on my resume.
Tim Riser: Who are these people you met with to get feedback on your resume?
Drew Pearson: I did an internship before I started ACME in customer relations. There was some coding, but nothing really to do with ACME—no math involved. I was purely customer relations. I really liked my manager there and he taught me one thing that will stay with me for the rest of my career. He said that whenever you’re in a position, you should talk to your manager once a month or so and get their feedback on your resume. He encouraged me to make sure my current work was being updated on my resume and to ask for suggestions of some strong bullets that would represent the work I did. At the end of that summer, we did just that. I did an internship at Google through the staffing company Adecco, which was more analytical, including more SQL and data science. Again, at the end of the summer, I made sure to meet with my manager about my resume. The bullet points for this Google internship were strong because of his feedback. Lastly, I just have a couple mentors I’ve met through the years via LinkedIn or my church whom I trust. I’ll occasionally send them an email with my resume asking them to look it over.
Tim Riser: What advice would you give to juniors in the ACME program of how to spend their junior year what kind of internships to get at that sort of thing?
Drew Pearson: I want to acknowledge that ACME keeps you busy. If you do everything that I say and everything that your professors say, you probably don’t have time to do it all. That’s recognized. My first advice is do the best you can in ACME and focus on your studies, because it’s a great curriculum and the material you learn will be helpful. Now on top of that. I think the next most important thing is gain some experience, ideally through an internship. If you can’t get an internship, find anything where you can get some real world experience, even if it means you’re just volunteering at a company and not getting any paid money. The important thing is getting some sort of data set to work with and you can build something or do something that you can put on your resume. ACME projects are great, and putting those on your resume is helpful, but having some real world experience will separate you from the other thousand students that are applying. They have all done a simple classification model using the classic data sets that are out there in academia. You definitely want to get a real world experience in one way or another.
Tim Riser: What kinds of data projects should students consider if they’re trying to target data science jobs?
Drew Pearson: Great question. I don’t think there’s a wrong answer to that. Pursuing what you’re interested in is the obvious, basic answer, but there are some very important reasoning behind that answer. If you’re pursuing a data science project or a project involving data that you’re interested in, you’re probably going to go above and beyond. You’re probably going to learn a little more and try some new things, whether it’s trying a new model type or trying a new approach that will help you stand out a little more. So that’d be my first answer: what are you interested in? What’s something that you’d really love to tackle? My second answer would be just start anywhere! Start somewhere! Dive into it! And as you dive into one problem, you’ll think of another problem that you want to try and dive into.
Tim Riser: If you were to do it again, would you do ACME?
Drew Pearson: I definitely would. It is a grind, though. There’s no way to sugarcoat it. It was hard. There were a lot of times where I didn’t want to do it. I actually almost quit in the first week. Let me rephrase that. I did quit, and dropped all my classes on like the third or fourth day. I remember I was in lab getting things set up and I just walked out. I went to Dr. Jarvis and said “I’m done.” I dropped all my classes and then spent the night thinking about it. I was going to switch to be an econ major, but then the next day I was like, “Nope. I’m going to do this.” I added all the classes again and ended up doing it. And yes, I would definitely do it again, but it’s hard. There’s no doubt about it. It’s a grind but I’m glad I did it.
Tim Riser: Speaking of that, is there any other life advice you’d give to people in the middle of ACME?
Drew Pearson: I’d say also just recognize that it isn’t the end-all be-all. There’s definitely some days where you feel like it’s the end of the world. You have too much. You can’t get it all done, but it’s all going to be okay in the long run, whether you don’t finish a homework assignment or you bombed the test or you know you are not getting everything done that you need to. It’ll end up being okay and everything will work out. The amount of knowledge that ACME tries to throw at you, no one soaks it all up. So just soak up as much as you can and that’s a great starting place. You have already soaked up so much more than 99% of the world in this field. And so just know that you’re already well ahead. And you’ll just soak up more next year, or next time so don’t beat yourself up too much.
Tim Riser: Reflecting on your experience so far, what skills, technical or otherwise, have you found to be most useful in your career?
Drew Pearson: Most data scientists use Python, and a few use R at Ancestry. I think we had one data scientist who used R and the rest use Python, the libraries within Python, and deep learning tools. I’m not as skilled in deep learning as I would like to be, and that’s one of the reasons I’m going back to school. Data scientists are not just people who can code things up, but can think critically about problems. That’s one area where things change from academia to industry, is that you’re no longer working in a vacuum. You need to have a good understanding of the model pipeline. You need to think critically about your data sets, your test, train, and dev sets. You need to think critically about the random sampling of your data set and about feature engineering.. You have to learn how to think critically. Getting a job in data science requires knowing the specifics of the models. Lots of interviews will ask you, “What’s the difference between XGBoost and random forest?” Those are really important to getting your foot in the door and being able to ace that interview. Another important thing is the soft skills ACME is talking about a lot. They really are important. One element is being a good team member. ACME does a great job with small cohorts, learning how to really be a good study teammate, and providing value to your friends. Maybe you don’t know the answers, but you can still keep the mood light, or keep everyone happy, or bring the pizza or something like that. The last one would be explanation and communication. In ACME we are talking with other people that are very bright and are studying the same material. In the real world, you’re going to be talking to people who are very bright but in different areas, so you can’t always explain things in the most technical terms. Learn how to communicate effectively and communicate complex topics in simple ways for people to understand. Start by trying to explain the last model that you built to your parents or your younger siblings or your cousins or some people who have no math background. Explain your model to them and see if they understand. That’s good practice.
Tim Riser: Can you talk a little bit about what a production like machine learning pipeline looks like?
Drew Pearson: Great question. ACME does an amazing job at hitting what the real world looks like as a data scientist, but I think it can do even better. There are concrete, non-mathematical, but important things you don’t necessarily cover in ACME, including pipelines or scrum teams, which are often how engineer teams are organized. It might be interesting for ACME to hear more from data scientists on these real-world topics. For us at Ancestry, we have data scientists who just work on modeling and don’t have to work on the pipeline. Other companies are different, where the data scientists own the pipeline beginning to end. As a data scientist, I got to kind of work in a vacuum. I had my data set, I had to help get the data set together, but I also had an engineer help me get the data that I wanted. I use that data set to train models, play around with and use that data set in a sandbox environment, and build the model that I liked. I then pass that model off to another engineer who takes that bin file or whatever it is and puts it into the actual product. There are customers interacting with the product every day, so I luckily didn’t have to deal too much with the pipeline. There is a lot that goes into it, especially when models have to use real-time data. In that case, we need to figure out how to get this real time data and translate it into the features. We need to run it through the model that we want it to run, spit out the output, and then take that output and do something with it. That whole pipeline can be massive. If you don’t have that pipeline, then your model is sitting idly in a sandbox.
Tim Riser: If someone’s hearing the word “pipeline” in the machine learning context for the first time, how would you describe it?
Drew Pearson: I would describe it as just kind of a series of steps to go from one place to another. That’s very theoretical and high level, but when I think about a pipeline, I think of production. When I say production, I mean in some way customer-facing, whether it’s the people on Ancestry site or whether it’s other teams using whatever you build. What you need is a series of steps to go from point A, which is perhaps just raw data, to then point B, C, D, E, F, and ultimately your endpoint, which is whatever your output is that someone else is going to consume and going to use.
Tim Riser: Since Ancestry is a consumer product, are many of the machine learning models you build focused on modeling consumer behavior?
Drew Pearson: In Ancestry there are two sectors of data science. This is not including the whole DNA science team that deals with the DNA product. There is a business data science team and a product data science team. The business data science team does a lot of stuff with our customers and figuring out who should be our customers. Who should we target our marketing advertisements to, and who is likely to cancel their account? Who do we need to try and send a referral code to? Things like that. The product data science team focuses on more how we can improve our product. An example would be the product that I work on, the Hints system, where if you start filling out your family tree and you give us some information about your great-great-grandmother we want to then say, “Oh, we know your great-great-grandmother. Here’s her birth record. Here’s her marriage record. Here’s her death record” and send those hints out. That process uses a lot of machine learning, and we want to make sure that that process is as good as possible. We continuously improve our system models and algorithms to send more relevant and more interesting records to that customer.
Tim Riser: Thanks, Drew!