Teaching Computers to Read: Rachel Wagner-Kaiser Interview

We recently caught up with Rachel Wagner-Kaier, Director of Data Science based in Seattle who has a new book coming out called "Teaching Computers to Read".

‍

Hi Rachel, thank you for the interview! Let's start with a bit of who you are...

DSW: Please tell us about yourself

Rachel: Absolutely, I am currently a Director Data Scientist based in Seattle.

I’ve been here for about eight years and love living close to the mountains (like most Seattleites I like to spend my time outdoors). Before ending up here, I got into the data science field while in graduate school, where I was researching old star clusters in nearby galaxies. This was before there were majors and degrees in data science, and I started taking data science classes on the side. It appealed to me because it mirrored the parts of my research I enjoyed: problem solving, coding, statistics & math, and turning data into knowledge. Now at work, among other things, I lead technical teams day-to-day to build AI solutions to solve pain points for different parts of the business across various industries. Most of my projects over the years have some element of natural language processing (NLP), and thankfully it’s remained a dynamic, relevant space!

DSW: There are many, many technical books out there – why did you decide to write another book on NLP and AI topics?

Rachel: It does feel like there are already a million books out there about AI. When I started this journey, my colleague and I were a bit struck by the dichotomy though. There were either very high-level AI books targeted towards business leaders that really didn’t incorporate any technical details, or books that were very into the nitty gritty technical details but without a lot of practical, business-driven advice. We wanted to create something that bridged the two: practical, technical advice about building useful AI solutions for business problems. I believe both early career data scientists and engineers as well as those leading or collaborating with technical teams will benefit from our advice and examples, as well as from the code companion.

DSW: Right, tell us more about the code companion that will be released with the book.

Rachel: Yes, what good is a bunch of advice if you can’t practice it, right? The code companion reinforces the key concepts I cover in the book, and it will be available on GitHub on the book launch date (Nov 5). It walks through the end-to-end build of a classic NLP problem. There are open-ended exercises throughout the end-to-end, which show how challenging real-life problems and data are. The Code Companion guides readers through how to critically think through design, build, evaluation, and production steps. Anyone who can survive the hands-on portion can take that experience directly into the workplace.

DSW: You mention several times in “Teaching Computers to Read” that designing and building useful AI is about picking the right approach for the specific problem, not just using the newest tools. What does that look like in practice?

Rachel: We are seeing the hype cycle out in full force with every LLM release and every new agent tool out there. It’s easy to get starry-eyed about LLMs and agents and vibe code quick little POCs that do really cool stuff, but it’s much harder to build something that has robust performance across a real corpus of data, something that doesn’t fail in production, and continues to provide ongoing value to a day-to-day business team. Practically, this means taking marketing with a grain of salt (there are a lot of promises) and testing the tools out for yourself. Use a champion-challenger approach and stress-test the “latest and greatest” AI promise on a real problem, with a statistically valid sample, and measure performance on the new tech or tool vs a tried-and-true method. The “science” part of data science (that is, building hypotheses and testing via valid scientific experimentation) is the best way to avoid falling into the hype cycle trap. Blindly assuming everything is a nail because you have an (admittedly awesome) hammer is a mistake and leads to the impending trough of disillusionment in the hype cycle.

DSW: In the build process, what are some of the most common mistakes teams make when preparing or labeling data to develop an NLP or AI solution?

Rachel: Random sampling sometimes feels like the bane of my existence. Until I understand the distribution of our dataset, I can’t just grab a random sample and assume it’s representative. Stratified sampling tends to be easier with structured data, but with natural language it can be tough, especially since we often start without labels and have to create the labels ourselves. How are we supposed to figure out what strata to sample on and how can we check that sampling along that variation represents the breadth of variation in the corpus? That’s the goal of course, to get the smallest, quickest sample possible that still represents the broad data distribution to build a more generalizable solution. I think the most common mistake is taking a random sample and assuming it’s good enough. This is something I really harp on in the book AND the code companion.

DSW: One of the topics you also return to throughout the book is cross-team collaboration. How can technical teams bridge the gap between actual model metrics and business expectations when assessing AI performance?

Rachel: What we do as data scientists can be challenging to communicate. Most business teams I work with can manage accuracy and maybe recall and precision, but more complex metrics (like similarity scores, BLEU, mean reciprocal rank, etc.) can be difficult. On top of that, in most solutions I’ve built, we also rarely use a single model in isolation. Communicating this to less technical stakeholders means bringing examples of model predictions as we develop, and being transparent about cases where the solution does great and the cases where the AI solution might struggle. It also means measuring the solution in the way it will be used, which typically means measuring the overall performance of a bunch of stacked or inter-dependent models and approaches. Even better, if I can connect that directly to their business KPIs and show how that’s impacted by the performance of the AI solution, that is when they can start to see the dots all getting connected in how they will benefit. It’s really critical to bring them along the way, how them how the process works, why we make the decisions or take the approach that we do, solicit continuous input throughout the build, and be a helpful source of education for them. That sounds like basic advice, but can too often get skipped, and really helps align expectations for how the AI solution will perform once released.

DSW: How do MLOps pipelines need to adapt to support the unique requirements of NLP and LLM-based systems?

Rachel: This is a good example of something that is talked about in the abstract or in technical terms, and I believe can be difficult to find really practical advice. I go through a couple examples in the book to bring this to life, and talk about the end-to-end lifecycle of an NLP or AI solution. The main difference is that data distribution and data drift is much more complicated with unstructured data than with structured data. Human-in-the-loop also tends to be more time-intensive with unstructured data, so it’s important to be intentional about leveraging subject matter expertise time for verification. Beyond that, though, there are important concepts about what model performance drift is “meaningful” from a business perspective, and being able to trigger NLPOps activities based on the needs of the stakeholders rather than purely numerical indicators. That requires close collaboration between the technical team and the subject matter experts.

DSW: How can teams build trust with stakeholders who may be skeptical about the reliability of AI-driven systems?

Rachel: In many scenarios, I find myself, as a technical leader, being primarily an educator. I spend a good deal of time explaining what we are doing, what we are building, why we are doing it that way, what the outcomes are, and targeting an honest representation of those results. This plus getting consistent feedback and sharing results as we go contributes to the trust I see from stakeholders. The other more obvious pieces are, of course, doing things ethically and fairly, building solutions with the appropriate technical approach, validating performance thoroughly with independent data, having strong testing plans, getting peer review, and so forth. But, really, the people element is key, particularly the practice of change management, which I have a section on in the book. It’s a much broader subject, but it’s something not enough technical teams think about, and having good change management heavily impacts the long-term success of an NLP or AI solution.

DSW: Why do so many organizations struggle to turn AI investments into real business results?

Rachel: In my experience, a lot of it comes back to setting realistic expectations both internal to organizations and external to organizations. The hype doesn’t help organizations much right now to think practically about what’s really feasible or realistic. A lot of organizations, especially outside of the tech industry, are not really set up to build or manage AI, and it will take them time to adjust. But, probably the hardest part is that proof of concepts are short, easy, and relatively cheap. POCs can quickly show clear, obvious, quantifiable business value. But taking that POC and putting it into production? That can be two times to ten times the amount of work! Engineering and scaling an AI is not easy. Many teams have a hard time wrapping their head around that, and I see a lot of teams get stuck at having a great POC that they don’t have the bandwidth, skill, or budget to scale, despite the proven value.

DSW: What is one unexpected topic in “Teaching Computers to Read” that might interest readers?

Rachel: Despite being hopelessly mono-lingual, I’ve had the opportunity to build AI solutions with native speakers across some challenging languages, non-Latin languages especially. I find it fascinating how different languages function and how that impacts building NLP and AI solutions (shout-out to the linguists out there for the work they do!). Of course, the biggest challenge across most languages, aside from the top 10-15 or so, is lack of data. English continues to be so highly dominant in the AI space that it’s largely assumed we are talking about English. There’s a great “rule” called the Bender Rule that encourages data scientists to be very clear about which languages the work was done in, tested on, applies to, or is referring to. “Teaching Computers to Read” largely talks about English, but, in my opinion anyway, there’s this fun chapter about working across languages.

DSW: What are the key takeaways you hope readers get out of your book?

Rachel: I hope that readers will come away with three things. One, practical advice for thinking critically through design, build, deployment, and monitoring of NLP and AI solutions. Two, hands-on practice through the code companion to have key best practices of the book really sink in. And three, hopefully some newfound excitement about NLP and AI solutions! It will be fun to see where the field goes from here.

DSW: Where can people find out more about you?

Rachel: For more details about both the book and myself, I recommend checking out the website. From there, people can find links and background on the book, the code companion, sign up for the mailing list, and check out where I’ll be speaking next. For anyone in Seattle, there’s an upcoming book launch party on November 8. I’m also excited to be in London October 27-30, and I will be at the AI Summit in New York City on December 10 and 11. I also post regularly on LinkedIn about these topics and related NLP and AI topics, and I am always happy to connect!

‍

Rachel—thank you ever so much for your time! Really enjoyed learning more about your background, your move to data science, your new book, "Teaching Machine to Read," and what readers can learn from you and your work. Good luck with all your ongoing projects!

‍