GBK Employee Spotlight: Katherine Wilson

kat_wilson_pic_50.jpg
 

GBK Employee Spotlight is a series designed to surface the stories of the amazing individuals across our team and what makes them tick.

Today for our employee spotlight series, we chatted with Katherine Wilson, who recently joined as a Summer Associate in Analytics and Marketing Science at GBK Collective. Currently a PhD student in Quantitative Methods at the University of Pennsylvania, Kat works on applied statistical methods in the areas of structural equation modeling and causal inference. Let’s learn more about her.

Q: Tell us a bit about yourself, your areas of study and what led to your role at GBK.

I am currently a PhD student in Quantitative Methods at the University of Pennsylvania. A methodology program like this one is heavily focused on the application of advanced statistical models, covering everything from Hierarchical Models to Causal Inference. My work focuses first on causal inference, specifically building simulations to help researchers understand how to improve generalizations and understand treatment heterogeneity in non-randomized experiments, and second on the application of psychometric models, such as the improvement of structural equation modeling and factor analysis as applied to large scale survey data. I am also working on an MA in Statistics through the Wharton School, and these Statistics courses give me an opportunity to dive more into the theoretical side of statistics, from probability theory to Bayesian methods.

Prior to graduate school, I actually spent four years on the other side of the classroom, as a public school teacher. I came to teaching after graduating from Notre Dame, a school with a heavy focus on service and giving back to the community. After graduating, I moved to New York and taught special education at a public school in the South Bronx. I then taught with the KIPP charter school network in Austin, Texas. Being in Austin, I of course learned to code, and later worked for a K-12 computer science bootcamp. I taught the basics of programming languages like JavaScript and Python alongside software engineers and data scientists. This is when I fell in love with the creativity and power inherent in computer science, and also was the time when I began to learn more about applications of data science. The experience led me to apply to a PhD in Quant Methods.

I was first introduced to GBK Collective through Professor Eric Bradlow, GBK Co-Founder and Vice Chair of Analytics at Wharton. I worked as a Technical Lead for some Wharton Analytics Fellows projects (such as a psychometrics prediction model for a Major League Baseball team). I really enjoyed the “consulting” aspect of these projects, and also loved the opportunity to work with graduate students outside of my discipline, as well as the leaders at such companies.

Q: What will your role at GBK focus on, and what are you most looking forward to?

Three things are really exciting to me about this new role with GBK. The first is the applied nature of consulting projects. I feel very lucky, over the last two years, to have taken courses in advanced topics like causal inference and machine learning. In these types of classes, we often learn technical skills, such as the theory behind a random forest model, and then apply the method to a project of our choice, such as pulling data from Kaggle on Covid infection rates. GBK will be different, in that our clients already have specific business problems they need to address and our job is to apply the right data and methods to solve them. I am really looking forward to tailoring the method to the research question, rather than the other way around.

Second, as a “methodologist”, my interest is also in studying the application of methods, and thinking about how to improve such methods for different situations. For instance, we are often interested in “worst case scenarios” that can go wrong in the data, and how models will react. Real projects have many of the questions that we try to simulate in methodological work (i.e. off kilter sample sizes, restricted variance, missing data), and understanding how these issues play out in the data for real clients, in what ways the model is affected by these considerations, and what tradeoffs to consider when adjusting the model, is something I am really looking forward to in the role.

The third and probably biggest thing I’m excited about is the team aspect of GBK. A PhD program of any kind can be very insular. I have a strict “deep work” routine, and use self-directed study to learn new topics efficiently. What I am really looking for now, however, is an opportunity to collaborate with others, and share both in the head-scratching that always underlies a difficult data science problem and in the joys of finally figuring out an optimal solution to a complex model. I think the best ideas come from this fusion between individual deep work and team collaboration, and I am really looking forward to collaborating with the GBK team and clients in this manner.

Q: Any recent projects where you have applied Structural Equation Modeling or causal analysis to solve a problem?

Causal inference really took off in applied school research over the last twenty years. With so much administrative data, and a better understanding on the barriers to randomizing interventions, the field was ready for a new statistical paradigm. From the class size experiment of Angrist and Lavy, to Rubin’s work applying the causal model to performance assessments, there have been countless great examples of causal inference in education, and these have really inspired me.

A few months ago, I built a causal analysis to understand the influence of special-education policy on attendance rates for students in New York City. Going back to my days of being a teacher, a debate that really interested me was the benefits of a self-contained class option for students with special needs. Using open data from the New York City Department of Education, I matched schools in the five boroughs on a variety of covariates (socio-economic variables, population, economic need index, etc.) to control for variations in factors which might affect attendance rates. Matching helps to isolate the incidence of what we want to evaluate (whether or not there is a self-contained class for students with special needs). A popular area of interest in these causal models are various ways to detect treatment heterogeneity. In this project, we used a large observational data set (not randomized), and then stratified the groups by their covariate balance to detect treatment heterogeneity. We found that lower income schools actually see more benefits from inclusion than higher income schools.

This kind of causal analysis is used all the time in market research. Many times, marketers are interested in whether or not a certain campaign caused an increase in sales, and the methods to control for the covariate share the same as they do in these large policy examples. Subclassification, the same approach we used in this project, splits the observed individuals along their respective values on the confounding values, and this method can be used to understand treatment heterogeneity on different groups of customers, such as the effect of an intervention on first-time or repeat customers.

Q: What emerging trends in analytics and marketing science are you most interested in?

I am really interested in how big data will be used in marketing science. My bootcamp days in Austin instilled in me some of the “scrappy” values that developers and data scientists pride themselves on, specifically the idea of combining multiple programming languages, such as javascript, html and python, to get you what you need. Last Fall, I audited a class at Penn called “Big Data Analytics” which really opened my mind to big data exploration. I think that accessing large databases, whether it be through APIs or direct scraping is a great opportunity for marketing science. With any project, I’m always thinking “how can we source and apply better data?”. For marketing, this means thinking about different sources of data, in addition to sales or transactional data, and data quilting, or bringing these different data sources together to tell the story that you want to.

As for the analytics side, causal inference is still what really gets me excited. Like most applied statistics students, I was first introduced to multi-level modeling approaches that can be used for randomized studies. The first time I took a causal class, I realized the power of observational data, and the nuances in the methods used to effectively analyze it. These approaches are different from the approaches that would be used in AB testing, where we have randomized testing, but they also share some theoretical similarities. The rigorous approaches to causal analysis have been really expounded upon in policy circles, like in the project that I worked on above with teacher observational data. I have seen that these approaches are now also used in Marketing, and I’m eager to see where this field goes next.

Q: What are your hobbies outside of work? Or what’s a fun fact about you that many people may not know?

Like many kids, I was “forced” to play an instrument growing up. But unlike many kids, my Mom let us choose which one we wanted to play, so naturally, I chose the drums. I ended up playing drums all the way through high school: full kit in the jazz band, drumline in the marching band, and percussion in the symphony band. Years later, while teaching in Austin, I rediscovered my love for music. I picked up playing drums again and made an indie surf rock band with a couple of musician friends. We recorded two EPs and played some classic Austin venues like Carousel Lounge and Hole in the Wall. When I left Austin to move to Philly for graduate school, my bandmates gave me an old Fender Squire Stratocaster. I was hooked. A few months later I bought an Ibanez acoustic, and I’ve been plugging away at it since.

Learning guitar has been such a rewarding and fun hobby to have. I think I like it for a lot of the reasons I like statistics and data science - there is so much to learn - and it’s challenging. From music theory to finger picking patterns, the domain of “guitar” is so large, and I learn something new every day. Like quant methods, the more I stick with it, and the more mistakes along the way, the more I keep coming back, and the better I get. Also like quant, getting good at guitar is nothing more than focused, concentrated, detail-oriented work. One song can have multiple double stops, chord changes, or pull offs to get right. 

The style I’m most interested in is classical fingerpicking and folk music. Doc Watson’s “Deep River Blues” is a song I’ve been studying for a few months now. I also love Kurt Vile, also local to Philly, and his collaborations with the late John Prine. Music is, to me, just like data science - a weird mix of technical standards and creative risk taking. And like anything else that is worthwhile, it is incredibly difficult and frustrating. I’ve certainly got the callouses to prove it!


 

Share this article

 

Follow us and stay up to date

Previous
Previous

Q&A with Acclaimed Brand Strategist, Professor and Author Kim Whitler

Next
Next

What’s On Your Conjoint Wish List? Q&A with Author Dr. Chris Chapman and Professor Eric Bradlow