Data Science


Exploring Drivers of US County GDP

This project set out to understand what really drives GDP per capita at the county level across the U.S. using machine learning. These included key factors like income, migration, and business activity that help explain regional economic differences. These insights can help local leaders and policymakers make more informed decisions about where to focus economic development efforts.

County GDP 2018

Report

Credit Score Classification

In this project, we looked at how to predict whether someone has good or bad credit, with factors such as checking account status, loan purpose, and duration driving predictions. Models were compared based on metrics such as their ROC Curve, sensitivity, and positive predictive value. To further improve results, we spent time tweaking the probability cutoffs to find the best balance, because in the real world, approving a risky customer can be a lot more costly than turning away someone who would have paid.

VarImp

Report

Shiny App: 2018 NBA Player Shot Map

Want to check out how your favorite player scores?

Analyzing Instacart Orders using Linux / AWS

Here we dug into Instacart’s open-source dataset to better understand how people shop for groceries online. Using over 3 million orders, we explored questions like which products are most popular, what items customers are most likely to reorder and order together, and how the timing between purchases affects repeat behavior. Working in a Jupyter notebook on a Linux/AWS setup gave us a great opportunity to analyze this data using the command line.

Sales by Day

Jupyter Notebook

Time Series Forcasting: US Auto Sales, 1995-2020

We set out to forecast U.S. auto sales using time series models that account for trends, seasonality, and key economic indicators like inflation and gas prices. We found that auto sales generally decline over time, but with distinct patterns in different periods, so we broke the timeline into three segments for better accuracy. After testing several models, we saw that, while a trend and cyclical regression model performed best on the training data, it didn’t generalize well. In the end, an ARIMA model combined with economic predictors gave us the most accurate forecasts on the holdout data—making it the best choice for predicting future auto sales.

AutoSales

Report

Tableau Dashboard: Trending Youtube Views