In this project, we looked at how to predict whether someone has good or bad credit, with factors such as checking account status, loan purpose, and duration driving predictions. Models were compared based on metrics such as their ROC Curve, sensitivity, and positive predictive value. To further improve results, we spent time tweaking the probability cutoffs to find the best balance, because in the real world, approving a risky customer can be a lot more costly than turning away someone who would have paid.
Here we dug into Instacart’s open-source dataset to better understand how people shop for groceries online. Using over 3 million orders, we explored questions like which products are most popular, what items customers are most likely to reorder and order together, and how the timing between purchases affects repeat behavior. Working in a Jupyter notebook on a Linux/AWS setup gave us a great opportunity to analyze this data using the command line.