Summer 2025 Internship
This project was designed to address a timing gap the investment team faced: major wildfire events were reaching news sources hours after ignition, leaving the team unable to react in real time. To solve this, I built a system that detects new wildfires instantly, pulls live environmental data, and evaluates each event using a machine-learning model trained on historically market-moving fires.
The system begins with real-time ignition detection using NASA FIRMS, which provides immediate latitude and longitude coordinates for new fire events. For each detection, the pipeline automatically collects weather and environmental inputs from NOAA, NASA, Meteosource, and other government APIs, including humidity, wind speed and direction, precipitation, temperature, and related variables.
To build the training dataset, I queried these same APIs for historical weather data at the ignition coordinates of tens of thousands of past fires, creating a large dataset that links ignition conditions to ultimate fire outcomes. From within that set, I isolated roughly fifteen historically significant, market-moving wildfire events and extracted the early-stage weather patterns associated with them.
Using this data, I trained an XGBoost classification model to identify when the initial conditions of a new fire statistically resemble those that preceded large historical events. For validation, I excluded several major fires from the training set and backtested the model's ability to detect them using only ignition-time data; the model correctly flagged these withheld events.
In production, whenever a new fire appears in the NASA feed, the system pulls real-time weather data for its coordinates, vectorizes the inputs, runs them through the XGBoost model, and, if the risk score crosses a set threshold, sends an alert to the investment team. This allows the team to identify potentially significant wildfire events hours before they reach mainstream news or market data providers, improving reaction time and situational awareness for insurance-focused trading strategies.
This project involved building a systematic long/short strategy for global insurance stocks using combined ratios as the primary signal. I pulled historical combined ratio data from Bloomberg for the full international insurance universe and ran extensive statistical tests to understand how different factor constructions influenced performance.
The core process involved testing multiple ways of grouping and scoring the universe. I evaluated tertiles, quartiles, and four-way splits, then applied weighting schemes ranging from z-score-style gradients to simple top-versus-bottom spreads. I also created variations that excluded specific subsectors - such as auto, home, or certain geographic regions - to isolate which segments contributed most to the factor's predictive power.
Each version of the factor was tested across different rebalancing frequencies, including daily, weekly, monthly, quarterly, and annual schedules, to evaluate turnover effects and signal durability. All grouping, weighting, subsector, and rebalance combinations were run as a large parameter matrix in Python to identify the configurations that produced the strongest risk-adjusted results.
The best-performing variants consistently showed that underwriting quality is a reliable driver of returns in insurance equities. The final long/short strategy built from these configurations achieved a 1.54 Sharpe ratio, with controlled drawdowns and stable behavior across market regimes.
Under Construction
I'm building a recruiting CRM for club lacrosse programs that combines my interests in data systems and UX design. The goal is to give coaches a structured, reliable way to track every interaction in the recruiting process and to replace the fragmented mix of spreadsheets, texts, and informal notes that most programs rely on.
The platform is built around two synchronized data models: a player-specific timeline and a college-specific interaction log. When a player or coach submits an update - such as a call, visit, email, or prospect day - the system timestamps it, attaches the appropriate metadata, and writes it into both the player bucket and the college bucket. This creates a bidirectional record of all activity, making it easy to retrieve past conversations or reconstruct a full recruiting history.
A central roster grid acts as an interactive interface layer. Players appear as rows, and each player's active colleges populate as columns. Selecting a row opens the full player timeline with all logged interactions; selecting a column opens the college view with aggregated data across the entire program. On the input side, players and coaches use lightweight mobile forms to log updates, which immediately sync to the backend and update the grid, timelines, and relationship objects.
The system is designed to handle high information volume - across dozens of players and hundreds of college conversations - while maintaining clarity, structure, and fast recall. It also supports future integration with the IMLCA database, allowing clubs to import verified roster and academic information directly into the system.
The result is a purpose-built recruiting infrastructure that streamlines communication, preserves institutional knowledge, and gives coaches a complete, real-time view of each player's recruiting progress.