Résumé
Languages & Tools
Python
,
SQL
,
Git
,
Jira
,
Azure DevOps
,
PyCharm
,
CLI
Skills
Natural Language Processing (NLP)
, Network Analysis/Graph Theory
,
Machine Learning
, Statistics
, API (Develop & Consume)
,
Classification
, Regression
, Data Visualization
,
Unsupervised Learning
, Web Scraping/Crawling
, Code Review
,
Object Oriented Programming
, Automation
, Unit Testing
Third-Party Packages I Commonly Use (Clickable)
pandas
,
scikit-learn
,
networkx
,
pydantic
,
numpy
,
scrapy
,
matplotlib
,
seaborn
,
pre-commit
,
black
,
isort
,
flake8
,
mypy
,
requests
,
spacy
,
typer
,
pytest
,
beautifulsoup4
,
streamlit
,
tqdm
,
fastapi
Career
Senior Data Scientist @ H&R Block
04/2022—present
- Developed a proof of concept to convert tax instructions into a directed acyclic graph using named entity recognition and a key file.
- Using topological sorting within a directed acyclic graph, predicted hundreds of tax lines in parallel within the same generation.
- Through the use of graph algorithms, developed methods to quickly identify which lines in a tax form impacted other lines. This was used to determine where to focus attention when reviewing tax returns for issues that could have a large impact on individual clients.
- Published the subpackage, “ReturnAssistant,” with the ability to process 60,000 prior year returns in ~200ms.
- Crawled the IRS website, extracted instructions for all forms and schedules, and preprocessed into training data.
- Developed and trained a relation extraction pipeline to automate the construction of a directed acyclic graph using a custom tokenizer, a fine-tuned named entity recognition component, and a custom trained relation extraction component.
- Hand-picked for the generative artificial intelligence research and development team that was tasked with researching how emerging technology in the natural language processing space could be leveraged by the company.
- Developed a Python package to help navigate and validate the company tax engine using graph algorithms.
Data Scientist Principal @ The University of Kansas Health System
10/2021—04/2022
- Automated PRM ticket creation saving 100 minutes per month.
- Introduced Gaussian Processes (Kriging) as a data generation technique to estimate values where observations were not recorded in a time series.
- Created and maintained a private Python package to streamline my team’s work.
Business Intelligence Statistician II @ The University of Kansas Health System
06/2019—10/2021
- Deployed a classification model via API to label cost accounting codes with 95% accuracy, shortening manual labeling time from 3 hours to seconds widening the user-base.
- Automated weekly JIRA tickets, time checks, and comment reminders using Python and the JIRA API saving 100 minutes per week.
- Using a topic model, trained a named entity recognition pipeline on charge code descriptions to identify profitable subsets.
- Cut average costs per primary complete mastectomy from $9,000 to $5,500 using evidence provided via bootstrapping and difference testing.
- Deconstructed and rebuilt the COVID-19 CHIME model using internal data, providing an auto-updating dashboard to senior leadership to help with capacity and supply planning.
Statistician @ Cerner Corporation
05/2017—05/2019
- Designed and automated standard analyses, updating PowerPoint presentations directly.
- Simulated and optimized complex rule-based algorithms for patient-provider matching.
- Lead quarterly Python classes to teach and train associates.
- Implemented quality-improvement benchmarks for health systems using a combination of the Achievable Benchmarks of Care and mathematically derived metrics.
- Developed a Python module to generate and execute SQL, allowing the validation of data sets and simulation of changes.
Education
B.S. Mathematics—Kansas State University
08/2013—08/2015
World Affairs—University of Oxford
06/2013—07/2013
Mathematics—Saint Charles Community College
08/2011—06/2013
Achievements
- Mapping Tax Structures Via Natural Language Processing Generated Directed Acyclic Graphs (Patent Application #18/414,771)—HRB Innovations
- Bronze Pandas Tag Badge—Stack Overflow
- Bronze Python Tag Badge—Stack Overflow
- 66 Days of Data Shoutout—66 Days of Data
- A Generalization of the Goresky-Klapper Conjecture, Part I—Society for Industrial and Applied Mathematics (SIAM)
- Consulting Project Eagle Award—Cerner Corporation
- DataCon 2017 Data Competition Winner—Cerner Corporation
Open Source Contributions
- [BUG] – Arguments
enable
anddisable
not working as expected inspacy.load
—spaCy - [ENH] – Option to exclude
model_extra
fromrepr
—Pydantic - BUG – Requiring
name
argument inStackAPI
makes “/users/{id}/network-activity” endpoint inaccessible—StackAPI - Update index handling in
PandasAdapter
—Scikit-learn - Most recent
scikit-learn
results in several failed unit tests—mlxtend - Integrate scikit-learn’s
set_output
method intoTransactionEncoder
—mlxtend - ENH – Replaced for-loops in :function:
rescale_layout
with numpy vectorized methods.—Networkx - Typo fix in
Language.replace_listeners
docs—spaCy - DOC Update MLPRegressor docs—Scikit-learn
- add
feature_names_in_
attribute toFeatureUnion
—Scikit-learn - synset pos parameter—spacy-wordnet
Presentations
- Detecting Kidney Stone CPT Communities using the Louvain Method—Health Analytics Summit @ Health Catalyst (09/2021)
- Exploring Null Space—DataCon @ Cerner Corporation (09/2018)
- Automating Excel Report Generation with Python—DataCon @ Cerner Corporation (09/2016)