profile for Ian Thompson at Stack Overflow, Q&A for professional and enthusiast programmers

Languages & Tools

Python, SQL, Git, Jira, Azure DevOps, PyCharm, CLI

Skills

Natural Language Processing (NLP), Network Analysis/Graph Theory, Machine Learning, Statistics, API (Develop & Consume), Classification, Regression, Data Visualization, Unsupervised Learning, Web Scraping/Crawling, Code Review, Object Oriented Programming, Automation, Unit Testing

Third-Party Packages I Commonly Use (Clickable)

pandas, scikit-learn, networkx, pydantic, numpy, scrapy, matplotlib, seaborn, pre-commit, black, isort, flake8, mypy, requests, spacy, typer, pytest, beautifulsoup4, streamlit, tqdm, fastapi

Career

Senior Data Scientist @ H&R Block

04/2022—present

  • Developed a proof of concept to convert tax instructions into a directed acyclic graph using named entity recognition and a key file.
  • Using topological sorting within a directed acyclic graph, predicted hundreds of tax lines in parallel within the same generation.
  • Through the use of graph algorithms, developed methods to quickly identify which lines in a tax form impacted other lines. This was used to determine where to focus attention when reviewing tax returns for issues that could have a large impact on individual clients.
  • Published the subpackage, “ReturnAssistant,” with the ability to process 60,000 prior year returns in ~200ms.
  • Crawled the IRS website, extracted instructions for all forms and schedules, and preprocessed into training data.
  • Developed and trained a relation extraction pipeline to automate the construction of a directed acyclic graph using a custom tokenizer, a fine-tuned named entity recognition component, and a custom trained relation extraction component.
  • Hand-picked for the generative artificial intelligence research and development team that was tasked with researching how emerging technology in the natural language processing space could be leveraged by the company.
  • Developed a Python package to help navigate and validate the company tax engine using graph algorithms.

Data Scientist Principal @ The University of Kansas Health System

10/2021—04/2022

  • Automated PRM ticket creation saving 100 minutes per month.
  • Introduced Gaussian Processes (Kriging) as a data generation technique to estimate values where observations were not recorded in a time series.
  • Created and maintained a private Python package to streamline my team’s work.

Business Intelligence Statistician II @ The University of Kansas Health System

06/2019—10/2021

  • Deployed a classification model via API to label cost accounting codes with 95% accuracy, shortening manual labeling time from 3 hours to seconds widening the user-base.
  • Automated weekly JIRA tickets, time checks, and comment reminders using Python and the JIRA API saving 100 minutes per week.
  • Using a topic model, trained a named entity recognition pipeline on charge code descriptions to identify profitable subsets.
  • Cut average costs per primary complete mastectomy from $9,000 to $5,500 using evidence provided via bootstrapping and difference testing.
  • Deconstructed and rebuilt the COVID-19 CHIME model using internal data, providing an auto-updating dashboard to senior leadership to help with capacity and supply planning.

Statistician @ Cerner Corporation

05/2017—05/2019

  • Designed and automated standard analyses, updating PowerPoint presentations directly.
  • Simulated and optimized complex rule-based algorithms for patient-provider matching.
  • Lead quarterly Python classes to teach and train associates.
  • Implemented quality-improvement benchmarks for health systems using a combination of the Achievable Benchmarks of Care and mathematically derived metrics.
  • Developed a Python module to generate and execute SQL, allowing the validation of data sets and simulation of changes.

Education

B.S. Mathematics—Kansas State University

08/2013—08/2015

World Affairs—University of Oxford

06/2013—07/2013

Mathematics—Saint Charles Community College

08/2011—06/2013

Achievements

Open Source Contributions

Presentations

  • Detecting Kidney Stone CPT Communities using the Louvain Method—Health Analytics Summit @ Health Catalyst (09/2021)
  • Exploring Null Space—DataCon @ Cerner Corporation (09/2018)
  • Automating Excel Report Generation with Python—DataCon @ Cerner Corporation (09/2016)