Findings From a Machine Learning Approach to Site Characterization – Continuing Education for Science and Engineering

Background: Azimuth1, with support from the National Science Foundation, researched how findings from thousands of previous contaminated site investigations can be leveraged to determine if there are commonalities within site subgroups that are meaningful when planning site characterization.

Approach: The data are compiled from EPA and state agency data, and then augmented to capture 3D contaminant extent, soil, and groundwater characteristics. The data are then categorized to contaminant type, soil conditions, groundwater conditions, climate, and age of the site; these site categories are then modeled using a machine-learning algorithm to determine the level of similarity among similar sites. This algorithm produces a statistical prior probability model automatically, which is used to plan on-site data collection, or as a further line of evidence for site characterization.

Results: Our analysis has shown that using this machine learning approach reduces uncertainty and bias, creating an easier way to begin understanding a complex site. This presentation will show the savings in time and cost for collecting more ideally located sampling locations, and demonstrate the measures of internal consistency and back testing performed to validate the method and estimate levels of accuracy to be expected.

Primary Author / Conference Presenter:
Jason Dalton
Azimuth1
McLean, Virginia, USA

Co-Author:
Anna Harrington, Azimuth1, McLean, Virginia