Machine learning methods were combined with differential absorption spectroscopy measurements in the vacuum-ultraviolet region (5.167 – 9.920 eV) in order to develop predictive capabilities for inferring molecular structure from the spectra. Several types of species were analyzed and, for modeling purposes, were defined using a single classification: (1) alkane, (2) conjugation with oxygen (e.g. diacetyl, ethyl vinyl ether), (3) non-conjugated alkene (e.g. 1-butene, 1,4-cyclohexadiene), (4) oxygen-containing (e.g. 1-butanol, tetrahydrofuran), or (5) cyclic (e.g. cyclopentane, cyclohexanone). The latter molecular classification excluded cyclic ethers. Several modeling methods were employed in the analysis of 102 absorption spectra, 24 of which were measured for the first time.
The primary objective was to identify suitable methods that enable accurate predictions of molecular structure classifications with minimized statistical uncertainties. Rather than identifying a single, unifying method to reliably predict molecular structure contributions to VUV absorption spectra, coordination is required among a particular method, the type of molecular structure detail (e.g. conjugation), and absorption region of interest. The latter is accomplished using a binning approach, wherein absorption regions of ~0.5 eV were utilized rather than the entire ~4.8 eV range. Photon energy binning enabled analysis of region-specific predictions of accuracy, precision, and recall. The outcome from the binning approach is that, rather than utilizing the entire spectrum, optimal determination of molecular structure using machine learning methods depends on the absorption region. The present work provides separate machine learning models for each molecular classification, which enables the identification of multi-functional species relevant to atmospheric chemistry and combustion chemistry, where isomer-resolved speciation is critical to understanding complex reaction networks.