Machine learning is combined with vacuum-ultraviolet (VUV) absorption spectroscopy measurements to enable predictions of molecular structure. Given the presence of carbonyl functional groups in elusive combustion intermediates, such as ketohydroperoxides, the present work employs VUV-absorption spectroscopy to measure spectra for saturated aldehydes and corresponding alkane analogues from 5.167 – 9.500 eV to expand a training library that is imbalanced with respect to aldehyde spectra. Machine learning models are developed for predicting the presence of an aldehyde group using various pre-processing techniques including feature selection, over-sampling (Synthetic Minority Oversampling Technique, SMOTE) and under-sampling (Tomek Links). This set of techniques was utilized in a bagged tree ensemble with leave-one-out cross validation. The scores from the ensemble were evaluated with receiver operating characteristics curves along with the number of true positives at zero false positives (TPFP=0) and partial area under the receiver operating characteristic curve to obtain metrics that determine predictive ability. The model trained using the entire photon energy range of the spectrum improved significantly when using SMOTE coupled with Tomek Links. The photon energy was also truncated to various ranges to examine the influence of binning. The most accurate ranges were 6.5 – 7.0, 6.5 – 7.8, and 6.5 – 8.5 eV. All three energy ranges showed similar trends in which the use of oversampling marginally increased the number of true positives. Four species, cyclobutane carboxaldehyde, cyclopropane carboxaldehyde, acetaldehyde, and formaldehyde were not predicted accurately with any of the models built due to spectra that differ drastically compared to the rest of the aldehydes. The largest improvement shown was due to the restriction of energy range, showing motifs are identified more reliably within a particular energy range that is specific to the class and the use of oversampling to address the imbalance within the dataset. The resulting models enable accurate predictions of aldehyde contributions to absorption spectra in the VUV and defines the first step of a broader piecewise approach wherein machine learning can predict molecular structure from multi-functional species.