Abstract: Inorganic-organic hybrid materials have been studied for decades, and hydrothermal and solvothermal syntheses have produced thousands of new materials that collectively contain nearly all the metals in the periodic table. The development of new compounds relies primarily on exploratory syntheses because their formation is not fully understood. Simulation- and data-driven approaches provide an alternative to experimental trial and error.
In this work, an alternative approach that uses machine-learning algorithms trained on reaction data to predict reaction outcomes for the crystallization of templated vanadium selenites is demonstrated. The effects associated with two types of data bias are explored, as such biases propagate through any resulting machine learning model. Specifically, archived “dark” reactions, both failed and successful attempts at hydrothermal syntheses, were used to demonstrate that excluding failed experiments leads to overconfident models. Moreover, anthropogenic biases in both reagent choice and reaction conditions have been identified in a class of exploratory chemical reactions, hydrothermal synthesis of amine-templated metal oxides, using a combination of data mining and experimentation. Physicochemical property descriptions were generated by using cheminformatics techniques, and the resulting data were used to train a machine-learning model to predict reaction “success.” The effects associated with these biases on the resulting machine learning models were explored. Additionally, strategies to increase model interpretability are discussed.