The Data and Learning Hub for Science (DLHub) project seeks to make it simple for researchers to publish and make their machine learning (ML) models available to other scientists and for other scientists to discover and easily run these models that may be applicable to their own problems. Thus, for example, a researcher working with data from an electron microscopy center may be able to discover and apply an existing model to their data to assess image quality or to detect loop defects with only a few lines of code.
Recently, there has been tremendous growth in the application of ML and artificial intelligence (AI) techniques to solve problems in the physical sciences. Even with this progress, there are significant roadblocks preventing researchers from easily applying these models to new problems. Very few developed ML models and other related codes are accessible. Even when the codes are shared, they can be difficult to install and run on new data, or may require retraining before use. In industry, deployment of machine learning into practice has benefited from the availability of infrastructure that provides simple ways to employ common machine learning tasks. Similar capabilities are required for models in science.
To meet these needs, we created DLHub. DLHub seeks to overcome roadblocks in the ML life cycle by providing capabilities that allow researchers to describe, publish, discover, and run ML models and associated data transformation and analysis codes with minimal overhead. DLHub connects models with data and model serving capabilities, allowing producers of models to make them easily available, and permitting consumers to discover the latest AI/ML developments and to quickly apply those developments to their research projects.