The DOE program supports the development of individual research programs of outstanding scientists early in their careers. The program award was given to Balaprakash for his work on domain-specific machine learning methods that explicitly take into account the characteristics of the scientific data. Under the program, Balaprakash will receive $2.5 million over five years to advance his research.
Balaprakash and his group will develop scalable machine learning methods that can effectively learn in complex DOE scientific domains in which only limited training data exists. The new methods will be integrated with physical and first-principle models to make the models more dependable.
To some, talking about limited data might seem strange when a common theme today is the overwhelming amount of data being produced. Balaprakash explained that machine learning models need labeled, or identified, data to learn to make predictions about unlabeled data. But because of the high costs of monitoring, experiment, and simulation, obtaining a large amount of labeled data is impractical in many DOE scientific domains.
“We need to find an innovative way to efficiently handle scientific domains with limited training data,” said Balaprakash.
Another challenge facing Balaprakash and his group is the fact that in many DOE applications, the data comes from unstructured grids.
“Numerous machine learning techniques have been developed for regular grids,” Balaprakash said. “But applying traditional methods can lead to loss of learning efficiency and poor prediction accuracy. In contrast, our new methods will be able to handle data that differs in structure, size, heterogeneity, and complexity.”
Using the Early Career funds, Balaprakash plans to automate the design and development of his new approach so that it can exploit the capabilities of current leadership-class and future exascale supercomputers such as Theta and Aurora, respectively. The scalability of the approach will then be tested on applications that span the DOE Office of Science mission areas.
Balaprakash’s new approach will fill a major gap facing DOE scientists: how to train large-scale machine learning systems for unstructured and irregular datasets. “My goal is to enable new scientific discoveries with the new model,” said Balaprakash.
The Early Career award that Balaprakash received was funded by the Advanced Scientific Computing Research program with the DOE Office of Science.
For more information, see the DOE website https://www.energy.gov/articles/department-energy-selects-84-scientists-receive-early-career-research-program-funding and the Argonne website http://www.anl.gov/articles/three-argonne-scientists-receive-doe-early-career-awards.