For scientists, data is the lifeblood of research. Collecting, organizing and sharing data both within and across fields drives pivotal discoveries that make us better off and more secure.
Making data open and available, however, only answers part of the question about how different scientists — often with very different training — can draw useful conclusions from the same dataset. In order to promote and guide the cultivation and exchange of data, researchers have developed a set of principles that could make the data more findable, accessible, interoperable and reusable, or FAIR, for both people and machines.
Although these FAIR principles were first published in 2016, researchers are still figuring out how they apply to particular datasets. In a new study, researchers from the U.S. Department of Energy’s (DOE) Argonne National Laboratory, Massachusetts Institute of Technology, University of California San Diego, University of Minnesota, and University of Illinois at Urbana-Champaign have laid out a set of new practices to guide the curation of high energy physics datasets that makes them more FAIR.
“The FAIR principles were created to serve as goals for data producers and publishers to improve data management and stewardship practices,” said Argonne computational scientist Eliu Huerta, an author of the study. “The community expects that adhering to these principles will enhance the capabilities of machines to automate the finding and use of data, thereby streamlining the reuse of data for humans.”
The research, published in Nature Scientific Data, demonstrates how to FAIRify an open simulation dataset drawn from particle physics experiments at the CERN Large Hadron Collider. To highlight the interplay between artificial intelligence (AI) research and scientific visualization, this study also provided software tools to visualize and explore this FAIR dataset.
In addition to building FAIR datasets, Huerta and his colleagues also sought to understand the FAIRness of AI models. “To have a FAIR AI model, we believe you need to have a FAIR dataset to train it on,” said Yifan Chen, the first author of the paper and a graduate student at Illinois and Argonne’s Data Science and Learning division. “Applying the FAIR principles to AI models will automate and streamline the design and use of those models for scientific discovery.”
“Our goal is to shed new light into the interplay of AI models and experimental data and help create a rigorous framework for the development of AI tools to address the biggest challenges in science,” Huerta added.
Ultimately, Huerta said, the goal of FAIRness is to create an agreed-upon set of best practices and methodologies, which will maximize the impact of AI and pave the way for the development of next-generation AI tools.
“We’re looking at the entire discovery cycle, from data production and curation, design and deployment of smart and modern computing environments and scientific data infrastructures, and the combination of these to create AI frameworks that greatly advance our understanding of scientific phenomena,” he said.
Argonne National Laboratory seeks solutions to pressing national problems in science and technology. The nation’s first national laboratory, Argonne conducts leading-edge basic and applied scientific research in virtually every scientific discipline. Argonne researchers work closely with researchers from hundreds of companies, universities, and federal, state and municipal agencies to help them solve their specific problems, advance America’s scientific leadership and prepare the nation for a better future. With employees from more than 60 nations, Argonne is managed by UChicago Argonne, LLC for the U.S. Department of Energy’s Office of Science.
The U.S. Department of Energy’s Office of Science is the single largest supporter of basic research in the physical sciences in the United States and is working to address some of the most pressing challenges of our time. For more information, visit https://energy.gov/science.