Skip to main content
Feature Story | Argonne National Laboratory

New code mines microscopy images in scientific articles

Pioneering software automatically searches for and summarizes content of microscopy images from the scientific literature

Pioneering software EXSCLAIM! unlocks new possibilities for the development and testing of deep learning algorithms.

Deep learning is a form of artificial intelligence transforming society by teaching computers to process information using artificial neural networks that mimic the human brain. It is now used in facial recognition, self-driving cars and even in the playing of complex games like Go. In general, the success of deep learning has depended on using large datasets of labeled images for training purposes.

A potential gold mine of labeled images resides within the scientific literature, with over a million articles published each year. Most have many figures woven into the text. To date, these figures have not been amenable to deep learning models. This is, in part, due to their complex layouts. Each figure typically contains multiple embedded images, graphs and illustrations. Also lacking has been an adequate means to search the literature for images matching specific content.

Researchers now have a powerful image-mining tool to advance their understanding of complex visual information.” — Maria Chan, scientist in Center for Nanoscale Materials at Argonne

Addressing this challenge, researchers at the U.S. Department of Energy’s (DOE) Argonne National Laboratory and Northwestern University have created the EXSCLAIM! software tool. The name stands for extraction, separation and caption-based natural language annotation of images.

Images generated by electron microscopes down to the billionths of a meter are one of the most important kinds of figures in materials science literature,” said Maria Chan, scientist in Argonne’s Center for Nanoscale Materials, a DOE Office of Science user facility. These images are essential to the understanding and development of new materials in many different fields. Our goal with EXSCLAIM! is to unlock the untapped potential of these imaging data.”

What sets EXSCLAIM! apart is its unique focus on a query-to-dataset approach similar to how a prompt is used with generative AI tools such as ChatGPT and DALL-E. It is thus capable of extracting individual images with very specific content from figures, as it both classifies the image content and recognizes the degree of magnification. It can then create descriptive labels for each image. This innovative software tool is expected to become a valuable asset for scientists researching new materials at the nanoscale.

While existing methods often struggle with the compound layout problem, EXSCLAIM! employs a new approach to overcome this,” said lead author Eric Schwenker, a former Argonne graduate student. Our software is effective at identifying sharp image boundaries, and it excels in capturing irregular image arrangements.”

Example output from EXSCLAIM! (Image by Argonne National Laboratory.)

EXSCLAIM! has already demonstrated its effectiveness by constructing a self-labeled electron microscopy dataset of >280,000 nanostructure images. While initially developed around materials microscopy images, EXSCLAIM! is adaptable to any scientific field that produces high volumes of papers with images. The software thus promises to revolutionize the use of published scientific images across various disciplines.

Researchers now have a powerful image-mining tool to advance their understanding of complex visual information,” Chan said.

This research was supported by the DOE Office of Basic Energy Sciences, Laboratory Directed Research and Development funding from Argonne and a DOE Early Career Award. The team used high performance computing resources at Argonne’s Laboratory Computing Resource Center, Argonne’s Joint Laboratory for System Evaluation and the National Energy Research Scientific Computing Center, a DOE Office of Science user facility at DOE’s Lawrence Berkeley National Laboratory.

This research first appeared in Patterns. In addition to Chan and Schwenker, authors include Weixin Jiang, Trevor Spreadbury, Nicola Ferrier and Oliver Cossairt.

About Argonne’s Center for Nanoscale Materials
The Center for Nanoscale Materials is one of the five DOE Nanoscale Science Research Centers, premier national user facilities for interdisciplinary research at the nanoscale supported by the DOE Office of Science. Together the NSRCs comprise a suite of complementary facilities that provide researchers with state-of-the-art capabilities to fabricate, process, characterize and model nanoscale materials, and constitute the largest infrastructure investment of the National Nanotechnology Initiative. The NSRCs are located at DOE’s Argonne, Brookhaven, Lawrence Berkeley, Oak Ridge, Sandia and Los Alamos National Laboratories. For more information about the DOE NSRCs, please visit https://​sci​ence​.osti​.gov/​U​s​e​r​-​F​a​c​i​l​i​t​i​e​s​/​U​s​e​r​-​F​a​c​i​l​i​t​i​e​s​-​a​t​-​a​-​G​lance.

Argonne National Laboratory seeks solutions to pressing national problems in science and technology by conducting leading-edge basic and applied research in virtually every scientific discipline. Argonne is managed by UChicago Argonne, LLC for the U.S. Department of Energy’s Office of Science.

The U.S. Department of Energy’s Office of Science is the single largest supporter of basic research in the physical sciences in the United States and is working to address some of the most pressing challenges of our time. For more information, visit https://​ener​gy​.gov/​s​c​ience.