Extend your brand profile by curating daily news.

AI Tools Extract Buried Experimental Data from Scientific Papers to Accelerate Materials Discovery

By Advos

TL;DR

NIMS researchers developed LLM tools to accelerate materials database construction, giving scientists a competitive edge in discovering new functional materials faster than traditional methods.

The Starrydata project uses LLMs to extract structured data from scientific papers, automating the conversion of complex information into organized databases for materials property analysis.

By digitizing and sharing experimental data globally, this research accelerates materials development for sustainable technologies, potentially improving energy efficiency and environmental solutions worldwide.

Researchers are using AI like ChatGPT to mine millions of scientific papers, transforming untapped experimental data into searchable databases that reveal hidden patterns in materials science.

Found this article helpful?

Share it with your network and spread the knowledge!

AI Tools Extract Buried Experimental Data from Scientific Papers to Accelerate Materials Discovery

Materials scientists at Japan's National Institute for Materials Science (NIMS) have developed artificial intelligence tools that automatically extract experimental data from scientific papers, potentially accelerating the discovery and development of new functional materials used in technologies ranging from smartphones to automobiles. The research, published in the journal Science and Technology of Advanced Materials: Methods, addresses a critical bottleneck in materials science where valuable experimental data remains buried in millions of published papers.

Dr. Yukari Katsura, senior researcher at NIMS and leader of the project, explained that while scientific papers contain valuable experimental data collected by researchers over decades, much of this information remains untapped because it exists in unstructured formats within PDF documents. "Graphs in the millions of papers published to date contain valuable experimental data collected by past researchers, and much of it remains untapped," Katsura noted. Her team's work builds on the Starrydata project, launched in 2015, which previously relied on manual data collection from papers.

The new tools leverage large language models (LLMs) like ChatGPT to automate the extraction process. "We found that by specifying a data structure and giving instructions to an LLM, we can accurately and comprehensively extract information about figures, tables, and samples from the text of paper PDFs across a wide range of fields," Katsura said. The research details are available in the published paper at https://doi.org/10.1080/27660400.2025.2590811.

The first tool, Starrydata Auto-Suggestion for Sample Information, is already integrated into the Starrydata2 web system. When users paste text from a paper's abstract or experimental methods section, the system sends it to OpenAI's GPT via API and automatically displays candidate entries for data fields designed for specific materials domains. The second tool, Starrydata Auto-Summary GPT, deconstructs entire open-access paper PDFs and summarizes all descriptions of figures, tables, and samples as structured data in JSON format, which can be viewed as easy-to-read tables in web browsers.

Katsura emphasized the transformative potential of converting paper-based information into structured digital data. "A paper is a logical structure assembled to convey the author's claims, but by deconstructing it and returning it to the form of experimental data, other researchers can also use it for their own research," she explained. The team is currently focusing on open-access papers due to publisher restrictions on AI use with PDFs, and they use semi-automated tools for extracting data from graph images where LLMs face difficulties.

The implications for materials development are significant. By creating large-scale databases of experimental data, researchers could gain inspiration through comprehensive data analysis and enable property predictions using machine learning. Currently, Starrydata has built databases for specific fields like thermoelectric materials and magnets, but as an open dataset, it's beginning to be utilized by leading researchers worldwide. The journal where the research was published, Science and Technology of Advanced Materials: Methods, focuses on methods for accelerating materials development and can be found at https://www.tandfonline.com/STAM-M.

This development matters because functional materials underpin modern technologies, and their development has traditionally relied heavily on researcher intuition and trial-and-error approaches. The ability to systematically extract and analyze experimental data from existing research could dramatically reduce development timelines and costs while enabling more data-driven discovery processes. The NIMS team aims to establish paper data collection as a recognized form of research within the scientific community, potentially transforming how materials research is conducted and shared globally.

Curated from NewMediaWire

blockchain registration record for this content
Advos

Advos

@advos