Automated Synthesis Procedure Generation in Heterogeneous Catalysis via Fine-Tuned Language Models

Diaz Aquino, Raul Bernardo

Automated Synthesis Procedure Generation in Heterogeneous Catalysis via Fine-Tuned Language Models

Files

Diaz_Aquino_RB_T_2025.pdf (10.96 MB)

Downloads:

Supporting documents (198.33 KB)

Downloads:

Supporting documents (198.35 KB)

Downloads:

Date

2025-05-22

Authors

Diaz Aquino, Raul Bernardo

Publisher

Virginia Tech

Abstract

The exploration of catalytic materials and their synthesis routes traditionally demands extensive iterative experimentation and significant time investment. To overcome these constraints, we have developed an advanced extraction workflow integrating language models and multimodal processing techniques. Initially, textual data from over 9,000 scientific articles were analyzed to identify and extract detailed catalyst attributes such as chemical composition, structural motifs, morphology, crystal structure, size, shape, and support materials. Additionally, images and their associated captions were systematically captured from these publications, enriching the dataset through advanced vision- language processing methods. Subsequently, this structured information was refined through rigorous classification, synthesis query generation, and feasibility validation, resulting in a curated dataset comprising 1,632 high-quality catalyst synthesis procedures. Leveraging this dataset, we fine-tuned a large language model using parameter-efficient adaptation, significantly enhancing its capability to accurately predict detailed catalyst synthesis methods. Performance evaluation of our fine-tuned model revealed stable and effective convergence, demonstrating substantial improvements over baseline models with a ROUGE-1 score of 0.522, a ROUGE-L score of 0.290, and a BERTScore of 0.863. These results underscore the effectiveness of integrating multimodal data and validation methods, offering a powerful pathway to accelerate catalyst discovery, thereby reducing research timelines and resource demands.

Keywords

catalysis

Persistent link

https://hdl.handle.net/10919/134196

Collections

Masters Theses

Full item page

Automated Synthesis Procedure Generation in Heterogeneous Catalysis via Fine-Tuned Language Models

Files

TR Number

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Keywords

Citation

Persistent link

Collections