Automated Synthesis Procedure Generation in Heterogeneous Catalysis via Fine-Tuned Language Models

dc.contributor.authorDiaz Aquino, Raul Bernardoen
dc.contributor.committeechairXin, Hongliangen
dc.contributor.committeememberBai, Xianmingen
dc.contributor.committeememberAchenie, Luke E.en
dc.contributor.committeememberDeshmukh, Sanket A.en
dc.contributor.departmentChemical Engineeringen
dc.date.accessioned2025-05-23T08:01:13Zen
dc.date.available2025-05-23T08:01:13Zen
dc.date.issued2025-05-22en
dc.description.abstractThe exploration of catalytic materials and their synthesis routes traditionally demands extensive iterative experimentation and significant time investment. To overcome these constraints, we have developed an advanced extraction workflow integrating language models and multimodal processing techniques. Initially, textual data from over 9,000 scientific articles were analyzed to identify and extract detailed catalyst attributes such as chemical composition, structural motifs, morphology, crystal structure, size, shape, and support materials. Additionally, images and their associated captions were systematically captured from these publications, enriching the dataset through advanced vision- language processing methods. Subsequently, this structured information was refined through rigorous classification, synthesis query generation, and feasibility validation, resulting in a curated dataset comprising 1,632 high-quality catalyst synthesis procedures. Leveraging this dataset, we fine-tuned a large language model using parameter-efficient adaptation, significantly enhancing its capability to accurately predict detailed catalyst synthesis methods. Performance evaluation of our fine-tuned model revealed stable and effective convergence, demonstrating substantial improvements over baseline models with a ROUGE-1 score of 0.522, a ROUGE-L score of 0.290, and a BERTScore of 0.863. These results underscore the effectiveness of integrating multimodal data and validation methods, offering a powerful pathway to accelerate catalyst discovery, thereby reducing research timelines and resource demands.en
dc.description.abstractgeneralDigital technologies have significantly transformed the way we discover new catalysts and materials, starting a new era of scientific exploration. In our study, we introduce an innovative dual-model framework that leverages the strengths of diverse Large Language Models (LLMs) to accelerate research. We utilize an open source Large Language Model (LLM), a model highly adept at analyzing textual data, together with a Vision Language Model specifically designed for processing scientific images and their captions simultaneously. Through a comprehensive multi-step process, these models classify, extract, and structure valuable data on catalyst synthesis from over 9,000 peer-reviewed publications. The curated dataset is subsequently used to fine-tune another language model, through supervised training, enabling it to predict innovative synthesis pathways for catalysts and thus accelerate discovery. The collaborative operation between these models results in a robust extraction of synthesis mechanisms and offers new insights into catalytic processes. By integrating both textual and visual data, our approach not only accelerates the extraction of key scientific information but also increases the analysis of heterogeneous catalysis. This strategy aims to enhance catalyst synthesis efficiency, stimulate creative research directions, and contribute to significant advancements in materials science and chemical engineering. Overall, our findings shows how advanced digital tools can enhance catalyst research.en
dc.description.degreeMaster of Scienceen
dc.format.mediumETDen
dc.identifier.othervt_gsexam:43592en
dc.identifier.urihttps://hdl.handle.net/10919/134196en
dc.language.isoenen
dc.publisherVirginia Techen
dc.rightsIn Copyrighten
dc.rights.urihttp://rightsstatements.org/vocab/InC/1.0/en
dc.subjectcatalysisen
dc.titleAutomated Synthesis Procedure Generation in Heterogeneous Catalysis via Fine-Tuned Language Modelsen
dc.typeThesisen
thesis.degree.disciplineChemical Engineeringen
thesis.degree.grantorVirginia Polytechnic Institute and State Universityen
thesis.degree.levelmastersen
thesis.degree.nameMaster of Scienceen

Files

Original bundle
Now showing 1 - 3 of 3
Name:
Diaz_Aquino_RB_T_2025.pdf
Size:
10.96 MB
Format:
Adobe Portable Document Format
Name:
Diaz_Aquino_RB_T_2025_support_1.pdf
Size:
198.33 KB
Format:
Adobe Portable Document Format
Description:
Supporting documents
Name:
Diaz_Aquino_RB_T_2025_support_3.pdf
Size:
198.35 KB
Format:
Adobe Portable Document Format
Description:
Supporting documents

Collections