Steps Toward Open-ended Reasoning and Discovery with Language Models

Shojaee, Seyedeh Parshin

Steps Toward Open-ended Reasoning and Discovery with Language Models

Files

Shojaee_S_D_2026.pdf (24.45 MB)

Downloads: 86

Date

2026-06-16

Authors

Shojaee, Seyedeh Parshin

Publisher

Virginia Tech

Abstract

Scientific discovery -- the process of distilling nature's complexity into compact, transferable knowledge -- has historically relied on human creativity, expertise, and intuition. Recent advances in large language models (LLMs), trained on vast amounts of scientific literature, raise a fundamental question: can these systems move beyond recovering existing knowledge to meaningfully participate in discovery? This thesis investigates this question across four research directions, progressively developing the capabilities necessary for open-ended discovery. First, we show that effective discovery systems require both broad scientific knowledge and systematic search. We introduce LLM-SR, a framework for scientific model discovery that combines LLM knowledge with evolutionary search, where LLMs guide the mutation and crossover of candidate hypotheses. Our results show that LLM-SR substantially outperforms state-of-the-art baselines. The second study examines limitations in current evaluations of LLM-driven discovery. We show that many benchmarks overestimate discovery capabilities because tasks are contaminated by training data. To address this, we introduce LLM-SRBench, a multi-domain benchmark designed with synthetic novel components to test models beyond memorization in the task of scientific model discovery. Results on LLM-SRBench show significant performance drop across existing methods, highlighting the importance of rigorous evaluation protocols for discovery. The third study investigates the role of adaptation. While humans continuously learn and adjust when facing unfamiliar environments, most existing LLM-based systems rely primarily on their pretrained knowledge during the search process. Motivated by recent advances in test-time training and reinforcement learning, we introduce DecAEvolve, a framework that enables models to adapt dynamically during evolutionary search with feedback obtained from the environment. We show that DecAEvolve substantially improves performance on out-of-distribution settings, establishing adaptation as a core requirement for discovery. Finally, the last study examines the role of exploration and diversity. We find that current LLM-based discovery systems often converge to narrow regions of the hypothesis space, limiting creativity and hindering stronger solutions in open-ended tasks. To address this, we introduce EvoDiverse, a framework that promotes diversity during evolutionary search. Across multiple scientific discovery tasks, EvoDiverse enables broader exploration and uncovers more promising regions of the search space, highlighting the importance of systematic exploration in open-ended discovery. Taken together, this thesis suggests that LLMs can actually become effective engines of discovery when equipped with principled search, rigorous evaluation, continuous adaptation, and diversity-preserving exploration -- four properties that we believe together define the path towards open-ended reasoning and discovery with language models.

Keywords

large language models, reasoning, open-endedness, scientific discovery, evolutionary search, adaptation, exploration, test-time training

Persistent link

https://hdl.handle.net/10919/143435

Collections

Doctoral Dissertations

Full item page

Steps Toward Open-ended Reasoning and Discovery with Language Models

Files

TR Number

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Keywords

Citation

Persistent link

Collections