Trust at Your Own Peril: A Mixed Methods Exploration of the Ability of Large Language Models to Generate Expert-Like Systems Engineering Artifacts and a Characterization of Failure Modes

Topcu, Taylan G.; Husain, Mohammed; Ofsa, Max; Wach, Paul

Trust at Your Own Peril: A Mixed Methods Exploration of the Ability of Large Language Models to Generate Expert-Like Systems Engineering Artifacts and a Characterization of Failure Modes

dc.contributor.author	Topcu, Taylan G.	en
dc.contributor.author	Husain, Mohammed	en
dc.contributor.author	Ofsa, Max	en
dc.contributor.author	Wach, Paul	en
dc.date.accessioned	2025-03-26T11:59:45Z	en
dc.date.available	2025-03-26T11:59:45Z	en
dc.date.issued	2025-02-21	en
dc.description.abstract	Multi-purpose large language models (LLMs), a subset of generative artificial intelligence (AI), have recently made significant progress. While expectations for LLMs to assist systems engineering (SE) tasks are paramount; the interdisciplinary and complex nature of systems, along with the need to synthesize deep-domain knowledge and operational context, raise questions regarding the efficacy of LLMs to generate SE artifacts, particularly given that they are trained using data that is broadly available on the internet. To that end, we present results from an empirical exploration, where a human expert-generated SE artifact was taken as a benchmark, parsed, and fed into various LLMs through prompt engineering to generate segments of typical SE artifacts. This procedure was applied without any fine-tuning or calibration to document baseline LLM performance. We then adopted a two-fold mixed-methods approach to compare AI generated artifacts against the benchmark. First, we quantitatively compare the artifacts using natural language processing algorithms and find that when prompted carefully, the state-of-the-art algorithms cannot differentiate AI-generated artifacts from the human-expert benchmark. Second, we conduct a qualitative deep dive to investigate how they differ in terms of quality. We document that while the two-material appear very similar, AI generated artifacts exhibit serious failure modes that could be difficult to detect. We characterize these as: premature requirements definition, unsubstantiated numerical estimates, and propensity to overspecify. We contend that this study tells a cautionary tale about why the SE community must be more cautious adopting AI suggested feedback, at least when generated by multi-purpose LLMs.	en
dc.description.version	Accepted version	en
dc.format.extent	22 page(s)	en
dc.format.mimetype	application/pdf	en
dc.identifier.doi	https://doi.org/10.1002/sys.21810	en
dc.identifier.eissn	1520-6858	en
dc.identifier.issn	1098-1241	en
dc.identifier.orcid	Topcu, Taylan [0000-0002-0110-312X]	en
dc.identifier.uri	https://hdl.handle.net/10919/125082	en
dc.language.iso	en	en
dc.publisher	Wiley	en
dc.rights	In Copyright	en
dc.rights.uri	http://rightsstatements.org/vocab/InC/1.0/	en
dc.subject	generative artificial intelligence (AI)	en
dc.subject	human-AI collaboration	en
dc.subject	large language models (LLMs)	en
dc.subject	problem formulation	en
dc.subject	systems engineering	en
dc.title	Trust at Your Own Peril: A Mixed Methods Exploration of the Ability of Large Language Models to Generate Expert-Like Systems Engineering Artifacts and a Characterization of Failure Modes	en
dc.title.serial	Systems Engineering	en
dc.type	Article - Refereed	en
dc.type.dcmitype	Text	en
dc.type.other	Article	en
dc.type.other	Early Access	en
dc.type.other	Journal	en
pubs.organisational-group	Virginia Tech	en
pubs.organisational-group	Virginia Tech/Engineering	en
pubs.organisational-group	Virginia Tech/Engineering/Industrial and Systems Engineering	en
pubs.organisational-group	Virginia Tech/Library	en
pubs.organisational-group	Virginia Tech/All T&R Faculty	en
pubs.organisational-group	Virginia Tech/Engineering/COE T&R Faculty	en
pubs.organisational-group	Virginia Tech/Graduate students	en
pubs.organisational-group	Virginia Tech/Graduate students/Doctoral students	en

Files

Original bundle

Now showing 1 - 1 of 1

Name:: 2502.09690v1.pdf
Size:: 1014.04 KB
Format:: Adobe Portable Document Format
Description:: Accepted version

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 1.5 KB
Format:: Plain Text
Description:

Download

Collections

All Faculty Deposits
Scholarly Works, Industrial and Systems Engineering
Scholarly Works, University Libraries
Scholarly Works, Virginia Tech National Security Institute