Statistical Methods for Performance Evaluation of Machine Learning and Artificial Intelligence Models

Song, Xinyi

Statistical Methods for Performance Evaluation of Machine Learning and Artificial Intelligence Models

dc.contributor.author	Song, Xinyi	en
dc.contributor.committeechair	Hong, Yili	en
dc.contributor.committeemember	Freeman, Laura June	en
dc.contributor.committeemember	Deng, Xinwei	en
dc.contributor.committeemember	Xing, Xin	en
dc.contributor.department	Statistics	en
dc.date.accessioned	2025-06-04T08:04:30Z	en
dc.date.available	2025-06-04T08:04:30Z	en
dc.date.issued	2025-06-03	en
dc.description.abstractgeneral	This dissertation explores strategies to improve the reliability and effectiveness of artificial intelligence (AI) and machine learning (ML) in practical data analysis tasks. As AI tech- nologies become increasingly capable and widely adopted in real-world applications—from detecting defect severity in solar panels to automatically generating analytical code—they often encounter challenges in complex scenarios, such as imbalanced datasets with rare out- comes. This research focuses on developing and refining tools and methodologies that enhance decision-making and enable more accurate evaluation of AI models, particularly under such challenging conditions. The first project focuses on using EL images to detect defects in solar panels. It compares several machine learning and deep learning models to see how well they identify severity of defectiveness. The results provide useful guidance for choosing the right prediction methods and evaluation tools in solar panel research. Building upon insights from the first project, the second project tackles a significant chal- lenge: although machine learning and deep learning models generally perform well, they struggle to accurately detect less frequent defect classes, such as "mildly defective" and "mod- erately defective" solar panels. To overcome this issue, we introduce customized loss functions alongwithmini-batchstratifiedsampling, aimingtoimprovepredictionaccuracyfortheserare defect classes. The proposed methods are evaluated using both a simulated dataset derived from Fashion MNIST—which mirrors the class proportion of the EL image dataset—and real EL image datasets, utilizing VGG-19 and ResNet-50 architectures. To ensure reliability, the analysis is repeated 50 times on the simulated dataset and 30 times on the EL image dataset. The third project examines how well AI tools—specifically LLMs like ChatGPT and Llama—can generate SAS code for automated statistical analysis. Although the code often appears correct, these tools sometimes fall short in handling more complex tasks or producing code that runs properly. This project evaluates the quality of the AI-generated code based on human expert assessment, focusing on code quality, correctness, executability and output. To support this evaluation, the last part of this dissertation introduces a new open-source dataset called StatLLM. This dataset provides examples of statistical tasks, code written by AI, and expert ratings of the results. StatLLM helps researchers and developers understand where AI tools perform well, wheretheyneedimprovementwhenitcomestowritingstatistical code. In summary, this dissertation advances our ability to evaluate and improve AI tools in data science. It helps ensure these technologies are not only powerful but also trustworthy and practical in solving real-world problems.	en
dc.description.degree	Doctor of Philosophy	en
dc.format.medium	ETD	en
dc.identifier.other	vt_gsexam:44183	en
dc.identifier.uri	https://hdl.handle.net/10919/135034	en
dc.language.iso	en	en
dc.publisher	Virginia Tech	en
dc.rights	In Copyright	en
dc.rights.uri	http://rightsstatements.org/vocab/InC/1.0/	en
dc.subject	AI-reliability	en
dc.subject	Image analysis	en
dc.subject	Imbalanced data	en
dc.subject	Evaluation of LLMs	en
dc.subject	Neural networks	en
dc.subject	NLP metrics.	en
dc.title	Statistical Methods for Performance Evaluation of Machine Learning and Artificial Intelligence Models	en
dc.type	Dissertation	en
thesis.degree.discipline	Statistics	en
thesis.degree.grantor	Virginia Polytechnic Institute and State University	en
thesis.degree.level	doctoral	en
thesis.degree.name	Doctor of Philosophy	en

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Song_X_D_2025.pdf
Size:: 2.38 MB
Format:: Adobe Portable Document Format

Download

Collections

Doctoral Dissertations