Statistical Methods for Performance Evaluation of Machine Learning and Artificial Intelligence Models

dc.contributor.authorSong, Xinyien
dc.contributor.committeechairHong, Yilien
dc.contributor.committeememberFreeman, Laura Juneen
dc.contributor.committeememberDeng, Xinweien
dc.contributor.committeememberXing, Xinen
dc.contributor.departmentStatisticsen
dc.date.accessioned2025-06-04T08:04:30Zen
dc.date.available2025-06-04T08:04:30Zen
dc.date.issued2025-06-03en
dc.description.abstractgeneralThis dissertation explores strategies to improve the reliability and effectiveness of artificial intelligence (AI) and machine learning (ML) in practical data analysis tasks. As AI tech- nologies become increasingly capable and widely adopted in real-world applications—from detecting defect severity in solar panels to automatically generating analytical code—they often encounter challenges in complex scenarios, such as imbalanced datasets with rare out- comes. This research focuses on developing and refining tools and methodologies that enhance decision-making and enable more accurate evaluation of AI models, particularly under such challenging conditions. The first project focuses on using EL images to detect defects in solar panels. It compares several machine learning and deep learning models to see how well they identify severity of defectiveness. The results provide useful guidance for choosing the right prediction methods and evaluation tools in solar panel research. Building upon insights from the first project, the second project tackles a significant chal- lenge: although machine learning and deep learning models generally perform well, they struggle to accurately detect less frequent defect classes, such as "mildly defective" and "mod- erately defective" solar panels. To overcome this issue, we introduce customized loss functions alongwithmini-batchstratifiedsampling, aimingtoimprovepredictionaccuracyfortheserare defect classes. The proposed methods are evaluated using both a simulated dataset derived from Fashion MNIST—which mirrors the class proportion of the EL image dataset—and real EL image datasets, utilizing VGG-19 and ResNet-50 architectures. To ensure reliability, the analysis is repeated 50 times on the simulated dataset and 30 times on the EL image dataset. The third project examines how well AI tools—specifically LLMs like ChatGPT and Llama—can generate SAS code for automated statistical analysis. Although the code often appears correct, these tools sometimes fall short in handling more complex tasks or producing code that runs properly. This project evaluates the quality of the AI-generated code based on human expert assessment, focusing on code quality, correctness, executability and output. To support this evaluation, the last part of this dissertation introduces a new open-source dataset called StatLLM. This dataset provides examples of statistical tasks, code written by AI, and expert ratings of the results. StatLLM helps researchers and developers understand where AI tools perform well, wheretheyneedimprovementwhenitcomestowritingstatistical code. In summary, this dissertation advances our ability to evaluate and improve AI tools in data science. It helps ensure these technologies are not only powerful but also trustworthy and practical in solving real-world problems.en
dc.description.degreeDoctor of Philosophyen
dc.format.mediumETDen
dc.identifier.othervt_gsexam:44183en
dc.identifier.urihttps://hdl.handle.net/10919/135034en
dc.language.isoenen
dc.publisherVirginia Techen
dc.rightsIn Copyrighten
dc.rights.urihttp://rightsstatements.org/vocab/InC/1.0/en
dc.subjectAI-reliabilityen
dc.subjectImage analysisen
dc.subjectImbalanced dataen
dc.subjectEvaluation of LLMsen
dc.subjectNeural networksen
dc.subjectNLP metrics.en
dc.titleStatistical Methods for Performance Evaluation of Machine Learning and Artificial Intelligence Modelsen
dc.typeDissertationen
thesis.degree.disciplineStatisticsen
thesis.degree.grantorVirginia Polytechnic Institute and State Universityen
thesis.degree.leveldoctoralen
thesis.degree.nameDoctor of Philosophyen

Files

Original bundle
Now showing 1 - 1 of 1
Name:
Song_X_D_2025.pdf
Size:
2.38 MB
Format:
Adobe Portable Document Format