Can an LLM find its way around a Spreadsheet?

Lee, Cho Ting

Can an LLM find its way around a Spreadsheet?

Files

Lee_C_T_2024.pdf (3.1 MB)

Downloads: 914

Date

2024-06-05

Authors

Lee, Cho Ting

Publisher

Virginia Tech

Abstract

Spreadsheets are routinely used in business and scientific contexts, and one of the most vexing challenges data analysts face is performing data cleaning prior to analysis and evaluation. The ad-hoc and arbitrary nature of data cleaning problems, such as typos, inconsistent formatting, missing values, and a lack of standardization, often creates the need for highly specialized pipelines. We ask whether an LLM can find its way around a spreadsheet and how to support end-users in taking their free-form data processing requests to fruition. Just like RAG retrieves context to answer users' queries, we demonstrate how we can retrieve elements from a code library to compose data processing pipelines. Through comprehensive experiments, we demonstrate the quality of our system and how it is able to continuously augment its vocabulary by saving new codes and pipelines back to the code library for future retrieval.

Keywords

LLMs, data cleaning, end-user programming

Persistent link

https://hdl.handle.net/10919/119304

Collections

Masters Theses

Full item page

Can an LLM find its way around a Spreadsheet?

Files

TR Number

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Keywords

Citation

Persistent link

Collections