Adapting Transformers for Structured Data Domains

Tipirneni, Sai Sindhura2025-05-312025-05-312025-05-30vt_gsexam:43868https://hdl.handle.net/10919/134958This research aims to enhance the adaptability and effectiveness of Transformers in structured data domains beyond their traditional use in natural language processing (NLP). We revisit key elements of the transformer framework - including input representations, attention formulations, auxiliary tasks, prediction layers and loss functions - and adapt them to better suit the structure and semantics of specific data domains. Focusing on four structured domains - (i) sparse and irregularly sampled multivariate time-series, (ii) general-purpose programming languages, (iii) short text clustering, and (iv) natural language interfaces to relational databases —this dissertation proposes novel domain-specific Transformer based models. For the first domain, we present STraTS, a self-supervised Transformer that represents data as observation triplets and adds forecasting as an auxiliary task to improve mortality prediction on multivariate clinical time-series. An interpretable version of this model is also proposed to enhance its utility for critical applications like healthcare. In the programming domain, we build StructCoder, which is an encoder-decoder Transformer designed to effectively capture source code structures and concurrently handle auxiliary tasks associated with predictions on target code structures. For short text clustering, we develop CACTUS, which is a Transformer for context-aware supervised clustering. This model incorporates efficient inter-entity interactions through sparse attention, employs a specialized loss function tailored for supervised clustering, and integrates a novel self-supervised clustering task to enhance performance on the primary clustering task. Finally, we present RAFT-S3, a framework for reasoning-aware finetuning of small language models (SLMs) on the text-to-SQL task. RAFT-S3 collects synthetic text-to-SQL data with diverse schemas using large language models (LLMs), along with intermediate reasoning traces which are incorporated into the two-stage finetuning process. We conduct extensive experiments to compare proposed methods to competitive baselines in each domain, conduct ablation studies, and discuss qualitative results. This research contributes to an improved understanding of Transformer architectures and provides opportunities for more applications across a spectrum of structured data domains.ETDenIn Copyrighttransformerattentionsupervised learningself-supervisionclinical time-seriescode generationtext clusteringLLMssynthetic dataAdapting Transformers for Structured Data DomainsDissertation