LLM-Assisted Detecting and Redacting Confidential Information for Government Information Disclosure

Hasegawa, Masaki2025-05-312025-05-312025-05-30vt_gsexam:43183https://hdl.handle.net/10919/134960Generative AI, especially large language models (LLMs), has advanced rapidly, with real-world applications growing steadily. However, the use of generative AI in the public sector has lagged behind the private sector. This paper focuses on the "Governmental Information Disclosure Process," which is vital in democratic countries' administrative systems. Many developed nations require government agencies to disclose information to citizens, excluding confidential data such as personal information. Although agencies must confirm the presence of confidential information and redact or mask it before release, this process is still manual, creating significant room for improvement. Additionally, since the information to be masked is defined in natural language, such as legal text, interpreting documents' contexts to determine what qualifies as confidential is resource-intensive. In this context, LLMs, capable of inferring context and general knowledge, could efficiently identify parts of documents that require masking. This paper first reviews the existing literature on sensitive or confidential information detection using LLMs, clarifying the use cases and the category of information identified in both the private and public sectors. Then, as a case study, we create sample documents modeled after Japanese administrative texts and compare the detecting and masking results performed by testers with administrative experience, following legal requirements, with those generated by an LLM. This study contributes by proposing end-to-end approach where LLMs directly generate masked text with dynamically determined granularity. This resolves the fundamental trade-off in previous methods by allowing the model to decide appropriate masking units (characters, words, or phrases) based on contextual requirements rather than predetermined structural units.ETDenIn CopyrightLarge Language ModelPublic SectorGovernment Process OptimizationLLM-Assisted Detecting and Redacting Confidential Information for Government Information DisclosureThesis