Summarizing Legal Depositions

Chakravarty, Saurabh

Summarizing Legal Depositions

dc.contributor.author	Chakravarty, Saurabh	en
dc.contributor.committeechair	Fox, Edward A.	en
dc.contributor.committeemember	Ashley, Kevin D.	en
dc.contributor.committeemember	Reddy, Chandan K.	en
dc.contributor.committeemember	Hsiao, Michael S.	en
dc.contributor.committeemember	Karpatne, Anuj	en
dc.contributor.department	Computer Science	en
dc.date.accessioned	2022-07-13T06:00:07Z	en
dc.date.available	2022-07-13T06:00:07Z	en
dc.date.issued	2021-01-18	en
dc.description.abstract	Documents like legal depositions are used by lawyers and paralegals to ascertain the facts pertaining to a case. These documents capture the conversation between a lawyer and a deponent, which is in the form of questions and answers. Applying current automatic summarization methods to these documents results in low-quality summaries. Though extensive research has been performed in the area of summarization, not all methods succeed in all domains. Accordingly, this research focuses on developing methods to generate high-quality summaries of depositions. As part of our work related to legal deposition summarization, we propose a solution in the form of a pipeline of components, each addressing a sub-problem; we argue that a pipeline based framework can be tuned to summarize documents from any domain. First, we developed methods to parse the depositions, accounting for different document formats. We were able to successfully parse both a proprietary and a public dataset with our methods. We next developed methods to anonymize the personal information present in the deposition documents; we achieve 95% accuracy on the anonymization using a random sampling based evaluation. Third, we developed an ontology to define dialog acts for the questions and answers present in legal depositions. Fourth, we developed classifiers based on this ontology and achieved F1-scores of 0.84 and 0.87 on the public and proprietary datasets, respectively. Fifth, we developed methods to transform a question-answer pair to a canonical/simple form. In particular, based on the dialog acts for the question and answer combination, we developed transformation methods using each of traditional NLP, and deep learning, techniques. We were able to achieve good scores on the ROUGE and semantic similarity metrics for most of the dialog act combinations. Sixth, we developed methods based on deep learning, heuristics, and machine translation to correct the transformed declarative sentences. The sentence correction improved the readability of the transformed sentences. Seventh, we developed a methodology to break a deposition into its topical aspects. An ontology for aspects was defined for legal depositions, and classifiers were developed that achieved an F1-score of 0.89. Eighth, we developed methods to segment the deposition into parts that have the same thematic context. The segments helped in augmenting candidate summary sentences with surrounding context, that leads to a more readable summary. Ninth, we developed a pipeline to integrate all of the methods, to generate summaries from the depositions. We were able to outperform the baseline and state of the art summarization methods in a majority of the cases based on the F1, Recall, and ROUGE-2 scores. The performance gains were statistically significant for all of the scores. The summaries generated by our system can be arranged based on the same thematic context or aspect and hence should be much easier to read and follow, compared to the baseline methods. As part of our future work, we will improve upon these methods. We will refine our methods to identify the important parts using additional documents related to a deposition. In addition, we will work to improve the compression ratio of the generated summaries by reducing the number of unimportant sentences. We will expand the training dataset to learn and tune the coverage of the aspects for various deponent types using empirical methods. Our system has demonstrated effectiveness in transforming a QA pair into a declarative sentence. Having such a capability could enable us to generate a narrative summary from the depositions, a first for legal depositions. We will also expand our dataset for evaluation to ensure that our methods are indeed generalizable, and that they work well when experts subjectively evaluate the quality of the deposition summaries.	en
dc.description.abstractgeneral	Documents in the legal domain are of various types. One set of documents includes trial and deposition transcripts. These documents capture the proceedings of a trial or a deposition by note-taking, often over many hours. They contain conversation sentences that are spoken during the trial or deposition and involve multiple actors. One of the greatest challenges with these documents is that generally, they are long. This is a source of pain for attorneys and paralegals who work with the information contained in the documents. Text summarization techniques have been successfully used to compress a document and capture the salient parts from it. They have also been able to reduce redundancy in summary sentences while focusing on coherence and proper sentence formation. Summarizing trial and deposition transcripts would be immensely useful for law professionals, reducing the time to identify and disseminate salient information in case related documents, as well as reducing costs and trial preparation time. Processing the deposition documents using traditional text processing techniques is a challenge because of their form. Having the deposition conversations transformed into a suitable declarative form where they can be easily comprehended can pave the way for the usage of extractive and abstractive summarization methods. As part of our work, we identified the different discourse structures present in the deposition in the form of dialog acts. We developed methods based on those dialog acts to transform the deposition into a declarative form. We were able to achieve an accuracy of 87% on the dialog act classification. We also were able to transform the conversational question-answer (QA) pairs into declarative forms for 10 of the top-11 dialog act combinations. Our transformation methods performed better in 8 out of the 10 QA pair types, when compared to the baselines. We also developed methods to classify the deposition QA pairs according to their topical aspects. We generated summaries using aspects by defining the relative coverage for each aspect that should be present in a summary. Another set of methods developed can segment the depositions into parts that have the same thematic context. These segments aid augmenting the candidate summary sentences, to create a summary where information is surrounded by associated context. This makes the summary more readable and informative; we were able to significantly outperform the state of the art methods, based on our evaluations.	en
dc.description.degree	Doctor of Philosophy	en
dc.format.medium	ETD	en
dc.identifier.other	vt_gsexam:29029	en
dc.identifier.uri	http://hdl.handle.net/10919/111223	en
dc.publisher	Virginia Tech	en
dc.rights	In Copyright	en
dc.rights.uri	http://rightsstatements.org/vocab/InC/1.0/	en
dc.subject	Natural Language Processing	en
dc.subject	Deep Learning	en
dc.subject	Legal Deposition	en
dc.subject	Summarization	en
dc.title	Summarizing Legal Depositions	en
dc.type	Dissertation	en
thesis.degree.discipline	Computer Science and Applications	en
thesis.degree.grantor	Virginia Polytechnic Institute and State University	en
thesis.degree.level	doctoral	en
thesis.degree.name	Doctor of Philosophy	en

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Chakravarty_S_D_2021.pdf
Size:: 3.94 MB
Format:: Adobe Portable Document Format

Download

Collections

Doctoral Dissertations