Summarizing Legal Depositions
dc.contributor.author | Chakravarty, Saurabh | en |
dc.contributor.committeechair | Fox, Edward A. | en |
dc.contributor.committeemember | Ashley, Kevin D. | en |
dc.contributor.committeemember | Reddy, Chandan K. | en |
dc.contributor.committeemember | Hsiao, Michael S. | en |
dc.contributor.committeemember | Karpatne, Anuj | en |
dc.contributor.department | Computer Science | en |
dc.date.accessioned | 2022-07-13T06:00:07Z | en |
dc.date.available | 2022-07-13T06:00:07Z | en |
dc.date.issued | 2021-01-18 | en |
dc.description.abstract | Documents like legal depositions are used by lawyers and paralegals to ascertain the facts pertaining to a case. These documents capture the conversation between a lawyer and a deponent, which is in the form of questions and answers. Applying current automatic summarization methods to these documents results in low-quality summaries. Though extensive research has been performed in the area of summarization, not all methods succeed in all domains. Accordingly, this research focuses on developing methods to generate high-quality summaries of depositions. As part of our work related to legal deposition summarization, we propose a solution in the form of a pipeline of components, each addressing a sub-problem; we argue that a pipeline based framework can be tuned to summarize documents from any domain. First, we developed methods to parse the depositions, accounting for different document formats. We were able to successfully parse both a proprietary and a public dataset with our methods. We next developed methods to anonymize the personal information present in the deposition documents; we achieve 95% accuracy on the anonymization using a random sampling based evaluation. Third, we developed an ontology to define dialog acts for the questions and answers present in legal depositions. Fourth, we developed classifiers based on this ontology and achieved F1-scores of 0.84 and 0.87 on the public and proprietary datasets, respectively. Fifth, we developed methods to transform a question-answer pair to a canonical/simple form. In particular, based on the dialog acts for the question and answer combination, we developed transformation methods using each of traditional NLP, and deep learning, techniques. We were able to achieve good scores on the ROUGE and semantic similarity metrics for most of the dialog act combinations. Sixth, we developed methods based on deep learning, heuristics, and machine translation to correct the transformed declarative sentences. The sentence correction improved the readability of the transformed sentences. Seventh, we developed a methodology to break a deposition into its topical aspects. An ontology for aspects was defined for legal depositions, and classifiers were developed that achieved an F1-score of 0.89. Eighth, we developed methods to segment the deposition into parts that have the same thematic context. The segments helped in augmenting candidate summary sentences with surrounding context, that leads to a more readable summary. Ninth, we developed a pipeline to integrate all of the methods, to generate summaries from the depositions. We were able to outperform the baseline and state of the art summarization methods in a majority of the cases based on the F1, Recall, and ROUGE-2 scores. The performance gains were statistically significant for all of the scores. The summaries generated by our system can be arranged based on the same thematic context or aspect and hence should be much easier to read and follow, compared to the baseline methods. As part of our future work, we will improve upon these methods. We will refine our methods to identify the important parts using additional documents related to a deposition. In addition, we will work to improve the compression ratio of the generated summaries by reducing the number of unimportant sentences. We will expand the training dataset to learn and tune the coverage of the aspects for various deponent types using empirical methods. Our system has demonstrated effectiveness in transforming a QA pair into a declarative sentence. Having such a capability could enable us to generate a narrative summary from the depositions, a first for legal depositions. We will also expand our dataset for evaluation to ensure that our methods are indeed generalizable, and that they work well when experts subjectively evaluate the quality of the deposition summaries. | en |
dc.description.abstractgeneral | Documents in the legal domain are of various types. One set of documents includes trial and deposition transcripts. These documents capture the proceedings of a trial or a deposition by note-taking, often over many hours. They contain conversation sentences that are spoken during the trial or deposition and involve multiple actors. One of the greatest challenges with these documents is that generally, they are long. This is a source of pain for attorneys and paralegals who work with the information contained in the documents. Text summarization techniques have been successfully used to compress a document and capture the salient parts from it. They have also been able to reduce redundancy in summary sentences while focusing on coherence and proper sentence formation. Summarizing trial and deposition transcripts would be immensely useful for law professionals, reducing the time to identify and disseminate salient information in case related documents, as well as reducing costs and trial preparation time. Processing the deposition documents using traditional text processing techniques is a challenge because of their form. Having the deposition conversations transformed into a suitable declarative form where they can be easily comprehended can pave the way for the usage of extractive and abstractive summarization methods. As part of our work, we identified the different discourse structures present in the deposition in the form of dialog acts. We developed methods based on those dialog acts to transform the deposition into a declarative form. We were able to achieve an accuracy of 87% on the dialog act classification. We also were able to transform the conversational question-answer (QA) pairs into declarative forms for 10 of the top-11 dialog act combinations. Our transformation methods performed better in 8 out of the 10 QA pair types, when compared to the baselines. We also developed methods to classify the deposition QA pairs according to their topical aspects. We generated summaries using aspects by defining the relative coverage for each aspect that should be present in a summary. Another set of methods developed can segment the depositions into parts that have the same thematic context. These segments aid augmenting the candidate summary sentences, to create a summary where information is surrounded by associated context. This makes the summary more readable and informative; we were able to significantly outperform the state of the art methods, based on our evaluations. | en |
dc.description.degree | Doctor of Philosophy | en |
dc.format.medium | ETD | en |
dc.identifier.other | vt_gsexam:29029 | en |
dc.identifier.uri | http://hdl.handle.net/10919/111223 | en |
dc.publisher | Virginia Tech | en |
dc.rights | In Copyright | en |
dc.rights.uri | http://rightsstatements.org/vocab/InC/1.0/ | en |
dc.subject | Natural Language Processing | en |
dc.subject | Deep Learning | en |
dc.subject | Legal Deposition | en |
dc.subject | Summarization | en |
dc.title | Summarizing Legal Depositions | en |
dc.type | Dissertation | en |
thesis.degree.discipline | Computer Science and Applications | en |
thesis.degree.grantor | Virginia Polytechnic Institute and State University | en |
thesis.degree.level | doctoral | en |
thesis.degree.name | Doctor of Philosophy | en |
Files
Original bundle
1 - 1 of 1