Semantic Decomposition By Covering

TR Number
Journal Title
Journal ISSN
Volume Title
Virginia Tech

This thesis describes the implementation of a covering algorithm for semantic decomposition of sentences of technical patents. This research complements the ASPIN project that has a long term goal of providing an automated system for digital system synthesis from patents.

In order to develop a prototype of the system explained in a patent, a natural language processor (sentence-interpreter) is required. These systems typically attempt to interpret a sentence by syntactic analysis (parsing) followed by semantic analysis. Quite often, the technical narrative contains grammatical errors, incomplete sentences, anaphoric references and typological errors that can cause the grammatical parse to fail. In such situations, an alternate method that uses a repository of pre-compiled, simple sentences (called frames) to analyze the sentences of the patent can be a useful back up. By semantically decomposing the sentences of patents to a set of frames whose meanings are fully understood, the meaning of the patent sentences can be interpreted.

This thesis deals with the semantic decomposition of sentences using a branch and bound covering algorithm. The algorithm is implemented in C++. A number of experiments were conducted to evaluate the performance of this algorithm. The covering algorithm uses a standard branch and bound algorithm to semantically decompose sentences. The algorithm is fast, flexible and can provide good (100 % coverage for some sentences) coverage results. The system covered 67.68 % of the sentence tokens using 3459 frames in the repository. 54.25% of the frames identified by the system in covers for sentences, were found to be semantically correct. The experiments suggest that the performance of the system can be improved by increasing the number of frames in the repository.

branch and bound algorithm, Linguistic interpretation, Covering, Sentence interpretation