Browsing by Author "Chen, Qi-Fan"
Now showing 1 - 11 of 11
Results Per Page
Sort Options
- Building the CODER Lexicon: The Collins English Dictionary and its Adverb DefinitionsFox, Edward A.; Wohlwend, Robert C.; Sheldon, Phyllis R.; Chen, Qi-Fan; France, Robert K. (1986-10-01)The CODER (COmposite Document Expert/extended/effective Retrieval) project is an investigation of the applicability of artificial intelligence techniques to the information retrieval task of analyzing, storing, and retrieving heterogeneous collections of "composite documents. "In order to support some of the processing desired, and to allow experimentation in information retrieval and natural language processing, a lexicon was constructed from the machine readable Collins Dictionary of the English Language. After giving background, motivation, and a survey of related work, the Collins lexicon is discussed. Following is a description of the conversion process, the format of the resulting Prolog database, and characteristics of the dictionary and relations. To illustrate what is present and to explain how it relates to the files produced from Webster's Seventh New Collegiate Dictionary, a number of comparative charts are given. Finally, a grammar for adverb definitions is presented, together with a description of defining formula that usually indicate the type of the adverb. Ultimately it is hoped that definitions for adverbs and other words will be parsed so that the relational lexicon being constructed will include many additional relationships and other knowledge about words and their usage.
- Building the CODER Lexicon: The Collins English Dictionary and Its Adverb DefinitionsFox, Edward A.; Wohlwend, Robert C.; Sheldon, Phyllis R.; Chen, Qi-Fan; France, Robert K. (Department of Computer Science, Virginia Polytechnic Institute & State University, 1986-10-01)The CODER (COmposite Document Expert/extended/effective Retrieval) project is an investigation of the applicability of artificial intelligence techniques to the information retrieval task of analyzing, storing, and retrieving heterogeneous collections of "composite documents." In order to support some of the processing desired, and to allow experimentation in information retrieval and natural language processing, a lexicon was constructed from the machine readable Collins dictionary of the English Language. After giving background, motivation, and a survey of related work, the Collins lexicon is discussed. Following is a description of the conversion process, the format of the resulting Prolog database, and characteristics of the dictionary and relations. To illustrate what is present and to explain how it relates to the files produced from Webster's Seventh New Collegiate Dictionary, a number of comparative charts are given. Finally, a summary of adverb definitions is presented, together with a description of defining formula that usually indicate the type of the adverb. Ultimately it is hoped that definitions for adverbs and other words will be parsed so that the relational lexicon being constructed will include many additional relationships and other knowledge about words and their usage.
- A Frame-Based Language in Information RetrievalWeaver, Marybeth T.; France, Robert K.; Chen, Qi-Fan; Fox, Edward A. (Department of Computer Science, Virginia Polytechnic Institute & State University, 1988)With the advent of the information society, many researchers are turning to artificial intelligence techniques to provide effective retrieval over large bodies of textual information. Yet any AI system requires a formalism for encoding its knowledge about the objects of its knowledge, the world, and the intelligence that it is designed to manifest. In the CODER system, the mission of which is to provide an environment for experiments in applying AI to information retrieval, that formalism is provided by a single well defined factual representation language. Designed as a flexible tool for retrieval research, the CODER factual representation language is a hybrid AI language involving a system of strong types for attribute values, a frame system, and a system of Prolog-like relational structures. Inheritance is enforced throughout, and the semantics of type subsumption and object matching formally defined. A collection of type and object managers called the knowledge administration complex implements this common language for storing knowledge and communicating it within the system. Of the three types of knowledge structures in the language, the frame facility has proven most useful in the retrieval domain. The factual representation language is implemented in Prolog as a set of predicates accessible to all system modules. Each level of knowledge representation (elementary primitives, frames, and relations) has a type manager; the frame and relation levels also have object managers. Storage of complete knowledge objects (statements in the factual representation language) is supported by a system or external knowledge bases. One paper discusses the frame construct itself, the implementation of the knowledge administration complex and external knowledge bases. and the use of the construct in retrieval research. The paper closes with a discussion of the utility of the language in experiments.
- Integrated Access to a Large Medical Literature DatabaseFox, Edward A.; Koushik, Prabhakar M.; Chen, Qi-Fan; France, Robert K. (Department of Computer Science, Virginia Polytechnic Institute & State University, 1991)Project INCARD (INtegrated CARdiology Database) has adapted the CODER (COmposite Document Expert/effective/extended Retrieval) system and LEND (Large External Network object oriented Database) to provide integrated access to a large collection of bibliographic citations, a full text document in cardiology, and a large thesaurus of medical terms. CODER is a distributed expert-based information system that incorporates techniques from artificial intelligence, information retrieval, and human-computer interaction to support effective access to information and knowledge bases. LEND is an object-oriented database which incorporates techniques from information retrieval and database systems to support complex objects, hypertext/hypermedia and semantic network operations efficiently with very large sets of data. LEND stores the CED lexicon, MeSH thesaurus, MEDLARS bibliographics records on cardiology, and the syllabus for the topic Abnormal Human Biology (Cardiology Section) taught at Columbia University. Together, CODER/LEND allow efficient and flexible access to all of this information while supporting rapid "intelligent" searching and hypertext-style browsing by both novice and expert users. This report gives statistics on the collections, illustrations of the system's use, and details on the overall architecture and design for Project INCARD.
- LEND and Faster Algorithms for Constructing Minimal Perfect Hash FunctionsFox, Edward A.; Chen, Qi-Fan; Heath, Lenwood S. (Department of Computer Science, Virginia Polytechnic Institute & State University, 1992)The Large External object-oriented Network Database (LEND) system has been developed to provide efficient access to large collections of primitive or multimedia objects, semantic networks, thesauri, hypertexts, and information retrieval collections. An overview of LEND is given, emphasizing aspects that yield efficient operation. In particular, a new algorithm is described for quickly finding minimal perfect hash functions whose specification space is very close to the theoretical lower bound, i.e., around 2 bits per key. The various stages of processing are detailed, along with analytical and empirical results, including timing for a set of over 3.8 million keys that was processed on a NeXTstation in about 6 hours.
- A More Cost Effective Algorithm for Finding Perfect Hash FunctionsFox, Edward A.; Chen, Qi-Fan; Heath, Lenwood S.; Datta, Sanjeev (Department of Computer Science, Virginia Polytechnic Institute & State University, 1988-09-01)As the use of knowledge-based systems increases, there will be a growing need for efficient artificial intelligence systems and methods to access large lexicons. In the Composite Document Expert/extended/effective Retrieval (CODER) system we have, in order to provide rapid access to data items on CD-ROM's and to terms in a lexicon built from machine readable dictionaries, investigated the construction of perfect hashing functions. We have considered algorithms reported earlier in the literature, have made numerous enhancements to them, have developed new algorithms, and here report on some of our results. This paper covers an $O(n^3)$ algorithm that has been applied to building hashing functions for a collection of 69806 words on a CD-ROM. Most recently we have developed a much better algorithm and have succeeded in finding a perfect hash function for a set of 5000 words taken from the Collins English Dictionary.
- A New View on What Limits TCP/IP Throughput in Local Area NetworksAbrams, Marc; Chen, Qi-Fan (Department of Computer Science, Virginia Polytechnic Institute & State University, 1991)This paper presents experimental results on what limits local area network throughput at the application program level for two popular transport protocols, TCP and UDP, using two application program interfaces, Berkeley sockets and System V transport layer interface. The sensitivity of application-level performance to the choice of host computer speed and background load on the host are also studied. Two sets of measurements are discussed. The first contains macroscopic measurement of throughput on 68020, 68030, 68040, 80386, SPARC, and MIPS R2000 and R3000 based computers over a single Ethernet subnet. The second presents a detailed timing analysis using a hardware monitor of a TCP/IP implementation for PC architecture computers. Previous studies implicate memory copying, checksumming, and the operating system interface as the major overheads in TCP/IP, rather than the time required to execute the protocol itself. This study indicates that these factors are secondary when the sender and receiver are closely matched in speed; rather the primary bottleneck is the TCP flow control mechanism. TCP flow control becomes closer to optimal as the degree of speed mismatch increases. We draw conclusions on why window mechanisms should be augmented by rate based flow control in the new generation of high data rate networks.
- An O(n log n) Algorithm for Finding Minimal Perfect Hash FunctionsFox, Edward A.; Heath, Lenwood S.; Chen, Qi-Fan (Department of Computer Science, Virginia Polytechnic Institute & State University, 1989)We describe the first practical algorithm for finding minimal perfect hash functions that can be used to access very large databases. This method extends earlier work wherein an O(n3) algorithm was devised, building upon prior work by Sager that described an O(n4) algorithm. Our new O(n log n) expected time algorithm makes use of three key insights: applying randomness whereever possible, ordering our search for hash functions based on the degree of the vertices in a graph that represents word dependencies, and viewing hash value assignment in terms of adding circular patterns of related words to a partially filled disk. While ultimately applicable to a wide variety of data and file access needs, this algorithm has already proven useful in aiding our work in improving performance of CD-ROM systems and our construction of a Large External Network Database (LEND) for semantic networks and hypertext/hypermedia collections. Virginia Disc One includes a demonstration of a minimal perfect hash function running on a PC to access a 130,198 word list on that CD-ROM, and several other microcomputer, minicomputer, and parallel processor versions and applications of our algorithm have also been developed. Tests with a French word list of 420,878 entries and a library catalog key set with over 1.2 million keys have shown that our methods work with very large databases.
- Order Preserving Minimal Perfect Hash Functions and Information RetrievalFox, Edward A.; Chen, Qi-Fan; Daoud, Amjad M. (Department of Computer Science, Virginia Polytechnic Institute & State University, 1991)Rapid access to information is essential for a wide variety of retrieval systems and applications. Hashing has long been used when the fastest possible direct search is desired, but is generally not appropriate when sequential or range searches are also required. This paper describes a hashing method, developed for collections that are relatively static, that supports both direct and sequential access. The algorithms described give hash functions that are optimal in terms of time and hash table space utilization, and that preserve any a priori ordering desired. Furthermore, the resulting order preserving minimal perfect hash functions (OPMPHFs) can be found using time and space that are linear in the number of keys involved and so are close to optimal.
- Practical Minimal Perfect Hashing Functions for Large DatabasesFox, Edward A.; Heath, Lenwood S.; Chen, Qi-Fan; Daoud, Amjad M. (Department of Computer Science, Virginia Polytechnic Institute & State University, 1990)We describe the first practical algorithms for finding minimal perfect hash functions that have been used to access very large databases (i.e., having over 1 million keys). This method extends earlier work wherein an 0(n-cubed) algorithm was devised, building upon prior work by Sager that described an 0(n-to the fourth) algorithm. Our first linear expected time algorithm makes use of three key insights: applying randomness whereever possible, ordering our search for hash functions based on the degree of the vertices in a graph that represents word dependencies, and viewing hash value assignment in terms of adding circular patterns of related words to a partially filled disk. Our second algorithm builds functions that are slightly more complex, but does not build a word dependency graph and so approaches the theoretical lower bound on function specification size. While ultimately applicable to a wide variety of data and file access needs, these algorithms have already proven useful in aiding our work in improving the performance of CD-ROM systems and our construction of a Large External Network Database (LEND) for semantic networks and hypertext/hypermedia collections. Virginia Disc One includes a demonstration of a minimal perfect hash function running on a PC to access a 130,198 word list on that CD-ROM. Several other microcomputer, minicomputer, and parallel processor versions and applications of our algorithm have also been developed. Tests including those wiht a French word list of 420,878 entries and a library catalog key set with over 3.8 million keys have shown that our methods work with very large databases.
- A Query Language for Information GraphsBetrabet, Sangita C.; Fox, Edward A.; Chen, Qi-Fan (Department of Computer Science, Virginia Polytechnic Institute & State University, 1993)In this paper we propose a database model and query language for information retrieval systems. The information graph model and Graph Object Access Language (GOAL) allow integrated handling of data, information and knowledge along with a variety of specialized objects (e.g., for geographic or multimedia information systems). There is flexible support for hyperbases, thesauri, lexicons, and both relational and object-oriented types of DBMS applications. In this paper we give the first published account of our new, powerful model and language (GOAL), illustrate their use, and compare them with related work.