An O(n log n) Algorithm for Finding Minimal Perfect Hash Functions

Fox, Edward A.; Heath, Lenwood S.; Chen, Qi-Fan

An O(n log n) Algorithm for Finding Minimal Perfect Hash Functions

Files

TR-89-10.pdf (2.01 MB)

Downloads: 392

TR Number

TR-89-10

Date

1989

Authors

Fox, Edward A.

Heath, Lenwood S.

Chen, Qi-Fan

Publisher

Department of Computer Science, Virginia Polytechnic Institute & State University

Abstract

We describe the first practical algorithm for finding minimal perfect hash functions that can be used to access very large databases. This method extends earlier work wherein an O(n3) algorithm was devised, building upon prior work by Sager that described an O(n4) algorithm. Our new O(n log n) expected time algorithm makes use of three key insights: applying randomness whereever possible, ordering our search for hash functions based on the degree of the vertices in a graph that represents word dependencies, and viewing hash value assignment in terms of adding circular patterns of related words to a partially filled disk. While ultimately applicable to a wide variety of data and file access needs, this algorithm has already proven useful in aiding our work in improving performance of CD-ROM systems and our construction of a Large External Network Database (LEND) for semantic networks and hypertext/hypermedia collections. Virginia Disc One includes a demonstration of a minimal perfect hash function running on a PC to access a 130,198 word list on that CD-ROM, and several other microcomputer, minicomputer, and parallel processor versions and applications of our algorithm have also been developed. Tests with a French word list of 420,878 entries and a library catalog key set with over 1.2 million keys have shown that our methods work with very large databases.

Persistent link

http://hdl.handle.net/10919/19510

Collections

Computer Science Technical Reports

Full item page