Developing an improved focused crawler for the IDEAL project

View/ Open
Downloads: 120
Downloads: 102
Downloads: 29
Downloads: 49
Downloads: 228
Downloads: 545
Downloads: 158
Downloads: 49
Downloads: 244
Date
2014-05-09Author
Bonnefond, Ward
Menzel, Chris
Morris, Zack
Patel, Suhas
Ritchie, Tyler
Tedesco, Marcus
Zheng, Franklin
Metadata
Show full item recordAbstract
The IDEAL (Integrated Digital Event Archive and Library) project currently has a general
purpose web crawler to find articles relevant to a set of URLs the user can provide. The
resulting articles are return based on frequency analysis of user provided keywords. The goal of
our project is to extend the web crawler to return articles related to user provided events and
other relevant information. By analyzing an article to identify key event components, such as
the date, location, and type of natural disaster, we can construct a tree representation of each
webpage. Next, we compute the tree edit distance between that tree, and the event tree
constructed from the user’s original input. With this information we can predict webpage
relevance with a higher certainty than frequency of keyword analysis provides.
Collections
License files: