VTechWorks staff will be away for the Thanksgiving holiday starting Wednesday afternoon, Nov. 25, through Sunday Nov. 29, and will not be replying to requests during this time. Thank you for your patience.
Developing an improved focused crawler for the IDEAL project
The IDEAL (Integrated Digital Event Archive and Library) project currently has a general
purpose web crawler to find articles relevant to a set of URLs the user can provide. The
resulting articles are return based on frequency analysis of user provided keywords. The goal of
our project is to extend the web crawler to return articles related to user provided events and
other relevant information. By analyzing an article to identify key event components, such as
the date, location, and type of natural disaster, we can construct a tree representation of each
webpage. Next, we compute the tree edit distance between that tree, and the event tree
constructed from the user’s original input. With this information we can predict webpage
relevance with a higher certainty than frequency of keyword analysis provides.