Browsing by Author "Ganesh, Prasad Krishnamurthi"
Now showing 1 - 3 of 3
Results Per Page
Sort Options
- CrawlingFox, Edward A.; Khandeparker, Ashwin S. (2012-11-28)This module covers the basic concepts of Web crawling, policies, techniques and how these can be applied to Digital Libraries.
- Focused CrawlingFarag, Mohamed Magdy Gharib; Khan, Mohammed Saquib Akmal; Mishra, Gaurav; Ganesh, Prasad Krishnamurthi; Collins, Wil; Dickerson, Will (Virginia Tech, 2012-12-11)Finding information on the WWW is a difficult and challenging task because of the extremely large volume of content in the WWW. Search engines can be used to facilitate this task, but it is still difficult to cover all the webpages on the WWW and also to provide good results for all types of users and in all contexts. The focused crawling concept has been developed to overcome these difficulties. There are several approaches for developing a focused crawler. Classification-based approaches use classifiers in relevance estimation. Semantic-based approaches use ontologies for domain or topic representation and in relevance estimation. Link analysis approaches use text and link structure information in relevance estimation. The main differences between these approaches are: what policy is taken for crawling, how to represent the topic of interest, and how to estimate the relevance of webpages visited during crawling. We present in this report a modular architecture for focused crawling. We separated the design of the main components of focused crawling into modules to facilitate the exchange and integration of different modules. We present here a classification-based focused crawler prototype based on our modular architecture. We also describe how it can help with a particular event-oriented crawl. Note: Mr. Collins and Mr. Dickerson, in CS4624 in the spring of 2013, extended the prior work by the other co-authors from CS5604, from the fall of 2012.
- Focused CrawlingFarag, Mohamed Magdy Gharib; Khan, Mohammed Saquib Akmal; Mishra, Gaurav; Ganesh, Prasad Krishnamurthi (2012-12-11)Finding information on WWW is difficult and challenging task because of the extremely large volume of the WWW. Search engine can be used to facilitate this task, but it is still difficult to cover all the webpages on the WWW and also to provide good results for all types of users and in all contexts. Focused crawling concept has been developed to overcome these difficulties. There are several approaches for developing a focused crawler. Classification-based approaches use classifiers in relevance estimation. Semantic-based approaches use ontologies for domain or topic representation and in relevance estimation. Link analysis approaches use text and link structure information in relevance estimation. The main differences between these approaches are: what policy is taken for crawling, how to represent the topic of interest, and how to estimate the relevance of webpages visited during crawling. We present in this report a modular architecture for focused crawling. We separated the design of the main components of focused crawling into modules to facilitate the exchange and integration of different modules. We will present here a classification-based focused crawler prototype based on our modular architecture.