Crawling on the World Wide Web

dc.contributor.authorWang, Lien
dc.contributor.authorFox, Edward A.en
dc.contributor.departmentComputer Scienceen
dc.date.accessioned2013-06-19T14:36:27Zen
dc.date.available2013-06-19T14:36:27Zen
dc.date.issued2002en
dc.description.abstractAs the World Wide Web grows rapidly, a web search engine is needed for people to search through the Web. The crawler is an important module of a web search engine. The quality of a crawler directly affects the searching quality of such web search engines. Given some seed URLs, the crawler should retrieve the web pages of those URLs, parse the HTML files, add new URLs into its buffer and go back to the first phase of this cycle. The crawler also can retrieve some other information from the HTML files as it is parsing them to get the new URLs. This paper describes the design, implementation, and some considerations of a new crawler programmed as an learning exercise and for possible use for experimental studies.en
dc.format.mimetypeapplication/pdfen
dc.identifierhttp://eprints.cs.vt.edu/archive/00000572/en
dc.identifier.sourceurlhttp://eprints.cs.vt.edu/archive/00000572/01/LiWangReportAccept.pdfen
dc.identifier.trnumberTR-02-10en
dc.identifier.urihttp://hdl.handle.net/10919/20052en
dc.language.isoenen
dc.publisherDepartment of Computer Science, Virginia Polytechnic Institute & State Universityen
dc.rightsIn Copyrighten
dc.rights.urihttp://rightsstatements.org/vocab/InC/1.0/en
dc.subjectInformation retrievalen
dc.titleCrawling on the World Wide Weben
dc.typeTechnical reporten
dc.type.dcmitypeTexten

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
LiWangReportAccept.pdf
Size:
261.99 KB
Format:
Adobe Portable Document Format