Efficient Web Archive Searching

dc.contributor.authorCheng, Mingen
dc.contributor.authorWu, Yijingen
dc.contributor.authorZhou, Xiaolinen
dc.contributor.authorLi, Jinyangen
dc.contributor.authorZhang, Linen
dc.date.accessioned2020-05-13T18:02:41Zen
dc.date.available2020-05-13T18:02:41Zen
dc.date.issued2020-05en
dc.description.abstractThe field of efficient web archive searching is at a turning point. In the early years of web archive searching, the organizations only use the URL as a key to search through the dataset, which is inefficient but acceptable. In recent years, as the volume of data in web archives has grown larger and larger, the ordinary searching methods have been gradually replaced by more efficient searching methods. This project will address the theoretical and methodological implications of choosing and running some suitable hashing algorithms locally, and eventually to improve the whole performance of web archive searching in time complexity. At the same time, our project introduces the design and implementation of various hashing algorithms to convert URLs to a sortable and shortened format, as well as demonstrates the corresponding searching efficiency improvement with benchmark results.en
dc.description.notesEfficientWebArchiveSearchingReport.pdf: The project report in PDF format. EfficientWebArchiveSearchingPresentation.pptx: The project presentation in Microsoft PowerPoint format. EfficientWebArchiveSearchingPresentation.pdf: The project presentation in PDF format. EfficientWebArchiveSearchingReportLaTex.zip: The LaTex project of the project report. EfficientWebArchiveSearchingSourceCode.zip: The project source code.en
dc.identifier.urihttp://hdl.handle.net/10919/98241en
dc.language.isoen_USen
dc.publisherVirginia Techen
dc.rightsCreative Commons Attribution-ShareAlike 4.0 Internationalen
dc.rights.urihttp://creativecommons.org/licenses/by-sa/4.0/en
dc.subjectInternet Archiveen
dc.subjectShort URLen
dc.subjectWeb Archiveen
dc.subjectsearching efficiencyen
dc.subjectWARC recordsen
dc.subjectDatabaseen
dc.subjectDigital Libraryen
dc.titleEfficient Web Archive Searchingen
dc.typePresentationen
dc.typeReporten

Files

Original bundle
Now showing 1 - 5 of 5
Loading...
Thumbnail Image
Name:
EfficientWebArchiveSearchingReport.pdf
Size:
4.98 MB
Format:
Adobe Portable Document Format
Name:
EfficientWebArchiveSearchingPresentation.pptx
Size:
13.21 MB
Format:
Microsoft Powerpoint XML
Loading...
Thumbnail Image
Name:
EfficientWebArchiveSearchingPresentation.pdf
Size:
1.15 MB
Format:
Adobe Portable Document Format
Name:
EfficientWebArchiveSearchingReportLaTex.zip
Size:
4.5 MB
Format:
Name:
EfficientWebArchiveSearchingSourceCode.zip
Size:
48.91 MB
Format:
License bundle
Now showing 1 - 1 of 1
Name:
license.txt
Size:
1.5 KB
Format:
Item-specific license agreed upon to submission
Description: