Efficient Web Archive Searching
dc.contributor.author | Cheng, Ming | en |
dc.contributor.author | Wu, Yijing | en |
dc.contributor.author | Zhou, Xiaolin | en |
dc.contributor.author | Li, Jinyang | en |
dc.contributor.author | Zhang, Lin | en |
dc.date.accessioned | 2020-05-13T18:02:41Z | en |
dc.date.available | 2020-05-13T18:02:41Z | en |
dc.date.issued | 2020-05 | en |
dc.description.abstract | The field of efficient web archive searching is at a turning point. In the early years of web archive searching, the organizations only use the URL as a key to search through the dataset, which is inefficient but acceptable. In recent years, as the volume of data in web archives has grown larger and larger, the ordinary searching methods have been gradually replaced by more efficient searching methods. This project will address the theoretical and methodological implications of choosing and running some suitable hashing algorithms locally, and eventually to improve the whole performance of web archive searching in time complexity. At the same time, our project introduces the design and implementation of various hashing algorithms to convert URLs to a sortable and shortened format, as well as demonstrates the corresponding searching efficiency improvement with benchmark results. | en |
dc.description.notes | EfficientWebArchiveSearchingReport.pdf: The project report in PDF format. EfficientWebArchiveSearchingPresentation.pptx: The project presentation in Microsoft PowerPoint format. EfficientWebArchiveSearchingPresentation.pdf: The project presentation in PDF format. EfficientWebArchiveSearchingReportLaTex.zip: The LaTex project of the project report. EfficientWebArchiveSearchingSourceCode.zip: The project source code. | en |
dc.identifier.uri | http://hdl.handle.net/10919/98241 | en |
dc.language.iso | en_US | en |
dc.publisher | Virginia Tech | en |
dc.rights | Creative Commons Attribution-ShareAlike 4.0 International | en |
dc.rights.uri | http://creativecommons.org/licenses/by-sa/4.0/ | en |
dc.subject | Internet Archive | en |
dc.subject | Short URL | en |
dc.subject | Web Archive | en |
dc.subject | searching efficiency | en |
dc.subject | WARC records | en |
dc.subject | Database | en |
dc.subject | Digital Library | en |
dc.title | Efficient Web Archive Searching | en |
dc.type | Presentation | en |
dc.type | Report | en |
Files
Original bundle
1 - 5 of 5
Loading...
- Name:
- EfficientWebArchiveSearchingReport.pdf
- Size:
- 4.98 MB
- Format:
- Adobe Portable Document Format
- Name:
- EfficientWebArchiveSearchingPresentation.pptx
- Size:
- 13.21 MB
- Format:
- Microsoft Powerpoint XML
Loading...
- Name:
- EfficientWebArchiveSearchingPresentation.pdf
- Size:
- 1.15 MB
- Format:
- Adobe Portable Document Format
License bundle
1 - 1 of 1
- Name:
- license.txt
- Size:
- 1.5 KB
- Format:
- Item-specific license agreed upon to submission
- Description: