Semantic Vector Search using an HNSW Index for Twitter Data

dc.contributor.authorRaines, Nicholasen
dc.contributor.authorSamarth, Mehtaen
dc.contributor.authorLax, Kyadaen
dc.contributor.authorJustin, Vitaen
dc.contributor.authorJonah, Bishopen
dc.date.accessioned2023-05-09T14:24:10Zen
dc.date.available2023-05-09T14:24:10Zen
dc.date.issued2023-05-08en
dc.description.abstractSemantic Vector Search is a promising alternative to keyword search for information retrieval systems. It allows for semantic meaning to be extracted from multiple different types of documents and doesn't rely on the underlying tokens of text-only documents in order to perform searches. One of the outstanding issues with vector search is that the K Nearest Neighbors (KNN) algorithm is O(N) time complexity, which does not scale well to enterprise applications. Hierarchical Navigable Small World (HNSW) retrieval systems attempt to solve this by implementing an Approximate Nearest Neighbors (ANN) in O(log(N)). This project implements both KNN and HNSW backends for an example use case with Twitter data. We found that the retrieval accuracy of the Vector Search systems was superior to a traditional keyword search system, and that the retrieval time of an HNSW backend not only greatly improves upon KNN, but is even comparable with keyword search.en
dc.identifier.urihttp://hdl.handle.net/10919/114987en
dc.language.isoenen
dc.rightsAttribution-NonCommercial 4.0 Internationalen
dc.rights.urihttp://creativecommons.org/licenses/by-nc/4.0/en
dc.subjectKNNen
dc.subjectHNSWen
dc.subjectVector Searchen
dc.subjectTwitteren
dc.titleSemantic Vector Search using an HNSW Index for Twitter Dataen
dc.typeMaster's projecten

Files

Original bundle
Now showing 1 - 3 of 3
Loading...
Thumbnail Image
Name:
Final Paper-1.pdf
Size:
210.6 KB
Format:
Adobe Portable Document Format
Loading...
Thumbnail Image
Name:
FinalPresentationCapstone.mp4
Size:
117.13 MB
Format:
MP4 Container format for video files
Name:
gitlab_link.txt
Size:
49 B
Format:
Plain Text
License bundle
Now showing 1 - 1 of 1
Name:
license.txt
Size:
1.5 KB
Format:
Item-specific license agreed upon to submission
Description: