Semantic Vector Search using an HNSW Index for Twitter Data

TR Number

Date

2023-05-08

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Semantic Vector Search is a promising alternative to keyword search for information retrieval systems. It allows for semantic meaning to be extracted from multiple different types of documents and doesn't rely on the underlying tokens of text-only documents in order to perform searches. One of the outstanding issues with vector search is that the K Nearest Neighbors (KNN) algorithm is O(N) time complexity, which does not scale well to enterprise applications. Hierarchical Navigable Small World (HNSW) retrieval systems attempt to solve this by implementing an Approximate Nearest Neighbors (ANN) in O(log(N)). This project implements both KNN and HNSW backends for an example use case with Twitter data. We found that the retrieval accuracy of the Vector Search systems was superior to a traditional keyword search system, and that the retrieval time of an HNSW backend not only greatly improves upon KNN, but is even comparable with keyword search.

Description

Keywords

KNN, HNSW, Vector Search, Twitter

Citation