New Jul 31, 2024

Enhancing Search Capabilities with K-NN Vector Search in OpenSearch

Multi Author Blogs All from The DigitalOcean Blog View Enhancing Search Capabilities with K-NN Vector Search in OpenSearch on digitalocean.com

Many applications depend on the ability to deliver precise and relevant search results. Although the full-text search capabilities of traditional relational databases are sufficient in some situations, these databases can fall short in extracting semantic meaning from text or searching through less-structured data. In this blog post, we’ll explore how you can address these limitations using DigitalOcean-managed OpenSearch and a collection of techniques called K-Nearest Neighbor vector search (K-NN). K-NN makes OpenSearch a powerful and flexible solution for various search and analytics applications.

Unlike traditional search methods that rely on keyword matching, K-NN vector search involves representing each record in a dataset as a vector that encapsulates the attributes of the record. Machine learning models are often used to embed data into a vector representation. When a query is made, the search engine computes the distance between the query vector and the data vectors and returns the nearest neighbors based on a predefined distance metric, such as Euclidean distance or cosine similarity.

Introduction to OpenSearch

OpenSearch is a highly scalable open-source search and analytics engine. It builds upon the strengths of Elasticsearch, providing robust features for full-text search, log analytics, and more. With the introduction of vector search capabilities, OpenSearch extends its utility to more advanced use cases such as natural language processing, recommendation systems, and image retrieval.

Installing OpenSearch

To get started, you need to install OpenSearch. Here’s a basic command to pull and run the latest version of the OpenSearch Docker image:

docker pull opensearchproject/opensearch:latest

docker run -d --name opensearch -p 9200:9200 -e "discovery.type=single-node" -e "OPENSEARCH_INITIAL_ADMIN_PASSWORD=<your-strong-password>” opensearchproject/opensearch:latest

Note: You need to set an initial admin password when you try to run the opensearch docker container. It should be a minimum of 8 characters and must contain at least one uppercase letter, one lowercase letter, one digit, and one special character that is strong.

Alternatively, DigitalOcean supports Managed OpenSearch, which makes configuring and managing OpenSearch clusters a breeze.

After installing OpenSearch, the next step is to enable the K-NN plugin. On self-managed clusters, this involves modifying the cluster’s configuration file. On DigitalOcean Managed Opensearch The K-NN plugin is enabled by default and no additional configuration is required.

To use K-NN vector search, you must first create an index with vector fields. You can do so by navigating to the Opensearch development console at https://${CLUSTER_HOST}/app/dev_tools#/console and submitting the following request. Alternatively, you can send these commands as HTTP requests to https://${CLUSTER_HOST}:9200.

PUT /my_vector_index

{

"mappings": {

"properties": {

"my_vector": {

"type": "K-NN_vector",

"dimension": 128

}

}

}

}

With this request you’ve created an index, my_vector_index, which you can use to store and query data using 128-dimension embeddings. You can now begin adding documents along with their vector representations to the index with the following request.

PUT /my_vector_index/_doc/1

{

"my_vector": [0.1, 0.2, ... , 0.128],

"description": "Sample document"

}

Finally, to perform a K-NN search over these documents, you can use the following query.

POST /my_vector_index/_search

{

"size": 5,

"query": {

"K-NN": {

"my_vector": {

"vector": [0.1, 0.2, ... , 0.128],

"k": 5

}

}

}

}

Use Cases and Applications

Let’s cover a few end-to-end applications that could make use of Opensearch’s K-NN capabilities.

Challenges and Considerations using K-NN with OpenSearch

1. Vector Dimensionality High-dimensional vectors can lead to increased computational complexity. It’s important to balance vector dimensions with performance requirements. Luckily, OpenSearch has multiple K-NN methods with their own performance characteristics. While each method aims to return vectors with the minimal distance to an incoming vector, some can be tuned to prioritize memory use, response time or accuracy.

2. Data Normalization Ensuring that data is normalized and consistent is crucial for the accuracy of K-NN search results.

3. Performance Tuning Optimizing OpenSearch settings and hardware resources is essential for handling large-scale vector searches efficiently. See this article for more details on performance tuning.

Conclusion

K-NN vector search opens up new possibilities for delivering highly relevant search results across various domains. By leveraging OpenSearch’s powerful capabilities, developers can implement advanced search functionalities with relative ease. Whether it’s for recommendation systems, image retrieval, or NLP applications, K-NN vector search with OpenSearch is a valuable tool in the search technology landscape.

Scroll to top