Many applications depend on the ability to deliver precise and relevant search results. Although the full-text search capabilities of traditional relational databases are sufficient in some situations, these databases can fall short in extracting semantic meaning from text or searching through less-structured data. In this blog post, we’ll explore how you can address these limitations using DigitalOcean-managed OpenSearch and a collection of techniques called K-Nearest Neighbor vector search (K-NN). K-NN makes OpenSearch a powerful and flexible solution for various search and analytics applications.
Understanding K-NN Vector Search
What is K-NN Vector Search?
Unlike traditional search methods that rely on keyword matching, K-NN vector search involves representing each record in a dataset as a vector that encapsulates the attributes of the record. Machine learning models are often used to embed data into a vector representation. When a query is made, the search engine computes the distance between the query vector and the data vectors and returns the nearest neighbors based on a predefined distance metric, such as Euclidean distance or cosine similarity.
Why Use OpenSearch for K-NN Vector Search?
Introduction to OpenSearch
OpenSearch is a highly scalable open-source search and analytics engine. It builds upon the strengths of Elasticsearch, providing robust features for full-text search, log analytics, and more. With the introduction of vector search capabilities, OpenSearch extends its utility to more advanced use cases such as natural language processing, recommendation systems, and image retrieval.
Benefits of Using OpenSearch for Vector Search
-
Scalability: OpenSearch can handle large volumes of data and queries efficiently. Using approximate nearest neighbor algorithms, OpenSearch can provide relevant search results much faster and with a lower memory footprint.
-
Flexibility: It supports various types of data and search functionalities, making it suitable for diverse applications.
-
Community and Support: Being open-source, it benefits from a vibrant community and regular updates.
Setting Up OpenSearch for K-NN Vector Search
Installing OpenSearch
To get started, you need to install OpenSearch. Here’s a basic command to pull and run the latest version of the OpenSearch Docker image:
docker pull opensearchproject/opensearch:latest
docker run -d --name opensearch -p 9200:9200 -e "discovery.type=single-node" -e "OPENSEARCH_INITIAL_ADMIN_PASSWORD=<your-strong-password>” opensearchproject/opensearch:latest
Note: You need to set an initial admin password when you try to run the opensearch docker container. It should be a minimum of 8 characters and must contain at least one uppercase letter, one lowercase letter, one digit, and one special character that is strong.
Alternatively, DigitalOcean supports Managed OpenSearch, which makes configuring and managing OpenSearch clusters a breeze.
Configuring OpenSearch for Vector Search
After installing OpenSearch, the next step is to enable the K-NN plugin. On self-managed clusters, this involves modifying the cluster’s configuration file. On DigitalOcean Managed Opensearch The K-NN plugin is enabled by default and no additional configuration is required.
Implementing K-NN Vector Search
To use K-NN vector search, you must first create an index with vector fields. You can do so by navigating to the Opensearch development console at https://${CLUSTER_HOST}/app/dev_tools#/console
and submitting the following request. Alternatively, you can send these commands as HTTP requests to https://${CLUSTER_HOST}:9200
.
PUT /my_vector_index
{
"mappings": {
"properties": {
"my_vector": {
"type": "K-NN_vector",
"dimension": 128
}
}
}
}
With this request you’ve created an index, my_vector_index
, which you can use to store and query data using 128-dimension embeddings. You can now begin adding documents along with their vector representations to the index with the following request.
PUT /my_vector_index/_doc/1
{
"my_vector": [0.1, 0.2, ... , 0.128],
"description": "Sample document"
}
Finally, to perform a K-NN search over these documents, you can use the following query.
POST /my_vector_index/_search
{
"size": 5,
"query": {
"K-NN": {
"my_vector": {
"vector": [0.1, 0.2, ... , 0.128],
"k": 5
}
}
}
}
Use Cases and Applications
Let’s cover a few end-to-end applications that could make use of Opensearch’s K-NN capabilities.
-
Customer Support Chatbot: Vector search is often used to find semantically similar texts. A chatbot service might use a machine-learning model to embed an incoming query (e.g. “How can I reset my password?”) into a vector and then use K-NN vector search to find similar queries in the knowledge base, such as “I forgot my password, how do I reset it?”. The chatbot can use this information to provide the user a more helpful response based on these similar queries.
-
E-commerce Platform: K-NN vector search can enhance recommendation systems by finding items similar to a user’s preferences based on vector representations. For example, a user who buys a book from an online store might be recommended other books by the same author, books from the same genre, or even books that other users with similar preferences have bought. In this example, the vector representation of a book may include attributes like author, genre, ratings, and keywords from reviews.
-
Fashion Retailer: By converting images into vectors using deep learning models, K-NN vector search can be used to retrieve visually similar images from a database. A user may upload a photo of a red dress. The system processes the image to create a vector representing the dress’s visual features. Using K-NN vector search, the platform retrieves and displays similar dresses in various shades of red, with similar cuts and designs, helping the user find exactly what they’re looking for.
Challenges and Considerations using K-NN with OpenSearch
1. Vector Dimensionality High-dimensional vectors can lead to increased computational complexity. It’s important to balance vector dimensions with performance requirements. Luckily, OpenSearch has multiple K-NN methods with their own performance characteristics. While each method aims to return vectors with the minimal distance to an incoming vector, some can be tuned to prioritize memory use, response time or accuracy.
2. Data Normalization Ensuring that data is normalized and consistent is crucial for the accuracy of K-NN search results.
3. Performance Tuning Optimizing OpenSearch settings and hardware resources is essential for handling large-scale vector searches efficiently. See this article for more details on performance tuning.
Conclusion
K-NN vector search opens up new possibilities for delivering highly relevant search results across various domains. By leveraging OpenSearch’s powerful capabilities, developers can implement advanced search functionalities with relative ease. Whether it’s for recommendation systems, image retrieval, or NLP applications, K-NN vector search with OpenSearch is a valuable tool in the search technology landscape.