Day 2: Vector Stores

Module Overview

Session: Vector Stores

Vector search algorithms
Vector databases and operational considerations
Applications: RAG and retrieval systems

Session: Vector Stores

What You’ll Learn Today

By the end of this session, you will:

Understand vector search algorithms and trade-offs
Know how to choose and deploy vector databases
Build RAG applications with embeddings and vector stores

Vector Search

The Nearest Neighbor Problem

Finding similar vectors at scale

Exact search: Brute force, too slow for large datasets
Approximate search: Trade accuracy for speed
Scale: Billions of vectors, milliseconds response time

Once you have embeddings, you need to search through them. This is the nearest neighbor problem: given a query vector, find the most similar vectors in your dataset.

Exact search - computing distances to every vector - is too slow for large datasets. If you have a billion vectors, you’d need to compute a billion distances for every query. That’s not practical.

Approximate search algorithms trade a small amount of accuracy for massive speed improvements. Instead of finding the exact nearest neighbors, they find approximate nearest neighbors - vectors that are very close, if not the closest.

The scale requirements are extreme: billions of vectors, milliseconds response time. This is what makes vector search algorithms so important - they’re what make semantic search possible at scale.

Important Vector Search Algorithms

Speed vs. accuracy trade-offs

LSH: Locality Sensitive Hashing
HNSW: Hierarchical Navigable Small Worlds
ScaNN: Google’s optimized ANN algorithm

There are several important vector search algorithms, each with different trade-offs:

LSH - Locality Sensitive Hashing - uses hash functions that map similar vectors to the same buckets. It’s fast but can have lower recall.

HNSW - Hierarchical Navigable Small Worlds - builds a graph structure where you navigate from node to node to find nearest neighbors. It’s very fast and has good accuracy, which is why it’s widely used.

ScaNN - Scalable Nearest Neighbors - is Google’s optimized algorithm. It uses advanced quantization and pruning techniques to achieve very high speed while maintaining accuracy.

The choice depends on your requirements: how much accuracy can you sacrifice for speed? How much memory can you use? How often does your dataset change?

Most vector databases support multiple algorithms and let you choose based on your needs.

Vector Databases

Vector databases

Type	Optimized For	Storage	Best Use Case
OLTP	Writing & Single-Row Reads	Row-store	“User management, E-commerce orders”
OLAP	Reading & Aggregation	Column-store	“Business Intelligence, Data Warehousing”
Vector	Semantic Proximity	Embeddings	“AI Search, Recommendation Engines”

Vector databases are specialized systems designed for managing and querying embeddings. They’re not just regular databases that happen to store vectors - they’re optimized for vector operations.

They provide efficient storage for vectors, using compression and indexing techniques to minimize memory usage.

They provide fast approximate nearest neighbor search, using algorithms like HNSW or ScaNN that we just discussed.

And they provide database operations: create, read, update, delete, filtering, and metadata management. You can store not just vectors, but also metadata about each vector, and filter by metadata before or after vector search.

This combination makes vector databases essential for production applications. You can’t just use a regular database and expect it to work well for vector search at scale.

Popular Vector Databases

Options for different needs

Pinecone: Fully managed, easy to use
Weaviate: Open source, self-hosted
ChromaDB: Lightweight, Python-first
pgvector: PostgreSQL extension
Vertex AI: Google Cloud’s managed service

There are many vector databases to choose from, each with different strengths:

Pinecone is fully managed and very easy to use. It’s a good choice if you want to get started quickly and don’t want to manage infrastructure. But it’s a commercial service, so there’s a cost.

Weaviate is open source and can be self-hosted. It’s more flexible and you have more control, but you need to manage the infrastructure yourself.

ChromaDB is lightweight and Python-first. It’s good for development and smaller applications. It’s easy to integrate into Python applications.

pgvector is a PostgreSQL extension, so you can add vector search to an existing PostgreSQL database. This is great if you’re already using PostgreSQL and want to add vector capabilities.

Vertex AI Vector Search is Google Cloud’s managed service. It uses ScaNN and is optimized for very large scale.

The choice depends on your needs: scale, budget, infrastructure preferences, and feature requirements.

Operational Considerations

Production deployment factors

Scalability: Handle growing datasets
Latency: Response time requirements
Consistency: Update and synchronization
Cost: Storage and compute costs

When choosing and deploying a vector database, there are several operational considerations:

Scalability: Can it handle your dataset size? Can it scale as your data grows? Some databases are better at horizontal scaling than others.

Latency: What are your response time requirements? Vector search can be fast, but you need to make sure it meets your SLA. This depends on the algorithm, the dataset size, and the infrastructure.

Consistency: How do updates work? If you add new vectors, are they immediately searchable? How do you handle updates and deletions? Some databases have eventual consistency, which might be fine for your use case or might not.

Cost: What are the storage and compute costs? Managed services are convenient but can be expensive. Self-hosted gives you more control but you need to manage the infrastructure.

You also need to think about monitoring, backup, disaster recovery, and all the other operational concerns of any production system.

Applications

Retrieval Augmented Generation (RAG)

Combining retrieval with generation

Problem: LLMs have limited, static knowledge
Solution: Retrieve relevant context, then generate
Result: Up-to-date, source-attributed responses

In Detail:

Indexing: Embed documents, store in vector database
Query: Embed user query
Retrieval: Find similar documents
Generation: LLM generates response from context

Now let’s talk about the most important application: RAG - Retrieval Augmented Generation.

The problem is that LLMs have limited, static knowledge. They only know what they were trained on, and that knowledge is frozen at training time. They can’t access new information, and they can’t cite sources.

RAG solves this by combining retrieval with generation. First, you retrieve relevant documents from a knowledge base using vector search. Then, you pass those documents as context to the LLM, which generates a response based on the retrieved context.

The result is a system that can answer questions about up-to-date information, can cite sources, and can access information that wasn’t in the training data.

This is why embeddings and vector stores are so important today - they’re the foundation of RAG systems.

RAG Architecture

Detailed RAG architecture showing indexing, querying, retrieval, and generation steps

Let’s break down the RAG architecture:

First, indexing: you take your documents, embed them using an embedding model, and store them in a vector database along with metadata.

Second, querying: when a user asks a question, you embed the query using the same embedding model.

Third, retrieval: you search the vector database for documents similar to the query embedding. You might retrieve the top K documents, or all documents above a similarity threshold.

Fourth, generation: you pass the retrieved documents as context to an LLM, along with the user’s query. The LLM generates a response that’s grounded in the retrieved context.

This pipeline is what makes RAG work. Each step is important: good embeddings lead to good retrieval, which leads to good generation.

You can also add steps like re-ranking (using a more expensive model to re-rank the retrieved documents) or filtering (using metadata to filter before or after vector search).

Other Applications

Beyond RAG: Retrieval, Semantic text similarity, Classification, Clustering, Reranking

RAG is the most talked-about application, but embeddings and vector stores are used in many other ways:

Semantic search: Instead of keyword matching, find documents based on meaning. This is what powers modern search engines.

Recommendation systems: Find items similar to what a user likes, or users similar to a given user. This is how Netflix, Amazon, and Spotify make recommendations.

Deduplication: Find duplicate or near-duplicate content. This is useful for content moderation, data cleaning, and copyright detection.

Clustering: Group similar items together. This is useful for organizing content, discovering topics, and understanding data structure.

The common thread is that all of these applications benefit from being able to find similar items quickly. That’s what embeddings and vector stores enable.

Summary

Key Takeaways

Different types of embeddings for different data types
Evaluation is crucial for ensuring quality
Vector search algorithms trade accuracy for speed
Vector databases are essential for production systems
RAG combines retrieval with generation

Let’s summarize what we’ve covered today.

There are different types of embeddings for different data types: text, images, structured data, graphs. Each has its own techniques and use cases.

Evaluation is crucial. You can’t assume embeddings are good - you need to evaluate them for your specific use case.

Vector search algorithms like LSH, HNSW, and ScaNN trade a small amount of accuracy for massive speed improvements, making semantic search possible at scale.

Vector databases are specialized systems that provide efficient storage, fast search, and database operations. They’re essential for production applications.

RAG combines retrieval with generation, using embeddings and vector stores to give LLMs access to up-to-date, source-attributed information.

These concepts work together to enable modern AI applications. Understanding embeddings and vector stores is essential for building production-ready retrieval systems.

Thank you, and good luck building with embeddings and vector stores!