VectorStore, quickstart¶

NOTE: this uses Cassandra's experimental "Vector Similarity Search" capability. At the moment, this is obtained by building and running an early alpha from a specific branch of the codebase.

In [1]:

                
                    Copied!
                    
                        
                        
                    
                    

            
from langchain.indexes import VectorstoreIndexCreator
from langchain.text_splitter import (
    CharacterTextSplitter,
    RecursiveCharacterTextSplitter,
)
from langchain.docstore.document import Document
from langchain.document_loaders import TextLoader
from langchain.indexes import VectorstoreIndexCreator
from langchain.text_splitter import (
    CharacterTextSplitter,
    RecursiveCharacterTextSplitter,
)
from langchain.docstore.document import Document
from langchain.document_loaders import TextLoader

The following line imports the Cassandra flavor of a LangChain vector store:

In [2]:

                
                    Copied!
                    
from langchain.vectorstores.cassandra import Cassandra
from langchain.vectorstores.cassandra import Cassandra

As usual, a database connection is needed to access Cassandra. The following assumes that a vector-search-capable Cassandra cluster is running locally. Adjust as needed.

In [3]:

                
                    Copied!
                    
from cqlsession import getLocalSession, getLocalKeyspace
localSession = getLocalSession()
localKeyspace = getLocalKeyspace()
from cqlsession import getLocalSession, getLocalKeyspace
localSession = getLocalSession()
localKeyspace = getLocalKeyspace()

Both an LLM and an embedding function are required. These are from OpenAI:

In [4]:

                
                    Copied!
                    
from langchain.llms import OpenAI
from langchain.embeddings import OpenAIEmbeddings

llm = OpenAI(temperature=0)
myEmbedding = OpenAIEmbeddings()
from langchain.llms import OpenAI
from langchain.embeddings import OpenAIEmbeddings

llm = OpenAI(temperature=0)
myEmbedding = OpenAIEmbeddings()

A minimal example¶

The following is a minimal usage of the Cassandra vector store. The store is created and filled at once, and is then queried to retrieve relevant parts of the indexed text, which are then stuffed into a prompt finally used to answer a question.

Note: for the time being you have to explicitly turn on this experimental flag on the cassio side:

In [5]:

                
                    Copied!
                    
import cassio
cassio.globals.enableExperimentalVectorSearch()
import cassio
cassio.globals.enableExperimentalVectorSearch()

The following creates an "index creator", which knows about the type of vector store, the embedding to use and how to preprocess the input text:

In [6]:

                
                    Copied!
                    
                        
                        
                    
                    

            
index_creator = VectorstoreIndexCreator(
    vectorstore_cls=Cassandra,
    embedding=myEmbedding,
    text_splitter=CharacterTextSplitter(
        chunk_size=400,
        chunk_overlap=0,
    ),
    vectorstore_kwargs={
        'session': localSession,
        'keyspace': localKeyspace,
        'table_name': 'vs_test1',
    },
)
index_creator = VectorstoreIndexCreator(
    vectorstore_cls=Cassandra,
    embedding=myEmbedding,
    text_splitter=CharacterTextSplitter(
        chunk_size=400,
        chunk_overlap=0,
    ),
    vectorstore_kwargs={
        'session': localSession,
        'keyspace': localKeyspace,
        'table_name': 'vs_test1',
    },
)

Loading a local text (a short story by E. A. Poe will do)

In [7]:

                
                    Copied!
                    
loader = TextLoader('texts/amontillado.txt', encoding='utf8')
loader = TextLoader('texts/amontillado.txt', encoding='utf8')

This takes a few seconds to run, as it must calculate embedding vectors for a number of chunks of the input text:

In [8]:

                
                    Copied!
                    
index = index_creator.from_loaders([loader])
index = index_creator.from_loaders([loader])

Created a chunk of size 603, which is longer than the specified 400
Created a chunk of size 609, which is longer than the specified 400
Created a chunk of size 808, which is longer than the specified 400
Created a chunk of size 648, which is longer than the specified 400
Created a chunk of size 879, which is longer than the specified 400
Created a chunk of size 546, which is longer than the specified 400
Created a chunk of size 525, which is longer than the specified 400

In [9]:

                
                    Copied!
                    
query = "Who is Luchesi?"
index.query(query)
query = "Who is Luchesi?"
index.query(query)

Out[9]:

" Luchesi is a friend of Fortunato's who has a critical turn and is known for his taste in wine."

Spawning a "retriever" from the index¶

In [10]:

                
                    Copied!
                    
retriever = index.vectorstore.as_retriever(search_kwargs={
    'k': 2,
})
retriever = index.vectorstore.as_retriever(search_kwargs={
    'k': 2,
})

In [11]:

                
                    Copied!
                    
retriever.get_relevant_documents(
    "Check the motto of the Montresors"
)
retriever.get_relevant_documents(
    "Check the motto of the Montresors"
)

Out[11]:

[Document(page_content='"A huge human foot d\'or, in a field azure; the foot crushes a serpent\nrampant whose fangs are imbedded in the heel."\n\n"And the motto?"\n\n"_Nemo me impune lacessit_."\n\n"Good!" he said.', metadata={'source': 'texts/amontillado.txt'}),
 Document(page_content='He raised it to his lips with a leer.  He paused and nodded to me\nfamiliarly, while his bells jingled.\n\n"I drink," he said, "to the buried that repose around us."\n\n"And I to your long life."\n\nHe again took my arm, and we proceeded.\n\n"These vaults," he said, "are extensive."\n\n"The Montresors," I replied, "were a great and numerous family."\n\n"I forget your arms."', metadata={'source': 'texts/amontillado.txt'})]