Local RAG

This is the local RAG system that Marco showcased during the last session of Meeting 2. It requires you to install Ollama and download the nomic-text-embed (for querying) and llama-3 (for generating queries and answering questions) models.

!pip install --q unstructured langchain
!pip install --q "unstructured[all-docs]"
from langchain_community.document_loaders import UnstructuredPDFLoader
from langchain_community.document_loaders import OnlinePDFLoader
local_path = "some.pdf"    # insert your PDF filename here
loader = UnstructuredPDFLoader(file_path=local_path)
data = loader.load()
data[0].page_content    # shows the raw text content of the first page
!pip install --q chromadb
!pip install --q langchain-text-splitters
from langchain_community.embeddings import OllamaEmbeddings
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_community.vectorstores import Chroma
text_splitter = RecursiveCharacterTextSplitter(chunk_size=7500, chunk_overlap=100)
chunks = text_splitter.split_documents(data)
vector_db = Chroma.from_documents(
    documents=chunks,
    embedding=OllamaEmbeddings(model='nomic-embed-text', show_progress=True),
    collection_name='local-rag',
)
OllamaEmbeddings: 100%|███████████████████████████| 4/4 [00:01<00:00,  2.73it/s]
from langchain.prompts import ChatPromptTemplate, PromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_community.chat_models import ChatOllama
from langchain_core. runnables import RunnablePassthrough
from langchain.retrievers.multi_query import MultiQueryRetriever
local_model = 'llama3'
llm = ChatOllama(model=local_model)
# Based on the prompt format for Llama 3; see their documentation for details

def make_prompt(message):
    return f'<|begin_of_text|><|start_header_id|>user<|end_header_id|>\n\n{message}<|eot_id|><|start_header_id|>assistant<|end_header_id|>'
QUERY_PROMPT = PromptTemplate(
    input_variables = ['question'],
    template = make_prompt('You are an AI language model assistant. Your task is to generate five different versions of the given user question to retrieve relevant documents from a vector database. By generating multiple perspectives on the user question, your goal is to help the user overcome some of the limitations of the distance-based similarity search. Provide these alternative questions separated by newlines.\n\nOriginal question: {question}'),
)
retriever = MultiQueryRetriever.from_llm(
    vector_db.as_retriever(),
    llm,
    prompt=QUERY_PROMPT,
)
template = make_prompt("Answer the question based ONLY on the following context:\n\n{context}\n\nQuestion: {question}")

prompt = ChatPromptTemplate.from_template(template)
chain = (
    {'context': retriever, 'question': RunnablePassthrough()}
    | prompt
    | llm
    | StrOutputParser()
)
chain.invoke('What is this document about?')
OllamaEmbeddings: 100%|███████████████████████████| 1/1 [00:00<00:00,  1.54it/s]
OllamaEmbeddings: 100%|███████████████████████████| 1/1 [00:00<00:00, 81.55it/s]
OllamaEmbeddings: 100%|███████████████████████████| 1/1 [00:00<00:00, 67.02it/s]
OllamaEmbeddings: 100%|███████████████████████████| 1/1 [00:00<00:00, 93.30it/s]
OllamaEmbeddings: 100%|███████████████████████████| 1/1 [00:00<00:00, 59.18it/s]
OllamaEmbeddings: 100%|███████████████████████████| 1/1 [00:00<00:00, 95.18it/s]
OllamaEmbeddings: 100%|███████████████████████████| 1/1 [00:00<00:00, 64.52it/s]
OllamaEmbeddings: 100%|███████████████████████████| 1/1 [00:00<00:00, 92.63it/s]
OllamaEmbeddings: 100%|███████████████████████████| 1/1 [00:00<00:00, 64.02it/s]
OllamaEmbeddings: 100%|███████████████████████████| 1/1 [00:00<00:00, 95.38it/s]
OllamaEmbeddings: 100%|███████████████████████████| 1/1 [00:00<00:00, 62.01it/s]
'Based on the provided context, this document appears to be a thesis or research paper in the field of Natural Language Processing (NLP). Specifically, it discusses the application of machine translation models, particularly transformer models, to translate text from Swedish to Northern Sámi. The document also touches on topics such as preprocessing data, attention mechanisms in transformers, and the challenges of working with low-resource language settings.'
chain.invoke('What languages are mentioned in the paper?')
OllamaEmbeddings: 100%|███████████████████████████| 1/1 [00:00<00:00,  1.65it/s]
OllamaEmbeddings: 100%|███████████████████████████| 1/1 [00:00<00:00, 86.80it/s]
OllamaEmbeddings: 100%|███████████████████████████| 1/1 [00:00<00:00, 63.46it/s]
OllamaEmbeddings: 100%|███████████████████████████| 1/1 [00:00<00:00, 91.68it/s]
OllamaEmbeddings: 100%|███████████████████████████| 1/1 [00:00<00:00, 63.27it/s]
OllamaEmbeddings: 100%|██████████████████████████| 1/1 [00:00<00:00, 101.10it/s]
OllamaEmbeddings: 100%|███████████████████████████| 1/1 [00:00<00:00, 60.05it/s]
OllamaEmbeddings: 100%|███████████████████████████| 1/1 [00:00<00:00, 93.73it/s]
OllamaEmbeddings: 100%|███████████████████████████| 1/1 [00:00<00:00, 53.69it/s]
OllamaEmbeddings: 100%|███████████████████████████| 1/1 [00:00<00:00, 94.35it/s]
OllamaEmbeddings: 100%|███████████████████████████| 1/1 [00:00<00:00, 46.71it/s]
'According to the context, the following languages are mentioned:\n\n1. Swedish\n2. Northern Sámi\n3. Norwegian\n4. Finnish'
chain.invoke('What is the BLEU score of the final Swedish-Sámi model?')
OllamaEmbeddings: 100%|███████████████████████████| 1/1 [00:00<00:00,  1.37it/s]
OllamaEmbeddings: 100%|███████████████████████████| 1/1 [00:00<00:00, 73.37it/s]
OllamaEmbeddings: 100%|███████████████████████████| 1/1 [00:00<00:00, 62.36it/s]
OllamaEmbeddings: 100%|███████████████████████████| 1/1 [00:00<00:00, 47.21it/s]
OllamaEmbeddings: 100%|███████████████████████████| 1/1 [00:00<00:00, 95.07it/s]
OllamaEmbeddings: 100%|███████████████████████████| 1/1 [00:00<00:00, 60.57it/s]
OllamaEmbeddings: 100%|███████████████████████████| 1/1 [00:00<00:00, 59.58it/s]
OllamaEmbeddings: 100%|███████████████████████████| 1/1 [00:00<00:00, 60.27it/s]
OllamaEmbeddings: 100%|███████████████████████████| 1/1 [00:00<00:00, 59.50it/s]
OllamaEmbeddings: 100%|███████████████████████████| 1/1 [00:00<00:00, 60.70it/s]
OllamaEmbeddings: 100%|███████████████████████████| 1/1 [00:00<00:00, 81.88it/s]
OllamaEmbeddings: 100%|███████████████████████████| 1/1 [00:00<00:00, 51.44it/s]
"According to the text, the results from the evaluation for each language pair's baseline and final model are presented in Table 1. For the Swedish-Sámi model, the BLEU score of the final model is:\n\n24.35"
chain.invoke('List the main learnings articulated by the author.')
OllamaEmbeddings: 100%|███████████████████████████| 1/1 [00:00<00:00,  1.48it/s]
OllamaEmbeddings: 100%|███████████████████████████| 1/1 [00:00<00:00, 99.41it/s]
OllamaEmbeddings: 100%|███████████████████████████| 1/1 [00:00<00:00, 56.55it/s]
OllamaEmbeddings: 100%|███████████████████████████| 1/1 [00:00<00:00, 97.99it/s]
OllamaEmbeddings: 100%|███████████████████████████| 1/1 [00:00<00:00, 66.19it/s]
OllamaEmbeddings: 100%|███████████████████████████| 1/1 [00:00<00:00, 84.92it/s]
OllamaEmbeddings: 100%|███████████████████████████| 1/1 [00:00<00:00, 61.73it/s]
OllamaEmbeddings: 100%|███████████████████████████| 1/1 [00:00<00:00, 93.36it/s]
OllamaEmbeddings: 100%|███████████████████████████| 1/1 [00:00<00:00, 55.96it/s]
OllamaEmbeddings: 100%|███████████████████████████| 1/1 [00:00<00:00, 82.73it/s]
OllamaEmbeddings: 100%|███████████████████████████| 1/1 [00:00<00:00, 42.92it/s]
OllamaEmbeddings: 100%|███████████████████████████| 1/1 [00:00<00:00, 58.08it/s]
OllamaEmbeddings: 100%|███████████████████████████| 1/1 [00:00<00:00, 40.87it/s]
"Based on the provided context, the main learnings articulated by the author are:\n\n1. The importance of quality in machine translation, as highlighted by the BLEU scores and the difficulty in evaluating the performance of the model.\n2. The value of preprocessing data to prepare it for training, including techniques such as removing duplicate sentences and using byte-pair-encoding and stemming.\n3. The transformer architecture's ability to utilize attention mechanisms to allow models to consider how words relate to each other in a sentence during translation.\n4. The complexity and resource requirements of the transformer model, including its large number of parameters and need for long training times.\n\nThese learnings were gained through hands-on experience with the openNMT framework and by reading papers on machine translation, such as Stenlund et al. (2023) and Vaswani et al. (2023)."