Skip to content

Code samples showing how to include data stored in Backblaze B2 in a RAG application

License

Notifications You must be signed in to change notification settings

backblaze-b2-samples/ai-rag-examples

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Retrieval-Augmented Generation (RAG) with Backblaze B2

Organizations looking to gain the benefits of AI, and, in particular, large language models (LLMs) must guard against the risks of using public services such as OpenAI's ChatGPT. One solution is to run a private LLM, where you select a model, and can more safely provision it with private data as context for generating responses.

This repository contains sample code showing how to build a retrieval-augmented generation (RAG) conversation chatbot application that loads context data in the form of PDFs from a private Backblaze B2 Cloud Object Storage Bucket.

There are two Jupyter notebooks:

  • gpt4all_demo.ipynb uses GPT4All to load a large language model (LLM) and answer a series of related questions without any custom context. This is a minimal example to show the basics of working with LLMs on your own machine.
  • rag_demo.ipynb uses the LangChain framework to build a retrieval-augmented generation (RAG) chain that loads context from PDF data stored in a Backblaze B2 bucket and implements a conversational chatbot that can include message history in generating responses.

You can browse the notebooks on GitHub and see sample output, or run them yourself.

The webinar, Leveraging your Cloud Storage Data in AI/ML Apps and Services, shows the Python applications that on which the above notebooks are based:

Backblaze AI/ML Webinar on YouTube

Running the Notebooks

Both notebooks should run on any Jupyter-compatible platform:

JupyterLab Settings

If you are deploying JupyterLab on a virtual machine at a cloud provider, you will need to configure it to accept connections from the internet. Here is the configuration we set in ~/.jupyter/jupyter_server_config.py for this purpose:

# Allow requests where the Host header doesn't point to a local server
c.ServerApp.allow_remote_access = True

# The IP address the Jupyter server will listen on.
# 0.0.0.0 = all addresses
c.ServerApp.ip = '0.0.0.0'

# Allow access to hidden files
# Set to True to allow access to .venv
c.ContentsManager.allow_hidden = True

About

Code samples showing how to include data stored in Backblaze B2 in a RAG application

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published