Skip to content

dingqiangliu/DocQAwithLLM

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

59 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Document Q&A with LLMs Locally


Context

  • Third-party commercial large language model (LLM) providers like OpenAI's GPT4 have democratized LLM use via simple API calls.
  • However, there are instances where teams would require self-managed or private model deployment for reasons like data privacy and residency rules.
  • The proliferation of open-source LLMs has opened up a vast range of options for us, thus reducing our reliance on these third-party providers.
  • When we host open-source LLMs locally on-premise or in the cloud, the dedicated compute capacity becomes a key issue. While GPU instances may seem the obvious choice, the costs can easily skyrocket beyond budget.
  • In this project, we will discover how to run quantized versions of open-source LLMs on local CPU inference for document question-and-answer (Q&A). architecture

Quickstart

  • Ensure you have downloaded one of models for answer generation:

    • the GGUF binary file of English model TheBloke/Llama-2-7b from https://huggingface.co/TheBloke/Llama-2-7b-Chat-GGUF, with MODEL_BIN_PATH pointing to the location and MODEL_TYPE: 'llama' in config/config.yml.
    • the GGML binary file of Chinese-English bilingual model THUDM/chatglm3-6b from https://modelscope.cn/models/tiansz/chatglm3-6b-ggml/ , with MODEL_BIN_PATH pointing to the location and MODEL_TYPE: 'chatglm_cpp' in config/config.yml. You can also manually quantize it with command:
      # see: https://github.com/li-plus/chatglm.cpp
      python -m chatglm_cpp.convert -i THUDM/chatglm3-6b -t q4_0 -o models/chatglm3-6b-ggml.q4_0.bin
    • all files of Chinese-English bilingual model THUDM/chatglm2-6b from https://huggingface.co/THUDM/chatglm2-6b-int4 , with MODEL_BIN_PATH pointing to the location and MODEL_TYPE: 'chatglm' in config/config.yml.
  • Ensure you have downloaded one of models for text embeddings:

  • Put your docs in data/ directory, launch the terminal from the project directory and run the following command to index:

    python db_build.py
  • To start parsing user queries into the application, run commandpython main.py "<user query>".For example, python main.py "What's the hightlights of Opentext IDOL?"python main.py "Opentext的IDOL有什么亮点?" demo


Tools

  • LangChain: Framework for developing applications powered by language models
  • C Transformers: Python bindings for the Transformer models implemented in C/C++ using GGML or GGUF library
  • ChatGLM.cpp: C++ implementation of ChatGLM-6B, ChatGLM2-6B, ChatGLM3-6B and more LLMs for real-time chatting on your MacBook.
  • IDOL: IDOL is a commercial enterprise search engine with vector index and search capability.
  • FAISS: Open-source library for efficient similarity search and clustering of dense vectors.
  • Sentence-Transformers (all-MiniLM-L6-v2): Open-source pre-trained transformer English model for embedding text to a 384-dimensional dense vector space for tasks like clustering or semantic search.
  • Sentence-Transformers (paraphrase-multilingual-MiniLM-L12-v2): Open-source pre-trained transformer multilingual model for embedding text to a 384-dimensional dense vector space for tasks like clustering or semantic search.
  • Llama-2-7B-Chat: Open-source fine-tuned Llama 2 model designed for chat dialogue. Leverages publicly available instruction datasets and over 1 million human annotations.
  • ChatGLM3-6B: Open-source Chinese-English bilingual model popular in Chinese community.

Files and Content

  • /assets: Images relevant to the project
  • /config: Configuration files for LLM application
  • /data: Dataset used for this project (i.e., Manchester United FC 2022 Annual Report - 177-page PDF document)
  • /models: Binary file of GGML or GGUF quantized LLM model (i.e., Llama-2-7B-Chat)
  • /src: Python codes of key components of LLM application, namely llm.py, utils.py, and prompts.py
  • /vectorstore: FAISS vector store for documents
  • db_build.py: Python script to ingest dataset and generate FAISS vector store
  • main.py: Main Python script to launch the application and to pass user query via command line
  • requirements.txt: List of Python dependencies (and version)

References

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%