Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP add any github repo #1

Open
wants to merge 3 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 18 additions & 0 deletions .devcontainer/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
FROM python:3.10


# Add non-root user
ARG USERNAME=nonroot
RUN groupadd --gid 1000 $USERNAME && \
useradd --uid 1000 --gid 1000 -m $USERNAME
## Make sure to reflect new user in PATH
ENV PATH="/home/${USERNAME}/.local/bin:${PATH}"
USER $USERNAME

# Upgrade pip
RUN pip install --upgrade pip

# Install production and dev dependencies
COPY --chown=nonroot:1000 requirements.txt /tmp/requirements.txt
RUN pip install --no-cache-dir -r /tmp/requirements.txt && \
rm /tmp/requirements.txt
29 changes: 29 additions & 0 deletions .devcontainer/devcontainer.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
{
"build": {
"dockerfile": "Dockerfile",
"context": ".."
},
"remoteUser": "nonroot",
"portsAttributes": {
"5005": {
"label": "flask",
"onAutoForward": "openBrowser"
}
},
"customizations": {
"vscode": {
"extensions": [
"ms-python.python",
"ms-azuretools.vscode-docker",
"github.copilot"
],
"settings": {
"terminal.integrated.defaultProfile.linux": "bash"
}
}
},

"forwardPorts": [
5005
]
}
3 changes: 2 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
@@ -1 +1,2 @@
.env
.env
repos
7 changes: 0 additions & 7 deletions Github.code-workspace

This file was deleted.

5 changes: 2 additions & 3 deletions app.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,5 @@
from flask import Flask, render_template, request, jsonify
import os
import getpass
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.vectorstores import DeepLake
from langchain.chat_models import ChatOpenAI
Expand All @@ -16,7 +15,7 @@
ACTIVELOOP_TOKEN = os.getenv('ACTIVELOOP_TOKEN')

embeddings = OpenAIEmbeddings(disallowed_special=())
db = DeepLake(dataset_path="hub://davitbun/twitter-algorithm", read_only=True, embedding_function=embeddings)
db = DeepLake(dataset_path="hub://theodoremeynard/ddataflow", read_only=True, embedding_function=embeddings)
retriever = db.as_retriever()
retriever.search_kwargs['distance_metric'] = 'cos'
retriever.search_kwargs['fetch_k'] = 100
Expand Down Expand Up @@ -50,4 +49,4 @@ def ask_question():
return jsonify({"question": question, "answer": result['answer']})

if __name__ == "__main__":
app.run(debug=True)
app.run(debug=True,port=5005)
21 changes: 18 additions & 3 deletions readme.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Twitter Algorithm Chatbot
# Algorithm Chatbot

This is a simple chatbot that can answer questions about the [Twitter algorithm](https://github.com/twitter/the-algorithm). It is built using Python, HTML, CSS, and JavaScript.
This is a simple chatbot that can answer questions about an algorithm like [twitter algorithm](https://github.com/twitter/the-algorithm). It is built using Python, HTML, CSS, and JavaScript.

Please note that since we are using GPT-4, the response times will be higher and every query will cost more than GPT-3.5.

Expand All @@ -25,7 +25,22 @@ cd GitGPT
python app.py
```

The chatbot interface will appear, allowing you to ask questions about the Twitter algorithm.
You can also directly use a devcontainer in vscode by clicking on the icon below

[
![Open in Remote - Containers](
https://img.shields.io/static/v1?label=Remote%20-%20Containers&message=Open&color=blue&logo=visualstudiocode
)
](
https://vscode.dev/redirect?url=vscode://ms-vscode-remote.remote-containers/cloneInVolume?url=https://github.com/theodoremeynard/GitGPT
)

and then you just need to run
```bash
python app.py
```

The chatbot interface will appear, allowing you to ask questions about the algorithm.

Enter your question in the input field and click the "Send" button or press the "Enter" key to submit your query. The chatbot will display a "Thinking..." message while it processes your request, and then it will display the answer to your question.

Expand Down
7 changes: 5 additions & 2 deletions requirements.txt
Original file line number Diff line number Diff line change
@@ -1,3 +1,6 @@
Flask==2.0.1
langchain==0.0.12
dotenv==0.17.1
langchain==0.0.170
python-dotenv==1.0.0
openai==0.27.6
deeplake==3.4.3
tiktoken==0.4.0
2 changes: 1 addition & 1 deletion templates/index.html
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@
<div id="chat-container">
<div id="messages"></div>
<div id="input-container">
<input id="query" type="text" placeholder="Ask me anything about the Twitter algorithm..."><!-- input messageL -->
<input id="query" type="text" placeholder="Ask me anything about the code..."><!-- input messageL -->
<button id="send">Send</button>
</div>
</div>
Expand Down
54 changes: 54 additions & 0 deletions upload.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
import subprocess
import os
from langchain.document_loaders import TextLoader
from langchain.text_splitter import CharacterTextSplitter
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.vectorstores import DeepLake

from dotenv import load_dotenv

load_dotenv()

def clone_repo(repo_url, location=""):
"""
Clone a git repository into a specified location.

:param repo_url: The URL of the repository to clone.
:param location: The location to clone the repository into. Default is the current directory.
"""
subprocess.run(["git", "clone", repo_url, location])

def prepare_data(root_dir):
"""
Prepare data from a root directory
"""
docs = []
for dirpath, _, filenames in os.walk(root_dir):
for file in filenames:
if file.endswith('.py') or file.endswith(".md"):
try:
loader = TextLoader(os.path.join(dirpath, file), encoding='utf-8')
docs.extend(loader.load_and_split())
except Exception as e:
pass
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
texts = text_splitter.split_documents(docs)
print(f"{len(texts)}")
return texts

def push_data_to_deeplake(texts, dataset_path):
"""
Push data to deeplake
"""
embeddings = OpenAIEmbeddings()

db = DeepLake.from_documents(texts, embeddings, dataset_path=dataset_path)
return db



# example usage
if __name__ == "__main__":
clone_repo("https://github.com/getyourguide/DDataFlow.git", "./repos/DDataFlow")
texts = prepare_data("./repos/DDataFlow")
db = push_data_to_deeplake(texts, "hub://theodoremeynard/ddataflow")