Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update 'Using_Pinecone_for_embeddings_search.ipynb' to current APIs #1355

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

sheldonrampton
Copy link

Summary

This updates the Pinecone example file, "Using_Pinecone_for_embeddings_search.ipynb," to use current versions of the Pinecone and OpenAI APIs and also fixes a mismatch between the embedding model specified in the notebook and the embedding model that was used to create the embeddings file which the notebook retrieves.

Motivation

The Pinecone and OpenAI APIs that were used to create the notebook have both been revised since the notebook was created. I noticed this when I tried using the code and encountered error messages.

In addition to the issues with old API calling syntax that is now deprecated, I noticed a mismatch between the embedding model specified in the notebook (text-embedding-3-small) and the embedding model that was used to create the embeddings file that is referenced at embeddings_url https://cdn.openai.com/API/examples/data/vector_database_wikipedia_articles_embedded.zip

The embeddings file was created using text-embedding-ada-002 as its embedding model. As a result, running the query_article() function produces nonsense results. Here is the result I got when I searched for similar results to modern art in Europe in the "title" namespace:

General Dynamics F-16 Fighting Falcon (score = 0.0341419838)
Mikoyan-Gurevich MiG-17 (score = 0.0325526334)
The Good, the Bad and the Ugly (score = 0.0281740129)
Mikoyan-Gurevich MiG-15 (score = 0.0260391217)
Musical genre (score = 0.0248822626)

And here are the results I got when I searched for "Famous battles in Scottish history" in the "content" namespace:

585 BC (score = 0.0467720367)
Order of the British Empire (score = 0.0448796861)
40s BC (score = 0.0444191061)
Order of the Bath (score = 0.0433623493)
Julius Caesar (score = 0.0405869484)

Once I switched back to the older text-embedding-ada-002 embeddings model, the notebook produced correct results. The notebook should therefore use "text-embedding-ada-002," or else you should regenerate file vector_database_wikipedia_articles_embedded.zip using the newer embeddings model.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant