Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create Embedding Microservice Skeleton #4557

Open
albertisfu opened this issue Oct 11, 2024 · 8 comments
Open

Create Embedding Microservice Skeleton #4557

albertisfu opened this issue Oct 11, 2024 · 8 comments

Comments

@albertisfu
Copy link
Contributor

This is a follow up issue of #4530

On here I think we can consider the following tasks to discuss and create the corresponding PR:

1.1 Create Docker Skeleton for the Microservice

  • Decide between Django or FastAPI
  • Create Docker structure based on practices used in other projects we can use Doctor as base
  • Determine whether to add this microservice container to our docker-compose
  • Considering: Model size (~2.7GB as pointed out by @legaltextai ) may be too large for dev environments, so we have some options
    • a. Is it possible to load a lightweight version of the model on development just for testing?
    • b. Use a microservice mock for dev/testing purposes

Some questions for you, @legaltextai, so we can make some decisions here:

  • Do you feel more comfortable working with FastAPI?
  • Are you familiar with testing in FastAPI?
  • Do you know if it's possible to create a smaller version of the model with fewer parameters than the production model, so it only takes up a couple of hundred MB and can be used in development just to confirm it's working? The alternative is to mock the embedding micro-service for testing and development, but it would be better if we could have a small model to maintain the whole context in development, even if it doesn’t return great results.

Additional question:

  • What would be the name of the microservice, so we can name the repository and project? @mlissner, do you have any ideas on this?

The output of this issue would be:

@mlissner
Copy link
Member

I can't think of any good names and ChatGPT wasn't much help today. Maybe just vectorizer?

@mlissner
Copy link
Member

Maybe dreamview, dreamviwer or dreamspace would be an OK name? There's something dreamy about trying to envision the multi-dimensional space created by vectorization, and the idea that dreams are a window into your inner thinking is similar to how embeddings find the inner meaning of texts.

@legaltextai
Copy link
Contributor

let's call it Inception or Interstellar.

@mlissner
Copy link
Member

Both excellent movies, but go on?

@legaltextai
Copy link
Contributor

Well, why don't you share the reasoning that went into the "Doctor" name?

@mlissner
Copy link
Member

That was a surprisingly long conversation, but it does things to DOCuments, so DOCtor?

@legaltextai
Copy link
Contributor

Both excellent movies, but go on?

Multidimensional space, a cosmos vast,
Where thoughts and dreams intertwine.

Interstellar voids, a starlit path,
Guiding earthward our design.

Corn fields chase, golden hues,
Midwest bounty, nature's shrine.

Indiana's soil, rich and deep,
Nurtures adventures yet undefined.

Indiana Jones, with hat and whip,
Quests for treasures hard to find.

Holy Grail, the sacred cup,
Like law's wisdom, aged and fine.

Wine of knowledge, centuries old,
In this elixir, justice we enshrine.

To sum it all up, In vino veritas. Let's call it "wine"

@legaltextai
Copy link
Contributor

legaltextai commented Oct 14, 2024

That was a surprisingly long conversation, but it does things to DOCuments, so DOCtor?

Using that analogy, someone who helps find knowledge... We can call it Wong. A librarian like no other.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants