-
Notifications
You must be signed in to change notification settings - Fork 209
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add GPT-NeoX and SAE #791
base: main
Are you sure you want to change the base?
Add GPT-NeoX and SAE #791
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -222,6 +222,13 @@ export const MODEL_LIBRARIES_UI_ELEMENTS = { | |
filter: false, | ||
countDownloads: `path:"checkpoints/byt5_model.pt"`, | ||
}, | ||
"gpt-neox": { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It might be easier to split the PR into the two different libraries |
||
prettyLabel: "GPT-NeoX", | ||
repoName: "GPT-NeoX", | ||
repoUrl: "https://github.com/EleutherAI/gpt-neox", | ||
filter: false, | ||
countDownloads: `path:"config.json"`, | ||
}, | ||
grok: { | ||
prettyLabel: "Grok", | ||
repoName: "Grok", | ||
|
@@ -358,6 +365,13 @@ export const MODEL_LIBRARIES_UI_ELEMENTS = { | |
filter: false, | ||
countDownloads: `path:"tokenizer.model"`, | ||
}, | ||
"sae": { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I haven't seen any models on the Hub tracked as There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, it would be great to add There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @Wauplin Yes, we have trained and uploaded a bunch of models using this library (and are actively working on more) but I figured we should add the connection to the back-end first and then add the tag to the models? Is that not the preferred order? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We usually recommend to tag at least a few models before merging the back-end PR. There is no hard requirement for that but asking it ensures that the library definition will be used. We don't want to add libraries and then realize model authors forgot to update their library. Once PR is merged, we know we don't have to check anything else afterwards. |
||
prettyLabel: "SAE", | ||
repoName: "sae", | ||
repoUrl: "https://github.com/EleutherAI/sae", | ||
filter: false, | ||
countDownloads: `path_extension:"safetensors" OR path_extension:"bin"`, | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Two questions:
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Maybe let's use the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Like the other example, this was me guessing. To make sure I understand best practices, we're looking for a file path pattern that
Is that correct? Perhaps the easiest solution is to just create a new metadata file which (for now, at least) primarily exists to support the integration. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. To be more precise, we are looking for a file path pattern that:
For example, on Meta-Llama-3.1-8B-Instruct-GGUF any download of a |
||
}, | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If you want, you can also provide a snippet to let the users know how to instantiate the models. The snippets are usually generated using the from sae import Sae
# Load specific layer
sae = Sae.load_from_hub("EleutherAI/sae-llama-3-8b-32x", layer=10)
# Load all layers
saes = Sae.load_many_from_hub("EleutherAI/sae-llama-3-8b-32x")
saes["layer_10"] It is not necessary to explain/document everything since the SAE github repo is better suited for that but having the "Use this model" button on your repos could help getting some traction. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This sounds like a great idea. Tagging @norabelrose for suggested example code. |
||
"sample-factory": { | ||
prettyLabel: "sample-factory", | ||
repoName: "sample-factory", | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I noticed that gpt_neox has 3919 models while gpt-neox has 33 models. For consistency with other libraries, naming it
gpt-neox
as you did makes more sense but it will mean most GPT-NeoX models will not be listed. A solution is to open a PR on all 3.9k models to update their metadata in the model card (we can provide a script for that) but it might be a bit too much. Any idea @osanseviero ?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR is actually a bit problematic. As we have 4k models with
gpt-neox
tag (automatically determined due tomodel_type
inconfig.json
), this will lead to some issues. I wonder if we could explore a different namegpt-neox-original
or something like that for the tag (to be added to all repos that can run inference with thegpt-neox
library). WDYT?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a side note, the above two comments are contradictory. To be clear, it looks to me that gpt-neox is the small one and gpt_neox is the big one.
I'm not deeply attached to either stylism. If we can rename 33 models and use a slightly different tag that's fine with me. I think the core question is if y'all like style consistency more than you dislike making a very large number of small edits. That said, if most of these models are getting automatically tagged there may need to be a CI that runs on new models to correct the tag each time. Would it be possible to hack the GPT-NeoX model class definition so that using
gpt_neox
induces the metadata entrygpt-neox
? Or is that too wonky of a patch? I assume changing the config name is non-viable at this point in time.