[Feature request] Papers API #2553

nbroad1881 · 2024-09-18T14:44:40Z

I can do the following to search for papers: curl 'https://huggingface.co/api/papers/search?q=attention'

And I get this:

[{"id":"2409.07146","title":"Gated Slot Attention for Efficient Linear-Time Sequence Modeling","thumbnailUrl":"https://cdn-thumbnails.huggingface.co/social-thumbnails/papers/2409.07146.png","source":"hf"},{"id":"2409.03752","title":"Attention Heads of Large Language Models: A Survey","thumbnailUrl":"https://cdn-thumbnails.huggingface.co/social-thumbnails/papers/2409.03752.png","source":"hf"},{"id":"2409.00391","title":"Density Adaptive Attention-based Speech Network: Enhancing Feature Understanding for Mental Health Disorders","thumbnailUrl":"https://cdn-thumbnails.huggingface.co/social-thumbnails/papers/2409.00391.png","source":"hf"},{"id":"2408.10945","title":"HiRED: Attention-Guided Token Dropping for Efficient Inference of High-Resolution Vision-Language Models in Resource-Constrained Environments","thumbnailUrl":"https://cdn-thumbnails.huggingface.co/social-thumbnails/papers/2408.10945.png","source":"hf"},{"id":"2408.12588","title":"Real-Time Video Generation with Pyramid Attention Broadcast","thumbnailUrl":"https://cdn-thumbnails.huggingface.co/social-thumbnails/papers/2408.12588.png","source":"hf"},{"id":"2408.11237","title":"Out-of-Distribution Detection with Attention Head Masking for Multimodal Document Classification","thumbnailUrl":"https://cdn-thumbnails.huggingface.co/social-thumbnails/papers/2408.11237.png","source":"hf"},{"id":"2408.00760","title":"Smoothed Energy Guidance: Guiding Diffusion Models with Reduced Energy Curvature of Attention","thumbnailUrl":"https://cdn-thumbnails.huggingface.co/social-thumbnails/papers/2408.00760.png","source":"hf"},{"id":"2407.16291","title":"TAPTRv2: Attention-based Position Update Improves Tracking Any Point","thumbnailUrl":"https://cdn-thumbnails.huggingface.co/social-thumbnails/papers/2407.16291.png","source":"hf"}]

I can do this to get papers for a certain date:

curl https://huggingface.co/api/papers/search?date=2024-09-17

I don't know what other query terms I can look use. It would be nice to have this as part of the hub client.

@Wauplin

The text was updated successfully, but these errors were encountered:

Wauplin · 2024-09-18T14:57:22Z

So basically a list_papers method similar to list_models/list_datasets/list_spaces but with different query parameters, correct? What would be your use case for?

We could start with a simple list_papers method with a single query parameter (date?) returning a PaperInfo dataclass and then expand to other parameters. We haven't got much requests for the Paper API yet but maybe it's time to look into it :)

WizKnight · 2024-09-18T17:09:48Z

Hey @Wauplin :), I'm interested to work on this!

Raghul-M · 2024-09-18T20:01:02Z

Hey @Wauplin :), I'm interested to work on this! Can I get more context on this issue

Wauplin · 2024-09-19T12:50:58Z

Thank you everyone for proposing your help and even creating a PR @hlky. I was not expecting such enthusiasm on this feature! 🤗 I'll need some time to check which direction we want to take and review the PR. I'll keep you posted :)

hlky · 2024-09-19T19:54:07Z

Just to note for the search path e.g. https://huggingface.co/api/papers/search?q=attention we can access the full metadata of the paper like https://huggingface.co/api/papers/{paper_id} e.g. https://huggingface.co/api/papers/2408.12588.

This almost matches the Paper dataclass/"paper" from daily_papers endpoint, except it includes submittedOnDailyAt and submittedOnDailyBy which not part of Paper/"paper" in daily_papers endpoint, they are publishedAt and submittedBy in DailyPaper respectively. numComments is also missing from papers/{paper_id}.
Example:

daily_paper:

{
    "paper": {
        "id": "2409.11074",
        "authors": [
            {
                "_id": "66ead57361228b02f8144cdf",
                "name": "Adrian Cosma",
                "hidden": false
            },
            {
                "_id": "66ead57361228b02f8144ce0",
                "name": "Ana-Maria Bucur",
                "hidden": false
            },
            {
                "_id": "66ead57361228b02f8144ce1",
                "name": "Emilian Radoi",
                "hidden": false
            }
        ],
        "publishedAt": "2024-09-17T11:03:46.000Z",
        "title": "RoMath: A Mathematical Reasoning Benchmark in Romanian",
        "summary": "Mathematics has long been conveyed through natural language, primarily for\nhuman understanding. With the rise of mechanized mathematics and proof\nassistants, there is a growing need to understand informal mathematical text,\nyet most existing benchmarks focus solely on English, overlooking other\nlanguages. This paper introduces RoMath, a Romanian mathematical reasoning\nbenchmark suite comprising three datasets: RoMath-Baccalaureate,\nRoMath-Competitions and RoMath-Synthetic, which cover a range of mathematical\ndomains and difficulty levels, aiming to improve non-English language models\nand promote multilingual AI development. By focusing on Romanian, a\nlow-resource language with unique linguistic features, RoMath addresses the\nlimitations of Anglo-centric models and emphasizes the need for dedicated\nresources beyond simple automatic translation. We benchmark several open-weight\nlanguage models, highlighting the importance of creating resources for\nunderrepresented languages. We make the code and dataset available.",
        "upvotes": 1,
        "discussionId": "66ead57461228b02f8144d31"
    },
    "publishedAt": "2024-09-19T17:17:31.279Z",
    "title": "RoMath: A Mathematical Reasoning Benchmark in Romanian",
    "thumbnail": "https://cdn-thumbnails.huggingface.co/social-thumbnails/papers/2409.11074.png",
    "numComments": 1,
    "submittedBy": {
        "avatarUrl": "/avatars/1208629f14f010dbc2cd94f3c30f9baf.svg",
        "fullname": "JB D.",
        "name": "IAMJB",
        "type": "user",
        "isPro": false,
        "isHf": true,
        "isMod": false
    }
}

papers/{paper_id}:

{
    "id": "2409.11074",
    "authors": [
        {
            "_id": "66ead57361228b02f8144cdf",
            "name": "Adrian Cosma",
            "hidden": false
        },
        {
            "_id": "66ead57361228b02f8144ce0",
            "name": "Ana-Maria Bucur",
            "hidden": false
        },
        {
            "_id": "66ead57361228b02f8144ce1",
            "name": "Emilian Radoi",
            "hidden": false
        }
    ],
    "publishedAt": "2024-09-17T11:03:46.000Z",
    "submittedOnDailyAt": "2024-09-19T17:17:31.279Z",
    "title": "RoMath: A Mathematical Reasoning Benchmark in Romanian",
    "submittedOnDailyBy": {
        "_id": "62716952bcef985363db8485",
        "avatarUrl": "/avatars/1208629f14f010dbc2cd94f3c30f9baf.svg",
        "isPro": false,
        "fullname": "JB D.",
        "user": "IAMJB",
        "type": "user"
    },
    "summary": "Mathematics has long been conveyed through natural language, primarily for\nhuman understanding. With the rise of mechanized mathematics and proof\nassistants, there is a growing need to understand informal mathematical text,\nyet most existing benchmarks focus solely on English, overlooking other\nlanguages. This paper introduces RoMath, a Romanian mathematical reasoning\nbenchmark suite comprising three datasets: RoMath-Baccalaureate,\nRoMath-Competitions and RoMath-Synthetic, which cover a range of mathematical\ndomains and difficulty levels, aiming to improve non-English language models\nand promote multilingual AI development. By focusing on Romanian, a\nlow-resource language with unique linguistic features, RoMath addresses the\nlimitations of Anglo-centric models and emphasizes the need for dedicated\nresources beyond simple automatic translation. We benchmark several open-weight\nlanguage models, highlighting the importance of creating resources for\nunderrepresented languages. We make the code and dataset available.",
    "upvotes": 1,
    "discussionId": "66ead57461228b02f8144d31"
}

Wauplin added good first issue Good for newcomers enhancement New feature or request labels Sep 18, 2024

hlky linked a pull request Sep 19, 2024 that will close this issue

Daily Papers API #2554

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature request] Papers API #2553

[Feature request] Papers API #2553

nbroad1881 commented Sep 18, 2024

Wauplin commented Sep 18, 2024

WizKnight commented Sep 18, 2024

Raghul-M commented Sep 18, 2024

Wauplin commented Sep 19, 2024

hlky commented Sep 19, 2024