Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

005 switch to using httr2 #6

Open
wants to merge 19 commits into
base: main
Choose a base branch
from
Open

Conversation

danymukesha
Copy link
Collaborator

@danymukesha danymukesha commented Aug 28, 2024

I switched from httr to httr2 for the function request. There is a remaining part for the error handling that I should complete.

  • I opted directly for json content from the response instead of text.
  • Parse JSON content is no longer needed

@danymukesha danymukesha linked an issue Aug 28, 2024 that may be closed by this pull request
Merge branch 'main' into 005_switch_to_using_httr2

# Conflicts:
#	R/request.R
@ramiromagno
Copy link
Member

@danymukesha: see the req.R file. I think we should go with functions along the lines of req() and reqs() instead of request() as it is now. I think the code of request() had a structure still resembling too much the logic of httr. These new functions are more flexible and more aligned with httr2.

There is an example of how you would run req() in req.R just before the definition of req() so that hopefully you understand the logic of it. It queries this endpoint: https://rest.ensembl.org/documentation/info/cafe_tree_member_symbol.

The .body argument, if different from NULL makes it a POST request, so I think it is nice to have such a lower level function as req() being reused for either GET or POST requests.

I don't think the Ensembl REST API has paginated responses, so that's one less headache. But it has throttling: https://github.com/Ensembl/ensembl-rest/wiki/Rate-Limits. So we should try to see how to accommodate that.

But let me know what do you think about these two functions first: req() and reqs().

@danymukesha
Copy link
Collaborator Author

danymukesha commented Sep 1, 2024

@ramiromagno
I just checked the new req. I think it is a good idea for future extensibility. I like that it also offers flexibility by allowing optional path parameters and dynamic reconstruction of the URL.

I believe we can reuse it for both GET and POST requests. We might also consider adding assertions and some error handling to improve user-friendliness.

I am not totaly sure about the Ensembl REST API not having paginated responses. Because, I think that if the response would contain huge genomic sequence(s) with formats like fasta (e.g: https://rest.ensembl.org/documentation/info/sequence_region), then (I guess) we should expect some kind of pagination.

@ramiromagno
Copy link
Member

ramiromagno commented Sep 8, 2024

Hi @danymukesha!

Really great work! I did quite a few changes but most are stylistic. The code and tests were very good already.

Here are a few points for discussion:

  1. I guess that the helper related to response headers should not be about creating a list of header values but instead a helper that allows us to extract a list of such response headers once we got a actual response , e.g. after a req_perform() call.
  2. Regarding the helper for request headers, I simplified a bit. Changed the name to match starting with the prefix req given that we had already the req(). I notice you were filtering out header values that were NULL but I decided to remove that code because httr2 does that for us, just check the docs of httr2::req_headers() for the parameter ....
  3. In the req() function I cleaned up a bit the passing of the headers to httr2::req_headers() by using the bang-bang-bang operator. This makes the code more flexible and easier to read. I noticed you were also augmenting the request object with the optional parameters as a way of telling apart which are mandatory and which are optional. I guess this was mostly for the testing part. But quite frankly I think it is just better to test if the whole URL is formed as expected, and that in and of itself will test everything we might want to validate about these parameters.

If you agree, I think next steps:

  1. Writing tests for reqs(), namely for testing the vectorization of parameters passed in ....
  2. Then creating a get() and a post() function. These should be relatively thin-wrappers around reqs() and in addition should actually make the requests themselves, i.e. using any of functions in https://httr2.r-lib.org/reference/index.html#perform-multiple-requests.

The general idea is the following:

  • req() creates one single request object.
  • reqs() creates many request objects (i.e., a list thereof).
  • get() or post() make actual requests, and both return a list of responses, even if that means a list of one response (if only one request was made)
  • Finally we create low-level functions for requesting actual Ensembl endpoints: get_XXX() or post_XXX(). Compared to get() or post() functions, the main difference is that these will perform validation on input parameters that are tailored to each specific endpoint, and return already an R object that is sensible for the type of information being returned by the endpoint.
  • Ultimately, we might have higher-level functions that call get_XXX() or post_XXX() and provide even easier interfaces for the user when requesting info from Ensembl.

The scope of this PR is getting the following functions implemented, documented and tested:

  1. req()
  2. reqs()
  3. get() and post()

@danymukesha
Copy link
Collaborator Author

danymukesha commented Sep 20, 2024

@ramiromagno
I have the impression that in the current vector recycling implementation we have in reqs does not permit the full complete permutation for all possible combinations. I don't know if what I am saying makes sense but let me break it down into an example:

take an example we have

  • res <- "/endpoint/{var1}/{var2}"
  • var1 <- c("a", "b")
  • var2 <- c("x", "y")

I then expect to have the following combinations of multiple requests:

  • https://rest.ensembl.org/endpoint/a/x
  • https://rest.ensembl.org/endpoint/b/x
  • https://rest.ensembl.org/endpoint/a/y
  • https://rest.ensembl.org/endpoint/b/x

Meaning we expect to have 4 different requests, right?

But in our case we have only two combinations:

> reqs(res, var1 = var1, var2 = var2)
[[1]]
<httr2_request>
GET https://rest.ensembl.org/endpoint/a/x
Headers:Content-Type: 'application/json'
Body: empty
Options:useragent: 'ensemblr (https://www.pattern.institute/ensemblr)'

[[2]]
<httr2_request>
GET https://rest.ensembl.org/endpoint/b/y
Headers:Content-Type: 'application/json'
Body: empty
Options:useragent: 'ensemblr (https://www.pattern.institute/ensemblr)'

Is that making sense?

@ramiromagno
Copy link
Member

Yeah, unless we have strong arguments for that, I would stick with https://vctrs.r-lib.org/reference/theory-faq-recycling.html.

@ramiromagno
Copy link
Member

If get() would take a body argument, then it would become a post(), isn't it?

@danymukesha
Copy link
Collaborator Author

Yeah I could confirm that. Providing the body argument defines the method to use automatically.
I even check the codebase of httr2 for the req_method function:
https://github.com/r-lib/httr2/blob/f6fb40865ef59eda55f10e6f846361a5718c078d/R/req-method.R#L5
image

https://github.com/r-lib/httr2/blob/f6fb40865ef59eda55f10e6f846361a5718c078d/R/req-body.R#L11
image

@danymukesha
Copy link
Collaborator Author

Do you think we need to assert the input arguments on get() and post() functions or should we leave that to the low-level functions get_XXX() and post_XXX()?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Migrate httr functionality to httr2
2 participants