Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

User-friendly (ish) interface for metadata extraction #39

Merged
merged 17 commits into from
Jul 20, 2023
Merged

Conversation

richfitz
Copy link
Member

@richfitz richfitz commented Jul 17, 2023

This has been requested in a few guises over the years, and I think it will perhaps form the basis of other tooling too.

Extract some fields from the metadata of a bunch of packets. The implementation is fairly rich, and is described in some detail in the docs (which I used to sketch out the interface before implementation). I think this strikes a reasonable balance of usability, generalisability and extensibility, but very open to suggestions.

The query syntax is deliberately not-very R-ish as we might want to adopt this into the rust tool in some guise?

This is a big PR, but quite a lot of it is docs, error handling and tests. The docs (in the Rd file) is probably the best place to start

@richfitz richfitz marked this pull request as ready for review July 19, 2023 09:12
Copy link
Contributor

@r-ash r-ash left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some small suggested changes but looks good.

Guess big question I'm interested in is why you wanted the type conversion to be explicit instead of leaving it up to R to convert (which generally works fairly well). Just thinking ahead to other language implementations of this where the automatic conversion might not be possible or the same?

##' but you can always do this yourself.
##'
##' In order to use function you need to know what metadata are
##' available; we will expand a the vignette with more worked examples
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
##' available; we will expand a the vignette with more worked examples
##' available; we will expand the vignette with more worked examples

##' (per packet), each with columns `packet` (id), `query` (string,
##' used to find `packet`) and `files` (another `data.frame` with
##' columns `there` and `here` corresponding to filenames upstream
##' and in this packet, repsectively)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
##' and in this packet, repsectively)
##' and in this packet, respectively)

Comment on lines +90 to +91
##' You must not provide `id`; it is always returned and always first
##' as a character vector column.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this error if you provide id? Wonder if that should just do nothing if users specify it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

currently, yes it errored. I wondered about being nice and just skipping over it but decided that every time I try and be helpful it creates complicated things to maintain later, so just threw instead. We can always walk that back later

Comment on lines 126 to 132
##' Note that this does not do any coersion to number, it will error
##' if a non-NULL non-numeric value is found. Valid types for use
##' with `is <type>` are `boolean`, `number` and `string` (note that
##' these differ slightly from R's names because we want to emphasise
##' that these are *scalar* quantities; also note that there is no
##' `integer` here as this may produce unexpected errors with
##' integer-like numeric values).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And list?

for (i in seq_len(nrow(extract))) {
from_i <- extract$from[[i]]
is_i <- extract$is[[i]]
value_i <- lapply(meta, function(x) x[[from_i]])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did not know you could index into a list like that

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, total madness. This was the bit that I meant when I said "bits that you'd think are hard end up easy"

Copy link
Member Author

@richfitz richfitz Jul 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this did remind me though that I wanted to prevent the opaque error you get when this fails (done now)

@richfitz
Copy link
Member Author

Guess big question I'm interested in is why you wanted the type conversion to be explicit instead of leaving it up to R to convert (which generally works fairly well). Just thinking ahead to other language implementations of this where the automatic conversion might not be possible or the same?

I don't think this works well in practice (thinking here of why we avoid sapply in general). In general very little of this is knowable and we'll need the user to guide us. Trying to be chill about it will mean that different calls will result in different classes (for example, pulling parameters might be a character vector in one case because there's only one type of parameter and its character, a list in others and a list of lists in another). This way, the same call will be type stable even when the underlying data changes

@richfitz richfitz merged commit 8968fd4 into main Jul 20, 2023
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants