Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revisit dependencies #504

Open
ldecicco-USGS opened this issue Jan 22, 2020 · 5 comments
Open

Revisit dependencies #504

ldecicco-USGS opened this issue Jan 22, 2020 · 5 comments
Assignees
Labels
Backlog Can't fix in near future, but don't want to forget

Comments

@ldecicco-USGS
Copy link
Collaborator

We chose readr initially because we thought data.table was a bigger learning curve. To simply convert to fread would be pretty easy and still save some time and reduce some (a lot?) of dependencies.

While this is explored, we could re-re-think the lubridate dependency.

@ldecicco-USGS
Copy link
Collaborator Author

fread works great for WQP. Not as obviously great for RDB (those are the only 2 table readers we need). Here's the branch I'm working on:
https://github.com/ldecicco-USGS/dataRetrieval/tree/fread

The reason I'm thinking about this is that CRAN is potentially going to limit the number of total dependencies a package can have (that's the rumor at least....). readr brings in a bunch of dependencies. data.table stands alone (and is mostly more efficient).

Right now though, fread doesn't have a comment.char argument....which makes getting the RDB format very difficult (since you need to both ignore the # lines and skip 2 lines).

So, I'll follow this Issue:
Rdatatable/data.table#856
because I think if that was implemented, we could make a switch.

In the meantime, we'll cross our fingers that the total dependency limit doesn't affect this (still a bit early to tell).

@ldecicco-USGS
Copy link
Collaborator Author

It turns out, data.table ignores lines that start with "#" by default! So maybe, it's doable in the near term. In my fiddling though, it seems that if you use the default (ignore "#"), then you can't also use the "skip" with just a number (in our case, 2). If you use skip, it's starting from the beginning of the file (so, not ideal in our case, but not a deal breaker). However, it's fast enough that we could read the whole thing in as characters (which is how it seems to come in with our RDB formatting), use the top row to convert types...:

siteINFO <- readNWISsite('05114000')
url <- attr(siteINFO,"url")
colnames_data <- fread(url, 
                       header = TRUE,data.table = FALSE,
                       keepLeadingZeros = TRUE)

types <- unlist(colnames_data[1,])
types <- gsub("\\d", "", types) #this is R 4.0 regex!
types <- gsub("d", "Date", types)
types <- gsub("s", "character", types)
types[grepl("_va", names(types))] <- "numeric"
ret_df <- colnames_data[-1,]

ret_df[,types == "numeric"] <- sapply(ret_df[,types == "numeric"], as.numeric)

This code chunk actually gets us pretty far.... "just" (HA!) need to figure out dates. A lot of the date logic though will still be baked in the codes, so it might not be too much more work.

@ldecicco-USGS ldecicco-USGS mentioned this issue May 8, 2020
@ldecicco-USGS
Copy link
Collaborator Author

httr2?
https://httr2.r-lib.org/index.html

@ldecicco-USGS ldecicco-USGS added the Backlog Can't fix in near future, but don't want to forget label Jan 3, 2024
@ldecicco-USGS
Copy link
Collaborator Author

Not so much that it's backlogged, but that it's a continues task that should be considered. I don't think it's worth the effort to convert to data.table unless we do a major overhaul or readr starts causing problems.

@ldecicco-USGS
Copy link
Collaborator Author

httr2 on the otherhand....

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Backlog Can't fix in near future, but don't want to forget
Projects
None yet
Development

No branches or pull requests

1 participant