Revisit dependencies #504

ldecicco-USGS · 2020-01-22T17:30:57Z

We chose readr initially because we thought data.table was a bigger learning curve. To simply convert to fread would be pretty easy and still save some time and reduce some (a lot?) of dependencies.

While this is explored, we could re-re-think the lubridate dependency.

The text was updated successfully, but these errors were encountered:

ldecicco-USGS · 2020-02-25T22:51:57Z

fread works great for WQP. Not as obviously great for RDB (those are the only 2 table readers we need). Here's the branch I'm working on:
https://github.com/ldecicco-USGS/dataRetrieval/tree/fread

The reason I'm thinking about this is that CRAN is potentially going to limit the number of total dependencies a package can have (that's the rumor at least....). readr brings in a bunch of dependencies. data.table stands alone (and is mostly more efficient).

Right now though, fread doesn't have a comment.char argument....which makes getting the RDB format very difficult (since you need to both ignore the # lines and skip 2 lines).

So, I'll follow this Issue:
Rdatatable/data.table#856
because I think if that was implemented, we could make a switch.

In the meantime, we'll cross our fingers that the total dependency limit doesn't affect this (still a bit early to tell).

ldecicco-USGS · 2020-04-29T13:26:51Z

It turns out, data.table ignores lines that start with "#" by default! So maybe, it's doable in the near term. In my fiddling though, it seems that if you use the default (ignore "#"), then you can't also use the "skip" with just a number (in our case, 2). If you use skip, it's starting from the beginning of the file (so, not ideal in our case, but not a deal breaker). However, it's fast enough that we could read the whole thing in as characters (which is how it seems to come in with our RDB formatting), use the top row to convert types...:

siteINFO <- readNWISsite('05114000')
url <- attr(siteINFO,"url")
colnames_data <- fread(url, 
                       header = TRUE,data.table = FALSE,
                       keepLeadingZeros = TRUE)

types <- unlist(colnames_data[1,])
types <- gsub("\\d", "", types) #this is R 4.0 regex!
types <- gsub("d", "Date", types)
types <- gsub("s", "character", types)
types[grepl("_va", names(types))] <- "numeric"
ret_df <- colnames_data[-1,]

ret_df[,types == "numeric"] <- sapply(ret_df[,types == "numeric"], as.numeric)

This code chunk actually gets us pretty far.... "just" (HA!) need to figure out dates. A lot of the date logic though will still be baked in the codes, so it might not be too much more work.

ldecicco-USGS · 2021-09-23T13:40:28Z

httr2?
https://httr2.r-lib.org/index.html

ldecicco-USGS · 2024-01-03T15:16:05Z

Not so much that it's backlogged, but that it's a continues task that should be considered. I don't think it's worth the effort to convert to data.table unless we do a major overhaul or readr starts causing problems.

ldecicco-USGS · 2024-07-10T17:39:56Z

httr2 on the otherhand....

ldecicco-USGS mentioned this issue May 8, 2020

Bug fix #518

Merged

lstanish-usgs assigned ldecicco-USGS Dec 28, 2021

ldecicco-USGS added the Backlog Can't fix in near future, but don't want to forget label Jan 3, 2024

ldecicco-USGS closed this as completed Jan 3, 2024

ldecicco-USGS reopened this Jul 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Revisit dependencies #504

Revisit dependencies #504

ldecicco-USGS commented Jan 22, 2020

ldecicco-USGS commented Feb 25, 2020

ldecicco-USGS commented Apr 29, 2020

ldecicco-USGS commented Sep 23, 2021

ldecicco-USGS commented Jan 3, 2024

ldecicco-USGS commented Jul 10, 2024

Revisit dependencies #504

Revisit dependencies #504

Comments

ldecicco-USGS commented Jan 22, 2020

ldecicco-USGS commented Feb 25, 2020

ldecicco-USGS commented Apr 29, 2020

ldecicco-USGS commented Sep 23, 2021

ldecicco-USGS commented Jan 3, 2024

ldecicco-USGS commented Jul 10, 2024