Add code for processing datasets from Popler #95

ha0ye · 2019-02-28T20:56:02Z

Popler is a package for obtaining LTER datasets in a (somewhat) standardized way. We are going to need code that processes the data into the format that we need it in for MATSS:

Obtaining the data files

First, identify the datasets that match our needs, by specifying the arguments for pplr_browse(...)
Next, get the raw data for a particular dataset with pplr_get_data(...)

Metadata

identify the columns in data that do not change across observations - these are likely to be values that will go in the metadata list
from the community metadata (output of pplr_browse?), extract the species table to add to metadata

Covariates

need to parse the right things to construct a time index column, and attach the name to metadata
use pplr_cov_unpack(data) and munge the name and value columns into the covariates table

The text was updated successfully, but these errors were encountered:

ha0ye · 2019-02-28T21:02:15Z

@bleds22e has been working with the popler package, and will make a new branch for this work (so that everyone can help out).

ha0ye · 2019-03-21T13:47:53Z

~~(currently on hold; see #101)~~
resolved

diazrenata · 2019-04-23T19:39:10Z

Currently, the next steps are:

Figure out a permanent storage solution
Functions to process and add to MATSS

ha0ye · 2019-05-28T14:18:03Z

Update on Popler data integration:
• the LTER sites that are included in Popler's database each have their site-specific data transformed into Popler's format
• this can contain mixtures of different data sampling schemes, so generating community time series data is non-trivial

thoughts on ways forward:
• contact LTER data managers for already-prepared time series datasets
• manually clean and transform each of (many) datasets ourselves
• see if Popler has information on the backend about the different types of datasets it's pulling in from each LTER, maybe this allows us to more quickly filter for time series data (contact Aldo for this?)
• what is the overlap of datasets with BioTime? (is it easier to try and get these datasets from BioTime?)

current status:
• we are compiling some summary tables on how the different LTER sites have their data organized hierarchically within Popler (Popler calls these "spatial replication levels")
• https://github.com/ha0ye/popler will eventually contain generated Rmarkdown reports for these summary tables (one report for each LTER dataset entry), to be uploaded once they are finished being generated

😵 😫

ha0ye · 2019-05-29T16:09:48Z

@diazrenata also suggested we could do some digging through the source code for popler to see if that yielded any clues about how it might be processing data on its end.

ha0ye · 2019-06-20T16:04:00Z

@diazrenata also suggested we could do some digging through the source code for popler to see if that yielded any clues about how it might be processing data on its end.

It sounds like there may be unique code for importing each dataset into popler, so this may not be a feasible path to lessen the workload of manually dealing with each dataset.

I think our steps forward are:

don't worry about replication levels too much, and just check for the names of the spatial replication level variables -- if there are a lot of high-level ones named "site", that may be a good thing to use to split datasets in Popler into separate communities. Otherwise, just assemble the communities as is (i.e. aggregate over other spatial replication levels).
AND/OR
use BioTime if the datasets from Popler are overlapped in BioTime. Given the relative sizes of the databases, it seems unlikely, and that Popler has data that are definitely not in BioTime? Though maybe BioTime has pre-aggregated data that is more easily assembled into time series (i.e. abundances instead of a row for each raw count)

ha0ye · 2020-01-17T19:11:00Z

This issue needs a decision one way or another (i.e. whether to try and include US LTER data via Popler for V1).

ha0ye assigned ha0ye and bleds22e Mar 14, 2019

ha0ye added this to the Spring Semester Goals milestone Mar 21, 2019

ha0ye added the dataset adding new data to MATSS label Mar 21, 2019

diazrenata mentioned this issue Apr 23, 2019

popler database appears down #101

Closed

ha0ye mentioned this issue Apr 23, 2019

Summer 2019 Roadmap #115

Closed

15 tasks

ha0ye mentioned this issue Aug 2, 2019

Roadmap (infrastructure paper) #147

Closed

5 tasks

ha0ye modified the milestones: Spring Semester Goals, Version 1 Release Jan 17, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add code for processing datasets from Popler #95

Add code for processing datasets from Popler #95

ha0ye commented Feb 28, 2019 •

edited

Loading

ha0ye commented Feb 28, 2019

ha0ye commented Mar 21, 2019 •

edited

Loading

diazrenata commented Apr 23, 2019

ha0ye commented May 28, 2019

ha0ye commented May 29, 2019

ha0ye commented Jun 20, 2019

ha0ye commented Jan 17, 2020

Add code for processing datasets from Popler #95

Add code for processing datasets from Popler #95

Comments

ha0ye commented Feb 28, 2019 • edited Loading

Obtaining the data files

Metadata

Covariates

ha0ye commented Feb 28, 2019

ha0ye commented Mar 21, 2019 • edited Loading

diazrenata commented Apr 23, 2019

ha0ye commented May 28, 2019

ha0ye commented May 29, 2019

ha0ye commented Jun 20, 2019

ha0ye commented Jan 17, 2020

ha0ye commented Feb 28, 2019 •

edited

Loading

ha0ye commented Mar 21, 2019 •

edited

Loading