Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Need to handle NEUS aggregation differently b/c duplicated wtcpue per spp #48

Open
rBatt opened this issue Feb 5, 2016 · 0 comments
Open

Comments

@rBatt
Copy link
Owner

rBatt commented Feb 5, 2016

The rows are per individual due to having length information, but the wtcpue column is for the species.

This is a problem because the intuition only works when you assign that the original species names are all correct.

I checked, and this raises very few problem for NEUS if approached simply (i.e., take the mean of the wtcpue within each unique combination of spp-haulid). However, there are a couple cases for which there was a species name correction. So 2 taxonimic ID's originally had their own (different) wtcpue's in a given haul, and each of those taxa may have had some individuals lengthed. So the wtcpue value is repeated several times for the taxon. But after correcting taxonomy, the 2 taxa are actually the same species. So you can't simply take the average (what you would do if all same taxa and duplicated wtcpue, as was probably intended interpretation) or the sum of wtcpue (if multiple rows for the same species-haul did not have duplicated wtcpue).

I hope this issue does not apply to sex too, but it could (i.e., when sex is listed, is the wtcpue sex-specific, or for the whole spp?).

One approach is to first aggregate while including wtcpue as a factor. This can be done with trawlAgg(), because usually at this stage of data processing both space_lvl and time_lvl are "haulid", so one of those (probably time) can be changed to "wtcpue". However, this might become challenging when there are NA's etc for wtcpue ... idk how the grouping would work.

Another approach could be to make the bioFun argument something like function(x)sumna(una(x)), where x is "wtcpue" passed to bioCols argument. This assumes equivalent wtcpue are from duplicated rows that shouldn't be summed together to get the total wtcpue for a species in a haul. May or may not be true.

Yet another approach could be to aggregate not by "spp", but by the original taxonomic ID column first. In that first aggregation, do bioFun = meanna. Then do the subsequent aggregation by "spp" with bioFun = sumna. This assumes that duplicate rows for a species within a haul should not be summed. It also obscures the potentially problematic scenario of there actually being multiple wtcpue values .... maybe instead of meanna could do something that lists the unique values, and hopefully throws an error when there's more than 1.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant