Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

beware of recapitation and missing data #1584

Open
petrelharp opened this issue Aug 22, 2024 · 1 comment
Open

beware of recapitation and missing data #1584

petrelharp opened this issue Aug 22, 2024 · 1 comment
Milestone

Comments

@petrelharp
Copy link
Contributor

A thing that caught me by surprise: now, if you simulate a chunk of a contig, you'll get 'missing data' outside that contig. This is represented as, basically, "no trees there". However, this will have ts.first().num_roots == ts.num_samples (and this is the reason for the root_threshold argument). This is fine and expected, if confusing - so maybe needs some documentation.

A particular gotcha is that if someone notices the ts doesn't look recapitated (although it is!), and recapitates again with pyslim.recapitate(ts, ...), then this will wrongly simulate coalescent history on the bits of chromosome that should be missing. I'm not sure if there's anything we can do to prevent this besides some documentation.

@petrelharp petrelharp added this to the Version 0.2.1 milestone Aug 22, 2024
@petrelharp
Copy link
Contributor Author

Example:

import stdpopsim

L = 1e6
engine = stdpopsim.get_engine("slim")
species = stdpopsim.get_species("CanFam")
contig = species.get_contig('1', left=1e7, right=1e7 + L)
model = stdpopsim.PiecewiseConstantSize(species.population_size, (100, 100))

samples = {"pop_0": 100}

ts = engine.simulate(model, contig, samples, slim_burn_in=100)

print([t.num_roots for t in ts.trees()])

gets

$ python test.py
[200, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 200]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant