-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add pixel_filter_util.py and pool_decorator function #464
Conversation
Yaoyx
commented
Oct 18, 2023
- add pixel_filter_util.py to sandbox. The script includes functions for generating a filtered cool file based on cis-total ratio threshold
- add pool_decorator function to lib/common.py. A decorator function that enables multiprocessing for a given function.
yield chunk | ||
|
||
|
||
def pool_decorator(func): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can probably delete this one & import from lib?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated
logger.addHandler(ch) | ||
|
||
|
||
@curry |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Curry looks nice here!
Would be great to add ipynb with example usage of this |
@agalitsyna would you mind taking a look, since you mentioned this functionality could be useful for you? |
'Update readthedoc from the original repo'
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
cooltools/lib/common.py
Outdated
pool = Pool(kwargs["nproc"]) | ||
mymap = pool.map |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mp.Pool.map
is not always what you want to use in the decorated function. It is eager and will block the Python interpreter until it materializes all outputs in a list before returning.
This is problematic if the function you are mapping is not reductive (e.g. when making a pixel chunk iterator, all chunks will be materialized in memory). Instead you want to use a lazier map implementation like imap
or in some cases imap_unordered
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
option for a test for this function could be returning what map the decorator is specifying
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seems like cooler indeed used pool.imap
for tabix loader: https://github.com/open2c/cooler/blob/857601826fb3aa8ae4e4d3cc64afa61dec18c87e/src/cooler/cli/cload.py#L222
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pool.imap_unordered
was used for cooler.balance
b/c could hold nproc copies of the array in memory and this collection is then summed over.
logger.addHandler(ch) | ||
|
||
@curry | ||
def cis_total_ratio_filter(clr, thres=0.5): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would go with just threshold=0.5
instead of thres, because it may be easier to remember (e.g. I would forget if I should have thres or thresh)
cooler.create_cooler( | ||
output_uri, | ||
bins=bin_table, | ||
pixels=map(pixels_filter, pixel_iter_chunks(clr, chunksize)), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You probably want either builtin map
, which is lazy and returns an iterator, or mp.Pool.imap
+ ordered=True
, or potentially mp.Pool.imap_unordered
+ ordered=False
(this does a 2-pass creation algorithm).
"bin_mask should have the same length as bin table in cool file" | ||
) | ||
logger.debug("Start to create cooler file...") | ||
bin_table = clr.bins()[:] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should drop fold weights from the bin_table to minimize confusion!
…emove weight column from the bin table
Updates:
|
* add pixel_filter_util to sandbox; * add pool_decorator function to lib/common.py --------- Co-authored-by: Yao Xiao <[email protected]>