Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add pixel_filter_util.py and pool_decorator function #464

Merged
merged 6 commits into from
Nov 6, 2023

Conversation

Yaoyx
Copy link
Collaborator

@Yaoyx Yaoyx commented Oct 18, 2023

  • add pixel_filter_util.py to sandbox. The script includes functions for generating a filtered cool file based on cis-total ratio threshold
  • add pool_decorator function to lib/common.py. A decorator function that enables multiprocessing for a given function.

yield chunk


def pool_decorator(func):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can probably delete this one & import from lib?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

logger.addHandler(ch)


@curry
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curry looks nice here!

@gfudenberg
Copy link
Member

gfudenberg commented Oct 18, 2023

Would be great to add ipynb with example usage of this cis_total_ratio_filter. Perhaps this can be put in a subdirectory of sandbox/cooler_filters with the pixel_filter_util

@gfudenberg
Copy link
Member

@agalitsyna would you mind taking a look, since you mentioned this functionality could be useful for you?

@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@agalitsyna agalitsyna assigned agalitsyna and unassigned agalitsyna Oct 23, 2023
@agalitsyna agalitsyna self-requested a review October 23, 2023 16:00
Comment on lines 531 to 532
pool = Pool(kwargs["nproc"])
mymap = pool.map
Copy link
Member

@nvictus nvictus Oct 23, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mp.Pool.map is not always what you want to use in the decorated function. It is eager and will block the Python interpreter until it materializes all outputs in a list before returning.

This is problematic if the function you are mapping is not reductive (e.g. when making a pixel chunk iterator, all chunks will be materialized in memory). Instead you want to use a lazier map implementation like imap or in some cases imap_unordered.

Copy link
Member

@gfudenberg gfudenberg Oct 23, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

option for a test for this function could be returning what map the decorator is specifying

Copy link
Member

@gfudenberg gfudenberg Oct 23, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

@gfudenberg gfudenberg Oct 23, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pool.imap_unordered was used for cooler.balance b/c could hold nproc copies of the array in memory and this collection is then summed over.

logger.addHandler(ch)

@curry
def cis_total_ratio_filter(clr, thres=0.5):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would go with just threshold=0.5 instead of thres, because it may be easier to remember (e.g. I would forget if I should have thres or thresh)

cooler.create_cooler(
output_uri,
bins=bin_table,
pixels=map(pixels_filter, pixel_iter_chunks(clr, chunksize)),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You probably want either builtin map, which is lazy and returns an iterator, or mp.Pool.imap + ordered=True, or potentially mp.Pool.imap_unordered + ordered=False (this does a 2-pass creation algorithm).

"bin_mask should have the same length as bin table in cool file"
)
logger.debug("Start to create cooler file...")
bin_table = clr.bins()[:]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should drop fold weights from the bin_table to minimize confusion!

@Yaoyx
Copy link
Collaborator Author

Yaoyx commented Oct 25, 2023

Updates:

  • Change pool.map to pool.imap in pool_decorator

  • rename to in cis_total_ratio_filter

  • remove weight column from the bin table in create_filtered_cooler

@gfudenberg gfudenberg merged commit 950b62a into open2c:master Nov 6, 2023
4 checks passed
agalitsyna pushed a commit that referenced this pull request Feb 12, 2024
* add pixel_filter_util to sandbox; 
* add pool_decorator function to lib/common.py

---------

Co-authored-by: Yao Xiao <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants