Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE] Idea for the reduction of mask sizes #999

Open
XylotrupesGideon opened this issue Aug 19, 2024 · 5 comments
Open

[FEATURE] Idea for the reduction of mask sizes #999

XylotrupesGideon opened this issue Aug 19, 2024 · 5 comments
Labels
enhancement New feature or request

Comments

@XylotrupesGideon
Copy link

Use a Napari-labels-like data format to reduce the size of Cellpose mask output before stitching

I was working with a large (14 GB) image stack and segmented it in cellpose. I was able to segment the individual z-planes, however when running 3d stitch I ran out of memory very quickly.

I then realized that by loading the planes into Napari, as label layers and then immediately saving them I could reduce the file size by 56x(!) (Cellpose output 18026 KB per z-plane vs 46 KB per z-plane when saved as Napari label layer). This enabled me to effortless run the 3D stitch on my computer.

I am not sure in what way the output of Napari differs from that of cellpose however this difference in file size was an absolute life saver for me and I guess It would be very helpful for anyone else that does not have access to a lot of RAM.
I would suggest to take a look at the implementation of Napari and see if it can be integrated into cellpose.

The workflow I used was the following (not the most efficient route I guess but it works for me):

import napari, cellpose,tifffile

model = models.CellposeModel(pretrained_model = cp_pretrained_model) #I used a custom model
masks_output_folder = "your_output_folder"

#segment the individual z-planes (each z-plane is in a seprate .tif file)
for i in range(num_z_planes):
     image = tifffile.imread(planes_output_folder + f'z_plane_{i}.tif')
     masks, flows, _ = model.eval(image)
     tifffile.imwrite(masks_output_folder + f'mask_{i}.tif', masks)
# -> results in the large cellpose output files

#convert the output into napari lable layer format = 56x filesize reduction
viewer = napari.Viewer() #opens a napari viewer
for  i in range(num_z_planes):
    labels_layer = viewer.open(f"mask_{i}_label.tif",layer_type = "labels") #load cellpose output mask file as label layer 
    labels_layer = viewer.layers[f"mask_{i}_label"] #rename the layer
    labels_layer.save(masks_output_folder + f"\mask_{i}_label.tif") #save the layer

#create a stack of the unstitched mask z-planes (i guess that could be skipped and directly stitched)
layer_data = [layer.data for layer in viewer.layers]
z_stack = np.stack(layer_data, axis=0)
viewer.add_image(z_stack, name="Z-stack")
viewer.layers["Z-stack""].save(masks_output_folder + f"\z-stack.tif")
viewer.close()

# Open the z-stack .tif file (for soem reason it did not want to load the whole stack directly probably somethign with my machine and it can be skipped. I will leave it here anyways
with Image.open(masks_output_folder + f"\z-stack.tif") as img:
    # Initialize a list to store image frames
    frames = []
    # Iterate over all frames in the .tif file
    for frame in range(img.n_frames):
        img.seek(frame)  # Move to the specific frame
        frames.append(np.array(img))  # Convert the frame to a NumPy array

# Convert the list of frames into a 3D NumPy array
stack = np.stack(frames, axis=0)

# Now, pass the 3D stack to your stitch3D function
stitched_stack = cellpose.utils.stitch3D(stack, stitch_threshold=0.6)

# Convert your stitched_stack to float32 if it's not already
stitched_stack = stitched_stack.astype(np.float32)

# Define the path for the output file
output_path = masks_output_folder + r'stitched_60%_stack_.tif'

# Convert the entire stitched stack to PIL Image objects with float32 mode
images = [Image.fromarray(slice, mode='F') for slice in stitched_stack]

# Save the multi-page TIFF
if images:
    images[0].save(output_path, save_all=True, append_images=images[1:], compression='tiff_deflate')
@XylotrupesGideon XylotrupesGideon added the enhancement New feature or request label Aug 19, 2024
@carsen-stringer
Copy link
Member

thanks for reporting this, what is the dtype for the frames saved with napari?

I think our tiffs are large because we weren't using compression, I've accepted a pull request with this. but that shouldn't change the underlying dtype

@XylotrupesGideon
Copy link
Author

XylotrupesGideon commented Sep 9, 2024

The dtype is np.ndarray after conversion.

I am still trying to figure out what exactly napari is doing to compress the labels.

@XylotrupesGideon
Copy link
Author

XylotrupesGideon commented Sep 9, 2024

Okay is seems that this is the function it uses :
It seems to just create an integer ndarray and then populate it successively with the labels.
though it goes throught some other functions which as far as I can see should however not influence the output of a cellpose segmentation.

napari/napari/layers/shapes
/_shape_list.py

def to_labels(self, labels_shape=None, zoom_factor=1, offset=(0, 0)):
        """Returns a integer labels image, where each shape is embedded in an
        array of shape labels_shape with the value of the index + 1
        corresponding to it, and 0 for background. For overlapping shapes
        z-ordering will be respected.

        Parameters
        ----------
        labels_shape : np.ndarray | tuple | None
            2-tuple defining shape of labels image to be generated. If non
            specified, takes the max of all the vertices
        zoom_factor : float
            Premultiplier applied to coordinates before generating mask. Used
            for generating as downsampled mask.
        offset : 2-tuple
            Offset subtracted from coordinates before multiplying by the
            zoom_factor. Used for putting negative coordinates into the mask.

        Returns
        -------
        labels : np.ndarray
            MxP integer array where each value is either 0 for background or an
            integer up to N for points inside the corresponding shape.
        """
        if labels_shape is None:
            labels_shape = self.displayed_vertices.max(axis=0).astype(int)

        labels = np.zeros(labels_shape, dtype=int)

        for ind in self._z_order[::-1]:
            mask = self.shapes[ind].to_mask(
                labels_shape, zoom_factor=zoom_factor, offset=offset
            )
            labels[mask] = ind + 1

        return labels

@carsen-stringer
Copy link
Member

thanks okay I'll see if I can replicate the increased memory usage and reduce it, we are using uint16 or uint32. which OS are you on and what were the dimensions of the stack (size in x,y,z)?

@carsen-stringer
Copy link
Member

okay I am not using a big enough stack to replicate large differences in RAM but I found where it could be slowed down - due to a type cast. inside stitch3D we're using int, not the dtype of the masks from cellpose (which are usually uint16). I've updated the code to use masks.dtype, but I don't think should make much of a difference.

another alternative is that you have a bunch of small masks (<15 pixels) that are thrown out when not stitching but remain when stitching, and that's what is slowing things down. you can test this by turning off min_size when running plane-by-plane (model.eval(..., min_size=-1)) and seeing if you find a lot of small masks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants