Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

very slow start time when parallelizing #167

Open
Remi-Gau opened this issue Sep 18, 2023 · 2 comments
Open

very slow start time when parallelizing #167

Remi-Gau opened this issue Sep 18, 2023 · 2 comments
Milestone

Comments

@Remi-Gau
Copy link
Contributor

Remi-Gau commented Sep 18, 2023

Testing it on our large test nodes, the commands seem to work quite well for a single subject
would like to parallelize them to process my entire study.
participants each have around 30 sessions.
Attempting to parallelize each subject on our GPU clusters appears to fail, the jobs keep getting killed due to being out of memory. In fact, BIDSMREYE seems to take an extremely long time just to begin, about several hours for the job to begin.

#!/bin/bash -l

#SBATCH --job-name=[bidsmreye]
#SBATCH -o log/bidsmreye_%a.txt
#SBATCH -e log/bidsmreye_%a.err
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --cpus-per-task=8
#SBATCH --mem-per-cpu=8G
#SBATCH --account=DBIC
#SBATCH --partition=gpuq
#SBATCH --gres=gpu:2
#SBATCH --time=7-01:00:00
#SBATCH --mail-type=FAIL,END
#SBATCH --requeue
#SBATCH --array=0-11

# Output and error log directories
output_log_dir="log"
error_log_dir="log"

# Create the directories if they don't exist
mkdir -p "$output_log_dir"
mkdir -p "$error_log_dir"

# Must run on a GPU node
module load cuda
module load TensorRT
nvidia-smi
echo $CUDA_VISIBLE_DEVICES
hostname

# bidsmreye requires input fmridata (fmriprep outputs) to be at least realigned
# Filenames and structure that conforms to a BIDS derivative dataset

# Had to add these lines to initialize conda
conda init bash
source ~/.bashrc
conda activate deepmreye

# Check if SLURM_ARRAY_TASK_ID is not set or is empty
if [ -z "$SLURM_ARRAY_TASK_ID" ]; then
    # Set SLURM_ARRAY_TASK_ID to a default value, e.g., 1
    SLURM_ARRAY_TASK_ID=0
fi

bids_dir="/dartfs-hpc/rc/lab/C/CANlab/labdata/data/WASABI/derivatives/fmriprep-try2"
output_dir="/dartfs-hpc/rc/lab/C/CANlab/labdata/data/WASABI/derivatives/deepmreye"
SUBJECTS=(SID000002 SID000743 SID001567 SID001651 SID001804 SID001907 SID001641 SID001684 SID001852 SID002035 SID002263 SID002328)
SUBJ=${SUBJECTS[$SLURM_ARRAY_TASK_ID]}
echo "processing bidsmreye for ${SUBJ}..."

# Preparing the data, then Computing the eye movements (action prepare; action generalize)
# Prepare: registers the data to MNI if this is not the case already, registers the data the the deepmreye template, extracts data from the eyes mask
bidsmreye --action all \
    ${bids_dir} \
    ${output_dir} \
    participant --participant_label ${SUBJ} 
    
# Group Level Summary
bidsmreye --action qc \
    ${bids_dir} \
    ${output_dir} \
    participant --participant_label ${SUBJ} 

echo "processing complete"
@github-actions
Copy link

Thank you for your issue. Give us a little time to review it.

PS. You might want to check the FAQ if you haven't done so already.

This is an automated reply, generated by FAQtory

@Michael-Sun
Copy link

To further clarify this issue, this occurs when using the conda environment installed bidsmreye. The following messages appear before processing begins:

2023-09-18 12:41:18.717612: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-09-18 12:41:25.070354: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT

@Remi-Gau Remi-Gau changed the title very slow start time when paralelizing very slow start time when paralellizing Sep 18, 2023
@Remi-Gau Remi-Gau changed the title very slow start time when paralellizing very slow start time when parallelizing Sep 18, 2023
@Remi-Gau Remi-Gau added this to the 0.4.0 milestone Aug 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants