Running sf-pediatric without internet access
Some computing nodes do not have access to the internet at runtime. Since the pipeline interacts with the container repository and containers are pulled during execution, you will get errors if the compute nodes do not have access to the internet. Fortunately, containers can be downloaded prior to pipeline execution and fetched locally during runtime. To facilitate usage in this specific use case, we provide a download script that handles all container downloads that will be required at runtime.
Downloading the containers
Section titled “Downloading the containers”-
Make sure you have all prerequisites
The script relies on three dependencies that should already be available on your compute cluster:
-
nextflow >= 24.10.5 -
apptainerorsingularity -
jq
-
-
Run the download script
Once all prerequisites are loaded and available in your
PATH, you can run the following command (if you are on a login node, it is recommended to use maximum-d 2for parallel download):Terminal window curl -fsSL https://raw.githubusercontent.com/nf-neuro/modules/main/assets/download_pipeline.sh | bash -s -- -p scilus/sf-pediatric -r 0.2.2 -d 1 -o ./containers -c ./singularity_cacheTerminal window Validating required tools...✓ All required tools are available=== Nextflow Pipeline Offline Downloader ===Pipeline: scilus/sf-pediatricRevision: 0.2.2Parallel downloads: 1Container directory: ./containersCache directory: ./singularity_cacheStep 1: Pulling Nextflow pipeline...Checking scilus/sf-pediatric ...done - revision: 17bb3e73e0 [0.2.2]✓ Pipeline pulled successfullyStep 2: Inspecting pipeline for container requirements...✓ Pipeline inspection complete✓ Found 18 unique container(s)Step 3: Downloading Apptainer/Singularity containers...This may take a while depending on container sizes...[1/18] Downloading: community.wave.seqera.io/library/pip_templateflow:2f726c524c63271e[1/18] ✓ Downloaded: community.wave.seqera.io-library-pip_templateflow-2f726c524c63271e[2/18] ✓ Already exists: docker.io-freesurfer-freesurfer-7.4.1[3/18] ✓ Already exists: docker.io-freesurfer-synthstrip-1.5[4/18] ✓ Already exists: docker.io-gagnonanthony-multiqc-neuroimaging-latest[5/18] ✓ Already exists: docker.io-gagnonanthony-neurostatx-0.1.0[6/18] ✓ Already exists: docker.io-gagnonanthony-nf-pediatric-atlases-1.1.0[7/18] ✓ Already exists: docker.io-gagnonanthony-nf-pediatric-fastsurfer-v2.3.3[8/18] ✓ Already exists: docker.io-gagnonanthony-nf-pediatric-freesurfer-8.0.0[9/18] ✓ Already exists: docker.io-gagnonanthony-nf-pediatric-mcribs-2.1.0[10/18] ✓ Already exists: docker.io-mrtrix3-mrtrix3-3.0.5[11/18] ✓ Already exists: docker.io-scilus-scilpy-1.6.0[12/18] ✓ Already exists: docker.io-scilus-scilpy-2.2.0_cpu[13/18] ✓ Already exists: docker.io-scilus-scilpy-2.2.1_cpu[14/18] ✓ Already exists: docker.io-scilus-scilpy-dev[15/18] ✓ Already exists: docker.io-scilus-scilus-2.0.2[16/18] ✓ Already exists: docker.io-scilus-scilus-2.1.0[17/18] ✓ Already exists: docker.io-scilus-scilus-2.2.0[18/18] ✓ Already exists: docker.io-scilus-scilus-2.2.1=== Download Complete ===✓ All containers downloaded successfullyContainer location: /lustre10/scratch/agagnon/test_nfcore_download/containersTo use these containers offline, set the following environment variables in your shell:export NXF_SINGULARITY_CACHEDIR='/lustre10/scratch/agagnon/test_nfcore_download/containers'export NXF_APPTAINER_CACHEDIR='/lustre10/scratch/agagnon/test_nfcore_download/containers'export SINGULARITY_CACHEDIR='/lustre10/scratch/agagnon/test_nfcore_download/containers'export APPTAINER_CACHEDIR='/lustre10/scratch/agagnon/test_nfcore_download/containers'You can then run your nextflow pipeline offline!Refer to the pipeline documentation for usage details.
Running sf-pediatric using the SLURM executor
Section titled “Running sf-pediatric using the SLURM executor”-
Set environment variables for pipeline execution
Now that all containers are downloaded, we need to tell Nextflow where to look for those containers. This can be done using environment variables. For example, if you run the pipeline in the same directory where you downloaded the containers, you should set these environment variables:
Terminal window export SINGULARITY_CACHEDIR="./singularity_cache"export APPTAINER_CACHEDIR="./singularity_cache"export NXF_SINGULARITY_CACHEDIR="./containers"export NXF_APPTAINER_CACHEDIR="./containers" -
Launching
sf-pediatricusing the SLURM schedulerOne of the great advantage of
nextflowis that it integrates natively with executors such as SLURM. This means that it can handle job submission for you, by submitting a single job pernextflowprocess. We currently have a designated profile for that, using-profile slurm,apptainer. However, sincenextflowwill handle all job submissions for you, it needs a few information prior to running the pipeline.First, since
nextflowwill interact for you with the SLURM scheduler, you can launch the pipeline directly on the login node using a terminal multiplexer (tmux/screen). The terminal multiplexer will enable the pipeline to keep running even if you disconnect from your server.Second, inside your
tmuxsession, make sure you set your environment variables for your containers as described in section 3 above. Third, you need to set your allocation account to be used for submitting jobs. This can be done using environment variables:Terminal window export SLURM_ACCOUNT=<account_name> # Replace the <account_name> with the name of the allocation to use.export SBATCH_ACCOUNT=$SLURM_ACCOUNTexport SALLOC_ACCOUNT=$SLURM_ACCOUNTFinally, you can launch the pipeline (see the Running the pipeline section for more details) using a simple call to nextflow:
Terminal window nextflow run scilus/sf-pediatric -r 0.2.2 \--input <input_folder> \ # Change this.--outdir <output_folder> \ # Change this.-profile ...,slurm,apptainer \ # Add any processing profiles you want to run.-resumeAlternatively, you can also add everything in a bash script, ensuring you are loading all the required modules and setting the required environment variables. Here’s an example:
Terminal window #/usr/bin/env bash# Load the required modules.module load nextflow apptainer# Some CLI arguments for when you launch the script.# Setting input & output folder as cli args.input=$1output=$2account=$3container_dir=$4# Variables for containers, etc.export NXF_APPTAINER_CACHEDIR=$4export SLURM_ACCOUNT=$3export SBATCH_ACCOUNT=$SLURM_ACCOUNTexport SALLOC_ACCOUNT=$SLURM_ACCOUNT# Call for the pipeline execution.nextflow run scilus/sf-pediatric -r 0.2.2 \--input $input \--outdir $output \-profile ...,apptainer,slurm \ # Add any processing profiles you want to run.-resumeYou can then run this bash script inside your
tmuxsession using a simple one liner (replace each parameter with the one that fits your file structure):Terminal window bash cmd_sfpediatric.sh <path/to/input_data> <path/to/output_data> <slurm_account> <path/to/container_folder>
Running sf-pediatric within a single SBATCH job
Section titled “Running sf-pediatric within a single SBATCH job”Some clusters do not allow Nextflow pipelines to run on login nodes (even though the login node only handles job scheduling and orchestration, not the actual data processing). In this case, you should run the pipeline within a single SBATCH job. Here is an example, in which we request a single node with 40 CPUs.
#!/bin/sh#SBATCH --mail-user=<email> # Change this.#SBATCH --mail-type=ALL
#SBATCH --account=<account> # Change this.#SBATCH --nodes=1#SBATCH --cpus-per-task=40#SBATCH --mem=128G # Please refer to your compute nodes characteristics.#SBATCH --time=48:00:00 # Might need adjusting the time depending on the number of subjects.
# Load the required modules.module load nextflow apptainer
# Variables for containers, etc.export NXF_APPTAINER_CACHEDIR=<path/to/container_folder> # Change this.
# Call for the pipeline execution.nextflow run scilus/sf-pediatric -r 0.2.2 \ --input <path/to/input_folder> \ # Change this. --outdir <path/to/output_folder> \ # Change this. -profile ...,apptainer,slurm \ # Add any processing profiles you want to run. -resumeYou can then submit your job by using the sbatch command.
sbatch cmd_sfpediatric_sbatch.shDownloading TemplateFlow templates
Section titled “Downloading TemplateFlow templates”sf-pediatric allows users to select a specific output space in which all subjects will be registered. These templates are fetched using the TemplateFlow Archive (Ciric et al, 2022). While downloading is directly handled in sf-pediatric when internet is available, for offline use, the selected template (specified using the --template parameter) needs to be available. You can refer to the official TemplateFlow documentation for information on how to download the archive using either datalad or the Python client.