Running sf-pediatric without internet access

Some computing nodes do not have access to the internet at runtime. Since the pipeline interacts with the container repository and containers are pulled during execution, you will get errors if the compute nodes do not have access to the internet. Fortunately, containers can be downloaded prior to pipeline execution and fetched locally during runtime. To facilitate usage in this specific use case, we provide a download script that handles all container downloads that will be required at runtime.

Downloading the containers

Make sure you have all prerequisites

The script relies on three dependencies that should already be available on your compute cluster:

For example, on Alliance Canada’s clusters, some software comes bundled into modules that you can easily load without having to install them yourself. In this case, you can load the nextflow module by using module load nextflow. Please refer to the documentation for your specific compute cluster.
1. nextflow >= 24.10.5
2. apptainer or singularity
3. jq

Run the download script

Once all prerequisites are loaded and available in your PATH, you can run the following command (if you are on a login node, it is recommended to use maximum -d 2 for parallel download):

Command
Expected Output

curl -fsSL https://raw.githubusercontent.com/nf-neuro/modules/main/assets/download_pipeline.sh | bash -s -- -p scilus/sf-pediatric -r 0.2.2 -d 1 -o ./containers -c ./singularity_cache

Validating required tools...
✓ All required tools are available

=== Nextflow Pipeline Offline Downloader ===
Pipeline: scilus/sf-pediatric
Revision: 0.2.2
Parallel downloads: 1
Container directory: ./containers
Cache directory: ./singularity_cache

Step 1: Pulling Nextflow pipeline...
Checking scilus/sf-pediatric ...
done - revision: 17bb3e73e0 [0.2.2]
✓ Pipeline pulled successfully

Step 2: Inspecting pipeline for container requirements...
✓ Pipeline inspection complete
✓ Found 18 unique container(s)

Step 3: Downloading Apptainer/Singularity containers...
This may take a while depending on container sizes...

[1/18] Downloading: community.wave.seqera.io/library/pip_templateflow:2f726c524c63271e
[1/18] ✓ Downloaded: community.wave.seqera.io-library-pip_templateflow-2f726c524c63271e
[2/18] ✓ Already exists: docker.io-freesurfer-freesurfer-7.4.1
[3/18] ✓ Already exists: docker.io-freesurfer-synthstrip-1.5
[4/18] ✓ Already exists: docker.io-gagnonanthony-multiqc-neuroimaging-latest
[5/18] ✓ Already exists: docker.io-gagnonanthony-neurostatx-0.1.0
[6/18] ✓ Already exists: docker.io-gagnonanthony-nf-pediatric-atlases-1.1.0
[7/18] ✓ Already exists: docker.io-gagnonanthony-nf-pediatric-fastsurfer-v2.3.3
[8/18] ✓ Already exists: docker.io-gagnonanthony-nf-pediatric-freesurfer-8.0.0
[9/18] ✓ Already exists: docker.io-gagnonanthony-nf-pediatric-mcribs-2.1.0
[10/18] ✓ Already exists: docker.io-mrtrix3-mrtrix3-3.0.5
[11/18] ✓ Already exists: docker.io-scilus-scilpy-1.6.0
[12/18] ✓ Already exists: docker.io-scilus-scilpy-2.2.0_cpu
[13/18] ✓ Already exists: docker.io-scilus-scilpy-2.2.1_cpu
[14/18] ✓ Already exists: docker.io-scilus-scilpy-dev
[15/18] ✓ Already exists: docker.io-scilus-scilus-2.0.2
[16/18] ✓ Already exists: docker.io-scilus-scilus-2.1.0
[17/18] ✓ Already exists: docker.io-scilus-scilus-2.2.0
[18/18] ✓ Already exists: docker.io-scilus-scilus-2.2.1

=== Download Complete ===
✓ All containers downloaded successfully
Container location: /lustre10/scratch/agagnon/test_nfcore_download/containers

To use these containers offline, set the following environment variables in your shell:

export NXF_SINGULARITY_CACHEDIR='/lustre10/scratch/agagnon/test_nfcore_download/containers'
export NXF_APPTAINER_CACHEDIR='/lustre10/scratch/agagnon/test_nfcore_download/containers'
export SINGULARITY_CACHEDIR='/lustre10/scratch/agagnon/test_nfcore_download/containers'
export APPTAINER_CACHEDIR='/lustre10/scratch/agagnon/test_nfcore_download/containers'

You can then run your nextflow pipeline offline!
Refer to the pipeline documentation for usage details.

Running sf-pediatric using the SLURM executor

Set environment variables for pipeline execution

Now that all containers are downloaded, we need to tell Nextflow where to look for those containers. This can be done using environment variables. For example, if you run the pipeline in the same directory where you downloaded the containers, you should set these environment variables:
- Command
Terminal window
export SINGULARITY_CACHEDIR="./singularity_cache" export APPTAINER_CACHEDIR="./singularity_cache" export NXF_SINGULARITY_CACHEDIR="./containers" export NXF_APPTAINER_CACHEDIR="./containers"
In the code snippet above, we set the environment variables based on the default values of the download script. If you changed those values or moved the container folder, you should update those environment variables to point to the correct folder. Otherwise, the nextflow execution won’t find the containers and error out.
Launching sf-pediatric using the SLURM scheduler

One of the great advantage of nextflow is that it integrates natively with executors such as SLURM. This means that it can handle job submission for you, by submitting a single job per nextflow process. We currently have a designated profile for that, using -profile slurm,apptainer. However, since nextflow will handle all job submissions for you, it needs a few information prior to running the pipeline.

First, since nextflow will interact for you with the SLURM scheduler, you can launch the pipeline directly on the login node using a terminal multiplexer (tmux/screen). The terminal multiplexer will enable the pipeline to keep running even if you disconnect from your server.

To open a tmux session, simply run tmux in the desired location. This will open a new session, in which you will run the pipeline as described below. To exit the session without killing it, use the CTRL-B + D shortcut. This will close the tmux session, but it will keep running the pipeline for you. To reattach to this session, use tmux a. For more information on how to use tmux, you can refer to this cheat sheet or this introduction.

Second, inside your tmux session, make sure you set your environment variables for your containers as described in section 3 above. Third, you need to set your allocation account to be used for submitting jobs. This can be done using environment variables:
- Command
Terminal window
export SLURM_ACCOUNT=<account_name> # Replace the <account_name> with the name of the allocation to use. export SBATCH_ACCOUNT=$SLURM_ACCOUNT export SALLOC_ACCOUNT=$SLURM_ACCOUNT
Finally, you can launch the pipeline (see the Running the pipeline section for more details) using a simple call to nextflow:
- Command
Terminal window
nextflow run scilus/sf-pediatric -r 0.2.2 \ --input <input_folder> \ # Change this. --outdir <output_folder> \ # Change this. -profile ...,slurm,apptainer \ # Add any processing profiles you want to run. -resume
Alternatively, you can also add everything in a bash script, ensuring you are loading all the required modules and setting the required environment variables. Here’s an example:
- cmd_sfpediatric.sh
Terminal window
#/usr/bin/env bash # Load the required modules. module load nextflow apptainer # Some CLI arguments for when you launch the script. # Setting input & output folder as cli args. input=$1 output=$2 account=$3 container_dir=$4 # Variables for containers, etc. export NXF_APPTAINER_CACHEDIR=$4 export SLURM_ACCOUNT=$3 export SBATCH_ACCOUNT=$SLURM_ACCOUNT export SALLOC_ACCOUNT=$SLURM_ACCOUNT # Call for the pipeline execution. nextflow run scilus/sf-pediatric -r 0.2.2 \ --input $input \ --outdir $output \ -profile ...,apptainer,slurm \ # Add any processing profiles you want to run. -resume
You can then run this bash script inside your tmux session using a simple one liner (replace each parameter with the one that fits your file structure):
- Command
Terminal window
bash cmd_sfpediatric.sh <path/to/input_data> <path/to/output_data> <slurm_account> <path/to/container_folder>

Running sf-pediatric within a single SBATCH job

Some clusters do not allow Nextflow pipelines to run on login nodes (even though the login node only handles job scheduling and orchestration, not the actual data processing). In this case, you should run the pipeline within a single SBATCH job. Here is an example, in which we request a single node with 40 CPUs.

cmd_sfpediatric_sbatch.sh

#!/bin/sh
#SBATCH --mail-user=<email>     # Change this.
#SBATCH --mail-type=ALL

#SBATCH --account=<account>     # Change this.
#SBATCH --nodes=1
#SBATCH --cpus-per-task=40
#SBATCH --mem=128G              # Please refer to your compute nodes characteristics.
#SBATCH --time=48:00:00         # Might need adjusting the time depending on the number of subjects.

# Load the required modules.
module load nextflow apptainer

# Variables for containers, etc.
export NXF_APPTAINER_CACHEDIR=<path/to/container_folder>    # Change this.

# Call for the pipeline execution.
nextflow run scilus/sf-pediatric -r 0.2.2 \
    --input <path/to/input_folder> \    # Change this.
    --outdir <path/to/output_folder> \  # Change this.
    -profile ...,apptainer,slurm \      # Add any processing profiles you want to run.
    -resume

You can then submit your job by using the sbatch command.

Command

sbatch cmd_sfpediatric_sbatch.sh

Downloading TemplateFlow templates

sf-pediatric allows users to select a specific output space in which all subjects will be registered. These templates are fetched using the TemplateFlow Archive (Ciric et al, 2022). While downloading is directly handled in sf-pediatric when internet is available, for offline use, the selected template (specified using the --template parameter) needs to be available. You can refer to the official TemplateFlow documentation for information on how to download the archive using either datalad or the Python client.