Skip to content

Running sf-pediatric without internet access

This content is for v0.2.2. Switch to the latest version for up-to-date documentation.

Some computing nodes do not have access to the internet at runtime. Since the pipeline interacts with the container repository and containers are pulled during execution, you will get errors if the compute nodes do not have access to the internet. Fortunately, containers can be downloaded prior to pipeline execution and fetched locally during runtime. To facilitate usage in this specific use case, we provide a download script that handles all container downloads that will be required at runtime.

  1. Make sure you have all prerequisites

    The script relies on three dependencies that should already be available on your compute cluster:

    1. nextflow >= 24.10.5

    2. apptainer or singularity

    3. jq

  2. Run the download script

    Once all prerequisites are loaded and available in your PATH, you can run the following command (if you are on a login node, it is recommended to use maximum -d 2 for parallel download):

    Terminal window
    curl -fsSL https://raw.githubusercontent.com/nf-neuro/modules/main/assets/download_pipeline.sh | bash -s -- -p scilus/sf-pediatric -r 0.2.2 -d 1 -o ./containers -c ./singularity_cache

Running sf-pediatric using the SLURM executor

Section titled “Running sf-pediatric using the SLURM executor”
  1. Set environment variables for pipeline execution

    Now that all containers are downloaded, we need to tell Nextflow where to look for those containers. This can be done using environment variables. For example, if you run the pipeline in the same directory where you downloaded the containers, you should set these environment variables:

    Terminal window
    export SINGULARITY_CACHEDIR="./singularity_cache"
    export APPTAINER_CACHEDIR="./singularity_cache"
    export NXF_SINGULARITY_CACHEDIR="./containers"
    export NXF_APPTAINER_CACHEDIR="./containers"
  2. Launching sf-pediatric using the SLURM scheduler

    One of the great advantage of nextflow is that it integrates natively with executors such as SLURM. This means that it can handle job submission for you, by submitting a single job per nextflow process. We currently have a designated profile for that, using -profile slurm,apptainer. However, since nextflow will handle all job submissions for you, it needs a few information prior to running the pipeline.

    First, since nextflow will interact for you with the SLURM scheduler, you can launch the pipeline directly on the login node using a terminal multiplexer (tmux/screen). The terminal multiplexer will enable the pipeline to keep running even if you disconnect from your server.

    Second, inside your tmux session, make sure you set your environment variables for your containers as described in section 3 above. Third, you need to set your allocation account to be used for submitting jobs. This can be done using environment variables:

    Terminal window
    export SLURM_ACCOUNT=<account_name> # Replace the <account_name> with the name of the allocation to use.
    export SBATCH_ACCOUNT=$SLURM_ACCOUNT
    export SALLOC_ACCOUNT=$SLURM_ACCOUNT

    Finally, you can launch the pipeline (see the Running the pipeline section for more details) using a simple call to nextflow:

    Terminal window
    nextflow run scilus/sf-pediatric -r 0.2.2 \
    --input <input_folder> \ # Change this.
    --outdir <output_folder> \ # Change this.
    -profile ...,slurm,apptainer \ # Add any processing profiles you want to run.
    -resume

    Alternatively, you can also add everything in a bash script, ensuring you are loading all the required modules and setting the required environment variables. Here’s an example:

    Terminal window
    #/usr/bin/env bash
    # Load the required modules.
    module load nextflow apptainer
    # Some CLI arguments for when you launch the script.
    # Setting input & output folder as cli args.
    input=$1
    output=$2
    account=$3
    container_dir=$4
    # Variables for containers, etc.
    export NXF_APPTAINER_CACHEDIR=$4
    export SLURM_ACCOUNT=$3
    export SBATCH_ACCOUNT=$SLURM_ACCOUNT
    export SALLOC_ACCOUNT=$SLURM_ACCOUNT
    # Call for the pipeline execution.
    nextflow run scilus/sf-pediatric -r 0.2.2 \
    --input $input \
    --outdir $output \
    -profile ...,apptainer,slurm \ # Add any processing profiles you want to run.
    -resume

    You can then run this bash script inside your tmux session using a simple one liner (replace each parameter with the one that fits your file structure):

    Terminal window
    bash cmd_sfpediatric.sh <path/to/input_data> <path/to/output_data> <slurm_account> <path/to/container_folder>

Running sf-pediatric within a single SBATCH job

Section titled “Running sf-pediatric within a single SBATCH job”

Some clusters do not allow Nextflow pipelines to run on login nodes (even though the login node only handles job scheduling and orchestration, not the actual data processing). In this case, you should run the pipeline within a single SBATCH job. Here is an example, in which we request a single node with 40 CPUs.

#!/bin/sh
#SBATCH --mail-user=<email> # Change this.
#SBATCH --mail-type=ALL
#SBATCH --account=<account> # Change this.
#SBATCH --nodes=1
#SBATCH --cpus-per-task=40
#SBATCH --mem=128G # Please refer to your compute nodes characteristics.
#SBATCH --time=48:00:00 # Might need adjusting the time depending on the number of subjects.
# Load the required modules.
module load nextflow apptainer
# Variables for containers, etc.
export NXF_APPTAINER_CACHEDIR=<path/to/container_folder> # Change this.
# Call for the pipeline execution.
nextflow run scilus/sf-pediatric -r 0.2.2 \
--input <path/to/input_folder> \ # Change this.
--outdir <path/to/output_folder> \ # Change this.
-profile ...,apptainer,slurm \ # Add any processing profiles you want to run.
-resume

You can then submit your job by using the sbatch command.

Terminal window
sbatch cmd_sfpediatric_sbatch.sh

sf-pediatric allows users to select a specific output space in which all subjects will be registered. These templates are fetched using the TemplateFlow Archive (Ciric et al, 2022). While downloading is directly handled in sf-pediatric when internet is available, for offline use, the selected template (specified using the --template parameter) needs to be available. You can refer to the official TemplateFlow documentation for information on how to download the archive using either datalad or the Python client.