Skip to content

Part 1-2 Visualize and create input data

Part 1. Visualize input data

Open the main.nf file in VS Code. This file is pre-filled with a workflow named get_data, which is responsible for fetching input files from a specified directory. This step serves as a generic data-loading process commonly used at the start of a pipeline.

A key concept here is the use of Channel, which enables efficient, asynchronous data flow. The fromFilePairs() method is particularly useful for handling paired-end sequencing data, but in this case, it helps group related files.

To run the Nextflow pipeline, use the following command:

Terminal window
nextflow run main.nf --input data -profile docker

Part 2: Create input structure

1. Update data structure

Now, let’s modify the get_data and the main workflow to fetch the test data. Replace the existing main.nf file with the following.

main.nf
#!/usr/bin/env nextflow
workflow get_data {
main:
if ( !params.input ) {
log.info "You must provide an input directory containing all images using:"
log.info ""
log.info " --input=/path/to/[input] Input directory containing your subjects"
log.info " |"
log.info " ├-- S1"
log.info " | └-- ses-01"
log.info " | | ├-- anat"
log.info " | | | |--*t1.nii.gz"
log.info " | | |--dwi"
log.info " | | | |--*dwi.nii.gz"
log.info " | | | ├-- *dwi.bval"
log.info " | | | └-- *dwi.bvec"
log.info " | └-- ses-02"
log.info " └-- S2"
log.info " └-- ses-01"
log.info " | ├-- anat"
log.info " | | |--*t1.nii.gz"
log.info " | |--dwi"
log.info " | | |--*dwi.nii.gz"
log.info " | | ├-- *dwi.bval"
log.info " | | └-- *dwi.bvec"
log.info " └-- ses-02"
log.info ""
error "Please resubmit your command with the previous file structure."
}
input = file(params.input)
// ** Loading all files. ** //
dwi_channel = Channel.fromFilePairs("$input/**/dwi/*dwi.{nii.gz,bval,bvec}", size: 3, flat: true)
emit:
dwi = dwi_channel
}
workflow {
// ** Now call your input workflow to fetch your files ** //
data = get_data()
data.dwi.view() // Contains your DWI data: [meta, dwi, bval, bvec]
}

Now, you can run nextflow..

Terminal window
nextflow run main.nf --input data -profile docker

Each element in the output channel is a tuple containing:

  • A unique key identifier (subject/session)
  • The matching .bval file
  • The matching .bvec file
  • The matching .nii.gz file (DWI image)
  • And following this format :

    Terminal window
    [ subject_session_id,
    /path/to/subject/session/dwi/*dwi.bval,
    /path/to/subject/session/dwi/*dwi.bvec ,
    /path/to/subject/session/dwi/*dwi.nii.gz]

    2. Set correctly the Subject and session ID

    Now let’s modify the input structure to make the key identifier sub-003_ses-01_dir-AP become sub-003_ses-01. We still use the current structure, but with an additional item mapping using it. Check the Before and After sections below to see the needed modification.

    dwi_channel = Channel.fromFilePairs("$input/**/dwi/*dwi.{nii.gz,bval,bvec}", size: 3, flat: true);

    Now, you can run nextflow..

    Terminal window
    nextflow run main.nf --input data -profile docker

    3. Organizing Data for Processing

    By default, files are sorted alphabetically, so you need to reorder them to get a specific file order. To do this, you use the map function and change the main.nf as follows:

    dwi_channel = Channel.fromFilePairs("$input/**/dwi/*dwi.{nii.gz,bval,bvec}", size: 3, flat: true)
    { it.parent.parent.parent.name + "_" + it.parent.parent.name}`;

    Now, you can run nextflow..

    Terminal window
    nextflow run main.nf --input data -profile docker

    Now, your input pipeline data is well-structured, facilitating seamless processing in subsequent pipeline stages. Each dataset includes a clearly labeled subject ID and session, along with all necessary files for DWI processing — such as the DWI file, b-values, and b-vectors.