Part 1-2 Visualize and create input data
Part 1. Visualize input data
Open the main.nf
file in VS Code. This file is pre-filled with a workflow named get_data
,
which is responsible for fetching input files from a specified directory.
This step serves as a generic data-loading process commonly used at the start of a pipeline.
A key concept here is the use of Channel
, which enables efficient, asynchronous data flow.
The fromFilePairs()
method is particularly useful for handling paired-end sequencing data,
but in this case, it helps group related files.
To run the Nextflow pipeline, use the following command:
nextflow run main.nf --input data -profile docker
[participants_files, /workspaces/nf-neuro-tutorial_test/data/participants.json, /workspaces/nf-neuro-tutorial_test/data/participants.tsv]
Part 2: Create input structure
1. Update data structure
Now, let’s modify the get_data
and the main workflow to fetch the test data.
Replace the existing main.nf
file with the following.
#!/usr/bin/env nextflow
workflow get_data { main: if ( !params.input ) { log.info "You must provide an input directory containing all images using:" log.info "" log.info " --input=/path/to/[input] Input directory containing your subjects" log.info " |" log.info " ├-- S1" log.info " | └-- ses-01" log.info " | | ├-- anat" log.info " | | | |--*t1.nii.gz" log.info " | | |--dwi" log.info " | | | |--*dwi.nii.gz" log.info " | | | ├-- *dwi.bval" log.info " | | | └-- *dwi.bvec" log.info " | └-- ses-02" log.info " └-- S2" log.info " └-- ses-01" log.info " | ├-- anat" log.info " | | |--*t1.nii.gz" log.info " | |--dwi" log.info " | | |--*dwi.nii.gz" log.info " | | ├-- *dwi.bval" log.info " | | └-- *dwi.bvec" log.info " └-- ses-02" log.info "" error "Please resubmit your command with the previous file structure." }
input = file(params.input) // ** Loading all files. ** // dwi_channel = Channel.fromFilePairs("$input/**/dwi/*dwi.{nii.gz,bval,bvec}", size: 3, flat: true)
emit: dwi = dwi_channel }
workflow { // ** Now call your input workflow to fetch your files ** // data = get_data() data.dwi.view() // Contains your DWI data: [meta, dwi, bval, bvec] }
Now, you can run nextflow..
nextflow run main.nf --input data -profile docker
[sub-003_ses-01_dir-AP, /workspaces/nf-neuro-tutorial_test/data/sub-003/ses-01/dwi/sub-003_ses-01_dir-AP_dwi.bval, /workspaces/nf-neuro-tutorial_test/data/sub-003/ses-01/dwi/sub-003_ses-01_dir-AP_dwi.bvec, /workspaces/nf-neuro-tutorial_test/data/sub-003/ses-01/dwi/sub-003_ses-01_dir-AP_dwi.nii.gz]
Each element in the output channel is a tuple containing:
.bval
file.bvec
file.nii.gz
file (DWI image)And following this format :
[ subject_session_id, /path/to/subject/session/dwi/*dwi.bval, /path/to/subject/session/dwi/*dwi.bvec , /path/to/subject/session/dwi/*dwi.nii.gz]
2. Set correctly the Subject and session ID
Now let’s modify the input structure to make the key identifier sub-003_ses-01_dir-AP
become sub-003_ses-01
.
We still use the current structure, but with an additional item mapping using it
. Check the Before and After
sections below to see the needed modification.
dwi_channel = Channel.fromFilePairs("$input/**/dwi/*dwi.{nii.gz,bval,bvec}", size: 3, flat: true);
dwi_channel = Channel.fromFilePairs("$input/**/dwi/*dwi.{nii.gz,bval,bvec}", size: 3, flat: true) { it.parent.parent.parent.name + "_" + it.parent.parent.name};
Now, you can run nextflow..
nextflow run main.nf --input data -profile docker
[sub-003_ses-01, /workspaces/nf-neuro-tutorial_test/data/sub-003/ses-01/dwi/sub-003_ses-01_dir-AP_dwi.bval, /workspaces/nf-neuro-tutorial_test/data/sub-003/ses-01/dwi/sub-003_ses-01_dir-AP_dwi.bvec, /workspaces/nf-neuro-tutorial_test/data/sub-003/ses-01/dwi/sub-003_ses-01_dir-AP_dwi.nii.gz]
3. Organizing Data for Processing
By default, files are sorted alphabetically, so you need to reorder them to get a specific file order.
To do this, you use the map
function and change the main.nf as follows:
dwi_channel = Channel.fromFilePairs("$input/**/dwi/*dwi.{nii.gz,bval,bvec}", size: 3, flat: true) { it.parent.parent.parent.name + "_" + it.parent.parent.name}`;
dwi_channel = Channel.fromFilePairs("$input/**/dwi/*dwi.{nii.gz,bval,bvec}", size: 3, flat: true) { it.parent.parent.parent.name + "_" + it.parent.parent.name} .map{ sid, bvals, bvecs, dwi -> [ [id: sid], dwi, bvals, bvecs ] } // Reordering the inputs.
Now, you can run nextflow..
nextflow run main.nf --input data -profile docker
[[id:sub-003_ses-01], /workspaces/nf-neuro-tutorial_test/data/sub-003/ses-01/dwi/sub-003_ses-01_dir-AP_dwi.nii.gz, /workspaces/nf-neuro-tutorial_test/data/sub-003/ses-01/dwi/sub-003_ses-01_dir-AP_dwi.bval, /workspaces/nf-neuro-tutorial_test/data/sub-003/ses-01/dwi/sub-003_ses-01_dir-AP_dwi.bvec]
Now, your input pipeline data is well-structured, facilitating seamless processing in subsequent pipeline stages. Each dataset includes a clearly labeled subject ID and session, along with all necessary files for DWI processing — such as the DWI file, b-values, and b-vectors.