Skip to content

Integrate MultiQC in your pipeline

Quality control in neuroimaging is an essential step to ensure good and reproducible results, but it is often painful and requires a lot of manual manipulation of final results. Pipelines developed within the nf-core or nf-neuro ecosystem have access to a powerful tool to aggregate processing results within interactive html reports: MultiQC. This integration allows for an easy evaluation of key processing steps for every subjects and/or a more global evaluation (population-level) using quantitative metrics. In this tutorial, you will see an overview of how MultiQC can integrate your data into a clean report, and how to implement it in your own pipeline.

Prerequisites

You need to work on a pipeline that follows the nf-core conventions and have access to the nf-core tools commands.

For the purpose of this walkthrough, we will reused the completed pipeline from the tutorial Your pipeline from A to Z.

Install MultiQC module

nf-core have already developed a dedicated module that runs MultiQC and create the html report containing your data. However, we first need to install it within your pipeline, to do so, run the following command:

Terminal window
nf-core modules --git-remote https://github.com/nf-core/modules.git -b master install multiqc

Import MultiQC in your pipeline

Once your module is installed, you need to import it within the main.nf file of your pipeline.

include { MULTIQC } from "./modules/nf-core/multiqc/main"

1. Modify the MultiQC inputs

MultiQC was originally designed to perform quality control on genomic data on a population level. In the neuroimaging field, quality control is mostly done on a per-subject basis (with sometimes a general report summarizing quantitative data across the studied population). In this walkthrough, we will go over the two methods to create:

  • A subject-specific report allowing QC of key processing steps using figures (e.g., registration, segmentation, etc.).
  • A global report for quantitative metrics across a population.
  • Creating the global report is natively supported by MultiQC. However, we will need to tweak the existing inputs to create the subject-specific reports. Check the modifications needed for the main.nf in ./modules/nf-core/multiqc/:

    input:
    path multiqc_files, stageAs: "?/*" // Contains the files to extract data from.
    path(multiqc_config) // Configuration file for MultiQC.
    path(extra_multiqc_config) // Additional configuration file for MultiQC.
    path(multiqc_logo) // Custom logo to include in report.
    path(replace_names) // Option to replace some samples, not relevant to our case.
    path(sample_names) // Option to change sample name.

    2. Prefix your report with a subject specific tag

    Now that we have a meta field that identifies our subjects, let’s use it to name our resulting report. To do that, we need to modify how the prefix variable is defined at the beginning of the script section. Copy this new line and replace the existing def prefix... in your MultiQC module (main.nf).

    def prefix = task.ext.prefix ? "--filename ${task.ext.prefix}.html" : ''

    3. Change MultiQC container

    We also want to change the container to use the official docker container for MultiQc.

    container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ?
    'https://depot.galaxyproject.org/singularity/multiqc:1.27--pyhdfd78af_0' :
    'biocontainers/multiqc:1.27--pyhdfd78af_0' }"

    4. Add a label to the process execution

    Finally, we want to tag each execution of the module using this meta.id tag. To do so, let’s add a tag "$meta.id" at the top of the process definition (main.nf).

    process MULTIQC {
    label 'process_single'
    <...>
    }

    Your MultiQC module is now up-and-running for our neuroimaging needs!

    Create a subject MultiQC report

    Preferably, QC files (either images or tabular-like files) should be generated within your module and enabled using a flag that can be passed using the task.ext arguments. In the tutorial pipeline, we are performing simple operations to obtain FA values within gray matter (GM), white matter (WM), and cerebrospinal fluid (CSF) masks. To ensure we obtain good results, we should look at the FA map and validate they do not contain any artifact. Luckily, the RECONST_DTIMETRICS module already includes a QC section, producing screenshots of the generated metrics map. We will leverage this to create our first section of our individual-level MultiQC report.

    1. Collect QC files

    1. To collect the QC files during pipeline execution we need to create an empty channel at the beginning of the workflow file. This empty channel will be populated when the modules output QC files. Add this line at the beginning of the workflow section in your ./main.nf file.

      ch_multiqc_files = Channel.empty()
    2. Since RECONST_DTIMETRICS is nested within the PREPROC_DIFF subworkflow, we also need to collect the files within the subworkflow to make them accessible in the main workflow. Identical to the step above, define an empty channel at the top of your PREPROC_DIFF ./subworkflows/local/preproc_diff/main.nf, right below the main: section.

      ch_multiqc_files = Channel.empty()
    3. Once all of our empty channels are defined, we can start populating them! If you look at the outputs of RECONST_DTIMETRICS (either in the API documentation or directly in the main.nf), you will see an emitted channel called mqc. This channel contains the QC files generated within the module. Let’s append this channel to our previously empty channel.

      // DTI-derived metrics
      RECONST_DTIMETRICS( input_dti )
    4. We now need to output this ch_multiqc_files from the subworkflow to make it accessible on the workflow level. We will add a new output in the emit: section of the PREPROC_DIFF subworkflow.

      emit:
      dwi = ch_dwi_bvalbvec.dwi // channel: [ val(meta), dwi-raw ]
      dwi_denoised = DENOISING_MPPCA.out.image // channel: [ val(meta), dwi-after-mppca ]
      bvs_files = ch_dwi_bvalbvec.bvs_files // channel: [ val(meta), bval, bvec ]
      fa = RECONST_DTIMETRICS.out.fa // channel: [ val(meta), fa ]
      md = RECONST_DTIMETRICS.out.md // channel: [ val(meta), md ]
    5. We have now setup the structure to collect QC files and deliver them to the main workflow file. However, generating the QC files within RECONST_DTIMETRICS is optional (as it is for all modules that have QC within nf-neuro). To actually produce the QC files during runtime, we need to change the ext.qc argument to true. In the nextflow.config file, locate the ext.qc argument, and replace false with true. We will also enable some of the other metrics to create a more meaningful QC report.

      withName: "RECONST_DTIMETRICS" {
      ext.ad = false
      ext.evecs = false
      ext.evals = false
      ext.fa = true
      ext.ga = false
      ext.rgb = false
      ext.md = true
      ext.mode = false
      ext.norm = false
      ext.rd = false
      ext.tensor = false
      ext.nonphysical = false
      ext.pulsation = false
      ext.residual = false
      ext.b0_thr_extract_b0 = 10
      ext.dwi_shell_tolerance = 50
      ext.max_dti_shell_value = 1200
      ext.run_qc = false
      }
    6. Final step before being ready to supply those files to the MultiQC module is to append the QC files from the PREPROC_DIFF subworkflow to the ch_multiqc_files in the main workflow (./main.nf). We will use the same approach as step 3 above.

      //Processing DWI
      PREPROC_DIFF( inputs.dwi )

    2. Supply QC files to MultiQC

    Now that the QC files are accessible in the main workflow, we can supply those files to the MultiQC module. However, we need to format the channel to make it in a way the module expects it to be. The following steps will go through this formatting and include the module in the main workflow.

    1. In the previous section, we appended our QC files to a single channel using the .mix Nextflow operator. This operator works similarly to how we append items to a list in python and returns a single list of files. When we have multiple subjects, this means the files from a single subject are not necessarily grouped together. Since we do not want to mix up our subjects in our MultiQC report, we will use the .groupTuple operator. Since all of our files are tagged with metadata (under the meta value), this will group every files with identical metadata, ensuring we do not have mixed up reports. Finally, as we saw in the MultiQC inputs section, the module expect a tuple val(meta), path(files). To make sure we respect this input definition, we will flatten the files into a single list. Add the following lines at the end of our main workflow definition:

      ch_multiqc_files = ch_multiqc_files
      .groupTuple()
      .map { meta, files_list ->
      def files = files_list.flatten().findAll { it != null }
      return tuple(meta, files)
      }
    2. Once the channel is properly reformatted, we can safely set it as input to our MultiQC module. Since we already imported the module at the top of the ./main.nf file, we can simply call it, and fill the MultiQC inputs with empty list if we do not have them.

      MULTIQC(
      ch_multiqc_files,
      [],
      [],
      [],
      [],
      [],
      []
      )
    3. Now, you can run nextflow..

      Terminal window
      nextflow run main.nf --input data -profile docker -resume

      Your resulting MultiQC report should look similar to this:

      Raw MultiQC Report

      As you can see, there isn’t much in the report for now other than our generated screenshot. Hopefully, we can customize this section to add proper heading, description, etc.

    3. Customize your MultiQC report

    We have now learned how to collect QC files, supply them to MultiQC, and generate a per-subject report. However, our report is not really informative as it does not contain any heading or description for our QC image. This can be solved using a custom MultiQC config file. This section will provide detailed instructions to create your own config file.

    1. Create a ./assets/multiqc_config.yml file.

      Terminal window
      mkdir assets
      touch assets/multiqc_config.yml
    2. Let’s start by adding a general comment to describe the origin of the MultiQC report. This is done by adding a report_comment section at the top of the config file.

      report_comment: >
      This report has been generated by the nf-neuro tutorial!
    3. Then, we can define the order of our future section, starting with a dti_qc section that will contain our FA QC image.

      report_section_order:
      dti_qc:
      order: -1001
    4. It is now time to add our custom_data section. This is where we will define the title of our section containing our FA QC image and its description.

      custom_data:
      dti_qc:
      file_format: "png"
      section_name: "DTI QC"
      description: |
      This section contains QC images for diffusion tensor imaging (DTI) metric
      maps. Add specifications regarding how to evaluate those images, for example:
      To assess the quality of the DTI metrics, ensure that FA highlights major white
      matter tracts with expected high values (e.g., corpus callosum, corticospinal tract, etc.)...
      plot_type: "image"
    5. In order for MultiQC to fetch the right files for each section, you need to specify which pattern to use to match your files for this specific section. This can be done using the sp item.

      sp:
      dti_qc:
      fn: "*dti_mqc.png"

      You should now have your complete multiqc_config.yml file! It should look similar to this:

      report_comment: >
      This report has been generated by the nf-neuro tutorial!
      report_section_order:
      dti_qc:
      order: -1001
      custom_data:
      dti_qc:
      file_format: "png"
      section_name: "DTI QC"
      description: |
      This section contains QC images for diffusion tensor imaging (DTI) metric
      maps. Add specifications regarding how to evaluate those images, for example:
      To assess the quality of the DTI metrics, ensure that FA highlights major white
      matter tracts with expected high values (e.g., corpus callosum, corticospinal tract, etc.)...
      plot_type: "image"
      sp:
      dti_qc:
      fn: "*dti_mqc.png"
    6. The final step consists of supplying this QC file to your MultiQC module. To fetch external files, you can use the Channel.fromPath() channel factory combined with the $projectDir variable available in the standard Nextflow library. Let’s fetch the config file and supply it to the MultiQc module. Change your ./main.nf file.

      // MultiQC
      MULTIQC(
      ch_multiqc_files,
      [],
      [],
      [],
      [],
      [],
      []
      )
    7. Now, you can run nextflow..

      Terminal window
      nextflow run main.nf --input data -profile docker -resume

      Your newly generated MultiQC report should now look like this:

      MultiQC Clean Report

      As you can see, the report is now much cleaner and informative!

    Create a global MultiQC report

    As described above, you can also create a single report containing quantitative metrics across all subjects. This is particularly useful to single out outliers needing further quality control. This section will be coming soon!