Test data infrastructure

nf-neuro provides an infrastructure to host and distribute test data, freely available to all contributors. This infrastructure is composed of three elements :

A webserver hosting data packages, versioned using the DVC framework.
A Nextflow subworkflow, LOAD_TEST_DATA, that downloads, caches and unpacks those data packages.
A VS Code extension, Test Data Explorer, to browse test data packages, inspect their content and download them.

Test Data Explorer extension

[If working in the devcontainer, the extension is already installed and setup for you] Else, go to the Extension Marketplace and install it. You’ll find the extension under the name Test Data Treeview.

test data explorer extension

Once installed, a new tab section will appear in the Explorer Panel.

test data explorer tab

Click on the arrow icon on the right of an archive name to show its content. You might have to wait a bit for the archive to download locally.

test data archive content

Hovering on a file name will display some options to interact with it :

You can Display the files content in the VSCode Editor Area.
- VSCode will select the best suited editor depending on the file type.
- For example, Nifti images are opened in the 3D viewer Niivue.
You can Copy the file from cache to any location on your machine.

Usage outside of nf-neuro

By default, the extension is configured to work with the content of the nf-neuro repository. However, you can alter its configuration and use it to browse data organized in archives from any repository. You only need two things for the extension to work :

A webserver hosting the data archives and exposing them as http links.
A json file listing the archives and location on the webserver.

The json file contains a single object, a dictionary with archives’ names as keys and their location on the webserver as values :

{
  "archive1": "path/to/archive1.zip",
  "archive2": "path/to/archive2.zip"
}

For flexibility, the base URI of the webserver is not hardcoded in the file, but provided as a VSCode setting. Use the configuration options below :

Setting	Description	Default
testDataExplorer.dataserver	Base URI to the data server (without protocol)
testDataExplorer.serverdatalocation	Subpath to the data location on the server (will be append to the dataserver URI)
testDataExplorer.localListingLocation	Path to the local JSON file listing the archives

LOAD_TEST_DATA subworkflow

The LOAD_TEST_DATA subworkflow not only downloads and unpacks test data archives, but also caches them locally to prevent unnecessary re-downloads and preserve bandwidth. To use it, include it in your main.nf subworkflow file or main.nf.test test file :

include { LOAD_TEST_DATA } from '../load_test_data/main'

include { LOAD_TEST_DATA } from '../../load_test_data/main'

include { LOAD_TEST_DATA } from '../../../../subworkflows/nf-neuro/load_test_data/main'

include { LOAD_TEST_DATA } from '../../../../../subworkflows/nf-neuro/load_test_data/main'

The workflow has two inputs :

A channel containing a list of package names to download.
A name for the temporary directory where the data will be put.

To call it, use the following syntax :

archives = Channel.from( [ "<archive1>", "<archive2>", ... ] )
LOAD_TEST_DATA( archives, "<directory>" )

The archives contents are accessed using the output parameter of the workflow LOAD_TEST_DATA.out.test_data_directory. To define actual test inputs from it, use the .map operator :

input = LOAD_TEST_DATA.out.test_data_directory
  .map{ test_data_directory -> [
    [ id:'test', single_end:false ], // meta map
    file("${test_data_directory}/<file for input 1>"),
    file("${test_data_directory}/<file for input 2>"),
    ...
  ] }

Then, feed it to your subworkflow as follows :

Subworkflow "preproc_A"
Module "DENOISING_ALGOA"

preproc_A( input )

DENOISING_ALGOA( input )