reana.yaml

About reana.yaml

The REANA reproducible analysis platform requires to have reana.yaml file present in your analysis source code. This is a so called REANA specification file. Its purpose is to answer the Four Questions:

  1. What is your input data?
  2. Which code analyses it?
  3. What is your computing environment?
  4. Which computational steps do you take to arrive at results?

This reference guide describes how the Four Questions are transcribed into reana.yaml REANA specification file.

Note

The REANA specification file uses a human-readable data-serialisation language called YAML. YAML uses Python-like whitespace indentation and supports basic data types such as strings, lists, dictionaries. More information can be found in the official YAML documentation.

Understanding reana.yaml

The reana.yaml file describes the analysis as a computational workflow with its inputs, workflow specification, and its outputs. The overall structure of reana.yaml looks as follows:

Property Type Mandatory? Description
version string optional Specifies REANA version to which the analysis was written for. For example, "0.6.0".
inputs dictionary optional Specifies all the high-level inputs to the workflow. Can be composed of "files", "directories", "parameters", "options".
workflow dictionary mandatory Defines computational workflow, using CWL, Serial or Yadage specifications.
outputs dictionary optional Specifies all the high-level outputs of the workflow. Can be composed of "files".

Each property will be described in detail in the following sections.

reana.yaml version

The version property of reana.yaml specifies REANA platform version to which the analysis was written for. It can be useful for long-term preservation in case the REANA specification file structure may change in the future. The property is optional.

The version property value is a string:

Property Type Mandatory? Description
version string optional Specifies REANA version to which the analysis was written for. For example, "0.6.0".

The version property example:

version: 0.6.0

reana.yaml inputs

The inputs property of reana.yaml specifies all the workflow high-level inputs, be they files, directories, or parameters with values. The property serves to document all the initial inputs to the workflow. The reana-client upload command will seed the inputs to the workflow workspace before running the workflow. Note that the property is optional.

The inputs property is composed of:

Property Type Mandatory? Description
directories list optional Lists all the input directories to the workflow. Will be seeded to the workspace before running.
files list optional Lists all the input files to the workflow. Will be seeded to the workspace before running.
parameters dictionary optional Specifies all the input parameters to the workflow. It is a dictionary of parameter names and their values expressed as strings.
options dictionary optional Specifies operational options for each workflow engine, see below.

The inputs.options property describes operational options that can be used for the different workflow engines. The available options are:

Property Type Mandatory? Workflow engine Description
CACHE string optional Serial Whether the workflow engine should cache the results of each step for faster execution of subsequent workflow runs. Disabled by default. Can be on or off.
FROM string optional Serial Allows partial execution of a workflow starting from the beginning of a desired step. The value is the name of the desired starting step. Note that the FROM option be combined with the TARGET option.
TARGET string optional Serial, CWL Allows partial execution of a workflow until the end of a desired step. The value is the name of the desired target step. Note that the TARGET option can be combined with the FROM option.
toplevel string optional Yadage Yadage toplevel argument. It represents the working directory or remote repository where the workflow should be pulled from. More info. It supports GitHub as remote repository github:<username/repo[@branch]>[:subpath]
initdir string optional Yadage Yadage initdir argument. It represents the initial directory for workflows running locally.
initfiles list optional Yadage Yadage initfiles argument. A list of YAML files that passes initial parameters as the workflow inputs.

The inputs property example:

inputs:
  directories:
    - mydir1
    - mysubdirs/mydir2
  files:
    - myfile1.csv
    - mysubdirs/myotherdir/myfile2.csv
  parameters:
    myparam1: myvalue1
    myparam2: myvalue2
  options:
    CACHE: off

reana.yaml workflow

The workflow property of reana.yaml specifies the computational steps that are necessary to take to get the results. REANA supports three different workflow specification languages (CWL, Serial, Yadage). Each workflow specification language expresses the computational steps differently.

The workflow property is composed of:

Property Type Mandatory? Description
type string mandatory Specifies workflow language type. Can be cwl, serial, yadage.
file string mandatory if property specification is missing For CWL and Yadage workflows, specifies workflow steps in an external file, using their respective workflow definition languages.
specification dictionary mandatory if property file is missing For Serial workflows, specifies workflow steps internally in reana.yaml, see below.

The workflow.specification property is used in Serial workflows and is further composed of:

Property Type Mandatory? Description
steps list mandatory Lists all workflow steps that are to be run sequentially to obtain workflow results.

The workflow.specification.steps property describes each individual computational step of the Serial workflow and is further composed of:

Property Type Mandatory? Description
name string optional Provides name of the given workflow step.
environment string mandatory Specifies runtime environment container image where the given workflow step commands will be run.
commands list mandatory Lists all commands to be run in the runtime environment container image when the given workflow step is executed. Note that each command is executed as a separate containerised job.

The workflow property examples:

  • For CWL workflows:
workflow:
  type: cwl
  file: myworkflow.cwl
workflow:
  type: serial
  specification:
    steps:
      - name: gendata
        environment: mydockerhuborganisation/mygendockerimage:1.1
        commands:
        - ./mygencommand "${mygenparam}" > mydata.txt
        - ./mygenothercommand mydata.txt > mydata.csv
      - name: fitdata
        environment: mydockerhuborganisation/myfitdockerimage:42.1
        commands:
        - ./myfitcommand mydata.csv "${myfitparam}" > myplot.png
workflow:
  type: yadage
  file: myworkflow.yaml

A more detailed information on workflow specification languages is available in corresponding CWL, Serial and Yadage pages.

reana.yaml outputs

The outputs property of reana.yaml specifies all the workflow high-level outputs, consisting of a list of files. The reana-client download command will download all the specified files from the workflow workspace to the local filesystem. Note that the property is optional.

The output property is composed of:

Property Type Mandatory? Description
files list optional Lists all the output files of the workflow.

The outputs property example:

outputs:
  files:
    - myplot.png
    - mysubdirs/myotherdir/myotherplot.pdf

Note

Unlike inputs property, the outputs propertly cannot specify directories, only files. Moreover, the wildcards are not supported.

Validating reana.yaml

You can use reana-client validate command to make sure that your reana.yaml (or reana.yml) specification file is conform to the above standard:

$ reana-client validate
File my-analysis/reana.yaml is a valid REANA specification file.

If your workflow specification file is not named reana.yaml (or reana.yml), you can use the -f command-line option to specify the path to the file for the validation:

$ reana-client validate -f reana-debug.yaml
File my-analysis/reana-debug.yaml is a valid REANA specification file.

The reana-client validate command will warn you about any errors or problems in your reana.yaml files.

Examples

Many reana.yaml examples can be seen in the REANA demo example analyses: