Docker

Using an existing environment

In order to run your analysis, you can use a pre-existing container environment created by a third party. For example python:3.8 for Python programs or gitlab-registry.cern.ch/cms-cloud/cmssw-docker/cc7-cms for CMS Offline Software framework. In this case you simply specify the container name and the version number in your workflow specification and you are good to go. This is usually the case when your code does not have to be compiled, for example Python scripts or ROOT macros.

Note also that REANA offers a set of containers that can serve as examples about how to containerise popular analysis environments such as:

Building your own environment

Other times you may need to build your own container, for example to add a certain library on top of Python 2.7. This is the most typical use case that we’ll address below.

This is usually the case when your code needs to be compiled, for example C++ analysis.

If you need to create your own environment, this can be achieved by means of providing a particular Dockerfile:

# Start from the Python 2.7 base image:
FROM docker.io/library/python:2.7

# Install HFtools:
RUN apt-get -y update && \
    apt-get -y install \
       python-pip \
       zip && \
    apt-get autoremove -y && \
    apt-get clean -y
RUN pip install hftools

# Mount our code:
ADD code /code
WORKDIR /code

You can build this customised analysis environment image and give it some name, for example docker.io/johndoe/myenv:

$ docker build -f environment/myenv/Dockerfile -t docker.io/johndoe/myenv:1.0 .

and push the created image to the DockerHub image registry:

$ docker push docker.io/johndoe/myenv:1.0

Providing necessary shell

The Docker images for executing user jobs in the REANA ecosystem need to contain bash shell in the image.

The bash shell is used in operational procedures to pass along encoded/decoded job commands and parameters between REANA workflow orchestration components, the job execution components and the compute backend itself, so that the job execution behaviour would be consistent across Kubernetes, HTCondor, Slurm backends for both Docker and Singularity execution wrappers.

Therefore, please make sure that your Docker images contain the bash shell executable, even if it may not be the default shell.

For example, if you would like to use the tiny Alpine image, which uses ash shell by default, you can add a command in your Dockerfile to install additional bash shell as follows:

FROM docker.io/library/alpine:3.17
RUN apk add bash

The bash shell is relatively widespread, so it is very probable that your base images contain it already. Note that it is not necessary for bash to be the default shell; only its presence is required. Please get in touch if this requirement causes any trouble and you cannot ensure the presence of bash in your job images.

Supporting arbitrary user IDs

In the Docker container ecosystem, the processes run in the containers by default, uses the root user identity. However, this may not be secure. If you want to improve the security in your environment you can set up your own user under which identity the processes will run.

In order for processes to run under any user identity and still be able to write to shared workspaces, we use a GID=0 technique as used by OpenShift:

  • UID: you can use any user ID you want;
  • GID: your should add your user to group with GID=0 (the root group)

This will ensure the writable access to workspace directories managed by the REANA platform.

For example, you can create the user johndoe with UID=501 and add the user to GID=0 by adding the following commands at the end of the previous Dockerfile:

# Setup user and permissions
RUN adduser johndoe -u 501 --disabled-password --gecos ""
RUN usermod -a -G 0 johndoe
USER johndoe

Testing the environment

We now have a containerised image representing our computational environment that we can use to run our analysis in another replicated environment.

We should test the containerised environment to ensure it works properly, for example whether all the necessary libraries are present:

$ docker run -i -t --rm docker.io/johndoe/myenv /bin/bash
container> python -V
Python 2.7.15
container> python mycode.py < mydata.csv > /tmp/mydata.tmp

Multiple environments

Note that various steps of your analysis can run in various environments; for instance, the step to perform the data filtering on a big cloud, having data selection libraries installed, or the step to build the data plotting in a local environment, containing only the preferred graphing system of choice. You can prepare several different environments for your analysis if needed.