Deploying at scale

REANA can be easily deployed on large Kubernetes clusters consisting of many nodes. This is useful for production instances with many users and many concurrent jobs.

Pre-requisites

  • A Kubernetes cluster with version greater than v1.21;
  • Helm v3;
  • A shared POSIX file system volume (such as CephFS, NFS) to host the REANA infrastructure volumes and the user runtime workspaces. The shared file system is necessary for any multi-node deployment conditions. See Configuring storage volumes.

Multi-node setup

For a scalable multi-user deployment of REANA, it is essential to use a Kubernetes cluster consisting of several nodes.

We shall be separating various REANA services into various dedicated nodes in order to ensure that the user runtime workloads would not interfere with the REANA infrastructure services that are critical for the platform to operate.

We recommend to start with at least six worker nodes:

$ kubectl get nodes
NAME       STATUS   ROLES    AGE   VERSION
master-0   Ready    master   97m   v1.18.2
node-0     Ready    <none>   97m   v1.18.2
node-1     Ready    <none>   97m   v1.18.2
node-2     Ready    <none>   97m   v1.18.2
node-3     Ready    <none>   97m   v1.18.2
node-4     Ready    <none>   97m   v1.18.2
node-5     Ready    <none>   97m   v1.18.2

The worker node roles are ensured by means of labelling the nodes:

  • 1 node labelled reana.io/system=infrastructure that will run the REANA infrastructure services such as the web interface application, the REST API server, and the workflow orchestration controller;

  • 1 node labelled reana.io/system=infrastructuredb that will run the PostgreSQL database service (unless you have some already-existing database service running outside of the cluster that could be reused without hosting the database yourself; this would be even more preferable);

  • 1 node labelled reana.io/system=infrastructuremq that will run the RabbitMQ messaging service;

  • 1 node labelled reana.io/system=runtimebatch that will run the user runtime batch workflow orchestration pods (such as CWL, Snakemake or Yadage processes);

  • 1 node labelled reana.io/system=runtimejobs that will run the user runtime job workload pods (generated by the above workflow batch orchestration pods);

  • 1 node labelled reana.io/system=runtimesessions that will run the user interactive notebook sessions.

For example, you would label the above cluster nodes as follows:

kubectl label node node-0 reana.io/system=infrastructure
kubectl label node node-1 reana.io/system=infrastructuredb
kubectl label node node-2 reana.io/system=infrastructuremq
kubectl label node node-3 reana.io/system=runtimebatch
kubectl label node node-4 reana.io/system=runtimejobs
kubectl label node node-5 reana.io/system=runtimesessions

You would then configure your REANA deployment by means of the Helm myvalues.yaml file as follows:

node_label_infrastructure: reana.io/system=infrastructure
node_label_infrastructuredb: reana.io/system=infrastructuredb
node_label_infrastructuremq: reana.io/system=infrastructuremq
node_label_runtimebatch: reana.io/system=runtimebatch
node_label_runtimejobs: reana.io/system=runtimejobs
node_label_runtimesessions: reana.io/system=runtimesessions

Deployment

You would deploy REANA us usual. Start by adding the REANA chart repository:

$ helm repo add reanahub https://reanahub.github.io/reana
"reanahub" has been added to your repositories
$ helm repo update
Hang tight while we grab the latest from your chart repositories...
...Successfully got an update from the "reanahub" chart repository
...Successfully got an update from the "cern" chart repository
...Successfully got an update from the "stable" chart repository
Update Complete. ⎈ Happy Helming!⎈

Continue with deploying REANA using your myvalues.yaml Helm values file: (see the list of supported Helm values)

$ vim myvalues.yaml  # customise your desired Helm values
$ helm install reana reanahub/reana -f myvalues.yaml --wait
NAME: reana
LAST DEPLOYED: Wed Mar 18 10:27:06 2020
NAMESPACE: default
STATUS: deployed
REVISION: 1
TEST SUITE: None
NOTES:
Thanks for flying REANA 🚀

Warning

Note that the above helm install command used reana as the Helm release name. You can choose any other name provided that it is less than 13 characters long. (This is due to current limitation on the length of generated pod names.)

Note

Note that you can deploy REANA in different namespaces by passing --namespace to helm install. Remember to pass --create-namespace if the namespace you want to use does not exist yet. For more information on how to work with namespaces, please see the Kubernetes namespace documentation.

Scaling up

With the above multi-node deployment scenario, it is easy to scale the cluster up for running heavier workloads or for welcoming more concurrent users, should the service evolve it that direction. You would keep the 3 infrastructure nodes and scale the 2 runtime nodes (1 batch, 1 jobs) as your needs grow.

For example, you could add to the cluster 50 new nodes, 10 for batch and 40 for jobs, and label these new nodes with reana.io/system=runtimebatch and reana.io/system=runtimejobs labels, and REANA would automatically recognise and use the new nodes for executing user workloads without any further change.

Ditto, if you see that users are preferring to run numerous Jupyter notebook sessions, you could add new nodes labelled reana.io/system=runtimesessions, and REANA would automatically use them to run Jupyter notebooks for users.

A typical production deployment could therefore look like:

  • 1 infrastructure app node (labelled reana.io/system=infrastructure)
  • 1 infrastructure DB node (labelled reana.io/system=infrastructuredb)
  • 1 infrastructure RabbitMQ node (labelled reana.io/system=infrastructuremq)
  • 5 runtime interactive session nodes (labelled reana.io/system=runtimesessions)
  • 10 runtime batch nodes (labelled reana.io/system=runtimebatch)
  • 40 runtime job nodes (labelled reana.io/system=runtimejobs)

Here, the first three infrastructure role nodes should be kept stable, whilst the last three runtime role nodes can be added and removed at will, based on increasing or decreasing user workload.

We have been operating REANA deployments on clusters of the above setup consisting typically of 50-100 nodes and 500-1000 cores, with occasional tests using up to 5000 cores.

Designing cluster node roles

The optimal number of how many cluster nodes you should reserve for runtime batch workflows, for runtime job workloads, or for runtime notebook sessions depends on your users and their typical research workflows that the cluster is running.

For example, assuming a cluster node of m2.large flavour, i.e. about 8 CPU cores and 16 GB memory per node, one such runtime job node can comfortably hold 8 concurrent user jobs at the full speed (since 1 node has 8 CPU cores). (The batch jobs do not require full CPU, since the workflow orchestration processes do not consume a lot of CPUs; they mostly launch user jobs and then wait for their execution.) Hence, 1 such runtime job node could run comfortably 8 user jobs, should the memory suffice. (If the workflows are not CPU-bound but memory-bound, then using higher RAM node flavours could be would be necessary.)

Another important consideration is the typical parallelism of the user workflows. For example, if the nature of the physics workflows that are run the most on the system is such that 1 workflow typically generates 4 very lengthy parallel n-tupling jobs that are running for hours, followed by relatively quicker statistical analysis jobs after them, then the overall job throughput would be most likely determined by the former n-tupling jobs, and we may expect 1 runtime job node to serve up to 2 workflows only. Hence, if we would like to run 80 such workflows concurrently, then we would need to have about 40 runtime job nodes in order to run the user workloads at optimal sustainable full speed.