Configuring storage volumes¶
As described in the Deploying at scale documentation page, REANA needs a shared filesystem to store the workflows' workspaces. It is also recommended to configure a second optional storage volume to be used by REANA's infrastructure components, such as the database and the message broker, in order to better isolate the infrastructure storage from possible disk exhaustion situations due to heavy user workloads.
The shared storage volume and the infrastructure storage volume can be configured by means of two Helm values:
shared_storageholds the configuration for the shared filesystem needed by REANA. It is mandatory to configure this volume.
infrastructure_storageconfigures the second optional volume used by the database and the message broker. If not configured, the shared storage volume is used instead.
This is how you would configure both volumes, with the shared storage volume being hosted on CephFS and the infrastructure one on NFS:
infrastructure_storage: backend: nfs volume_size: 10 access_modes: ReadWriteMany shared_storage: backend: cephfs volume_size: 200 access_modes: ReadWriteMany cephfs: ...
In the configuration examples below we'll use the
shared_storage volume, but the
infrastructure_storage volume can be configured in exactly the same way.
Note that typically, the shared storage volume can be huge (even tens of terabytes), while the shared storage volume can be as small as a few gigabytes, depending also on whether you are using an external database or not.
To use an existing CephFS share, set the
shared_storage.backend Helm value to
You can then configure the CephFS share by setting the appropriate values in
This is an example CephFS configuration, using the default values present in the Helm chart:
shared_storage: # or `infrastructure_storage` backend: cephfs volume_size: 200 access_modes: ReadWriteMany cephfs: provisioner: manila-provisioner type: "Geneva CephFS Testing" availability_zone: nova os_secret_name: os-trustee os_secret_namespace: kube-system cephfs_os_share_id: <cephfs-share-id> cephfs_os_share_access_id: <cephfs-share-access-id>
You can deploy an NFS file system inside your cluster using the official NFS Server Provisioner Helm chart.
Create your configuration file,
nfs-provisioner-values.yaml, following the documentation and adapting it to your needs. Remember to name the storage class as REANA expects it,
storageClass: defaultClass: true name: reana-shared-volume-storage-class
Install the NFS Server Provisioner
$ helm install reana-dev-storage stable/nfs-server-provisioner \ -f nfs-provisioner-values.yaml
Configure the NFS storage volume in your Helm configuration file
shared_storaged: # or `infrastructure_storage` backend: nfs volume_size: 200 access_modes: ReadWriteMany
REANA also supports using directories already present on the host node's filesystem. Note that this approach is not suitable for multi-node deployments, but only for local single-user single-node instances of REANA, for example during personal development.
Using HostPath volumes is not recommended, as they present many security risks. See the related Kubernetes documentation page for more information.
To use a local directory as a storage volume, set the
shared_storage.backend Helm value to
shared_storage.hostpath.root_path to the path of the directory you would like to use:
shared_storage: # or `infrastructure_storage` backend: hostpath hostpath: root_path: "/opt/myreana"
Separating infrastructure storage volume¶
REANA deployment chart uses only the single shared storage volume by default. If you have an existing cluster with many users with heavy disk space needs, and you would like to introduce the additional infrastructure storage volume to better isolate the infrastructure components' storage needs from user workloads, you can proceed as follows.
This procedure is delicate and it can lead to data loss. Carefully read the following instructions and, if possible, try them out on a test instance of REANA to make sure that everything works.
This guide requires to upgrade REANA a few times by using the
helm tool. To make sure that you are running the correct commands, you can use the
helm-diff plugin to inspect the changes that will be made to REANA's deployment.
- Make sure REANA is already updated to version 0.9.0 or greater. In the following steps, we will refer to this version as
Disable the scheduling of new workflows by setting
$ helm upgrade reana reanahub/reana \ --version $VERSION \ --values myvalues.yaml \ --set components.reana_server.environment.REANA_MAX_CONCURRENT_BATCH_WORKFLOWS=0
Wait for all the running workflows to finish. You can run
kubectl get podsto make sure that workflows are not running.
myvalues.yamlto configure the new infrastructure volume, as explained in the Configuration section. For example:
infrastructure_storage: backend: cephfs volume_size: 200 access_modes: ReadWriteMany cephfs: provisioner: manila-provisioner type: "Geneva CephFS Testing" availability_zone: nova os_secret_name: os-trustee os_secret_namespace: kube-system cephfs_os_share_id: <cephfs-share-id> cephfs_os_share_access_id: <cephfs-share-access-id>
helm upgradeto deploy the new volume and to enter maintenance mode.
$ helm upgrade reana reanahub/reana \ --version $VERSION \ --values myvalues.yaml \ --set components.reana_server.environment.REANA_MAX_CONCURRENT_BATCH_WORKFLOWS=0 \ --set maintenance.enabled=true
Make sure that all the pods were terminated by checking the output of
kubectl get pods.
Migrate the RabbitMQ and database directories from the shared volume to the new infrastructure volume.
As an example, this can be achieved by executing a custom pod that mounts both volumes. Please note that if volumes use the
hostpathbackend, then you need to amend
migration-pod.yamlto mount the correct paths. You also need to amend the pod's specification if you are using an external database, as in this case the database directory does not exist on the shared storage volume.
apiVersion: v1 kind: Pod metadata: name: reana-volume-migration spec: containers: - name: reana-volume-migration image: ubuntu:20.04 command: - "bash" - "-c" - | set -e SRC_DB=/var/reana-shared/db DST_DB=/var/reana-infrastructure/db SRC_MQ=/var/reana-shared/rabbitmq DST_MQ=/var/reana-infrastructure/rabbitmq if [ -e $DST_DB ]; then echo "$DST_DB already exists, aborting" exit 1 fi if [ -e $DST_MQ ]; then echo "$DST_MQ already exists, aborting" exit 1 fi echo "Moving DB to infrastructure volume" mv $SRC_DB $DST_DB echo "Moving MQ to infrastructure volume" mv $SRC_MQ $DST_MQ echo "All done!" volumeMounts: - name: reana-shared-storage mountPath: "/var/reana-shared" - name: reana-infrastructure-storage mountPath: "/var/reana-infrastructure" restartPolicy: Never volumes: - name: reana-shared-storage # hostPath: # path: "/var/reana" persistentVolumeClaim: claimName: reana-shared-persistent-volume - name: reana-infrastructure-storage # hostPath: # path: "/var/reana-infra" persistentVolumeClaim: claimName: reana-infrastructure-persistent-volume
To spawn the pod, simply run
kubectl apply -f migration-pod.yaml
Check the logs to make sure there are no errors and check that the directories of the database and message broker are now in the newly-created volume.
$ kubectl logs reana-volume-migration Moving DB to infrastructure volume Moving MQ to infrastructure volume All done!
Disable maintenance mode and scale up the cluster to start accepting new workflows.
$ helm diff upgrade reana helm/reana -f myvalues.yaml