Storage Class Definitions

A storage class definition defines how storage volumes are created/published/deleted/named on a cluster. Only organization owners can control storage class definitions for security reasons.

Below we present examples of definitions of persistent and ephemeral classes.

Persistent storage class

Here is an example of a persistent storage class backed by an NFS share using the NFS storage driver installed in the previous section. We assume access to an NFS server at ${NFS_SERVER_IP} providing a /srv/fuzzball/storage_data share:

version: v1
name: data
description: Persistent data
driver: nfs.csi.k8s.io
properties:
  persistent: true
  retainOnDelete: true
parameters:
  server: ${NFS_SERVER_IP}
  share: /srv/fuzzball/storage_data
capacity:
  size: 100
  unit: GiB
access:
  type: filesystem
  mode: multi_node_multi_writer
mount:
  options:
    - nfsvers=4
  user: user
  group: user
  permissions: 770
scope: user
volumes:
  nameArgs:
    - USERNAME
  nameFormat: "{{username}}"
  maxByAccount: 1

We will discuss the fields of this YAML definition in detail. First, it is necessary to provide the definition version, a name, and a description.

The name cannot be changed later, so ensure it is correct when first created.

# always v1 for now
version: v1
# The name is mandatory and will be used by Fuzzball as reference to the volume
name: data
# A nice, human-readable description for this storage volume
description: Persistent data
# The storage driver to use
driver: nfs.csi.k8s.io

Properties

properties:
  persistent: true
  retainOnDelete: true

In the above example, persistent specifies that the storage volumes will persist and can be used for input and output in multiple concurrent and/or subsequent workflows. This is in contrast to ephemeral which only exist for the duration of a single workflow.

And retainOnDelete specifies that the content of storage volumes should be retained even when the storage volume itself is deleted in fuzzball by its owner.

The retainOnDelete setting only retains data when the owner deletes the volume. If the owner deletes the data explicitly, this setting will not allow them to be recovered.

Parameters

This field takes key/value pairs and its content is driver dependent. Consult the CSI driver documentation to determine appropriate keys for a given driver. In this example server and share are set for an NFS CSI Driver to specify the NFS server and the export for the volumes mount:

parameters:
  server: ${NFS_SERVER_IP}
  share: /srv/fuzzball/storage_data

It is possible to set in parameter values the following arguments which will be substituted:

{{uid}} will be set to the user ID as defined by mount.user

{{gid}} will be set to the group ID as defined by mount.group

Capacity

capacity:
  size: 100
  unit: GiB

This allows you to specify the storage volume capacity. The recognized unit values (case insensitive) are as follows:

MiB
GiB
TiB
PiB
EiB

Access

access:
  type: filesystem
  mode: multi_node_multi_writer

This section allows one to define the access type and the access mode of the volumes.

Recognized (case insensitive) values for type:

filesystem

Recognized (case insensitive) values for mode:

MULTI_NODE_READER_ONLY : to allow one or multiple Substrate nodes to mount the volume in RO mode
MULTI_NODE_MULTI_WRITER : to allow one or multiple Substrate nodes to mount the volume a RW mode

Mount

mount:
  options:
    - nfsvers=4
  user: user
  group: user
  permissions: 770

The mount section allows one to specify additional mount options to pass. Options are driver dependent (see CSI Driver documentation). In addition to mount options, an organization owner controls volumes ownership and permissions that are applied on storage volume directories when used. For that purpose we can define user and group keys with the following values:

Recognized values for user:
- user: sets the storage volume directory owner to the user’s UID
- root: sets the storage volume directory owner to root (UID=0)
group can take as value:
- user : this tells to set the primary group ID of user for the storage volume directory group
- root: this tells to set the root group ID for the storage volume directory group
- account: this tells to set the account group ID for the storage volume directory group

The value user requires that Keycloak be configured with an LDAP provider.

Additionally, an organization owner can define permissions bits to apply on storage volume directories, recognized values:

700
750
755
775
770
500
550
555

Scope

Storage volumes can have a restricted scope. The scope parameter controls the creation scope (but not the usage scope). Possible scopes are account, user, or all for both.

scope: user

The scope determines which account type can create a storage volume:

account specifies that storage volumes can be created from any account except from the user account
user specifies that storage volumes can be created from the user account only
all specifies that storage volumes can be created from any accounts

Volumes

Many CSI Drivers use the storage volume name as the directory name in order to control the storage volumes. The section below provides a flexible way to define storage volume names automatically:

volumes:
  nameArgs:
    - USERNAME
  nameFormat: "{{username}}"
  # this is the maximum of storage volumes that can be created by an account
  maxByAccount: 1

nameArgs and nameFormat constitute the equivalent of getting sprintf(nameFormat, nameArgs...) result where {{arg_name}} placeholders are substituted by their respective values. Placeholders argument names are case-insensitive.

Recognized name arguments:

USERNAME: substituted with the username of user creating/using a storage volume
ORGANIZATION_ID: substituted with the organization ID the user belongs to when creating/using a storage volume
ACCOUNT_ID: substituted with the account ID the user is using when creating/using a storage volume
WORKFLOW_ID: substituted with the workflow ID the user is running when creating/using a storage volume
CUSTOM_NAME: substituted with the name the user provided when creating/using a storage volume

In the example above, when user userx creates/uses a storage volume, the resulting name for the volume name is user-userx. The NFS CSI Driver uses the volume name as the directory name, so when userx creates/uses a storage volume based on this definition, the CSI Drivers will set /srv/fuzzball/storage_data/userx as the resulting volume path.

When CUSTOM_NAME is used as part of the volume name users can affect the name of the volume by using volume references in the format volume://<scope>/<storage_volume_name>/<optional_custom_name>. This will allow users to create multiple distinct persistent storage volumes for a storage class up to the maximum specified with maxByAccount. If CUSTOM_NAME is not used then the <custom_name> part of the volume reference is ignored. If CUSTOM_NAME is used it may be advisable to include additional information in the nameFormat to ensure unique paths on the storage backend. For example:
volumes:
  nameArgs:
    - USERNAME
    - CUSTOM_NAME
  nameFormat: "{{username}}-{{custom_name}}"
  maxByAccount: 2

While possible to use WORKFLOW_ID for persistent volume naming, it is not generally helpful to include the workflow id of the creating job in the volume name.

The name arguments ORGANIZATION_ID and ACCOUNT_ID don’t ensure a unique volume name, with the exception of ACCOUNT_ID when maxByAccount is set to 1. All other arguments can be used in isolation and will produce unique volume names.

Ephemeral storage class

A definition file for the storage class underlying ephemeral volumes for workflows is broadly similar:

version: v1
name: scratch
description: Ephemeral Scratch Volumes
driver: nfs.csi.k8s.io
properties:
  persistent: false
  retainOnDelete: false
parameters:
  server: ${NFS_SERVER_ID}
  share: /srv/fuzzball/storage_scratch
capacity:
  size: 100
  unit: GiB
access:
  type: filesystem
  mode: multi_node_multi_writer
mount:
  options:
    - nfsvers=4
  user: user
  group: user
  permissions: 770
scope: user
volumes:
  nameArgs:
    - WORKFLOW_ID
  nameFormat: "{{workflow_id}}"

Here we use WORKFLOW_ID as the sole element used for naming the ephemeral volumes which will ensure that each workflow will get a unique, private scratch volume backed by a subdirectory of the scratch share created earlier named after the workflow id.

To highlight changes from the persistent example, we have set persistent property to false and we also don’t want to keep data when workflow has finished by setting retainOnDelete to false.