Storage Volumes and Workflows

Storage volumes can be created either from the CLI or web UI, but they can also be created automatically by workflows. This is a convenience especially for ephemeral volumes used for a single workflow.

Storage volume scope

We have seen that a storage class can restrict the storage volume scope within its definition. This scope governs the creation (but not the usage) of volumes. Scopes can be

group: Owners of a group can create storage volumes for their groups. This excludes the user’s personal group.
user: Single users can create storage volumes for their user only. This excludes other users and groups.
all: There is no scope restriction.

Why do these settings not control the usage scope?

Storage volumes scoped to a user are usable from any group a user is member of. This allows, for example, a user of the group Apple to use their home directory while running workflows from the group Apple.

Storage volume references

Storage volumes can be used by their reference from workflow volumes. A storage volume reference includes (at a minimum) a scope and the storage class name. An optional custom name can be specified depending on the storage class naming configuration. So a storage volume reference takes this form:

volume://<scope>/<storage_class_name>/<optional_custom_name>

Where scope can be user or group depending on the defined storage class scope, and storage_class_name is the name of the storage class (e.g., scratch, data, nfs). If the storage class scope is set to group, the reference will use the prefix volume://group/. If the storage class scope is set to user, the reference will use the prefix volume://user/.

So if you want to use a storage volume from an ephemeral storage class like scratch we defined earlier (which has scope set to user), you would use the reference volume://user/scratch, where scratch is the storage class name. In our definition of the scratch ephemeral class, the custom name portion of the reference is automatically generated using the workflow ID, so you don’t need to include /<optional_custom_name> in the reference.

Permissions

When a group is created, a group ID (GID) is allocated for the group. All group members have the allocated GID added to their user.

When a storage volume is created within a group, the group ownership of the top level directory is set to either the primary group ID of user or root group ID for the storage volume directory group. This configuration is set in the storage volume definition. Group permissions set at the top level directory are set to read/write. setgid is configured at the top level directory and is used to ensure data written to the storage volume has group ownership set to the allocated GID.

By default, umask is typically set 002. The umask enables a user to do anything (read, write, execute) with the files created while other users can only read and execute, but not alter them. As a result, group members can share data written to a storage volume. Setting a umask in a workflow can modify the permissions files created with in the storage volume.

Example workflow

Let’s use volumes from the scratch (ephemeral) and data (persistent) classes created previously in a workflow. Neither use the optional custom name part of a volume reference. The persistent data volume for the user running the example workflow below may already exist having been created either explicitly or automatically by a previous workflow and persists. If it does not yet exist it will be created automatically. The ephemeral scratch volume will be created automatically and deleted at the end of the workflow.

# this is test.fz
version: v1

volumes:
  data:
    reference: volume://user/data
  scratch:
    reference: volume://user/scratch
    ingress:
    - source:
        uri: https://raw.githubusercontent.com/ErikSchierboom/sentencegenerator/master/samples/the-king-james-bible.txt
      destination:
        uri: file://bible.txt

jobs:
  read:
    image:
      uri: docker://alpine:latest
    mounts:
      scratch:
        location: /scratch
      data:
        location: /data
    script: |
      #!/bin/sh
      cat /scratch/bible.txt | tr '[:lower:]' '[:upper:]' > /data/bible.txt
    policy:
      timeout:
        execute: 10m
      retry:
        attempts: 3
    resource:
      cpu:
        cores: 1
        affinity: NUMA
      memory:
        size: 1GiB

After creating the Fuzzfile, you can run the workflow like so:

# fuzzball workflow start test.fz
Workflow "9ce2772a-924c-4b1a-b10e-87a864fb16d7" started.

# fuzzball workflow describe 9ce2772a-924c-4b1a-b10e-87a864fb16d7
Name:      test.fz
Email:     bob@me.llc
UserId:    b8077c97-0185-437c-9425-feb063a5884b
Status:    STAGE_STATUS_FINISHED
Cluster:   unset-cluster
Created:   2025-04-15 02:02:17PM
Started:   2025-04-15 02:02:17PM
Finished:  2025-04-15 02:02:28PM
Error:


Stages:
KIND     | STATUS   | NAME                                          | STARTED               | FINISHED
Workflow | Finished | 9ce2772a-924c-4b1a-b10e-87a864fb16d7          | 2025-04-15 02:02:17PM | 2025-04-15 02:02:28PM
Volume   | Finished | data                                          | 2025-04-15 02:02:17PM | 2025-04-15 02:02:19PM
Volume   | Finished | scratch                                       | 2025-04-15 02:02:17PM | 2025-04-15 02:02:19PM
Image    | Finished | docker://alpine:latest                        | 2025-04-15 02:02:17PM | 2025-04-15 02:02:18PM
File     | Finished | https://raw.githubusercontent.com/ErikSchi... | 2025-04-15 02:02:22PM | 2025-04-15 02:02:23PM
Job      | Finished | read                                          | 2025-04-15 02:02:24PM | 2025-04-15 02:02:25PM

We can see the persistent storage volume on the NFS server containing the upper-cased text file.

# tree /srv/fuzzball/storage_data
/srv/fuzzball/storage_data
├── alice
│   ├── ...
└── bob
    └── bible.txt

2 directories, 4 files