Storage Volumes and Workflows
Storage volumes are
used in workflows through the
volumes: section using the V4 provisioner-based syntax. Persistent volumes must be
created explicitly (via the CLI, web UI, or API) before they can be referenced in a
workflow. Ephemeral volumes are created automatically when the workflow runs and
destroyed when it completes.
Volumes in a workflow definition support these fields:
| Field | Type | Description |
|---|---|---|
use | string | Provisioner name, or ephemeral/persistent for auto-selection |
name | string | Persistent volume name (must already exist; created via CLI, Web UI, or API) |
size | string | Requested volume capacity (e.g., 10GB, 1TiB) |
annotations | map | Key-value pairs for provisioner auto-selection |
ingress | list | Data files to download into the volume before job execution |
egress | list | Data files to upload from the volume after job execution |
volumes:
scratch:
use: my-nfs-provisioner
size: 50GB
volumes:
data:
use: shared-nfs
name: project-data
volumes:
scratch:
size: 10GB
annotations:
tier: fast
Fuzzball selects the first provisioner that matches the requested annotations and grants ephemeral access to the user’s group. See Annotations and Selection for details.
volumes:
data:
name: my-dataset
Fuzzball searches all accessible provisioners for a volume with the specified name. If the
volume exists on exactly one accessible provisioner, it is used. If the same name exists on
multiple provisioners, specify the provisioner with use: to resolve the ambiguity.
When a volume is created within a group, the group ownership of the volume directory is set based on the user’s account. The storage driver applies POSIX ownership (UID, GID, and mode) during volume creation.
Group members can share data written to a volume because the setgid bit is set on the
volume directory. By default, umask is typically set to 002, which enables the file
creator to have full permissions while other group members can read and execute.
The ingress and egress URIs shown below are illustrative examples. Replace them with actual data sources and destinations appropriate for your environment.
This example uses an ephemeral scratch volume and a persistent data volume:
version: v4
volumes:
data:
use: shared-nfs
name: project-data
scratch:
use: shared-nfs
size: 10GB
ingress:
- source:
uri: https://example.com/input-data.tar.gz
destination:
uri: file://input-data.tar.gz
jobs:
process:
image:
uri: docker://alpine:latest
mounts:
/scratch:
volume: scratch
/data:
volume: data
script: |
#!/bin/sh
tar xzf /scratch/input-data.tar.gz -C /scratch
cp /scratch/results.csv /data/results.csv
policy:
timeout:
execute: 10m
resource:
cpu:
cores: 1
affinity: NUMA
memory:
size: 1GiB
After this workflow completes, the project-data persistent volume contains
results.csv. The ephemeral scratch volume is automatically deleted.
The V3
reference: volume://scope/classsyntax is still accepted for backward compatibility — Fuzzball auto-upgrades V1 workflows to V4 at execution time. For new workflows, useversion: v4with theuse:/name:/size:/annotations:fields.To permanently convert an existing V1 Fuzzfile to V4 format:
$ fuzzball workflow upgrade my-workflow.fz > my-workflow-v4.fzSee Upgrading existing V1 workflows for details.