Fuzzball Documentation
Toggle Dark/Light/Auto mode Toggle Dark/Light/Auto mode Toggle Dark/Light/Auto mode Back to homepage

Fuzzfile Syntax Guide

Workflows are described by Fuzzfiles which are codified as mostly YAML 1.2 files (can’t have multiple documents in a file) and have specific fields and syntax that are used when defining them. Below is a hierarchical tree of the fields that can be used to define a Fuzzfile, as they should be organized.

# $ fuzzball workflow validate workflow-syntax-example.yaml # this example, named
# "workflow-syntax-example.yaml" has been validated.

version: v1

files:
  file1: |
    var,val
    x,42
    y,3.14    
  file2: |
    param1=true
    param2=19.3    

defaults:
  job:
    mounts:
      scratch-volume:
        location: /scratch
    env:
      - SCRATCH=/scratch

volumes:
  data-volume:
    reference: volume://user/persistent/data
  scratch-volume:
    reference: volume://user/ephemeral
    ingress:
      - source:
          uri: s3://bucket-name-1/data.tar.gz
          secret: secret://account/ACCOUNT_S3_SECRET
        destination:
          uri: file://inputs/data.tar.gz
        policy:
          retry:
            attempts: 2
          timeout:
            execute: 2m
      - source:
          uri: file://file1
        destination:
          uri: file://inputs/params.csv
    egress:
      - source:
          uri: file:///results/simulations.tar.gz
        destination:
          uri: s3://bucket-name-2/simulations.tar.gz
          secret: secret://account/ACCOUNT_S3_SECRET
jobs:
  one:
    image:
      uri: docker://your.registry.com/path/to/container-one:stable
      secret: secret://account/ACCOUNT_REGISTRY_SECRET
    command: [python3, script.py, "--tmp", "$SCRATCH", "$INPUT", "$OUTPUT"]
    env:
      - INPUT=/scratch/inputs/data.tar.gz
      - OUTPUT=/scratch/results/preproc_out
    policy:
      timeout:
        execute: 2h
      retry:
        attempts: 1
    resource:
      annotations:
        gpu_compute_capability: sm80
        cpu_arch: epyc
      cpu:
        cores: 32
        affinity: NUMA
        sockets: 1
        threads: false
      memory:
        size: 92GB
        by-core: false
      exclusive: true
      devices:
        nvidia.com/gpu: 1
        cerebras.net/wse2: 1
  two-a:
    image:
      uri: docker://your.registry.com/path/to/container-two-a:stable
      secret: secret://account/ACCOUNT_REGISTRY_SECRET
    command:
      - '/bin/sh'
      - '-c'
      - 'some_sim_tool --seed  ${FB_TASK_ID} /scratch/results/preproc_out /scratch/results/simulations'
    cwd: /scratch
    requires: [one]
    task-array:
      start: 1
      end: 1000
      concurrency: 100
  two-b:
    image:
      uri: docker://some_mpi_tool:latest
    command:
      - 'some_mpi_tool'
      - '--in=/scratch/results/preproc_out'
      - '--out=/scratch/results/model.csv'
      - '--params=/scratch/inputs/params.csv'
    requires: [one]
    resource:
      cpu:
        cores: 4
        affinity: CORE
        sockets: 1
        threads: false
      memory:
        size: 6GB
        by-core: false
    multinode:
      implementation: openmpi
      nodes: 2
  three:
    image:
      uri: docker://your.registry.com/container-three:stable
      secret: secret://account/ACCOUNT_REGISTRY_SECRET
    command:
      - summarize
      - /scratch/results/model.csv
      - /scratch/results/simulations
      - /scratch/results/summary.html
    requires: [two-a, two-b]
    resource:
      cpu:
        cores: 2
        affinity: NUMA
        sockets: 1
        threads: false
      memory:
        size: 10GB
        by-core: false
  four:
    image:
      uri: docker://alpine:latest
    files:
      /tmp/config.toml: file://file2
    script: |
      #! /bin/sh
      # copy summary to persistent disk
      cp /scratch/results/summary.html /data/${FB_WORKFLOW_ID}-summary.html
      tar -czf /scratch/results/simulations.tar.gz /scratch/results/simulations      
    requires: [three]
    mounts:
      data-volume:
        location: /data
  • version: Denotes the workflow syntax version. Currently only allowed value is v1

  • files: Optional section to define the contents of files. Inline file contents defined here can be referred to in volume ingress and jobs using a file://<arbitrary_name> uri. The name of each inline file is arbitrary (e.g. file1 and file2). In the example above the contents of the file1 section are saved as a the file inputs/params.csv in the scratch volume with the the following stanza:

    - source:
        uri: file://file1                 ## <- the contents of the file1 section
      destination:
        uri: file://inputs/params.csv     ## <- saved as inputs/params.csv in the volume
    

    And the contents of file2 are injected into the container of job four using bind mount syntax like so:

    files:
      /tmp/config.toml: file://file2    ## <- contents of the file2 section will be available as /tmp/config.toml in the container
    
  • defaults: Optional section to set up default

    • env: Environment variables
    • mounts: Volume mounts

    that apply to all jobs. In the example all jobs will mount the ephemeral scratch volume at /scratch. Job four will, in addition to scratch, also mount the persistent data volume at /data. Likewise, all jobs will set the environment variable $SCRATCH to /scratch. Job one will, in addition, also set the environment variables $INPUT and $OUTPUT.

  • volumes: Optional section to define a list of storage volumes that jobs can interact with. This section contains the specifications for each volume as well as all the workflow data ingress and egress. For each volume in the list:

    • name of your volume: A custom name for the volume that’s used to specify it to the mounts: section of workflow jobs. Would be, for example, something like v1: or data:.

      • reference: Used to specify a volume from a storage class. Typically a fuzzball cluster may have configured an ephemeral volume which only persists for the life of a workflow (volume://user/ephemeral) and at least one persistent volumes (volume://user/persistent/VOLUMENAME). Ephemeral volumes should be used in cases like when data is stored elsewhere than on-prem storage and will need to be downloaded - like when using S3 or a similar cloud object storage provider.

      • ingress: Declares a section that defines a list of files to be ingested into the volume at the beginning of a workflow workflow. Each file must define a source, a destination, and optionally a policy for timeouts and retries.

        • source:

          • uri: The actual path to the data being moved into the workflow via the ingress. Can be of the form s3:// or http[s]://, ex. s3://bucket-name/information.tar.gz or https://website.com/data/information.tar.gz.

          • secrets: Part of the secrets/credentials functionality that describes access keys needed to access the endpoint.

        • destination:

          • uri: The path on the volume where the ingressed data should be placed. Should be of the form file://, ex. file://inputs/information.tar.gz. This, prepended with the mounts: location specified in the jobs, gives the location of the data inside the workflow job container. For example, a file data.tar.gz placed into an inputs folder (as shown above with file://inputs/data.tar.gz) on the scratch volume being accessed by a job that mounts scratch at /scratch would be found in the container at /scratch/inputs/information.tar.gz.
      • egress: Declares a section that defines a list of files to be movement from the volume to an external storage at the end of the workflow. Source, destination, and policy sections are defined as for ingress except that the source is the local file and the destination is the remote storage.

  • jobs: This is where the actual compute jobs that the workflow will perform are defined. There can be as can be many jobs in a workflow or as few as a single one, and jobs can be used to accomplish and run all kinds of tasks. Jobs can be created for things like untarring some ingressed data, running scientific software directly as it would be run on the command line, or even to call scripts in Python or other languages that perform their own processing or runs of software.

    • : An arbitrary job name. This name will be what shows up under commands like fuzzball workflow status for the name of the job. Can be anything - untar-data, run-training-alexnet, build-gene-database, etc. as need to describe what the job is doing.

      • image: A blank line starting the description of the container image a job will use.

        • uri: This defines where the container image the job will use to run is at. Supported schemes are oras:// for Apptainers and docker:// for Docker/OCI containers. The image contains all the software that the job using it needs - which could be scientific software, compilers/interpreters, MPI/other parallelism suites, etc.

        • secrets: Part of the secrets/credentials functionality that describes credentials needed to access the registry the container is at, if necessary. For example, privately hosted registries, GitLab or similar website registries, or pulling certain container from Docker Hub or the NVIDIA NGC may all require credentials to be specified.

      • command: The actual command that a job will run. This could be running, for example, a Bash shell command, script, or call to an executable for a piece of scientific software. The command will be executed in the job container specified in the image: section, with other constraints applied as specified in other job: subfields. Examples of common command field patterns are command: [python3, script.py, command: ["/bin/sh", "-c", "ls && tar -xf --no-replace-root information.tar.gz, or command: [mpi-program, -in, /path/to/inputs.txt].

      • script: Script is mutually exclusive with command. A multiline script with a shebang (#!) line can be specified.

      • env: A list of environment variables in KEY=VALUE format. The variables are available in the code executed by the job command or script.

      • multinode: How a job should implement multinode parallelism through MPI or PGAS/GASNet.

        • implementation: The specific multinode wrapper implementation that Fuzzball should use. There are two MPI-based options, OpenMPI (openmpi), MPICH (mpich), and one PGAS/GASNet option (gasnet).

        • nodes: How many nodes matching the job’s specified resource requirements the multinode job should spin up.

      • task-array: How a job should implement embarrassingly parallel-based workloads.

        • start: The starting task ID of the task array.

        • end: The ending task ID of the task array.

        • concurrency: The number of Compute nodes that should be used to process the tasks in the task array.

      • mounts: This specifies where the job will access the volumes defined previously in the workflow file.

        • : The name of the volume as previously defined in the volumes section.

          • location: The path inside of the job container that the mount will be attached at. For example, a file information.tar.gz placed into an inputs folder (with ex. file://inputs/information.tar.gz) on a volume named data being accessed by a job that mounts data at /data would be found in the container at /data/inputs/information.tar.gz.
      • policy: Defines specific rules and parameters related to how the job should run.

        • timeout: Beginning of timeout description for job.

          • execute: How long the job should be allowed to run for being failing due to timeout. Can be specified in minutes.
        • retry: Rules related to rerunning a workflow step in case of failure.

          • attempts: The number of retry attempts to perform before failing the step completely.
      • resource: Defines the resources the job will obtain before it runs. Resources are meant to be specified based on the context the workflow is running in and may be shared between jobs in that context. Fuzzball’s internal scheduler and resource manager will find where the desired resources are available, obtain them, and then start running the job on them. If the specified resources are unavailable, the workflow will halt at the job until resources are available. This will be reflected in the fuzzball workflow status output as a job waiting in the Pending state.

        • annotations: Sets specific, custom attributes of the desired compute node OS. Essentially a custom way to specify nodes for things like CPU architecture, GPU compute capability, specialized hardware accelerators. For example, knights-landing, sm-80, or gaudi.

        • cpu: Specifies how the job will utilize CPUs.

          • cores: How many CPU cores the job will attempt to claim when it starts.

          • affinity: Whether the workflow will attempt to claim any CPU core on a node (CORE), CPUs from the same socket on a node (SOCKET), or CPUs within the same NUMA (Non-uniform memory access) domain (NUMA). NUMA is a system architecture that co-locates each CPU core (or group of CPU cores, potentially) with a part of the overall system memory that the co-located core(s) have a physically and logically shorter path to, which can lead to drastic speedups for some workloads. A “socket” is a physical interface on the motherboard of the compute node that a CPU can fit into; if there is more than one socket, then the node can have more than one CPU. For example, a node may have two sockets, each with an 8 core CPU in them, for a total of 16 CPU cores on the system. If affinity is set to CORE, then the workflow will attempt to claim the requested number of cores from any CPU on the system, regardless of socket or NUMA domain. If affinity is set to SOCKET, then it will attempt to claim the requested number of cores from those available on a single CPU in a single socket. If affinity is set to NUMA, then it will attempt to minimize the number of utilized NUMA nodes - which typically also means minimizing the number of utilized sockets.

          • sockets: The number of sockets that the job should try to use.

          • threads: Whether or not hardware threads should be used. Hyper-threading allows for two threads to run on each core, but has varied results with high performance computing workloads; sometimes they see a performance increase, but also frequently see no speedup or even a performance decrease.

        • memory: Specifies how the job will use RAM.

          • size: The amount of RAM that the job will attempt to claim.

          • by-core: Whether or not the memory should be allocated by core or by node.

        • devices: Specifies which and how many devices the job will use. A device request take the following form <vendor>/<type>: <n> , by example to request Nvidia GPUs:

          • nvidia.com/gpu: 1
      • requires: What other jobs have to be completed before this job can start. This provides an easy to specify interface for setting up jobs that depend on the results of other jobs to run correctly - it specifies a workflow as a directed acyclic graph (DAG), meaning a graph with no cycles (which would represent circular impossible to fulfill job dependencies) where there is no backtracking; jobs can only flow forward through the path of execution. If a job has no dependencies defined with requires:, it will be treated as an entry point to the workflow and be started when the workflow begins.

      • network: Specifies parameters related to how a network namespace for the container should be established.

        • isolated: Creates an isolated network for the container to use, allowing the fuzzball workflow port-forward command to be used for forwarding a workflow job container port back to the user’s Fuzzball CLI. Provides a bridge to the open internet so connectivity there is maintained as is the default.