Fuzzball Documentation
Toggle Dark/Light/Auto mode Toggle Dark/Light/Auto mode Toggle Dark/Light/Auto mode Back to homepage

Provisioner Configuration Reference

This document provides an exhaustive reference for all configuration parameters available in the Fuzzball central configuration system. The central configuration uses YAML format and supports multiple provisioner types with their specific parameters.

Configuration Structure Overview

# Global cluster settings
nodeAnnotations:
  # Map of global annotations applied to all nodes
  # For example:
  global.annotation: "cluster-wide-value"
  environment: "production"

softwareTokens:
  # Map of software license token limits
  # For example:
  matlab: 20
  ansys: 10

scheduler:
  queueDepth: 64
  # Recognized annotations which could be passed by workflow
  # jobs to the scheduler to match against provisioner definition
  # annotations.
  # By default the recognized annotations are:
  # - nvidia.com/gpu.arch
  # - nvidia.com/gpu.model
  recognizedAnnotations:
    - custom.annotation/one
    - custom.annotation/two

definitions:
  # Array of provisioner definitions
  # For example:
  - id: compute-nodes
    provisioner: static
    # and more provisioner-specific configuration ...

nodeAnnotations

Global annotations applied to all cluster nodes.

ParameterTypeRequiredDescriptionExample
nodeAnnotationsmap[string]stringNoKey-value pairs of annotations applied globally to all nodescluster.name: "production"

Example:

nodeAnnotations:
  cluster.name: "hpc-cluster-01"
  datacenter: "us-west-2"
  environment: "production"
  cost.center: "research"

softwareTokens

Software license token limits for concurrent usage control.

Software tokens are currently on the roadmap but not yet implemented.
ParameterTypeRequiredDescriptionExample
softwareTokensmap[string]uint32NoSoftware name to maximum concurrent license count mappingmatlab: 25

Example:

softwareTokens:
  matlab: 25
  ansys: 15
  comsol: 8
  abaqus: 10

Scheduler Parameters

scheduler:
  # maximum number of requests in queue processed by scheduling iteration
  queueDepth: 64
  # Recognized annotations which could be passed by workflow
  # jobs to the scheduler to match against provisioner definition
  # annotations.
  # By default the recognized annotations are:
  # - nvidia.com/gpu.arch
  # - nvidia.com/gpu.model
  recognizedAnnotations:
    - custom.annotation/one
    - custom.annotation/two
ParameterTypeRequiredDescriptionExample
queueDepthuint32NoScheduler queue depth (default: 64)128
recognizedAnnotations[]stringNo Set of recognized annotations used to match job/definition["custom.annotation/one"]

Provisioner Definitions

Each definition in the definitions array represents a compute resource provisioner.

Common Definition Parameters

These parameters are available for all provisioner types:

ParameterTypeRequiredDescriptionExample
idstringYesUnique identifier for the provisioner definition"compute-nodes"
annotationsmap[string]stringNoKey-value pairs of annotations specific to this definitionnode.type: "compute"
provisionerstringYesProvisioner backend type: static, aws, slurm, pbs"static"
policystringNoExpression-based policy controlling access to this definitionrequest.owner.organization_id == "research"
ttluint32NoDefines node lifetime in seconds (ignored for static provisioner)86400
exclusivestringNoNode exclusive level: empty or none (default, shared), job (exclusive to one job), or workflow (exclusive to one workflow)"job"
provisionerSpecobjectYesProvisioner-specific configuration (see sections below)-

Node Exclusive

The exclusive parameter controls how nodes provisioned by this definition are shared among jobs:

  • If not specified or empty, nodes are shared and can run multiple jobs simultaneously. Multiple jobs from the same or different workflows can be scheduled on the same node based on available resources.

  • job: Nodes are exclusive to a single job allocation. Once a job is assigned to the node, no other jobs can use it until the job completes and the node is cleaned up. This ensures complete isolation at the job level.

  • workflow: Nodes are exclusive to a single workflow. All jobs within the same workflow can share the node, but jobs from other workflows cannot use it. This is useful for workflows that need dedicated resources but want to share nodes across their jobs.

Example:

definitions:
  # Shared nodes for general workloads
  - id: shared-compute
    provisioner: static
    exclusive: none
    provisionerSpec:
      condition: hostname() matches "shared-[0-9]+"

  # Job-exclusive nodes for sensitive workloads
  - id: exclusive-compute
    provisioner: pbs
    exclusive: job
    ttl: 3600
    provisionerSpec:
      cpu: 8
      memory: "32GiB"
      queue: "workq"

Static Provisioner Specifications

Static provisioners manage physical or pre-allocated compute resources.

Static provisionerSpec Parameters

ParameterTypeRequiredDescriptionExample
conditionstringYesExpression-based condition for node matchinghostname() matches "compute-[0-9]+"
costPerHourfloat64NoCost per hour for resource usage (must be ≥ 0)0.25

Static Condition Expression Variables

The condition field supports these built-in variables and functions:

System Information (uname)

VariableTypeDescriptionExample Value
uname.sysnamestringOperating system name"Linux"
uname.nodenamestringNetwork node hostname"compute-001"
uname.releasestringOperating system release"5.4.0-74-generic"
uname.versionstringOperating system version"#83-Ubuntu SMP"
uname.machinestringHardware machine type"x86_64", "aarch64"
uname.domainnamestringNetwork domain name"cluster.local"

Operating System Details (osrelease)

VariableTypeDescriptionExample Value
osrelease.namestringOS name"Ubuntu"
osrelease.idstringOS identifier"ubuntu"
osrelease.id_likestringSimilar OS identifiers"debian"
osrelease.versionstringOS version string"20.04.3 LTS (Focal Fossa)"
osrelease.version_idstringOS version identifier"20.04"
osrelease.version_codenamestringOS version codename"focal"

CPU Information (cpuinfo)

VariableTypeDescriptionExample Value
cpuinfo.vendor_idstringCPU vendor"GenuineIntel", "AuthenticAMD"
cpuinfo.cpu_familyuintCPU family number6
cpuinfo.modeluintCPU model number158
cpuinfo.model_namestringCPU model name string"Intel(R) Xeon(R) CPU E5-2680 v4"
cpuinfo.microcodeuintMicrocode version240
cpuinfo.cpu_coresuintNumber of physical CPU cores16

Hardware Detection Functions

FunctionReturn TypeDescriptionExample
hostname()stringReturns current hostname"compute-001"
modalias.match(pattern)boolMatches hardware modalias patternsmodalias.match("pci:v000010DEd*")

Common Modalias Patterns

# NVIDIA GPU (any model)
modalias.match("pci:v000010DEd*sv*sd*bc03sc*i*")

# Specific NVIDIA GPU models
modalias.match("pci:v000010DEd00001B06sv*sd*bc03sc*i*")  # GTX 1080 Ti
modalias.match("pci:v000010DEd00001E07sv*sd*bc03sc*i*")  # RTX 2080 Ti

# Intel Ethernet controllers
modalias.match("pci:v00008086d*sv*sd*bc02sc00i*")

# Mellanox InfiniBand adapters
modalias.match("pci:v000015B3d*sv*sd*bc0Csc06i*")

You can also easily get the modalias for all the PCI devices on a node to match a specific device with the following one-liner:

$ IFS=$'\n'; for d in $(lspci); do modalias=$(cat /sys/bus/pci/devices/0000\:${d%% *}/modalias); echo "$modalias -> ${d#* }"; done

pci:v00008086d00004641sv00001D05sd00001174bc06sc00i00 -> Host bridge: Intel Corporation 12th Gen Core Processor Host Bridge/DRAM Registers (rev 02)
pci:v00008086d0000460Dsv00000000sd00000000bc06sc04i00 -> PCI bridge: Intel Corporation 12th Gen Core Processor PCI Express x16 Controller #1 (rev 02)
pci:v00008086d000046A6sv00001D05sd00001174bc03sc00i00 -> VGA compatible controller: Intel Corporation Alder Lake-P GT2 [Iris Xe Graphics] (rev 0c)
[snip...]

Static Provisioner Examples

definitions:
  # Basic compute nodes
  - id: compute-standard
    provisioner: static
    provisionerSpec:
      condition: |-
        hostname() matches "compute-[0-9]{3}" &&
        cpuinfo.vendor_id == "GenuineIntel" &&
        cpuinfo.cpu_cores >= 16
      costPerHour: 0.40

  # GPU nodes
  - id: gpu-nodes
    provisioner: static
    provisionerSpec:
      condition: |-
        hostname() matches "gpu-[0-9]+" &&
        modalias.match("pci:v000010DEd*sv*sd*bc03sc*i*")
      costPerHour: 2.50

  # High-memory nodes
  - id: highmem-nodes
    provisioner: static
    provisionerSpec:
      condition: |-
        hostname() matches "mem-[0-9]+" &&
        cpuinfo.cpu_cores >= 64
      costPerHour: 1.75

Static Provisioner Condition Examples

Operating System Matching

condition: |-
  osrelease.id == "ubuntu" &&
  osrelease.version_id >= "20.04"

CPU Architecture and Vendor

condition: |-
  uname.machine == "x86_64" &&
  cpuinfo.vendor_id == "GenuineIntel" &&
  cpuinfo.cpu_cores >= 16

Hostname Pattern Matching

condition: |-
  let compute_regex = "compute-[0-9]{3}";
  let gpu_regex = "gpu-[0-9]{2}";
  hostname() matches compute_regex || hostname() matches gpu_regex

Hardware Device Detection

condition: |-
  // Match NVIDIA GPU devices
  modalias.match("pci:v000010DEd*sv*sd*bc03sc*i*") &&
  cpuinfo.cpu_cores >= 8

Complex Multi-Condition Logic

condition: |-
  let is_compute_node = hostname() matches "compute-[0-9]+";
  let is_intel_cpu = cpuinfo.vendor_id == "GenuineIntel";
  let is_ubuntu = osrelease.id == "ubuntu";
  let has_enough_cores = cpuinfo.cpu_cores >= 16;

  is_compute_node && is_intel_cpu && is_ubuntu && has_enough_cores

AWS Provisioner Specifications

AWS provisioners support dynamic EC2 instance provisioning.

AWS provisionerSpec Parameters

ParameterTypeRequiredDescriptionExample
instanceTypestringYesEC2 instance type or wildcard pattern"t3.large", "c5.*"
spotboolNoUse spot instancestrue, false

AWS Instance Type Expansion

AWS provisioners support wildcard patterns that automatically expand to individual instance types:

  • t3.* expands to t3.nano, t3.micro, t3.small, etc.
  • c5.* expands to c5.large, c5.xlarge, c5.2xlarge, etc.
  • p3.* expands to p3.2xlarge, p3.8xlarge, p3.16xlarge

When using wildcards, the ${spec.instanceType} placeholder in the definition ID is replaced with the actual instance type.

AWS Provisioner Examples

definitions:
  # Spot instances for cost optimization
  - id: aws-${spec.instanceType}-spot
    provisioner: aws
    provisionerSpec:
      instanceType: t3.*
      spot: true
    policy: |-
      request.job_ttl <= 3600

  # On-demand compute instances
  - id: aws-${spec.instanceType}
    provisioner: aws
    provisionerSpec:
      instanceType: c5.*
      spot: false
    policy: |-
      request.job_kind == "service"

  # GPU instances for ML workloads
  - id: aws-${spec.instanceType}-gpu
    provisioner: aws
    provisionerSpec:
      instanceType: p3.*
      spot: false
    policy: |-
      request.job_resource.devices["nvidia.com/gpu"] > 0

Slurm Provisioner Specifications

Slurm provisioners integrate with existing Slurm clusters.

Slurm provisionerSpec Parameters

ParameterTypeRequiredDescriptionExample
costPerHourfloat64NoCost per hour for resource usage (must be ≥ 0)0.30
cpuintYesNumber of CPU cores (must be > 0)16
memorystringYesMemory specification"64GiB"
partitionstringYesSlurm partition name"compute"

Slurm Provisioner Examples

definitions:
  # Standard compute partition
  - id: slurm-compute
    provisioner: slurm
    provisionerSpec:
      costPerHour: 0.30
      cpu: 16
      memory: "64GiB"
      partition: "compute"
    policy: |-
      request.job_resource.cpu.cores <= 16

  # GPU partition
  - id: slurm-gpu
    provisioner: slurm
    ttl: 43200 # node lifetime set to 12h
    provisionerSpec:
      costPerHour: 1.80
      cpu: 8
      memory: "32GiB"
      partition: "gpu"
    policy: |-
      request.job_resource.devices["nvidia.com/gpu"] > 0

PBS Provisioner Specifications

PBS provisioners integrate with OpenPBS/PBS Pro clusters.

PBS provisionerSpec Parameters

ParameterTypeRequiredDescriptionExample
cpuintYesNumber of CPU cores (must be > 0)8
memorystringYesMemory specification"32GiB"
gpusintNoNumber of GPUs (must be ≥ 0)1
queuestringYesPBS queue name"workq"
costPerHourfloat64NoCost per hour for resource usage (must be ≥ 0)0.30

PBS Provisioner Examples

definitions:
  # Standard PBS queue
  - id: pbs-compute
    provisioner: pbs
    provisionerSpec:
      cpu: 8
      memory: "32GiB"
      gpus: 0
      queue: "workq"
      costPerHour: 0.30

  # GPU queue
  - id: pbs-gpu
    provisioner: pbs
    provisionerSpec:
      cpu: 4
      memory: "16GiB"
      gpus: 1
      queue: "gpu"
      costPerHour: 1.00

Policy Expressions

Policy expressions control access to provisioner definitions and use the same expression language as static conditions.

Available Policy Variables

Request Owner Information

VariableTypeDescriptionExample
request.owner.idstringUser ID"user-123"
request.owner.organization_idstringOrganization ID"org-research"
request.owner.emailstringUser email address"user@example.com"
request.owner.cluster_idstringCluster ID"cluster-01"
request.owner.account_idstringGroup ID"account-456"

Job Information

VariableTypeDescriptionExample
request.job_kindstringJob type"job", "service", "internal"
request.job_ttlintJob time-to-live in seconds3600
request.job_annotationsmap[string]stringJob annotation key-value pairsrequest.job_annotations["tier"]
request.multinode_jobboolTrue for multi-node jobstrue
request.task_array_jobboolTrue for task array jobsfalse

Resource Requirements

VariableTypeDescriptionExample
request.job_resource.cpu.affinitystringCPU affinity"none", "core", "socket", "numa"
request.job_resource.cpu.coresintNumber of CPU cores requested4
request.job_resource.cpu.threadsboolHyperthreading enabledtrue
request.job_resource.cpu.socketsintNumber of CPU sockets1
request.job_resource.mem.bytesintMemory in bytes4294967296
request.job_resource.mem.by_coreboolMemory allocation per corefalse
request.job_resource.devicesmap[string]uint32Device requestsrequest.job_resource.devices["nvidia.com/gpu"]
request.job_resource.exclusiveboolExclusive node accesstrue

Policy Examples

User and Organization Access Control

policy: |-
  request.owner.organization_id == "research" &&
  request.owner.account_id in ["2f0a8f4e-0a16-47d5-b541-05d3f9f44910", "c602cf05-7604-4f11-a690-79552b1fdbdd"]

Resource-Based Restrictions

policy: |-
  request.job_resource.cpu.cores <= 32 &&
  request.job_resource.mem.bytes <= (256 * 1024 * 1024 * 1024) &&
  !request.job_resource.exclusive

Job Type and Duration Policies

policy: |-
  request.job_kind == "job" &&
  request.job_ttl >= 300 &&
  request.job_ttl <= 86400

GPU Access Control

policy: |-
  let gpu_count = request.job_resource.devices["nvidia.com/gpu"];
  gpu_count > 0 && gpu_count <= 4 &&
  request.owner.organization_id == "280abb59-b765-4cdd-a538-6ab8f9b7927c"

Annotation-Based Policies

policy: |-
  request.job_annotations["priority"] == "high" &&
  request.job_annotations["project"] in ["proj-a", "proj-b"] &&
  request.owner.email matches "*@ciq.com"

Multi-Node Job Restrictions

policy: |-
  request.multinode_job ?
  request.job_resource.cpu.cores >= 4 &&
  request.owner.account_id == "092403fe-12ef-4465-bce4-18292fec13c8"
  :
  request.job_resource.cpu.cores <= 16

Time-Based Access

policy: |-
  let current_hour = time.Now().Hour();
  let is_business_hours = current_hour >= 9 && current_hour <= 17;

  request.job_annotations["priority"] == "low" ? !is_business_hours : true