Provisioner Configuration Reference

This document provides an exhaustive reference for all configuration parameters available in the Fuzzball central configuration system. The central configuration uses YAML format and supports node provisioners across multiple node provisioner backends with their specific parameters.

Configuration Structure Overview

# Global cluster settings
nodeAnnotations:
  # Map of global annotations applied to all nodes
  # For example:
  global.annotation: "cluster-wide-value"
  environment: "production"

softwareTokens:
  # Map of software license token limits
  # For example:
  matlab: 20
  ansys: 10

scheduler:
  queueDepth: 64
  # Annotation keys that scheduler annotation matching should skip when
  # comparing workflow job annotations against provisioner definitions'
  # Resource.Annotations. Add keys that are routed by a provisioner-
  # definition `policy:` expression rather than by Resource.Annotations.
  ignoredAnnotations:
    - nodepool

definitions:
  # Array of node provisioner definitions
  # For example:
  - id: compute-nodes
    provisioner: static
    # and more provisioner-specific configuration ...

nodeAnnotations

Global annotations applied to all cluster nodes.

Parameter	Type	Required	Description	Example
`nodeAnnotations`	`map[string]string`	No	Key-value pairs of annotations applied globally to all nodes	`cluster.name: "production"`

Example:

nodeAnnotations:
  cluster.name: "hpc-cluster-01"
  datacenter: "us-west-2"
  environment: "production"
  cost.center: "research"

softwareTokens

Software license token limits for concurrent usage control.

Software tokens are currently on the roadmap but not yet implemented.

Parameter	Type	Required	Description	Example
`softwareTokens`	`map[string]uint32`	No	Software name to maximum concurrent license count mapping	`matlab: 25`

Example:

softwareTokens:
  matlab: 25
  ansys: 15
  comsol: 8
  abaqus: 10

Scheduler Parameters

scheduler:
  # maximum number of requests in queue processed by scheduling iteration
  queueDepth: 64
  # how often the scheduler processes the queue
  interval: 60s
  # Cluster-wide expression (expr-lang) computing each allocation's scheduling
  # priority every tick; a higher value is scheduled sooner. When unset, the
  # default sums the organization, account, user, and workflow priority inputs
  # and ages each allocation by one unit per hour spent in the queue.
  priority: "organization.priority + account.priority + user.priority + workflow.priority"
  # Preemption: allow a blocked higher-priority allocation to evict a
  # preemptible, lower-priority running allocation. Disabled by default.
  # Evictions only happen when they make the blocked allocation placeable.
  preemptionEnabled: false
  # Pool utilization percentage (0-100) at or above which preemption may
  # evict; below it the blocked allocation is served by free or newly
  # provisioned capacity instead.
  preemptionThreshold: 75
  preemptionGap: 10
  minPreemptionRuntime: 60s
  # Allow evicting a set of victims for one blocked allocation in a single
  # scheduler pass (instead of at most one). Disabled by default.
  preemptionMultiVictimEnabled: false
  maxEvictionsPerTick: 8
  preemptionDrainTimeout: 30s
  # One-level (EASY-style) backfill: let lower-priority allocations fill a gap
  # behind a blocked head allocation without delaying it. Enabled by default;
  # set to false to schedule each node pool strictly in priority order — a
  # blocked allocation then stops lower-priority work on the same pool for
  # that pass.
  backfillEnabled: true
  # How long a running internal allocation (image or data fetch) is expected
  # to hold its resources, used by the backfill availability estimate in
  # place of the internal job's TTL. Estimate only: it never terminates a
  # fetch that runs longer.
  internalJobReleaseEstimate: 10m
  # Annotation keys that scheduler annotation matching should skip when
  # comparing job annotations against each candidate definition's
  # Resource.Annotations. See "Scheduler annotation matching" below.
  ignoredAnnotations:
    - nodepool

Parameter	Type	Required	Description	Example
`queueDepth`	`uint32`	No	Scheduler queue depth (default: `64`)	`128`
`interval`	`duration`	No	How often the scheduler processes the queue (default: `60s`).	`30s`
`priority`	`string`	No	Cluster-wide expr-lang expression that computes each allocation’s scheduling priority every tick (higher is scheduled sooner). When empty, defaults to `organization.priority + account.priority + user.priority + workflow.priority` plus one priority unit per hour the allocation has spent in the queue. The per-entity priority inputs are set by admins via `group`/`organization`/`user update --priority`; the per-workflow input via `workflow start --priority` (signed, `<= 0`).	`"user.priority + workflow.priority"`
`preemptionEnabled`	`bool`	No	Enables the preemption pass, which may evict a preemptible, lower-priority running allocation in favor of a blocked higher-priority one (default: `false`).	`true`
`preemptionThreshold`	`float`	No	Pool utilization percentage (0–100) at or above which the preemption pass may evict; `0` is treated as unset (default: `75`). Utilization is measured per blocked allocation against the nodes it could actually run on, as the highest-utilized resource dimension it consumes (cores, memory, or a requested device kind such as GPUs — a saturated device kind the allocation does not request is ignored). Below the threshold — or while a dynamic provisioner definition can still provision nodes under its `maxNodes` cap — preemption is skipped and the blocked allocation waits for free or newly provisioned capacity instead. Note that utilization is a pool-level aggregate with no per-node fit awareness: a below-threshold pool whose free capacity is fragmented across nodes too small for the blocked allocation’s shape waits for natural drain rather than triggering preemption — a rising `below_threshold` miss count alongside a low `pool_utilization` gauge is the signal to lower the threshold.	`90`
`preemptionGap`	`float`	No	Minimum effective-priority delta between a blocked allocation and an eviction candidate before that candidate may be preempted (default: `10`).	`20`
`minPreemptionRuntime`	`duration`	No	Anti-thrash floor: a running allocation cannot be preempted until it has been running at least this long (default: `60s`).	`5m`
`preemptionMultiVictimEnabled`	`bool`	No	Allows the preemption pass to evict a set of victims for one blocked allocation in a single scheduler pass when no single victim frees enough capacity (default: `false` — at most one victim per blocked allocation per pass). Evictions always require that they make the blocked allocation placeable.	`true`
`maxEvictionsPerTick`	`uint32`	No	Caps the total victims the preemption pass may evict in one scheduler pass, across all blocked allocations (default: `8`).	`16`
`preemptionDrainTimeout`	`duration`	No	How long the preemption pass waits for an eviction’s freed capacity to appear before it may select new victims for the same blocked allocation (default: `30s`).	`1m`
`backfillEnabled`	`bool`	No	Enables one-level (EASY-style) backfill, letting lower-priority allocations fill a gap behind a blocked head allocation without delaying it (default: `true`). Set to `false` to disable backfill; a blocked allocation then stops lower-priority work on the same node pool for that scheduling pass.	`false`
`internalJobReleaseEstimate`	`duration`	No	How long a running internal allocation (image or data fetch) is expected to hold its resources, used by the backfill availability estimate in place of the internal job’s TTL (default: `10m`). An estimate only — a fetch running past it is treated as releasing its resources imminently and is never terminated. Raise it on deployments where image pulls or data staging routinely take longer (e.g. slow WAN links) to keep availability estimates realistic.	`30m`
`ignoredAnnotations`	`[]string`	No	Annotation keys to skip during scheduler annotation matching. A specific enumerated set of platform-internal `fuzzball.io/*` keys (e.g. `fuzzball.io/workflow.id`, `fuzzball.io/job.name`) is always ignored automatically — but this is an allowlist, not a prefix match, so user-defined keys placed under `fuzzball.io/` are NOT auto-exempt. This list is for additional keys handled by `policy:` expressions on your provisioner definitions.	`["nodepool"]`

Scheduler annotation matching

A node provisioner has two scoring knobs that determine its fitness for a given workflow job:

definition.annotations — values matched key-by-key against the job’s annotations by scheduler annotation matching, with built-in matchers per key (string equality by default; the GPU dimensions use substring or numeric-minimum matchers; see Built-in matchers below).
definition.policy — an Expr expression evaluated against the job’s request; returns a boolean that gates eligibility.

The two knobs are independent: scheduler annotation matching does not look at definition.policy, and policy evaluation does not look at definition.annotations. Either can route on the same annotation key; a deployment is free to use one, the other, or both.

By default every annotation key on a workflow job must be matched by an entry in definition.annotations on each candidate definition; otherwise the candidate is rejected for that job. The platform’s own annotation keys — a specific enumerated set of fuzzball.io/* keys including workflow/job/account identifiers (fuzzball.io/workflow.id, fuzzball.io/job.name, …) and the provisioner-definition pinning key — are skipped automatically. This is an allowlist, not a prefix match: the fuzzball.io/ prefix is reserved for the platform, and any user-defined key placed under that prefix is not auto-exempt and will still need to be added to scheduler.ignoredAnnotations. Cluster admins should declare only deployment-specific keys (their own labels for routing, etc.) in that list.

When to add a key to `ignoredAnnotations`

Add an annotation key to ignoredAnnotations when routing for that key is handled by definition.policy rather than by definition.annotations. Without the entry, scheduler annotation matching would additionally require the key on every candidate definition’s annotations map, redundant with what the policy already evaluates.

For example, a deployment whose node provisioners look like

definitions:
  - id: pool-small
    provisioner: pbs
    policy: |-
      request.job_annotations["nodepool"] in ["pbs-small", "small", ""]
    # ... no definition.annotations["nodepool"] needed; the policy handles it

should set:

scheduler:
  ignoredAnnotations:
    - nodepool

so that a workflow with resource.annotations.nodepool: small reaches the policy without being rejected by scheduler annotation matching first.

Built-in matchers

Scheduler annotation matching uses string equality (ExactMatch) by default. The following GPU dimensions have built-in non-exact matchers:

Annotation key	Matcher
`nvidia.com/gpu.arch`	`ExactMatch`
`nvidia.com/gpu.model`	`SubstringMatch` (case-insensitive)
`nvidia.com/gpu.family`	`ExactMatch`
`nvidia.com/gpu.product`	`SubstringMatch` (case-insensitive)
`nvidia.com/gpu.memory`	`MinimumMatch` (definition value ≥ requested)
`nvidia.com/gpu.compute.major`	`MinimumMatch`
`nvidia.com/gpu.compute.minor`	`ExactMatch`
`nvidia.com/gpu.count`	`MinimumMatch`

Node Provisioners

Each entry in the definitions array is a node provisioner: a configuration that, for a chosen backend, tells Fuzzball how to obtain a class of functionally-identical compute nodes with policy attached. The serialized form of a node provisioner is referred to as a node provisioner definition.

Common Parameters

These parameters are available for node provisioners across all backends:

Parameter	Type	Required	Description	Example
`id`	`string`	Yes	Unique identifier for the provisioner definition	`"compute-nodes"`
`annotations`	`map[string]string`	No	Key-value pairs of annotations specific to this definition	`node.type: "compute"`
`provisioner`	`string`	Yes	Node provisioner backend: `static`, `aws`, `slurm`, `pbs`, `coreweave`	`"static"`
`policy`	`string`	No	Expression-based policy controlling access to this definition	`request.owner.organization_id == "research"`
`ttl`	`uint32`	No	Node lifetime in seconds after provisioning. Required and must be > 0 for `pbs` and `slurm` definitions; must be 0 or omitted for `static` definitions (a non-zero value is rejected).	`86400`
`ttlBuffer`	`uint32`	No	Per-node buffer added to the allocation TTL, scaled by the number of nodes in the allocation. Accounts for provisioning and scheduling delays in multi-node jobs. Must be 0 or omitted for `static` definitions. Ignored when 0.	`300`
`exclusive`	`string`	No	Node exclusive level: empty or `none` (default, shared), `job` (exclusive to one job), or `workflow` (exclusive to one workflow)	`"job"`
`provisionerSpec`	`object`	Yes	Provisioner-specific configuration (see sections below)	-

ttl and ttlBuffer are both uint32 values. When a non-static provisioner definition has ttlBuffer > 0 and a non-zero allocation TTL is being set for a node, the scheduler adds ttlBuffer × nodeCount to the allocation TTL before submitting the provisioning request. Both the multiplication and addition use saturating arithmetic: if either result would exceed 4,294,967,295 (roughly 136 years), it is clamped to that value rather than wrapping around. This prevents misconfigured large values from silently producing a much shorter TTL and causing nodes to be terminated before jobs complete.
This formula does not apply to static provisioners, or when either the allocation TTL or ttlBuffer is 0.

Node Exclusive

The exclusive parameter controls how nodes provisioned by this definition are shared among jobs:

If not specified or empty, nodes are shared and can run multiple jobs simultaneously. Multiple jobs from the same or different workflows can be scheduled on the same node based on available resources.
job: Nodes are exclusive to a single job allocation. Once a job is assigned to the node, no other jobs can use it until the job completes and the node is cleaned up. This ensures complete isolation at the job level.
workflow: Nodes are exclusive to a single workflow. All jobs within the same workflow can share the node, but jobs from other workflows cannot use it. This is useful for workflows that need dedicated resources but want to share nodes across their jobs.

Example:

definitions:
  # Shared nodes for general workloads
  - id: shared-compute
    provisioner: static
    exclusive: none
    provisionerSpec:
      condition: hostname() matches "shared-[0-9]+"

  # Job-exclusive nodes for sensitive workloads
  - id: exclusive-compute
    provisioner: pbs
    exclusive: job
    ttl: 3600
    provisionerSpec:
      cpu: 8
      memory: "32GiB"
      queue: "workq"

Static Provisioner Specifications

Static provisioners manage physical or pre-allocated compute resources.

Static provisionerSpec Parameters

Parameter	Type	Required	Description	Example
`condition`	`string`	Yes	Expression-based condition for node matching	`hostname() matches "compute-[0-9]+"`
`costPerHour`	`float64`	No	Cost per hour for resource usage (must be ≥ 0)	`0.25`

Static Condition Expression Variables

The condition field supports these built-in variables and functions:

System Information (`uname`)

Variable	Type	Description	Example Value
`uname.sysname`	`string`	Operating system name	`"Linux"`
`uname.nodename`	`string`	Network node hostname	`"compute-001"`
`uname.release`	`string`	Operating system release	`"5.4.0-74-generic"`
`uname.version`	`string`	Operating system version	`"#83-Ubuntu SMP"`
`uname.machine`	`string`	Hardware machine type	`"x86_64"`, `"aarch64"`
`uname.domainname`	`string`	Network domain name	`"cluster.local"`

Operating System Details (`osrelease`)

Variable	Type	Description	Example Value
`osrelease.name`	`string`	OS name	`"Ubuntu"`
`osrelease.id`	`string`	OS identifier	`"ubuntu"`
`osrelease.id_like`	`string`	Similar OS identifiers	`"debian"`
`osrelease.version`	`string`	OS version string	`"20.04.3 LTS (Focal Fossa)"`
`osrelease.version_id`	`string`	OS version identifier	`"20.04"`
`osrelease.version_codename`	`string`	OS version codename	`"focal"`

CPU Information (`cpuinfo`)

Variable	Type	Description	Example Value
`cpuinfo.vendor_id`	`string`	CPU vendor	`"GenuineIntel"`, `"AuthenticAMD"`
`cpuinfo.cpu_family`	`uint`	CPU family number	`6`
`cpuinfo.model`	`uint`	CPU model number	`158`
`cpuinfo.model_name`	`string`	CPU model name string	`"Intel(R) Xeon(R) CPU E5-2680 v4"`
`cpuinfo.microcode`	`uint`	Microcode version	`240`
`cpuinfo.cpu_cores`	`uint`	Number of physical CPU cores	`16`

Hardware Detection Functions

Function	Return Type	Description	Example
`hostname()`	`string`	Returns current hostname	`"compute-001"`
`modalias.match(pattern)`	`bool`	Matches hardware modalias patterns	`modalias.match("pci:v000010DEd*")`

Common Modalias Patterns

# NVIDIA GPU (any model)
modalias.match("pci:v000010DEd*sv*sd*bc03sc*i*")

# Specific NVIDIA GPU models
modalias.match("pci:v000010DEd00001B06sv*sd*bc03sc*i*")  # GTX 1080 Ti
modalias.match("pci:v000010DEd00001E07sv*sd*bc03sc*i*")  # RTX 2080 Ti

# Intel Ethernet controllers
modalias.match("pci:v00008086d*sv*sd*bc02sc00i*")

# Mellanox InfiniBand adapters
modalias.match("pci:v000015B3d*sv*sd*bc0Csc06i*")

You can also easily get the modalias for all the PCI devices on a node to match a specific device with the following one-liner:

$ IFS=$'\n'; for d in $(lspci); do modalias=$(cat /sys/bus/pci/devices/0000\:${d%% *}/modalias); echo "$modalias -> ${d#* }"; done

pci:v00008086d00004641sv00001D05sd00001174bc06sc00i00 -> Host bridge: Intel Corporation 12th Gen Core Processor Host Bridge/DRAM Registers (rev 02)
pci:v00008086d0000460Dsv00000000sd00000000bc06sc04i00 -> PCI bridge: Intel Corporation 12th Gen Core Processor PCI Express x16 Controller #1 (rev 02)
pci:v00008086d000046A6sv00001D05sd00001174bc03sc00i00 -> VGA compatible controller: Intel Corporation Alder Lake-P GT2 [Iris Xe Graphics] (rev 0c)
[snip...]

Static Provisioner Examples

definitions:
  # Basic compute nodes
  - id: compute-standard
    provisioner: static
    provisionerSpec:
      condition: |-
        hostname() matches "compute-[0-9]{3}" &&
        cpuinfo.vendor_id == "GenuineIntel" &&
        cpuinfo.cpu_cores >= 16
      costPerHour: 0.40

  # GPU nodes
  - id: gpu-nodes
    provisioner: static
    provisionerSpec:
      condition: |-
        hostname() matches "gpu-[0-9]+" &&
        modalias.match("pci:v000010DEd*sv*sd*bc03sc*i*")
      costPerHour: 2.50

  # High-memory nodes
  - id: highmem-nodes
    provisioner: static
    provisionerSpec:
      condition: |-
        hostname() matches "mem-[0-9]+" &&
        cpuinfo.cpu_cores >= 64
      costPerHour: 1.75

Static Provisioner Condition Examples

Operating System Matching

condition: |-
  osrelease.id == "ubuntu" &&
  osrelease.version_id >= "20.04"

CPU Architecture and Vendor

condition: |-
  uname.machine == "x86_64" &&
  cpuinfo.vendor_id == "GenuineIntel" &&
  cpuinfo.cpu_cores >= 16

Hostname Pattern Matching

condition: |-
  let compute_regex = "compute-[0-9]{3}";
  let gpu_regex = "gpu-[0-9]{2}";
  hostname() matches compute_regex || hostname() matches gpu_regex

Hardware Device Detection

condition: |-
  // Match NVIDIA GPU devices
  modalias.match("pci:v000010DEd*sv*sd*bc03sc*i*") &&
  cpuinfo.cpu_cores >= 8

Complex Multi-Condition Logic

condition: |-
  let is_compute_node = hostname() matches "compute-[0-9]+";
  let is_intel_cpu = cpuinfo.vendor_id == "GenuineIntel";
  let is_ubuntu = osrelease.id == "ubuntu";
  let has_enough_cores = cpuinfo.cpu_cores >= 16;

  is_compute_node && is_intel_cpu && is_ubuntu && has_enough_cores

AWS Provisioner Specifications

AWS provisioners support dynamic EC2 instance provisioning.

AWS provisionerSpec Parameters

Parameter	Type	Required	Description	Example
`instanceType`	`string`	Yes	EC2 instance type or wildcard pattern	`"t3.large"`, `"c5.*"`
`spot`	`bool`	No	Use spot instances	`true`, `false`

AWS Instance Type Expansion

AWS provisioners support wildcard patterns that automatically expand to individual instance types:

t3.* expands to t3.nano, t3.micro, t3.small, etc.
c5.* expands to c5.large, c5.xlarge, c5.2xlarge, etc.
p3.* expands to p3.2xlarge, p3.8xlarge, p3.16xlarge

When using wildcards, the ${spec.instanceType} placeholder in the definition ID is replaced with the actual instance type.

AWS Provisioner Examples

definitions:
  # Spot instances for cost optimization
  - id: aws-${spec.instanceType}-spot
    provisioner: aws
    provisionerSpec:
      instanceType: t3.*
      spot: true
    policy: |-
      request.job_ttl <= 3600

  # On-demand compute instances
  - id: aws-${spec.instanceType}
    provisioner: aws
    provisionerSpec:
      instanceType: c5.*
      spot: false
    policy: |-
      request.job_kind == "service"

  # GPU instances for ML workloads
  - id: aws-${spec.instanceType}-gpu
    provisioner: aws
    provisionerSpec:
      instanceType: p3.*
      spot: false
    policy: |-
      request.job_resource.devices["nvidia.com/gpu"] > 0

Slurm Provisioner Specifications

Slurm provisioners integrate with existing Slurm clusters.

Slurm provisionerSpec Parameters

Parameter	Type	Required	Description	Example
`costPerHour`	`float64`	No	Cost per hour for resource usage (must be ≥ 0)	`0.30`
`cpu`	`int`	Yes	Number of CPU cores (must be > 0)	`16`
`memory`	`string`	Yes	Memory specification	`"64GiB"`
`partition`	`string`	Yes	Slurm partition name	`"compute"`

Slurm Provisioner Examples

definitions:
  # Standard compute partition
  - id: slurm-compute
    provisioner: slurm
    ttl: 86400 # node lifetime set to 24h
    provisionerSpec:
      costPerHour: 0.30
      cpu: 16
      memory: "64GiB"
      partition: "compute"
    policy: |-
      request.job_resource.cpu.cores <= 16

  # GPU partition
  - id: slurm-gpu
    provisioner: slurm
    ttl: 43200 # node lifetime set to 12h
    provisionerSpec:
      costPerHour: 1.80
      cpu: 8
      memory: "32GiB"
      partition: "gpu"
    policy: |-
      request.job_resource.devices["nvidia.com/gpu"] > 0

PBS Provisioner Specifications

PBS provisioners integrate with OpenPBS/PBS Pro clusters.

PBS provisionerSpec Parameters

Parameter	Type	Required	Description	Example
`cpu`	`int`	Yes	Number of CPU cores (must be > 0)	`8`
`memory`	`string`	Yes	Memory specification	`"32GiB"`
`gpus`	`int`	No	Number of GPUs (must be ≥ 0)	`1`
`queue`	`string`	Yes	PBS queue name	`"workq"`
`costPerHour`	`float64`	No	Cost per hour for resource usage (must be ≥ 0)	`0.30`

PBS Provisioner Examples

definitions:
  # Standard PBS queue
  - id: pbs-compute
    provisioner: pbs
    ttl: 86400 # node lifetime set to 24h
    provisionerSpec:
      cpu: 8
      memory: "32GiB"
      gpus: 0
      queue: "workq"
      costPerHour: 0.30

  # GPU queue
  - id: pbs-gpu
    provisioner: pbs
    ttl: 86400 # node lifetime set to 24h
    provisionerSpec:
      cpu: 4
      memory: "16GiB"
      gpus: 1
      queue: "gpu"
      costPerHour: 1.00

CoreWeave Provisioner Specifications

CoreWeave provisioners support dynamic instance provisioning on CoreWeave’s cloud infrastructure, optimized for GPU workloads.

CoreWeave provisionerSpec Parameters

Parameter	Type	Required	Description	Example
`instanceType`	`string`	Yes	CoreWeave instance type identifier	`"cd-a40-24gb"`
`costPerHour`	`float64`	No	Cost per hour for resource usage (must be ≥ 0)	`15.00`

CoreWeave Provisioner Examples

definitions:
  # CPU instance for standard compute workloads
  - id: coreweave-cpu-small
    provisioner: coreweave
    provisionerSpec:
      instanceType: "cd-gp-a192-genoa"
      costPerHour: 7.78
    policy: |-
      request.job_resource.cpu.cores >= 4 &&
      request.job_resource.cpu.cores <= 32

  # GPU instance for ML/AI workloads
  - id: coreweave-gpu-a40
    provisioner: coreweave
    ttl: 3600
    provisionerSpec:
      instanceType: "cd-a40-24gb"
      costPerHour: 15.00
    policy: |-
      request.job_resource.devices["nvidia.com/gpu"] > 0

CoreWeave provisioners support both dynamic provisioning (on-demand node creation) and static provisioning (pre-existing node pools). For static provisioning with pre-existing CoreWeave node pools, see the CoreWeave Static Provisioning guide.

Policy Expressions

Policy expressions control access to node provisioners and use the same expression language as static conditions.

Available Policy Variables

Request Owner Information

Variable	Type	Description	Example
`request.owner.id`	`string`	User ID	`"user-123"`
`request.owner.organization_id`	`string`	Organization ID	`"org-research"`
`request.owner.email`	`string`	User email address	`"user@example.com"`
`request.owner.cluster_id`	`string`	Cluster ID	`"cluster-01"`
`request.owner.account_id`	`string`	Group ID	`"account-456"`

Job Information

Variable	Type	Description	Example
`request.job_kind`	`string`	Job type	`"job"`, `"service"`, `"internal"`
`request.job_ttl`	`int`	Job time-to-live in seconds	`3600`
`request.job_annotations`	`map[string]string`	Job annotation key-value pairs	`request.job_annotations["tier"]`
`request.multinode_job`	`bool`	True for multi-node jobs	`true`
`request.task_array_job`	`bool`	True for task array jobs	`false`
`request.multinode_nodes`	`int`	Node count requested by a multi-node job (`0` otherwise)	`4`
`request.task_array_concurrency`	`int`	Concurrency requested by a task array job (`0` otherwise)	`8`

Definition Information

Variable	Type	Description	Example
`definition.nodes`	`int`	The size of the pool this definition can offer an allocation. For a static definition, the count of usable nodes as they exist — including fully allocated ones, since a busy pool is still a pool — capped by an explicitly configured `maxNodes`. For a dynamic definition, its `maxNodes` capacity (what it may grow to).	`definition.nodes >= request.multinode_nodes`
`definition.max_nodes`	`int`	The definition’s `maxNodes`: the configured value when set; otherwise unlimited for a static definition, or the default cap (`16`) for a dynamic one.	`definition.max_nodes >= 4`

Resource Requirements

Variable	Type	Description	Example
`request.job_resource.cpu.affinity`	`string`	CPU affinity	`"none"`, `"core"`, `"socket"`, `"numa"`
`request.job_resource.cpu.cores`	`int`	Number of CPU cores requested	`4`
`request.job_resource.cpu.threads`	`bool`	Hyperthreading enabled	`true`
`request.job_resource.cpu.sockets`	`int`	Number of CPU sockets	`1`
`request.job_resource.mem.bytes`	`int`	Memory in bytes	`4294967296`
`request.job_resource.mem.by_core`	`bool`	Memory allocation per core	`false`
`request.job_resource.devices`	`map[string]uint32`	Device requests	`request.job_resource.devices["nvidia.com/gpu"]`
`request.job_resource.exclusive`	`bool`	Exclusive node access	`true`

Policy Examples

User and Organization Access Control

policy: |-
  request.owner.organization_id == "research" &&
  request.owner.account_id in ["2f0a8f4e-0a16-47d5-b541-05d3f9f44910", "c602cf05-7604-4f11-a690-79552b1fdbdd"]

Resource-Based Restrictions

policy: |-
  request.job_resource.cpu.cores <= 32 &&
  request.job_resource.mem.bytes <= (256 * 1024 * 1024 * 1024) &&
  !request.job_resource.exclusive

Job Type and Duration Policies

policy: |-
  request.job_kind == "job" &&
  request.job_ttl >= 300 &&
  request.job_ttl <= 86400

GPU Access Control

policy: |-
  let gpu_count = request.job_resource.devices["nvidia.com/gpu"];
  gpu_count > 0 && gpu_count <= 4 &&
  request.owner.organization_id == "280abb59-b765-4cdd-a538-6ab8f9b7927c"

Pool-Size Gating for Parallel Jobs

Reject multi-node or task-array jobs that can never be satisfied by this definition’s pool. Because definition.nodes counts a static pool’s usable nodes even while they are fully occupied, a busy pool keeps accepting jobs (they queue until nodes drain) instead of being rejected at its busiest. definition.nodes never exceeds definition.max_nodes — an explicitly configured maxNodes already caps it — so gating on definition.nodes alone is sufficient.

policy: |-
  request.multinode_nodes <= definition.nodes &&
  request.task_array_concurrency <= definition.nodes

Annotation-Based Policies

policy: |-
  request.job_annotations["priority"] == "high" &&
  request.job_annotations["project"] in ["proj-a", "proj-b"] &&
  request.owner.email matches "*@ciq.com"

Multi-Node Job Restrictions

policy: |-
  request.multinode_job ?
  request.job_resource.cpu.cores >= 4 &&
  request.owner.account_id == "092403fe-12ef-4465-bce4-18292fec13c8"
  :
  request.job_resource.cpu.cores <= 16

Time-Based Access

policy: |-
  let current_hour = time.Now().Hour();
  let is_business_hours = current_hour >= 9 && current_hour <= 17;

  request.job_annotations["priority"] == "low" ? !is_business_hours : true

Provisioner Configuration Reference

Configuration Structure Overview

nodeAnnotations

softwareTokens

Scheduler Parameters

Scheduler annotation matching

When to add a key to ignoredAnnotations

Built-in matchers

Node Provisioners

Common Parameters

Node Exclusive

Static Provisioner Specifications

Static provisionerSpec Parameters

Static Condition Expression Variables

System Information (uname)

Operating System Details (osrelease)

CPU Information (cpuinfo)

Hardware Detection Functions

Common Modalias Patterns

Static Provisioner Examples

Static Provisioner Condition Examples

Operating System Matching

CPU Architecture and Vendor

Hostname Pattern Matching

Hardware Device Detection

Complex Multi-Condition Logic

AWS Provisioner Specifications

AWS provisionerSpec Parameters

AWS Instance Type Expansion

AWS Provisioner Examples

Slurm Provisioner Specifications

Slurm provisionerSpec Parameters

Slurm Provisioner Examples

PBS Provisioner Specifications

PBS provisionerSpec Parameters

PBS Provisioner Examples

CoreWeave Provisioner Specifications

CoreWeave provisionerSpec Parameters

CoreWeave Provisioner Examples

Policy Expressions

Available Policy Variables

Request Owner Information

Job Information

Definition Information

Resource Requirements

Policy Examples

User and Organization Access Control

Resource-Based Restrictions

Job Type and Duration Policies

GPU Access Control

Pool-Size Gating for Parallel Jobs

Annotation-Based Policies

Multi-Node Job Restrictions

Time-Based Access

When to add a key to `ignoredAnnotations`

System Information (`uname`)

Operating System Details (`osrelease`)

CPU Information (`cpuinfo`)