Fuzzball Documentation
Toggle Dark/Light/Auto mode Toggle Dark/Light/Auto mode Toggle Dark/Light/Auto mode Back to homepage

Orchestrate Configuration

The Orchestrate configuration defines how Fuzzball connects to your Slurm or PBS cluster. This configuration is specified in the FuzzballOrchestrate Custom Resource Definition (CRD) and controls SSH authentication, connection parameters, and options specific to the batch scheduling system in use.

Please select either the Slurm or PBS tab to see the appropriate instructions for your environment.

Basic Configuration

The minimal configuration requires SSH connection details and authentication credentials:

apiVersion: deployment.ciq.com/v1alpha1
kind: FuzzballOrchestrate
metadata:
  name: fuzzball
  namespace: fuzzball-system
spec:
  orchestrator:
    enabled: true
    provisioner:
      enabled: true
      slurm:
        enabled: true
        sshHost: "slurm-head.example.com"
        sshPort: 22
        username: "fuzzball-service"
        password: "secure-password"  # OR use SSH key (recommended)
        skipHostKeyVerification: false

Authentication Methods

Password Authentication

Password authentication is the simplest method but less secure for production use:

spec:
  orchestrator:
    provisioner:
      slurm:
        enabled: true
        sshHost: "slurm-head.example.com"
        username: "fuzzball-service"
        password: "secure-password"

For production environments, use SSH key-based authentication:

spec:
  orchestrator:
    provisioner:
      slurm:
        enabled: true
        sshHost: "slurm-head.example.com"
        username: "fuzzball-service"
        sshPrivateKeyPem: |
          <full private key in PEM format>
        sshHostPublicKey: "10.0.0.4 ecdsa-sha2-nistp256 AAAAE2VjZHNhL...full-public-key="
Double check the format of the private key. At present, Fuzzball Slurm integration only supports ecdsa encryption and the key must be added in the format of output from the keyscan command.

SSH Key with Passphrase

If your SSH key is encrypted with a passphrase:

spec:
  orchestrator:
    provisioner:
      slurm:
        enabled: true
        sshHost: "slurm-head.example.com"
        username: "fuzzball-service"
        sshPrivateKeyPem: |
          <full private key in PEM format>
        sshPrivateKeyPassPhrase: "key-passphrase"
        sshHostPublicKey: "10.0.0.4 ecdsa-sha2-nistp256 AAAAE2VjZHNhL...full-public-key="

Configuration Parameters

ParameterTypeRequiredDefaultDescription
enabledbooleanYesfalseEnable the Slurm provisioner
sshHoststringYes-Hostname or IP address of the Slurm head node
sshPortintegerNo22SSH port on the Slurm head node
usernamestringYes-SSH username for authentication
passwordstringNo*-SSH password for password authentication
sshPrivateKeyPemstringNo*-SSH private key in PEM format
sshPrivateKeyPassPhrasestringNo-Passphrase for encrypted SSH private key
sshHostPublicKeystringNo**-SSH host public key for verification
skipHostKeyVerificationbooleanNofalseSkip SSH host key verification (not recommended)
connectionTimeoutintegerNo30SSH connection timeout in seconds (1-120)
binaryPathstringNo-Custom path to Slurm binaries (if not in $PATH)
sudoPathstringNo/usr/bin/sudoPath to sudo binary on compute nodes
optionsmapNo{}Additional Slurm sbatch options

* Either password or sshPrivateKeyPem must be provided for authentication

** Required unless skipHostKeyVerification is true

Advanced Configuration

Custom Binary Path

If Slurm binaries are not in the default $PATH, specify their location:

spec:
  orchestrator:
    provisioner:
      slurm:
        enabled: true
        sshHost: "slurm-head.example.com"
        username: "fuzzball-service"
        password: "secure-password"
        binaryPath: "/opt/slurm/bin"

This affects the following commands:

  • sbatch: Job submission
  • squeue: Job status queries (if sacct is unavailable)
  • scancel: Job cancellation
  • sacct: Job status and accounting (if available)

Connection Timeout

Configure SSH connection timeout for reliability in high-latency environments:

spec:
  orchestrator:
    provisioner:
      slurm:
        enabled: true
        sshHost: "slurm-head.example.com"
        username: "fuzzball-service"
        password: "secure-password"
        connectionTimeout: 60  # Wait up to 60 seconds for SSH connection

Valid range: 1-120 seconds. Default: 30 seconds.

Custom Sudo Path

If sudo is installed in a non-standard location on compute nodes:

spec:
  orchestrator:
    provisioner:
      slurm:
        enabled: true
        sshHost: "slurm-head.example.com"
        username: "fuzzball-service"
        password: "secure-password"
        sudoPath: "/usr/local/bin/sudo"

This path is used when executing the Substrate binary on compute nodes.

Slurm Batch Options

You can specify additional Slurm options that will be applied to all submitted jobs:

spec:
  orchestrator:
    provisioner:
      slurm:
        enabled: true
        sshHost: "slurm-head.example.com"
        username: "fuzzball-service"
        password: "secure-password"
        options:
          partition: "compute"        # Default partition for all jobs
          account: "fuzzball-project" # Account to charge
          qos: "normal"               # Quality of Service
          constraint: "haswell"       # Node constraints
          comment: "Fuzzball workflow" # Job comment

Supported Slurm Options

The following standard sbatch options are supported:

OptionDescriptionExample
accountAccount to charge for resources"research-group"
clustersCluster names for multi-cluster federations"cluster1,cluster2"
constraintNode feature constraints"gpu|haswell"
excludeNodes to exclude from allocation"node[01-05]"
partitionPartition for job submission"gpu"
preferPreferred nodes for allocation"node[10-20]"
qosQuality of Service level"high"
reservationReservation name to use"maintenance"
commentJob comment field"Production workflow"

For more information about these options, refer to the Slurm sbatch documentation.

Security Best Practices

Host Key Verification

Always verify SSH host keys to prevent man-in-the-middle attacks:

  1. Obtain the host public key:

    $ ssh-keyscan slurm-head.example.com
  2. Include the output in your configuration:

    sshHostPublicKey: "slurm-head.example.com ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQC..."
    
  3. Keep skipHostKeyVerification: false (the default)

SSH Key Management

For production deployments:

  1. Generate a dedicated SSH key pair for the Fuzzball service:

    $ ssh-keygen -t ecdsa -C "fuzzball-service" -f fuzzball-slurm-key
  2. Add the public key to the Slurm head node:

    $ ssh-copy-id -i fuzzball-slurm-key.pub fuzzball-service@slurm-head.example.com
  3. Store the private key securely in the Kubernetes Secret used by the operator

  4. Use SSH key passphrases for additional security and store them separately

User Permissions

Create a dedicated Slurm user account with minimal necessary permissions:

  1. Job submission permissions: Ability to run sbatch
  2. Job monitoring permissions: Access to squeue and/or sacct
  3. Job control permissions: Ability to run scancel for jobs it owns
  4. Sudo permissions: Required for fuzzball-substrate to create isolated namespaces

Example /etc/sudoers.d/fuzzball-service configuration:

fuzzball-service ALL=(ALL) NOPASSWD: /usr/local/bin/fuzzball-substrate

Network Security

  1. Restrict SSH access to the Fuzzball Orchestrate IP address
  2. Configure firewall rules to allow bidirectional communication between Orchestrate and compute nodes
  3. Use VPN or private networks for communication between Fuzzball and Slurm clusters
  4. Enable SSH rate limiting to prevent brute force attacks

Validation

After applying your configuration, verify the connection:

  1. Check the Fuzzball Orchestrate logs:

    $ kubectl logs -n fuzzball-system deployment/fuzzball-orchestrator | grep -i slurm
  2. Verify SSH connectivity from within the Orchestrate pod:

    $ ssh fuzzball-service@slurm-head.example.com 'sbatch --version'
  3. Test Substrate availability on a Slurm compute node:

    $ which fuzzball-substrate
    # sudo fuzzball-substrate --version

Complete Example of spec.orchestrator.provisioner.slurm section

Here’s a complete production-ready configuration:

apiVersion: deployment.ciq.com/v1alpha1
kind: FuzzballOrchestrate
metadata:
  name: fuzzball
  namespace: fuzzball-system
spec:
  orchestrator:
    enabled: true
    provisioner:
      enabled: true
      slurm:
        # Enable Slurm provisioner
        enabled: true

        # SSH connection details
        sshHost: "slurm-head.hpc.example.com"
        sshPort: 22
        username: "fuzzball-service"

        # SSH key authentication (recommended for production)
        sshPrivateKeyPem: |
          <full private key in PEM format>
        sshPrivateKeyPassPhrase: "my-secure-passphrase"

        # Host key verification (required for security)
        sshHostPublicKey: "slurm-head.hpc.example.com ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQC5..."
        skipHostKeyVerification: false

        # Connection settings
        connectionTimeout: 60

        # Custom binary paths (if needed)
        binaryPath: "/opt/slurm/bin"
        sudoPath: "/usr/bin/sudo"

        # Default Slurm options
        options:
          partition: "fuzzball"
          account: "research"
          qos: "normal"
          constraint: "haswell|broadwell"
          comment: "Fuzzball orchestrated workflow"

Troubleshooting

SSH Connection Failures

Symptom: Orchestrate logs show “Failed to connect to SSH host”

Solutions:

  1. Verify SSH host is reachable: ping slurm-head.example.com
  2. Check SSH port: nc -zv slurm-head.example.com 22
  3. Test credentials manually: ssh fuzzball-service@slurm-head.example.com
  4. Review firewall rules between Orchestrate and Slurm head node

Authentication Errors

Symptom: “Permission denied” or “Authentication failed” errors

Solutions:

  1. For password auth: Verify password is correct and account is not locked
  2. For key auth: Check private key format and ensure public key is installed
  3. Verify SSH user exists and has proper permissions
  4. Check /var/log/secure or /var/log/auth.log on the Slurm head node

Host Key Verification Issues

Symptom: “Host key verification failed” errors

Solutions:

  1. Obtain correct host key: ssh-keyscan slurm-head.example.com
  2. Ensure host key matches configuration
  3. For testing only: set skipHostKeyVerification: true (not recommended for production)

Command Not Found

Symptom: “sbatch: command not found” or similar errors

Solutions:

  1. Check if Slurm is installed on head node: which sbatch
  2. Set binaryPath to the directory containing Slurm commands
  3. Verify SSH user’s $PATH includes Slurm binaries
  4. Check if SSH user’s shell initialization files are properly configured

For more troubleshooting guidance, see the Troubleshooting Guide.

Basic Configuration

The minimal configuration requires SSH connection details and authentication credentials:

apiVersion: deployment.ciq.com/v1alpha1
kind: FuzzballOrchestrate
metadata:
  name: fuzzball
  namespace: fuzzball-system
spec:
  orchestrator:
    enabled: true
    provisioner:
      enabled: true
      pbs:
        enabled: true
        sshHost: "pbs-head.example.com"
        sshPort: 22
        username: "fuzzball-service"
        password: "secure-password"  # OR use SSH key (recommended)
        skipHostKeyVerification: false

Authentication Methods

Password Authentication

Password authentication is the simplest method but less secure for production use:

spec:
  orchestrator:
    provisioner:
      pbs:
        enabled: true
        sshHost: "pbs-head.example.com"
        username: "fuzzball-service"
        password: "secure-password"

For production environments, use SSH key-based authentication:

spec:
  orchestrator:
    provisioner:
      pbs:
        enabled: true
        sshHost: "pbs-head.example.com"
        username: "fuzzball-service"
        sshPrivateKeyPem: |
          <full private key in PEM format>
        sshHostPublicKey: "10.0.0.5 ecdsa-sha2-nistp256 AAAAE2VjZHNhL...full-public-key="
Double check the format of the private key. At present, Fuzzball PBS integration only supports ecdsa encryption and the key must be added in the format of output from the keyscan command.

SSH Key with Passphrase

If your SSH key is encrypted with a passphrase:

spec:
  orchestrator:
    provisioner:
      pbs:
        enabled: true
        sshHost: "pbs-head.example.com"
        username: "fuzzball-service"
        sshPrivateKeyPem: |
          <full private key in PEM format>
        sshPrivateKeyPassPhrase: "key-passphrase"
        sshHostPublicKey: "10.0.0.5 ecdsa-sha2-nistp256 AAAAE2VjZHNhL...full-public-key="

Configuration Parameters

ParameterTypeRequiredDefaultDescription
enabledbooleanYesfalseEnable the PBS provisioner
sshHoststringYes-Hostname or IP address of the PBS head node
sshPortintegerNo22SSH port on the PBS head node
usernamestringYes-SSH username for authentication
passwordstringNo*-SSH password for password authentication
sshPrivateKeyPemstringNo*-SSH private key in PEM format
sshPrivateKeyPassPhrasestringNo-Passphrase for encrypted SSH private key
sshHostPublicKeystringNo**-SSH host public key for verification
skipHostKeyVerificationbooleanNofalseSkip SSH host key verification (not recommended)
connectionTimeoutintegerNo30SSH connection timeout in seconds (1-120)
binaryPathstringNo-Custom path to PBS binaries (if not in $PATH)
sudoPathstringNo/usr/bin/sudoPath to sudo binary on compute nodes
pbsServerstringNo-PBS server hostname (if different from SSH host)
validateSubstratebooleanNofalseValidate Substrate before job submission
defaultQueuestringNo-Default PBS queue for all jobs
defaultAccountstringNo-Default PBS account for all jobs
defaultResourceListstringNo-Default PBS resource list for all jobs
optionsmapNo{}Additional PBS qsub options

* Either password or sshPrivateKeyPem must be provided for authentication

** Required unless skipHostKeyVerification is true

Advanced Configuration

Custom Binary Path

If PBS binaries are not in the default $PATH, specify their location:

spec:
  orchestrator:
    provisioner:
      pbs:
        enabled: true
        sshHost: "pbs-head.example.com"
        username: "fuzzball-service"
        password: "secure-password"
        binaryPath: "/opt/pbs/bin"

This affects the following commands:

  • qsub: Job submission
  • qstat: Job status queries
  • qsig: Job signaling for graceful termination

Connection Timeout

Configure SSH connection timeout for reliability in high-latency environments:

spec:
  orchestrator:
    provisioner:
      pbs:
        enabled: true
        sshHost: "pbs-head.example.com"
        username: "fuzzball-service"
        password: "secure-password"
        connectionTimeout: 60  # Wait up to 60 seconds for SSH connection

Valid range: 1-120 seconds. Default: 30 seconds.

Custom Sudo Path

If sudo is installed in a non-standard location on compute nodes:

spec:
  orchestrator:
    provisioner:
      pbs:
        enabled: true
        sshHost: "pbs-head.example.com"
        username: "fuzzball-service"
        password: "secure-password"
        sudoPath: "/usr/local/bin/sudo"

This path is used when executing the Substrate binary on compute nodes.

PBS Server Configuration

If your PBS server hostname differs from the SSH host:

spec:
  orchestrator:
    provisioner:
      pbs:
        enabled: true
        sshHost: "pbs-head.example.com"
        username: "fuzzball-service"
        password: "secure-password"
        pbsServer: "pbs-server.example.com"

Default Queue and Account

Set default queue and account for all PBS jobs:

spec:
  orchestrator:
    provisioner:
      pbs:
        enabled: true
        sshHost: "pbs-head.example.com"
        username: "fuzzball-service"
        password: "secure-password"
        defaultQueue: "workq"
        defaultAccount: "research-group"

PBS Job Options

You can specify additional PBS options that will be applied to all submitted jobs:

spec:
  orchestrator:
    provisioner:
      pbs:
        enabled: true
        sshHost: "pbs-head.example.com"
        username: "fuzzball-service"
        password: "secure-password"
        options:
          "-A": "research-account"     # Account to charge
          "-q": "high-priority"        # Queue name
          "-N": "fuzzball-job"         # Default job name prefix

Supported PBS Options

The following standard qsub options are supported:

OptionDescriptionExample
-AAccount to charge for resources"research-group"
-qQueue name for job submission"workq"
-lResource list specification"nodes=1:ppn=4"
-NJob name"my-job"
-oStandard output file path"/path/to/output.log"
-eStandard error file path"/path/to/error.log"

For more information about these options, refer to the PBS Professional or Torque documentation for your specific PBS implementation.

Security Best Practices

Host Key Verification

Always verify SSH host keys to prevent man-in-the-middle attacks:

  1. Obtain the host public key:

    $ ssh-keyscan pbs-head.example.com
  2. Include the output in your configuration:

    sshHostPublicKey: "pbs-head.example.com ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQC..."
    
  3. Keep skipHostKeyVerification: false (the default)

SSH Key Management

For production deployments:

  1. Generate a dedicated SSH key pair for the Fuzzball service:

    $ ssh-keygen -t ecdsa -C "fuzzball-service" -f fuzzball-pbs-key
  2. Add the public key to the PBS head node:

    $ ssh-copy-id -i fuzzball-pbs-key.pub fuzzball-service@pbs-head.example.com
  3. Store the private key securely in the Kubernetes Secret used by the operator

  4. Use SSH key passphrases for additional security and store them separately

User Permissions

Create a dedicated PBS user account with minimal necessary permissions:

  1. Job submission permissions: Ability to run qsub
  2. Job monitoring permissions: Access to qstat
  3. Job control permissions: Ability to run qsig for jobs it owns
  4. Sudo permissions: Required for fuzzball-substrate to create isolated namespaces

Example /etc/sudoers.d/fuzzball-service configuration:

fuzzball-service ALL=(ALL) NOPASSWD: /usr/local/bin/fuzzball-substrate

Network Security

  1. Restrict SSH access to the Fuzzball Orchestrate IP address
  2. Configure firewall rules to allow bidirectional communication between Orchestrate and compute nodes
  3. Use VPN or private networks for communication between Fuzzball and PBS clusters
  4. Enable SSH rate limiting to prevent brute force attacks

Validation

After applying your configuration, verify the connection:

  1. Check the Fuzzball Orchestrate logs:

    $ kubectl logs -n fuzzball-system deployment/fuzzball-orchestrator | grep -i pbs
  2. Verify SSH connectivity from within the Orchestrate pod:

    $ ssh fuzzball-service@pbs-head.example.com 'qstat --version'
  3. Test Substrate availability on a PBS compute node:

    $ which fuzzball-substrate
    # sudo fuzzball-substrate --version

Complete Example of spec.orchestrator.provisioner.pbs section

Here’s a complete production-ready configuration:

apiVersion: deployment.ciq.com/v1alpha1
kind: FuzzballOrchestrate
metadata:
  name: fuzzball
  namespace: fuzzball-system
spec:
  orchestrator:
    enabled: true
    provisioner:
      enabled: true
      pbs:
        # Enable PBS provisioner
        enabled: true

        # SSH connection details
        sshHost: "pbs-head.hpc.example.com"
        sshPort: 22
        username: "fuzzball-service"

        # SSH key authentication (recommended for production)
        sshPrivateKeyPem: |
          <full private key in PEM format>
        sshPrivateKeyPassPhrase: "my-secure-passphrase"

        # Host key verification (required for security)
        sshHostPublicKey: "pbs-head.hpc.example.com ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQC5..."
        skipHostKeyVerification: false

        # Connection settings
        connectionTimeout: 60

        # Custom binary paths (if needed)
        binaryPath: "/opt/pbs/bin"
        sudoPath: "/usr/bin/sudo"

        # PBS server (if different from SSH host)
        pbsServer: "pbs-server.hpc.example.com"

        # Default settings
        defaultQueue: "fuzzball"
        defaultAccount: "research"

        # Additional PBS options
        options:
          "-A": "research-account"

Troubleshooting

SSH Connection Failures

Symptom: Orchestrate logs show “Failed to connect to SSH host”

Solutions:

  1. Verify SSH host is reachable: ping pbs-head.example.com
  2. Check SSH port: nc -zv pbs-head.example.com 22
  3. Test credentials manually: ssh fuzzball-service@pbs-head.example.com
  4. Review firewall rules between Orchestrate and PBS head node

Authentication Errors

Symptom: “Permission denied” or “Authentication failed” errors

Solutions:

  1. For password auth: Verify password is correct and account is not locked
  2. For key auth: Check private key format and ensure public key is installed
  3. Verify SSH user exists and has proper permissions
  4. Check /var/log/secure or /var/log/auth.log on the PBS head node

Host Key Verification Issues

Symptom: “Host key verification failed” errors

Solutions:

  1. Obtain correct host key: ssh-keyscan pbs-head.example.com
  2. Ensure host key matches configuration
  3. For testing only: set skipHostKeyVerification: true (not recommended for production)

Command Not Found

Symptom: “qsub: command not found” or similar errors

Solutions:

  1. Check if PBS is installed on head node: which qsub
  2. Set binaryPath to the directory containing PBS commands
  3. Verify SSH user’s $PATH includes PBS binaries
  4. Check if SSH user’s shell initialization files are properly configured

For more troubleshooting guidance, see the Troubleshooting Guide.