Orchestrate Configuration
The Orchestrate configuration defines how Fuzzball connects to your Slurm or PBS cluster. This
configuration is specified in the FuzzballOrchestrate Custom Resource Definition (CRD) and
controls SSH authentication, connection parameters, and options specific to the batch scheduling
system in use.
The minimal configuration requires SSH connection details and authentication credentials:
apiVersion: deployment.ciq.com/v1alpha1
kind: FuzzballOrchestrate
metadata:
name: fuzzball
namespace: fuzzball-system
spec:
orchestrator:
enabled: true
provisioner:
enabled: true
slurm:
enabled: true
sshHost: "slurm-head.example.com"
sshPort: 22
username: "fuzzball-service"
password: "secure-password" # OR use SSH key (recommended)
skipHostKeyVerification: false
Password authentication is the simplest method but less secure for production use:
spec:
orchestrator:
provisioner:
slurm:
enabled: true
sshHost: "slurm-head.example.com"
username: "fuzzball-service"
password: "secure-password"
For production environments, use SSH key-based authentication:
spec:
orchestrator:
provisioner:
slurm:
enabled: true
sshHost: "slurm-head.example.com"
username: "fuzzball-service"
sshPrivateKeyPem: |
<full private key in PEM format>
sshHostPublicKey: "10.0.0.4 ecdsa-sha2-nistp256 AAAAE2VjZHNhL...full-public-key="
Double check the format of the private key. At present, Fuzzball Slurm integration only supports ecdsa encryption and the key must be added in the format of output from thekeyscancommand.
If your SSH key is encrypted with a passphrase:
spec:
orchestrator:
provisioner:
slurm:
enabled: true
sshHost: "slurm-head.example.com"
username: "fuzzball-service"
sshPrivateKeyPem: |
<full private key in PEM format>
sshPrivateKeyPassPhrase: "key-passphrase"
sshHostPublicKey: "10.0.0.4 ecdsa-sha2-nistp256 AAAAE2VjZHNhL...full-public-key="
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
enabled | boolean | Yes | false | Enable the Slurm provisioner |
sshHost | string | Yes | - | Hostname or IP address of the Slurm head node |
sshPort | integer | No | 22 | SSH port on the Slurm head node |
username | string | Yes | - | SSH username for authentication |
password | string | No* | - | SSH password for password authentication |
sshPrivateKeyPem | string | No* | - | SSH private key in PEM format |
sshPrivateKeyPassPhrase | string | No | - | Passphrase for encrypted SSH private key |
sshHostPublicKey | string | No** | - | SSH host public key for verification |
skipHostKeyVerification | boolean | No | false | Skip SSH host key verification (not recommended) |
connectionTimeout | integer | No | 30 | SSH connection timeout in seconds (1-120) |
binaryPath | string | No | - | Custom path to Slurm binaries (if not in $PATH) |
sudoPath | string | No | /usr/bin/sudo | Path to sudo binary on compute nodes |
options | map | No | {} | Additional Slurm sbatch options |
* Either password or sshPrivateKeyPem must be provided for authentication
** Required unless skipHostKeyVerification is true
If Slurm binaries are not in the default $PATH, specify their location:
spec:
orchestrator:
provisioner:
slurm:
enabled: true
sshHost: "slurm-head.example.com"
username: "fuzzball-service"
password: "secure-password"
binaryPath: "/opt/slurm/bin"
This affects the following commands:
sbatch: Job submissionsqueue: Job status queries (ifsacctis unavailable)scancel: Job cancellationsacct: Job status and accounting (if available)
Configure SSH connection timeout for reliability in high-latency environments:
spec:
orchestrator:
provisioner:
slurm:
enabled: true
sshHost: "slurm-head.example.com"
username: "fuzzball-service"
password: "secure-password"
connectionTimeout: 60 # Wait up to 60 seconds for SSH connection
Valid range: 1-120 seconds. Default: 30 seconds.
If sudo is installed in a non-standard location on compute nodes:
spec:
orchestrator:
provisioner:
slurm:
enabled: true
sshHost: "slurm-head.example.com"
username: "fuzzball-service"
password: "secure-password"
sudoPath: "/usr/local/bin/sudo"
This path is used when executing the Substrate binary on compute nodes.
You can specify additional Slurm options that will be applied to all submitted jobs:
spec:
orchestrator:
provisioner:
slurm:
enabled: true
sshHost: "slurm-head.example.com"
username: "fuzzball-service"
password: "secure-password"
options:
partition: "compute" # Default partition for all jobs
account: "fuzzball-project" # Account to charge
qos: "normal" # Quality of Service
constraint: "haswell" # Node constraints
comment: "Fuzzball workflow" # Job comment
The following standard sbatch options are supported:
| Option | Description | Example |
|---|---|---|
account | Account to charge for resources | "research-group" |
clusters | Cluster names for multi-cluster federations | "cluster1,cluster2" |
constraint | Node feature constraints | "gpu|haswell" |
exclude | Nodes to exclude from allocation | "node[01-05]" |
partition | Partition for job submission | "gpu" |
prefer | Preferred nodes for allocation | "node[10-20]" |
qos | Quality of Service level | "high" |
reservation | Reservation name to use | "maintenance" |
comment | Job comment field | "Production workflow" |
For more information about these options, refer to the Slurm sbatch documentation.
Always verify SSH host keys to prevent man-in-the-middle attacks:
Obtain the host public key:
$ ssh-keyscan slurm-head.example.comInclude the output in your configuration:
sshHostPublicKey: "slurm-head.example.com ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQC..."Keep
skipHostKeyVerification: false(the default)
For production deployments:
Generate a dedicated SSH key pair for the Fuzzball service:
$ ssh-keygen -t ecdsa -C "fuzzball-service" -f fuzzball-slurm-keyAdd the public key to the Slurm head node:
$ ssh-copy-id -i fuzzball-slurm-key.pub fuzzball-service@slurm-head.example.comStore the private key securely in the Kubernetes Secret used by the operator
Use SSH key passphrases for additional security and store them separately
Create a dedicated Slurm user account with minimal necessary permissions:
- Job submission permissions: Ability to run
sbatch - Job monitoring permissions: Access to
squeueand/orsacct - Job control permissions: Ability to run
scancelfor jobs it owns - Sudo permissions: Required for
fuzzball-substrateto create isolated namespaces
Example /etc/sudoers.d/fuzzball-service configuration:
fuzzball-service ALL=(ALL) NOPASSWD: /usr/local/bin/fuzzball-substrate- Restrict SSH access to the Fuzzball Orchestrate IP address
- Configure firewall rules to allow bidirectional communication between Orchestrate and compute nodes
- Use VPN or private networks for communication between Fuzzball and Slurm clusters
- Enable SSH rate limiting to prevent brute force attacks
After applying your configuration, verify the connection:
Check the Fuzzball Orchestrate logs:
$ kubectl logs -n fuzzball-system deployment/fuzzball-orchestrator | grep -i slurmVerify SSH connectivity from within the Orchestrate pod:
$ ssh fuzzball-service@slurm-head.example.com 'sbatch --version'Test Substrate availability on a Slurm compute node:
$ which fuzzball-substrate # sudo fuzzball-substrate --version
Here’s a complete production-ready configuration:
apiVersion: deployment.ciq.com/v1alpha1
kind: FuzzballOrchestrate
metadata:
name: fuzzball
namespace: fuzzball-system
spec:
orchestrator:
enabled: true
provisioner:
enabled: true
slurm:
# Enable Slurm provisioner
enabled: true
# SSH connection details
sshHost: "slurm-head.hpc.example.com"
sshPort: 22
username: "fuzzball-service"
# SSH key authentication (recommended for production)
sshPrivateKeyPem: |
<full private key in PEM format>
sshPrivateKeyPassPhrase: "my-secure-passphrase"
# Host key verification (required for security)
sshHostPublicKey: "slurm-head.hpc.example.com ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQC5..."
skipHostKeyVerification: false
# Connection settings
connectionTimeout: 60
# Custom binary paths (if needed)
binaryPath: "/opt/slurm/bin"
sudoPath: "/usr/bin/sudo"
# Default Slurm options
options:
partition: "fuzzball"
account: "research"
qos: "normal"
constraint: "haswell|broadwell"
comment: "Fuzzball orchestrated workflow"
Symptom: Orchestrate logs show “Failed to connect to SSH host”
Solutions:
- Verify SSH host is reachable:
ping slurm-head.example.com - Check SSH port:
nc -zv slurm-head.example.com 22 - Test credentials manually:
ssh fuzzball-service@slurm-head.example.com - Review firewall rules between Orchestrate and Slurm head node
Symptom: “Permission denied” or “Authentication failed” errors
Solutions:
- For password auth: Verify password is correct and account is not locked
- For key auth: Check private key format and ensure public key is installed
- Verify SSH user exists and has proper permissions
- Check
/var/log/secureor/var/log/auth.logon the Slurm head node
Symptom: “Host key verification failed” errors
Solutions:
- Obtain correct host key:
ssh-keyscan slurm-head.example.com - Ensure host key matches configuration
- For testing only: set
skipHostKeyVerification: true(not recommended for production)
Symptom: “sbatch: command not found” or similar errors
Solutions:
- Check if Slurm is installed on head node:
which sbatch - Set
binaryPathto the directory containing Slurm commands - Verify SSH user’s
$PATHincludes Slurm binaries - Check if SSH user’s shell initialization files are properly configured
For more troubleshooting guidance, see the Troubleshooting Guide.
The minimal configuration requires SSH connection details and authentication credentials:
apiVersion: deployment.ciq.com/v1alpha1
kind: FuzzballOrchestrate
metadata:
name: fuzzball
namespace: fuzzball-system
spec:
orchestrator:
enabled: true
provisioner:
enabled: true
pbs:
enabled: true
sshHost: "pbs-head.example.com"
sshPort: 22
username: "fuzzball-service"
password: "secure-password" # OR use SSH key (recommended)
skipHostKeyVerification: false
Password authentication is the simplest method but less secure for production use:
spec:
orchestrator:
provisioner:
pbs:
enabled: true
sshHost: "pbs-head.example.com"
username: "fuzzball-service"
password: "secure-password"
For production environments, use SSH key-based authentication:
spec:
orchestrator:
provisioner:
pbs:
enabled: true
sshHost: "pbs-head.example.com"
username: "fuzzball-service"
sshPrivateKeyPem: |
<full private key in PEM format>
sshHostPublicKey: "10.0.0.5 ecdsa-sha2-nistp256 AAAAE2VjZHNhL...full-public-key="
Double check the format of the private key. At present, Fuzzball PBS integration only supports ecdsa encryption and the key must be added in the format of output from thekeyscancommand.
If your SSH key is encrypted with a passphrase:
spec:
orchestrator:
provisioner:
pbs:
enabled: true
sshHost: "pbs-head.example.com"
username: "fuzzball-service"
sshPrivateKeyPem: |
<full private key in PEM format>
sshPrivateKeyPassPhrase: "key-passphrase"
sshHostPublicKey: "10.0.0.5 ecdsa-sha2-nistp256 AAAAE2VjZHNhL...full-public-key="
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
enabled | boolean | Yes | false | Enable the PBS provisioner |
sshHost | string | Yes | - | Hostname or IP address of the PBS head node |
sshPort | integer | No | 22 | SSH port on the PBS head node |
username | string | Yes | - | SSH username for authentication |
password | string | No* | - | SSH password for password authentication |
sshPrivateKeyPem | string | No* | - | SSH private key in PEM format |
sshPrivateKeyPassPhrase | string | No | - | Passphrase for encrypted SSH private key |
sshHostPublicKey | string | No** | - | SSH host public key for verification |
skipHostKeyVerification | boolean | No | false | Skip SSH host key verification (not recommended) |
connectionTimeout | integer | No | 30 | SSH connection timeout in seconds (1-120) |
binaryPath | string | No | - | Custom path to PBS binaries (if not in $PATH) |
sudoPath | string | No | /usr/bin/sudo | Path to sudo binary on compute nodes |
pbsServer | string | No | - | PBS server hostname (if different from SSH host) |
validateSubstrate | boolean | No | false | Validate Substrate before job submission |
defaultQueue | string | No | - | Default PBS queue for all jobs |
defaultAccount | string | No | - | Default PBS account for all jobs |
defaultResourceList | string | No | - | Default PBS resource list for all jobs |
options | map | No | {} | Additional PBS qsub options |
* Either password or sshPrivateKeyPem must be provided for authentication
** Required unless skipHostKeyVerification is true
If PBS binaries are not in the default $PATH, specify their location:
spec:
orchestrator:
provisioner:
pbs:
enabled: true
sshHost: "pbs-head.example.com"
username: "fuzzball-service"
password: "secure-password"
binaryPath: "/opt/pbs/bin"
This affects the following commands:
qsub: Job submissionqstat: Job status queriesqsig: Job signaling for graceful termination
Configure SSH connection timeout for reliability in high-latency environments:
spec:
orchestrator:
provisioner:
pbs:
enabled: true
sshHost: "pbs-head.example.com"
username: "fuzzball-service"
password: "secure-password"
connectionTimeout: 60 # Wait up to 60 seconds for SSH connection
Valid range: 1-120 seconds. Default: 30 seconds.
If sudo is installed in a non-standard location on compute nodes:
spec:
orchestrator:
provisioner:
pbs:
enabled: true
sshHost: "pbs-head.example.com"
username: "fuzzball-service"
password: "secure-password"
sudoPath: "/usr/local/bin/sudo"
This path is used when executing the Substrate binary on compute nodes.
If your PBS server hostname differs from the SSH host:
spec:
orchestrator:
provisioner:
pbs:
enabled: true
sshHost: "pbs-head.example.com"
username: "fuzzball-service"
password: "secure-password"
pbsServer: "pbs-server.example.com"
Set default queue and account for all PBS jobs:
spec:
orchestrator:
provisioner:
pbs:
enabled: true
sshHost: "pbs-head.example.com"
username: "fuzzball-service"
password: "secure-password"
defaultQueue: "workq"
defaultAccount: "research-group"
You can specify additional PBS options that will be applied to all submitted jobs:
spec:
orchestrator:
provisioner:
pbs:
enabled: true
sshHost: "pbs-head.example.com"
username: "fuzzball-service"
password: "secure-password"
options:
"-A": "research-account" # Account to charge
"-q": "high-priority" # Queue name
"-N": "fuzzball-job" # Default job name prefix
The following standard qsub options are supported:
| Option | Description | Example |
|---|---|---|
-A | Account to charge for resources | "research-group" |
-q | Queue name for job submission | "workq" |
-l | Resource list specification | "nodes=1:ppn=4" |
-N | Job name | "my-job" |
-o | Standard output file path | "/path/to/output.log" |
-e | Standard error file path | "/path/to/error.log" |
For more information about these options, refer to the PBS Professional or Torque documentation for your specific PBS implementation.
Always verify SSH host keys to prevent man-in-the-middle attacks:
Obtain the host public key:
$ ssh-keyscan pbs-head.example.comInclude the output in your configuration:
sshHostPublicKey: "pbs-head.example.com ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQC..."Keep
skipHostKeyVerification: false(the default)
For production deployments:
Generate a dedicated SSH key pair for the Fuzzball service:
$ ssh-keygen -t ecdsa -C "fuzzball-service" -f fuzzball-pbs-keyAdd the public key to the PBS head node:
$ ssh-copy-id -i fuzzball-pbs-key.pub fuzzball-service@pbs-head.example.comStore the private key securely in the Kubernetes Secret used by the operator
Use SSH key passphrases for additional security and store them separately
Create a dedicated PBS user account with minimal necessary permissions:
- Job submission permissions: Ability to run
qsub - Job monitoring permissions: Access to
qstat - Job control permissions: Ability to run
qsigfor jobs it owns - Sudo permissions: Required for
fuzzball-substrateto create isolated namespaces
Example /etc/sudoers.d/fuzzball-service configuration:
fuzzball-service ALL=(ALL) NOPASSWD: /usr/local/bin/fuzzball-substrate- Restrict SSH access to the Fuzzball Orchestrate IP address
- Configure firewall rules to allow bidirectional communication between Orchestrate and compute nodes
- Use VPN or private networks for communication between Fuzzball and PBS clusters
- Enable SSH rate limiting to prevent brute force attacks
After applying your configuration, verify the connection:
Check the Fuzzball Orchestrate logs:
$ kubectl logs -n fuzzball-system deployment/fuzzball-orchestrator | grep -i pbsVerify SSH connectivity from within the Orchestrate pod:
$ ssh fuzzball-service@pbs-head.example.com 'qstat --version'Test Substrate availability on a PBS compute node:
$ which fuzzball-substrate # sudo fuzzball-substrate --version
Here’s a complete production-ready configuration:
apiVersion: deployment.ciq.com/v1alpha1
kind: FuzzballOrchestrate
metadata:
name: fuzzball
namespace: fuzzball-system
spec:
orchestrator:
enabled: true
provisioner:
enabled: true
pbs:
# Enable PBS provisioner
enabled: true
# SSH connection details
sshHost: "pbs-head.hpc.example.com"
sshPort: 22
username: "fuzzball-service"
# SSH key authentication (recommended for production)
sshPrivateKeyPem: |
<full private key in PEM format>
sshPrivateKeyPassPhrase: "my-secure-passphrase"
# Host key verification (required for security)
sshHostPublicKey: "pbs-head.hpc.example.com ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQC5..."
skipHostKeyVerification: false
# Connection settings
connectionTimeout: 60
# Custom binary paths (if needed)
binaryPath: "/opt/pbs/bin"
sudoPath: "/usr/bin/sudo"
# PBS server (if different from SSH host)
pbsServer: "pbs-server.hpc.example.com"
# Default settings
defaultQueue: "fuzzball"
defaultAccount: "research"
# Additional PBS options
options:
"-A": "research-account"
Symptom: Orchestrate logs show “Failed to connect to SSH host”
Solutions:
- Verify SSH host is reachable:
ping pbs-head.example.com - Check SSH port:
nc -zv pbs-head.example.com 22 - Test credentials manually:
ssh fuzzball-service@pbs-head.example.com - Review firewall rules between Orchestrate and PBS head node
Symptom: “Permission denied” or “Authentication failed” errors
Solutions:
- For password auth: Verify password is correct and account is not locked
- For key auth: Check private key format and ensure public key is installed
- Verify SSH user exists and has proper permissions
- Check
/var/log/secureor/var/log/auth.logon the PBS head node
Symptom: “Host key verification failed” errors
Solutions:
- Obtain correct host key:
ssh-keyscan pbs-head.example.com - Ensure host key matches configuration
- For testing only: set
skipHostKeyVerification: true(not recommended for production)
Symptom: “qsub: command not found” or similar errors
Solutions:
- Check if PBS is installed on head node:
which qsub - Set
binaryPathto the directory containing PBS commands - Verify SSH user’s
$PATHincludes PBS binaries - Check if SSH user’s shell initialization files are properly configured
For more troubleshooting guidance, see the Troubleshooting Guide.