Host Networking for Multinode Services
Multinode services can opt in to host networking by setting the network.host field to true.
It allows distributed workloads to bypass network namespace isolation and run directly on the host network stack.
This option is useful for applications that require low-latency, high-bandwidth communication between nodes or direct access to high-speed interconnects like InfiniBand or RoCE.
Enable host networking (network.host: true) when your multinode service:
- Requires high-speed interconnects: Your application needs direct access to InfiniBand, RoCE, or other RDMA-capable networks
- Needs low-latency communication: Network namespace isolation overhead is unacceptable for your workload
- Has compatibility requirements: Your distributed framework requires host network access (e.g., some MPI implementations)
Avoid host networking when:
- Your application works fine with standard container networking
- You want stronger network isolation for security
- Your service doesn’t require special network access
Host networking means the service shares the node’s network namespace. This can expose additional network interfaces and may have security implications. Use it only when necessary for performance or compatibility.
Add the host field under the network section of your multinode service:
services:
distributed-inference:
image:
uri: docker://myregistry/mpi-inference:latest
script: |
#!/bin/bash
python -m inference_server --model /models/llm
multinode:
nodes: 4
implementation: openmpi
resource:
cpu:
cores: 16
affinity: NUMA
memory:
size: 64GB
devices:
nvidia.com/gpu: 2
network:
host: true # Enable host networking
ports:
- name: http
port: 8080
endpoints:
- name: inference
port-name: http
protocol: https
type: subdomain
scope: group
- network.host: Boolean field to enable host networking (default:
false) - multinode.nodes: Number of compute nodes to allocate
- multinode.implementation: Coordination framework (
ompi,mpich,gasnet,generic) - resource: Specifies resources per node (total resources = nodes × per-node resources)
In a multinode service:
- Rank 0 is the first node (node 0) and serves as the primary endpoint
- Ranks 1, 2, 3, … are worker nodes that participate in distributed computation
- All network ports and endpoints are exposed only on rank 0
- Clients connect to rank 0, which coordinates with other ranks internally
For example, in a 4-node distributed inference service:
- Rank 0 runs the API server and accepts HTTP requests (rank 0 can also be a worker)
- Ranks 1-3 run workers that process inference requests coordinated by rank 0
- Clients send requests to rank 0’s HTTPS endpoint
version: v1
volumes:
models:
reference: volume://user/persistent/vllm-cache
services:
vllm:
cwd: /home/vllm
env:
- MULTINODE_WRAPPER_FORWARD_STREAMS=1
image:
uri: >-
docker://public.ecr.aws/deep-learning-containers/vllm:0.20.0-gpu-py312-cu130-ubuntu22.04-ec2-v1.1-2026-04-29-18-08-36-soci
mounts:
data:
location: /home/vllm
script: |
#!/bin/sh
HOSTNAME=$(hostname)
vllm serve "openai/gpt-oss-20b" \
--dtype auto \
--tool-call-parser openai \
--reasoning-parser openai_gptoss \
--enable-auto-tool-choice \
--tensor-parallel-size 2 \
--nnodes 2 --node-rank 0 \
--master-addr ${MULTINODE_NODE_IP} \
--gpu-memory-utilization 0.9 \
--kv-cache-dtype auto \
--max-num-batched-tokens 2048 \
--max-model-len 131072 \
--host 0.0.0.0 \
--port 8080 &
pid=$!
IFS=','
for HOST in ${MULTINODE_HOSTLIST_NOSLOTS}; do
if [ "$HOSTNAME" != "$HOST" ]; then
$MULTINODE_SSH_WRAPPER $HOST vllm serve "openai/gpt-oss-20b" \
--dtype auto \
--tool-call-parser openai \
--reasoning-parser openai_gptoss \
--enable-auto-tool-choice \
--tensor-parallel-size 2 \
--nnodes 2 --node-rank 1 \
--master-addr ${MULTINODE_NODE_IP} --headless \
--gpu-memory-utilization 0.9 \
--kv-cache-dtype auto \
--max-num-batched-tokens 2048 \
--max-model-len 131072 \
--host 0.0.0.0 \
--port 8080
fi
done
wait $pid
resource:
cpu:
cores: 16
affinity: NUMA
memory:
size: 120GB
devices:
nvidia.com/gpu: 1
annotations:
nvidia.com/gpu.model: NVIDIA L4
multinode:
nodes: 2
implementation: generic
network:
host: true # Enable host networking for high-speed interconnects
ports:
- name: openai-api
port: 8080
protocol: tcp
endpoints:
- name: openai-vllm
type: subdomain
scope: public
protocol: http
port-name: openai-api
readiness-probe:
tcp-socket:
port: 8080
period-seconds: 30
failure-threshold: 60
success-threshold: 1
initial-delay-seconds: 60
persist: true
This example demonstrates:
- A 2-node multinode service with 1 GPUs per node (2 GPUs total)
- Host networking enabled for high-speed node-to-node communication
- Rank 0 serves the vLLM API on port 8000
- HTTPS endpoint exposed for organization-wide access
- Distributed Workflows with MPI for MPI job configuration
- Distributed Workflows with Generic Multinode for custom distributed workloads
- Workflow Syntax Guide for complete YAML reference