Skip to main content

Locust loadgen

Last updated on

Locust loadgen is a chaos fault that runs a Locust load-test script against HOST from REPLICA helper pods inside the chaos infrastructure cluster for TOTAL_CHAOS_DURATION seconds, then stops. Each runner spawns USERS virtual users at SPAWN_RATE users per second. The Locust task file is supplied via a ConfigMap mounted at CONFIG_MAP_FILE (default /tmp/config.py), so the load profile is fully driven by the script.

Use this fault to test how a target workload behaves under sustained Python-driven synthetic load: whether application latency stays inside the SLA, whether autoscaling kicks in, whether circuit breakers and rate limiters work as expected, and whether monitoring detects the saturation within the alerting SLA.

Run your first experiment

If you have not configured the chaos infrastructure yet, go to Quickstart to install the chaos infrastructure and run an experiment end to end.


Use cases

Run this fault when you want to answer concrete questions like:

  • API latency under load: When USERS virtual users hit HOST at SPAWN_RATE users/second, does p95/p99 stay inside the SLA?
  • Autoscaling fidelity: Does HPA, KEDA, or a custom autoscaler add capacity inside the alerting SLA?
  • Rate limiting and quotas: Does the rate limiter return 429s correctly under sustained burst traffic without leaking errors downstream?
  • OAuth/credentialed APIs: Use GRANT_TYPE=client_credentials (or another grant) in the Locust script to test authenticated endpoints under load.
  • Distributed load: Drive load from REPLICA runner pods (pinned to NODE_NAMES) to spread egress and avoid hitting per-pod limits.

Prerequisites

  • Kubernetes version: 1.21 or later for the cluster running the chaos infrastructure.
  • Locust task file: A ConfigMap in the chaos infrastructure namespace that contains the Locust config.py mounted at CONFIG_MAP_FILE. Provide it through the helper pod spec.
  • Target host reachable: HOST is reachable from the chaos infrastructure cluster.
  • Image accessible: LOAD_IMAGE (default chaosnative/locust-loadgen:latest) is pullable from the cluster, or mirror it to your own registry and override the tunable.
  • Node selectors (optional): If NODE_NAMES is set, the runner pods are scheduled only on the listed nodes (comma-separated).

Supported environments

PlatformSupport status
Self-hosted Kubernetes (1.21+)Supported
Managed Kubernetes (EKS, GKE, AKS, OKE)Supported
OpenShiftSupported
Targets running on AWS, Azure, GCP, or any reachable hostSupported

Permissions required

This fault is classified as a Basic Load fault. The chaos service account needs the following Kubernetes RBAC permissions.

apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
namespace: hce
name: locust-loadgen
rules:
- apiGroups: [""]
resources: ["pods"]
verbs: ["create", "delete", "get", "list", "patch", "deletecollection", "update"]
- apiGroups: [""]
resources: ["events"]
verbs: ["create", "get", "list", "patch", "update"]
- apiGroups: [""]
resources: ["pods/log"]
verbs: ["get", "list", "watch"]
- apiGroups: ["litmuschaos.io"]
resources: ["chaosengines", "chaosexperiments", "chaosresults"]
verbs: ["create", "delete", "get", "list", "patch", "update"]
- apiGroups: ["batch"]
resources: ["jobs"]
verbs: ["create", "delete", "get", "list", "deletecollection"]
- apiGroups: [""]
resources: ["configmaps", "secrets"]
verbs: ["get", "list", "watch"]

Fault tunables

Configure the following fault parameters when you add Locust loadgen to an experiment in Chaos Studio. Defaults are shown for reference.

Required parameters

TunableDescriptionDefault
HOSTURL of the target host (for example https://api.example.com).https://google.com

Chaos parameters

TunableDescriptionDefault
TOTAL_CHAOS_DURATIONTotal duration of the fault, in seconds. Locust runs for this period.60
CHAOS_INTERVALDelay in seconds between successive iterations when running for more than one cycle.60
USERSNumber of concurrent virtual users per runner pod.300
SPAWN_RATENumber of virtual users to spawn per second.300
LOAD_TYPEType of load produced by the script (used by the Locust task to choose a profile, for example load, spike, stress).load
GRANT_TYPEOAuth grant type used by the script when the target needs credentials (for example client_credentials).client_credentials
REPLICANumber of runner pods to launch. Total virtual users equal USERS x REPLICA.3
NODE_NAMESComma-separated node names where runner pods are scheduled. Empty means any node.""
CONFIG_MAP_FILEPath inside the runner pod where the Locust task file is mounted from the supplied ConfigMap./tmp/config.py
LOAD_IMAGEContainer image used to run Locust inside each runner pod.chaosnative/locust-loadgen:latest
RAMP_TIMEWait period in seconds before and after the fault. Go to ramp time to read how it is applied.0

Tunables that apply to every fault are documented in common tunables for all faults.


Sample Locust task

The ConfigMap mounted at CONFIG_MAP_FILE contains the Locust task file. A minimal config.py:

from locust import HttpUser, task, between

class WebsiteUser(HttpUser):
wait_time = between(0.5, 1.5)

@task
def index(self):
self.client.get("/")

@task(3)
def health(self):
self.client.get("/health")

Go to the Locust documentation to read the task reference.


Fault execution in brief

Launches REPLICA runner pods running LOAD_IMAGE, mounts the Locust task file from the ConfigMap at CONFIG_MAP_FILE, runs locust against HOST with USERS concurrent virtual users at SPAWN_RATE per second for TOTAL_CHAOS_DURATION, then tears the runner pods down.


Expected behavior during fault execution

  • The target host sees sustained synthetic traffic for TOTAL_CHAOS_DURATION seconds at the rate driven by USERS, SPAWN_RATE, and REPLICA.
  • Application metrics on the target (latency, throughput, error rate) shift in line with the load profile.
  • Autoscalers (HPA, KEDA) may add capacity if CPU/RPS thresholds are reached.
  • After the duration ends, the runner pods are deleted; traffic from the fault stops within seconds.
When the fault ends

The chaos pod stops Locust and deletes the runner pods when TOTAL_CHAOS_DURATION elapses. Synthetic traffic stops within seconds; in-flight requests complete naturally.

Signals to watch

Attach resilience probes to assert each layer:

  • Application latency: Use a Prometheus probe on the application's request-duration histogram and assert p95/p99 stays inside the SLA.
  • Error rate: Use a Prometheus probe on the application's 5xx counter and assert it stays below threshold.
  • Autoscaling reaction: Use a command probe running kubectl get hpa <name> to assert replicas grew.

Verify the fault execution effect

  1. Watch the runner pod logs for Locust output.

    kubectl logs -n <chaos-infra-namespace> -l name=locust-load-generator -f

    You should see Locust's per-second stats lines (RPS, response times, failures).

  2. Inspect target metrics.

    Use your APM tool (Prometheus, Datadog, New Relic) to confirm RPS and latency rose during the chaos window and recovered afterwards.

  3. Confirm the runner pods were cleaned up.

    kubectl get pods -n <chaos-infra-namespace> -l name=locust-load-generator

    The pods should be gone after the experiment ends.


Recovery and cleanup

  • End of duration: The chaos pod stops Locust and deletes the runner pods when TOTAL_CHAOS_DURATION elapses.
  • Abort the experiment: Stopping the experiment from Chaos Studio also stops Locust and cleans up the runner pods.
  • Manual recovery: If the runner pods survive an abort, delete them with kubectl delete pods -n <chaos-infra-namespace> -l name=locust-load-generator.
  • Workload recovery: Application metrics recover as soon as synthetic traffic stops; HPA-driven replicas scale back in over the autoscaler cooldown.

Limitations

  • Single host per run: The fault drives load against one HOST; running against many hosts requires that the script address them inside one task file.
  • Distributed runners are independent: Each runner pod tracks its own users and statistics; aggregate them in the target's APM, not in the per-pod logs.
  • No mid-flight reconfigure: Changes to USERS, SPAWN_RATE, or the task file require re-running the experiment.
  • Cluster network egress: Synthetic traffic leaves the chaos infrastructure cluster; egress costs and per-host rate limits apply.

Troubleshooting

Locust loadgen runner pods fail to start in Harness Chaos Engineering

Confirm the ConfigMap containing config.py exists in the chaos infrastructure namespace and is mounted at CONFIG_MAP_FILE. If NODE_NAMES is set, verify each node name matches kubectl get nodes exactly.

Locust runner pod stuck in ImagePullBackOff

LOAD_IMAGE defaults to chaosnative/locust-loadgen:latest. If your cluster cannot reach docker.io, mirror the image to a registry you can reach and set LOAD_IMAGE to that path.

Target host not reachable from the chaos infrastructure cluster

Confirm HOST is reachable from inside the cluster: kubectl run debug --image=alpine --rm -it -- wget HOST. Adjust network policies, security groups, or egress rules to allow traffic from the chaos infra namespace.

Locust reports zero RPS or all failures

Check the task file (config.py) and confirm self.client points at a valid path under HOST. If HOST needs authentication, confirm the task fetches an OAuth token using GRANT_TYPE and attaches it to subsequent calls.


  • K6 loadgen: Generate load with k6 (JavaScript-based) instead of Locust.
  • Pod HTTP latency: Inject latency on the server side instead of driving load from outside.
  • Pod CPU hog: Stress server CPU instead of driving traffic.