239 docs tagged with "chaos-engineering"

ALB AZ down

Detach one or more availability zones from an Application Load Balancer for a configurable duration so you can test how clients, target groups, and AZ-aware routing behave when a zone is taken out of the load balancer rotation.

AWS EC2 Instance Status Check

Built-in Command Probe template that validates the state of one or more Amazon EC2 instances during a chaos experiment.

AWS ECS Service Status Check

Built-in Command Probe template that validates whether an Amazon ECS service has reached its desired state during a chaos experiment.

AWS Lambda Function Status Check

Built-in Command Probe template that validates whether an AWS Lambda function exists and is in the Active state during a chaos experiment.

AWS Load Balancer AZ Check

Built-in Command Probe template that validates the availability of target availability zones in an ALB or CLB during a chaos experiment.

AWS Security Group Rule Check

Built-in Command Probe template that validates whether AWS security groups have rules configured during a chaos experiment.

AZ blackhole

Isolate network traffic for one or more AWS Availability Zones (optionally scoped to specific VPCs or subnets) for a configurable duration and restore connectivity afterwards so you can test how multi-AZ workloads handle a zone-level outage.

Azure AKS node down

Deallocate a percentage of AKS worker VMs (selected by node pool and zone) for a configurable duration so you can test how the workload behaves when AKS nodes disappear.

Azure disk loss

Detach one or more managed data disks from an Azure VM for a configurable duration, then reattach them, so you can test how the workload behaves when its storage disappears.

Azure instance CPU hog

Drive CPU utilization to a configurable target on one or more Azure VMs for a configurable duration so you can test how the workload behaves when compute headroom shrinks.

Azure instance IO stress

Drive disk IO load on one or more Azure VMs for a configurable duration so you can test how the workload behaves when the storage subsystem is saturated.

Azure instance memory hog

Consume a configurable amount of memory on one or more Azure VMs for a configurable duration so you can test how the workload behaves when memory headroom shrinks.

Azure instance stop

Stop one or more Azure VM instances by name for a configurable duration, then start them again, so you can test how the workload behaves when a VM disappears.

Azure Service Bus queue state change

Change the operational status of one or more Azure Service Bus queues (Disabled, SendDisabled, ReceiveDisabled) for a configurable duration so you can test how producers and consumers handle queue-state disruptions.

Azure web app access restrict

Add an Access Restriction rule to one or more Azure App Service web apps for a configurable duration so you can test how clients behave when traffic to the web app is blocked.

Azure web app stop

Stop one or more Azure App Service web apps for a configurable duration, then start them again, so you can test how clients behave when the web app is unavailable.

Centralized delegate approach

Run one Harness Delegate on a central infrastructure that orchestrates chaos experiments across multiple target clusters through Kubernetes connectors.

CF app container kill

Kill the container of a Cloud Foundry app instance so you can test how the platform reschedules it and how peers absorb traffic during the gap.

CF app JVM CPU stress

Drive CPU saturation inside the JVM of a Cloud Foundry app instance so you can test how the application and the platform react to sustained CPU pressure.

CF app JVM memory stress

Drive heap or non-heap memory pressure inside the JVM of a Cloud Foundry app instance so you can test how the application reacts to sustained memory exhaustion.

CF app JVM method exception

Make a specific JVM method throw an exception inside a Cloud Foundry app instance so you can test how callers handle synchronous failures.

CF app JVM method latency

Add artificial latency to a specific JVM method inside a Cloud Foundry app instance so you can test how slow downstream calls cascade through the system.

CF app JVM modify return

Override the return value of a specific JVM method inside a Cloud Foundry app instance so you can test caller behavior against unexpected return values.

CF app JVM trigger GC

Trigger a full garbage collection cycle inside the JVM of a Cloud Foundry app instance so you can measure pause time and tail-latency impact.

CF app network corruption

Corrupt a configurable percentage of egress packets from a Cloud Foundry app instance so you can test how TCP retransmissions and protocol handlers cope.

CF app network duplication

Duplicate a configurable percentage of egress packets from a Cloud Foundry app instance so you can test deduplication logic and idempotency assumptions.

CF app network latency

Inject network latency on the egress of a Cloud Foundry app instance so you can test how the app and its callers behave when downstream calls become slow.

CF app network loss

Drop a configurable percentage of egress packets from a Cloud Foundry app instance so you can test retry, timeout, and circuit-breaker behavior.

CF app route unmap

Temporarily unmap a route from a Cloud Foundry app so you can test how upstream consumers behave when the app becomes unreachable via that route.

CF app stop

Stop a Cloud Foundry app for a configurable duration, then restart it, so you can test how the platform and dependents react when the app goes offline.

Chaos faults for Cloud Foundry

Catalog of Cloud Foundry chaos faults that disrupt apps, JVM runtimes, and the network between app instances and their dependencies.

CLB AZ down

Disable one or more availability zones on a Classic Load Balancer for a configurable duration so you can test how clients and back-end instances behave when an AZ is removed from the load balancer rotation.

Cluster permissions

Kubernetes API permissions the chaos service account needs on the target cluster to inject DDCR-based faults, plus copy-paste RBAC manifests for the two install topologies.

Common node fault tunables

Environment variables shared by node-level chaos faults for selecting target nodes by name, by label, or by percentage.

Container kill

Kill a specific container inside a Kubernetes pod to test restart loops, sidecar resilience, probe tuning, and multi-container coordination.

Container Restart Check

Built-in Command Probe template that validates whether container restart counts stay within an acceptable threshold during a chaos experiment.

Datadog Avg Latency Check

Built-in Datadog APM Probe template that validates average latency during a chaos experiment.

Datadog CPU Check

Built-in Datadog APM Probe template that validates container CPU utilisation during a chaos experiment.

Datadog Error Rate Check

Built-in Datadog APM Probe template that validates service error rate during a chaos experiment.

Datadog Memory Check

Built-in Datadog APM Probe template that validates container memory utilisation during a chaos experiment.

Datadog P95 Latency Check

Built-in Datadog APM Probe template that validates p95 latency during a chaos experiment.

Datadog P99 Latency Check

Built-in Datadog APM Probe template that validates p99 latency during a chaos experiment.

Dedicated delegate approach

Install the Harness Delegate inside the target cluster and create a Kubernetes chaos infrastructure. Covers the standard cluster-admin install and a least-privilege install scoped to a dedicated namespace.

Disk fill

Fill a target Kubernetes container's ephemeral storage as a percentage of its limit to test ephemeral-storage eviction, retention, and back-pressure logic.

DynamoDB replication pause

Pause cross-region replication on one or more Amazon DynamoDB global tables for a configurable duration using an AWS Fault Injection Service (FIS) experiment so you can test how your application handles a brief stop in multi-region consistency.

EBS loss by ID

Detach an EBS volume by volume ID for a configurable duration and reattach it afterwards so you can test how a workload behaves when its storage disappears.

EBS loss by tag

Detach EBS volumes selected by tag for a configurable duration and reattach them afterwards so you can test how workloads behave when a tagged subset of storage disappears.

EC2 CPU hog

Stress a configurable number of CPU cores inside a target EC2 instance via AWS Systems Manager so you can test how the workload behaves when the host is starved of CPU.

EC2 DNS chaos

Block or redirect DNS resolution for selected hostnames on a target EC2 instance via AWS Systems Manager so you can test how the workload reacts when a dependency cannot be resolved.

EC2 HTTP latency

Add latency to inbound HTTP traffic on a configurable port of a target EC2 instance via AWS Systems Manager so you can test how clients react when an HTTP service responds slowly.

EC2 HTTP modify body

Replace HTTP response bodies on a configurable port of a target EC2 instance via AWS Systems Manager so you can test how clients react when an upstream returns unexpected content.

EC2 HTTP modify header

Add, change, or remove HTTP headers on a configurable port of a target EC2 instance via AWS Systems Manager so you can test how clients react when headers are missing or malformed.

EC2 HTTP reset peer

Reset TCP connections to an HTTP service on a configurable port of a target EC2 instance via AWS Systems Manager so you can test how clients react when the server tears down connections mid-flight.

EC2 HTTP status code

Rewrite HTTP response status codes on a configurable port of a target EC2 instance via AWS Systems Manager so you can test how clients react to specific error codes returned by an upstream service.

EC2 IO stress

Generate sustained filesystem read and write load on a target EC2 instance via AWS Systems Manager so you can test how the workload behaves under disk pressure or near-full storage.

EC2 memory hog

Consume a configurable amount of memory inside a target EC2 instance via AWS Systems Manager so you can test how the workload behaves when the host is starved of memory.

EC2 network latency

Add configurable latency and jitter to outbound traffic on an EC2 instance via AWS Systems Manager so you can test how the workload reacts when network round-trip times grow.

EC2 network loss

Drop a configurable percentage of outbound packets on a target EC2 instance via AWS Systems Manager so you can test how the workload reacts when network reliability degrades.

EC2 process kill

Kill one or more processes by PID inside a target EC2 instance via AWS Systems Manager, so you can test how the workload recovers when a critical process disappears without losing the host.

EC2 stop by ID

Stop one or more EC2 instances selected by instance ID for a configurable duration so you can test how the workload running on those instances behaves during and after the outage.

EC2 stop by tag

Stop EC2 instances selected by tag for a configurable duration so you can test how the workload running on those instances behaves when a tagged subset disappears.

ECS agent stop

Stop the ECS container agent on every container instance in an ECS cluster for a configurable duration so you can test how tasks, scheduling, and self-healing behave when the cluster temporarily loses its agent.

ECS container CPU hog

Stress a configurable number of CPU cores at a configurable load percentage inside a percentage of running ECS tasks (EC2 launch type) for a configurable duration so you can test how the workload behaves under sustained CPU pressure.

ECS container HTTP latency

Add latency to inbound HTTP traffic on a specific port inside a percentage of running ECS tasks (EC2 launch type) for a configurable duration so you can test how clients behave when the HTTP service responds slowly.

ECS container HTTP modify body

Replace HTTP response bodies on a specific port inside a percentage of running ECS tasks (EC2 launch type) with a configurable string for a configurable duration so you can test how clients behave when the response body is unexpected.

ECS container HTTP reset peer

Reset TCP connections to HTTP clients on a specific port after a configurable timeout inside a percentage of running ECS tasks (EC2 launch type) for a configurable duration so you can test how clients behave when the server abruptly closes the connection.

ECS container HTTP status code

Return a configurable HTTP status code (and optionally rewrite the body) on a specific port inside a percentage of running ECS tasks (EC2 launch type) for a configurable duration so you can test how clients behave when the service returns an unexpected status.

ECS container IO stress

Stress filesystem IO using a configurable number of workers writing to a configurable mount path inside a percentage of running ECS tasks (EC2 launch type) for a configurable duration so you can test how the workload behaves when disk IO is saturated.

ECS container memory hog

Consume a configurable amount of memory (absolute or percentage) using a configurable number of workers inside a percentage of running ECS tasks (EC2 launch type) for a configurable duration so you can test how the workload behaves under sustained memory pressure.

ECS container network latency

Add a configurable amount of network latency on a specific interface inside a percentage of running ECS tasks (EC2 launch type) for a configurable duration so you can test how the workload behaves when the network is slow.

ECS container network loss

Drop a configurable percentage of network packets on a specific interface inside a percentage of running ECS tasks (EC2 launch type) for a configurable duration so you can test how the workload behaves when the network is lossy.

ECS container volume detach

Detach the data volume attached to a percentage of running ECS tasks for a configurable duration so you can test how the workload behaves when its storage disappears.

ECS Fargate CPU hog

Inject CPU stress inside a percentage of running ECS Fargate tasks for a configurable duration via a sidecar container so you can test how the service behaves under sustained CPU pressure.

ECS Fargate memory hog

Consume a configurable amount of memory inside a percentage of running ECS Fargate tasks for a configurable duration via a sidecar container so you can test how the service behaves under sustained memory pressure.

ECS instance stop

Stop one or more EC2 container instances that back an ECS cluster for a configurable duration so you can test how the cluster reschedules tasks, drains workloads, and recovers when capacity disappears.

ECS invalid container image

Swap the container image of an ECS service to an invalid value for a configurable duration so you can test how ECS, your deployment guardrails, and your alerting respond to a failed image pull.

ECS network restrict

Add or remove a network rule (ingress or egress, by IP and port range) for the security group of one or more ECS services for a configurable duration so you can test how the workload behaves when network access is partially restricted.

ECS task scale

Force one or more ECS services to a configurable replica count for a configurable duration so you can test how the workload, dependent services, and autoscaling logic behave when capacity is suddenly scaled up or down.

ECS task stop

Stop a configurable percentage of ECS tasks (selected by task ID or by service) for a configurable duration so you can test how the service reschedules, how dependent traffic reroutes, and how the workload recovers.

ECS update container resource limit

Re-register the task definition of an ECS service with smaller CPU and memory limits for a configurable duration so you can test how the workload behaves when its container resources shrink.

ECS update container timeout

Re-register the task definition of an ECS service with chaos values for container start and stop timeouts for a configurable duration so you can test how the workload behaves when ECS no longer waits long enough for containers to start or drain.

ECS update task role

Swap the task role of an ECS service to a chaos value (or empty) for a configurable duration so you can test how the workload behaves when its IAM identity loses or changes permissions.

FS fill

Write a configurable amount of data into a specific path inside a Kubernetes container to test mounted-volume capacity, eviction, and write-failure handling.

GCP SQL instance failover

Trigger a failover on a GCP Cloud SQL high-availability instance so you can test how the application behaves when the primary node fails over to its standby.

GCP SQL Instance Status Check

Built-in Command Probe template that validates whether a GCP Cloud SQL instance is in the Running state during a chaos experiment.

GCP VM disk loss

Detach one or more non-boot persistent disks from GCP VM instances for a configurable duration, then reattach them, so you can test how the workload behaves when its storage disappears.

GCP VM disk loss by label

Detach a percentage of non-boot persistent disks selected by label from GCP VM instances for a configurable duration, then reattach them, so you can test how the workload behaves when a labeled subset of storage disappears.

GCP VM Disk Status Check

Built-in Command Probe template that validates whether GCP Compute Engine persistent disks are in a ready-to-use state during a chaos experiment.

GCP VM Instance Status Check

Built-in Command Probe template that validates whether GCP Compute Engine VM instances are in the Running state during a chaos experiment.

GCP VM instance stop

Stop one or more GCP Compute Engine VM instances by name for a configurable duration, then start them again, so you can test how the workload behaves when a VM disappears.

GCP VM instance stop by label

Stop a percentage of GCP Compute Engine VM instances selected by label for a configurable duration, then start them again, so you can test how the workload behaves when a labeled subset of VMs disappears.

Generic FIS experiment template

Trigger any pre-built AWS Fault Injection Service (FIS) experiment template by ID from Harness Chaos Engineering so you can fold native AWS-managed faults into your chaos experiments and probe / verify / report on the result as you do with any other Harness fault.

Infrastructure settings

Reference for the Settings panel exposed when you create or edit a Kubernetes (Harness Infrastructure) chaos infrastructure.

K6 loadgen

Generate a configurable load against a target endpoint with a k6 script for a configurable duration so you can test how the workload behaves under sustained traffic.

Kubelet density

Create a configurable number of pods on a target Kubernetes node so you can test how the node, kubelet, and workload behave during a sudden pod-storm.

Kubelet service kill

Stop the kubelet on a Kubernetes node to simulate node loss without rebooting, and test eviction, rescheduling, and recovery behavior.

Kubernetes (Harness Infrastructure)

Overview of the Delegate-Driven Chaos Runner (DDCR), how chaos experiments execute on top of the Harness Delegate, and the two supported install approaches.

Lambda block TCP connection

Block outbound TCP connections from an AWS Lambda function to one or more target hostnames for a configurable duration so you can test how the function behaves when a TCP-based dependency is unreachable.

Lambda delete event source mapping

Delete one or more event source mappings on an AWS Lambda function for a configurable duration and recreate them afterwards so you can test how the workload behaves when the function stops receiving events from its source.

Lambda delete function concurrency

Delete the reserved concurrency configuration on an AWS Lambda function for a configurable duration and restore it afterwards so you can test how the workload behaves when the function has to share account-level concurrency with other functions.

Lambda function layer detach

Detach a specified Lambda layer from a target AWS Lambda function for a configurable duration and reattach it afterwards so you can test how the workload behaves when a shared dependency layer disappears.

Lambda inject latency

Inject runtime latency into an AWS Lambda function for a configurable duration so you can test how upstream callers and downstream consumers handle slower-than-expected responses, cold-start spikes, and resource contention.

Lambda inject status code

Override the HTTP status code returned by an AWS Lambda function for a configurable duration so you can test how upstream callers and downstream consumers handle unexpected error status responses.

Lambda modify response body

Override the response body returned by an AWS Lambda function for a configurable duration so you can test how upstream callers and client applications handle unexpected payload shapes and corrupted data.

Lambda toggle event mapping state

Disable one or more event source mappings on an AWS Lambda function for a configurable duration and re-enable them afterwards so you can test how the workload behaves when the function temporarily stops receiving events from its source.

Lambda update function memory

Lower the memory allocation of an AWS Lambda function for a configurable duration and restore it afterwards so you can test how the workload behaves with less memory and a proportionally smaller CPU share.

Lambda update function timeout

Lower the configured timeout of an AWS Lambda function for a configurable duration and restore it afterwards so you can test how the workload behaves when invocations are cut short.

Lambda update role permission

Detach a specified IAM policy from the execution role attached to an AWS Lambda function for a configurable duration and reattach it afterwards so you can test how the workload behaves when the function loses permission to call a downstream AWS service.

Linux API block

Block API requests passing through a target Linux machine for a configurable duration by returning a configured status code, so you can test how callers handle a sudden API outage.

Linux API latency

Add latency to API requests passing through a target Linux machine for a configurable duration so you can test how callers handle slow API responses.

Linux API modify body

Replace API request or response bodies passing through a target Linux machine for a configurable duration so you can test how callers handle unexpected payloads.

Linux API modify header

Override HTTP headers on API requests or responses passing through a target Linux machine for a configurable duration so you can test how callers handle altered headers.

Linux API status code

Override the HTTP status code (and optionally the response body) of API responses passing through a target Linux machine for a configurable duration.

Linux CPU stress

Apply CPU load to a target Linux machine for a configurable duration so you can test how the workload behaves when compute is starved.

Linux disk fill

Fill a disk path on a target Linux machine to a configured size for a configurable duration so you can test how the workload behaves when storage runs out.

Linux disk I/O stress

Apply disk I/O load to a target Linux machine for a configurable duration so you can test how the workload behaves when disk bandwidth is saturated.

Linux DNS error

Force DNS resolution failures for target host names on a Linux machine for a configurable duration so you can test how the workload behaves during a DNS outage.

Linux DNS spoof

Return spoofed IP addresses for target host names on a Linux machine for a configurable duration so you can test how the workload behaves when DNS resolves to unexpected endpoints.

Linux fs fill

Fill a filesystem path on a target Linux machine to a configured size for a configurable duration so you can test how the workload behaves when storage runs out.

Linux infrastructure

Linux Chaos Infrastructure for chaos experiments on Linux VMs and Cloud Foundry. Walks through the three-step create wizard, root-user options, and SELinux setup.

Linux JVM CPU stress

Apply CPU stress inside a target Java process on a Linux machine for a configurable duration so you can test how the JVM behaves under compute pressure.

Linux JVM memory stress

Apply memory stress inside a target Java process on a Linux machine for a configurable duration so you can test how the JVM behaves under memory pressure.

Linux JVM method exception

Throw a configured exception from a target class and method in a Java process on a Linux machine so you can test how the application handles unexpected exceptions.

Linux JVM method latency

Add latency to a target class and method in a Java process on a Linux machine so you can test how the application behaves when an internal method slows down.

Linux JVM modify return

Override the return value of a target class and method in a Java process on a Linux machine so you can test how callers handle unexpected return data.

Linux JVM trigger GC

Force garbage collection in a target Java process on a Linux machine for a configurable duration so you can test how the workload behaves under repeated GC events.

Linux memory stress

Consume memory on a target Linux machine for a configurable duration so you can test how the workload behaves under memory pressure and OOM conditions.

Linux network corruption

Corrupt a percentage of network packets leaving a target Linux machine for a configurable duration so you can test how the workload behaves when packet contents are damaged.

Linux network duplication

Duplicate a percentage of network packets leaving a target Linux machine for a configurable duration so you can test how the workload behaves when packets are duplicated.

Linux network latency

Add network latency to traffic leaving a target Linux machine for a configurable duration so you can test how the workload behaves when the network is slow.

Linux network loss

Drop a percentage of network packets leaving a target Linux machine for a configurable duration so you can test how the workload behaves when packets are lost.

Linux network rate limit

Throttle network bandwidth leaving a target Linux machine for a configurable duration so you can test how the workload behaves when bandwidth is constrained.

Linux process kill

Kill target processes on a Linux machine for a configurable duration so you can test how the workload behaves when a critical process disappears.

Linux service restart

Stop and restart systemd services on a target Linux machine for a configurable duration so you can test how the workload behaves when a service flaps.

Linux time chaos

Skew the system clock on a target Linux machine for a configurable duration so you can test how the workload behaves when time jumps forward or backward.

Locust loadgen

Generate a configurable load against a target host with a Locust script for a configurable duration so you can test how the workload behaves under sustained traffic.

Network configuration

Configure mTLS and Harness Network Proxy (HNP) for the chaos runner and Discovery Agent on Kubernetes chaos infrastructure.

NLB AZ down

Detach one or more availability zones from a Network Load Balancer for a configurable duration so you can test how clients, target groups, and AZ-aware routing behave when a zone is taken out of the load balancer rotation.

Node CPU hog

Exhaust CPU on a Kubernetes node to test scheduler behavior, pod eviction under pressure, HPA reactions, and noisy-neighbor isolation.

Node drain

Cordon and drain a Kubernetes node using the Eviction API to test PodDisruptionBudget enforcement, graceful shutdown, and rescheduling behavior.

Node I/O stress

Stress disk I/O on a Kubernetes node to test ephemeral-storage eviction, etcd write tolerance, log shipper backpressure, and noisy-neighbor isolation.

Node memory hog

Exhaust memory on a Kubernetes node to test kubelet eviction order, QoS-based pod prioritization, OOM behavior, and noisy-neighbor isolation.

Node network latency

Inject configurable network latency on a Kubernetes node's interface to test application timeouts, retry tuning, and tail-latency resilience.

Node network loss

Drop a configurable percentage of packets on a Kubernetes node's network interface to test cluster, application, and control-plane resilience.

Node restart

Reboot a Kubernetes node over SSH to test how the cluster handles sudden node loss, pod rescheduling, and stateful recovery.

Node Status Check

Built-in Command Probe template that validates the current state of Kubernetes nodes during a chaos experiment.

Node taint

Apply a temporary taint to a Kubernetes node to test toleration correctness, scheduling policies, and NoExecute eviction behavior.

OpenShift

Run Kubernetes chaos infrastructure on OpenShift, including the Security Context Constraint (SCC) the chaos service account needs and the CRI-O fault tunables.

Overview

Overview of the Kubernetes, Linux, and Windows chaos infrastructure types in Harness Chaos Testing, and where to manage them in the UI.

Pod API block

Block selected API requests or responses on a target Kubernetes pod using path, method, header, query parameter, and source or destination filters to test client retry and failover behavior.

Pod API latency

Add a configurable delay to selected API calls on a target Kubernetes pod using path, method, header, query, and source or destination filters to test client timeouts, retries, and tail-latency budgets.

Pod API modify body

Overwrite API request or response bodies on a target Kubernetes pod using path, method, header, query, and source or destination filters to test client behavior under corrupted payloads.

Pod API modify header

Override API request or response headers on a target Kubernetes pod using path, method, query, and source or destination filters to test resilience to missing, altered, or unexpected header values.

Pod API modify response custom

Combine status code, header, and body modifications on selected API calls of a target Kubernetes pod in a single fault, with filtering by path, method, query, source, or destination.

Pod API status code

Override the HTTP status code returned by selected API calls on a target Kubernetes pod using path, method, header, query, and source or destination filters to test client error handling and circuit-breaker behavior.

Pod application function error

Inject a configurable error into a specific function of an instrumented application running in a Kubernetes pod so you can test how callers and dependents handle the failure.

Pod application function exception

Throw a configurable exception from a specific function of an instrumented application running in a Kubernetes pod so you can test how callers and dependents handle the failure.

Pod application function latency

Add a configurable delay to a specific function of an instrumented application running in a Kubernetes pod so you can test timeout, retry, and tail-latency behavior of callers.

Pod autoscaler

Scale a Kubernetes workload's replicas up to a target count to test cluster capacity, node autoscaling, scheduling pressure, and rollback behavior.

Pod CPU hog

Consume CPU on a target Kubernetes pod's container to test autoscaling, throttling, latency budgets, and noisy-neighbor tolerance.

Pod delete

Delete one or more pods of a Kubernetes workload to test replica availability, controller recovery, graceful termination, and disruption budgets.

Pod DNS error

Block DNS resolution for selected hostnames inside a target Kubernetes pod to test how the application handles upstream lookup failures and cluster DNS outages.

Pod DNS spoof

Redirect DNS lookups for selected hostnames inside a target Kubernetes pod to a different address to test how the application handles misdirected upstream traffic and cache poisoning.

Pod HTTP latency

Add a configurable delay to HTTP responses served by a target Kubernetes pod to test timeouts, retries, and tail-latency behavior at the application protocol layer.

Pod HTTP modify body

Overwrite the HTTP response body returned by a target Kubernetes pod to test client behavior under corrupted, empty, or unexpected response payloads.

Pod HTTP modify header

Override HTTP request or response headers served by a target Kubernetes pod to test client and server resilience to missing, altered, or unexpected header values.

Pod HTTP reset peer

Forcibly reset TCP connections carrying HTTP requests to a target Kubernetes pod to test client retry, connection-pool, and circuit-breaker behavior on abrupt disconnects.

Pod HTTP status code

Override the HTTP response status code returned by a target Kubernetes pod to test client error handling, retry classification, and circuit-breaker behavior on specific HTTP status codes.

Pod IO attribute override

Override file attributes (such as permissions, size, or ownership) returned by stat syscalls on a target Kubernetes pod's mounted volume to test how the application reacts to changed metadata.

Pod IO error

Make filesystem syscalls on a target Kubernetes pod's mounted volume return a configurable error code, so you can validate how the application handles failed reads, writes, and opens.

Pod IO latency

Add configurable delay to filesystem syscalls against a target Kubernetes pod's mounted volume so you can test how the application behaves under slow storage.

Pod IO mistake

Seed wrong data into reads or writes against a target Kubernetes pod's mounted volume so you can validate how the application detects and recovers from silent data corruption.

Pod IO stress

Generate sustained filesystem read and write load inside a target Kubernetes pod to test how the application handles disk pressure, slow IO, and ephemeral storage exhaustion.

Pod JVM CPU stress

Generate sustained CPU load inside a JVM running in a target Kubernetes pod to test how the application behaves when its Java process is starved of CPU.

Pod JVM Kafka exception

Cause Kafka producer or consumer calls from a JVM running in a target Kubernetes pod to throw a configurable exception on a chosen topic so you can test caller error handling.

Pod JVM Kafka latency

Add a configurable delay to Kafka producer or consumer calls from a JVM running in a target Kubernetes pod, scoped by topic, so you can test timeout, back-pressure, and lag behavior under slow Kafka traffic.

Pod JVM method exception

Cause a specific Java method in a JVM running in a target Kubernetes pod to throw a configurable exception so you can test how callers handle the failure.

Pod JVM method latency

Add a configurable delay to every invocation of a specific Java method in a JVM running in a target Kubernetes pod so you can test how callers and dependents behave under slow methods.

Pod JVM modify return

Override the return value of a specific Java method in a JVM running in a target Kubernetes pod so you can test how callers behave when a method silently returns wrong data.

Pod JVM Mongo exception

Cause MongoDB operations from a JVM running in a target Kubernetes pod to throw a configurable exception on a chosen database, collection, and operation so you can test caller error handling.

Pod JVM Mongo latency

Add a configurable delay to MongoDB operations from a JVM running in a target Kubernetes pod, scoped by database, collection, and operation, so you can test timeout and back-pressure behavior under a slow MongoDB.

Pod JVM Solace exception

Cause Solace publisher or subscriber calls from a JVM running in a target Kubernetes pod to throw a configurable exception on a chosen topic or queue so you can test caller error handling.

Pod JVM Solace latency

Add a configurable delay to Solace publisher or subscriber calls from a JVM running in a target Kubernetes pod, scoped by topic or queue, so you can test timeout and back-pressure behavior under slow Solace messaging.

Pod JVM SQL exception

Cause JDBC calls from a JVM running in a target Kubernetes pod to throw a configurable exception on a chosen table and SQL operation so you can test caller error handling.

Pod JVM SQL latency

Add a configurable delay to JDBC calls from a JVM running in a target Kubernetes pod, scoped by table and SQL operation, so you can test timeout and back-pressure behavior under a slow database.

Pod JVM trigger GC

Force the JVM in a target Kubernetes pod to run garbage collection on a configurable schedule so you can test how the application behaves under repeated GC pauses.

Pod memory hog

Consume memory inside a target Kubernetes pod's container to test OOM behavior, eviction order, request handling under pressure, and limit enforcement.

Pod network corruption

Corrupt a configurable percentage of packets on a target Kubernetes pod's network namespace to test checksum, retransmit, and integrity behavior.

Pod network duplication

Duplicate a configurable percentage of packets on a target Kubernetes pod's network namespace to test idempotency and dedup behavior.

Pod network latency

Add a configurable delay to packets on a target Kubernetes pod's network path to test timeout, retry, and tail-latency behavior of upstream and downstream calls.

Pod network loss

Drop a configurable percentage of packets on a target Kubernetes pod's network path to test retry, timeout, and failover behavior.

Pod network partition

Apply a temporary Kubernetes NetworkPolicy to isolate a target pod from its peers, dependencies, or namespaces and test split-brain behavior.

Pod network rate limit

Cap bandwidth on a target Kubernetes pod's network path to test throughput-sensitive workloads, batch jobs, and bandwidth-bound flows.

Pod Replica Count Check

Built-in Command Probe template that validates whether a Kubernetes workload keeps its minimum healthy replica count during a chaos experiment.

Pod Resource Utilisation Check

Built-in Command Probe template that validates whether Kubernetes pod CPU or memory usage stays within a limit during a chaos experiment.

Pod Startup Time Check

Built-in Command Probe template that validates whether Kubernetes pods start within an acceptable duration during a chaos experiment.

Pod Status Check

Built-in Command Probe template that validates the current state of Kubernetes pods during a chaos experiment.

Pod Warnings Check

Built-in Command Probe template that checks for warning events on Kubernetes pods during a chaos experiment.

RDS instance delete

Delete a target RDS DB instance so you can test how applications behave when a database disappears permanently and how disaster-recovery procedures handle the loss.

RDS instance reboot

Reboot a target RDS DB instance (with optional Multi-AZ failover) for a configurable duration so you can test how applications behave when their database restarts.

Redis cache expire

Expire one or more keys (or all keys) in a target Redis instance for a configurable duration so you can test how the application behaves when its cache is suddenly evicted.

Redis cache limit

Cap the maximum memory of a target Redis instance to force evictions and write errors so you can test how the application behaves when Redis runs out of memory.

Redis cache penetration

Generate a configurable burst of cache-miss requests against a target Redis instance so you can test how the application and its downstream database behave when the cache is bypassed.

Resource access restrict

Temporarily strip ingress or egress rules from one or more AWS security groups for a configurable duration and restore them afterwards so you can test how the workload behaves when network access to (or from) an AWS resource disappears.

SSH chaos

Run a custom chaos script and matching abort script on a remote VM over SSH for a configurable duration so you can build any kind of host-level fault that the gold-standard fault library does not cover out of the box.

SSM chaos by ID

Run an arbitrary AWS Systems Manager document against a target EC2 instance selected by ID so you can inject custom chaos that is not covered by a dedicated fault.

SSM chaos by tag

Run an arbitrary AWS Systems Manager document against EC2 instances selected by tag so you can inject custom chaos against a logical group of hosts.

Time chaos

Shift the wall-clock time observed by selected processes inside a target Kubernetes pod to test application behavior under clock skew, token expiry, and time-based scheduling errors.

VMware CPU hog

Consume CPU resources on a Linux VMware VM for a configurable duration so you can test how the workload behaves when compute headroom shrinks.

VMware DNS chaos

Force DNS resolution failures for specific hostnames inside a Linux VMware VM so you can test how the workload behaves when DNS is unhealthy.

VMware HTTP latency

Inject HTTP response latency on a target service running inside a Linux VMware VM so you can test how callers behave when a downstream service slows down.

VMware HTTP reset peer

Reset TCP connections to an HTTP service running inside a Linux VMware VM so you can test how callers behave when the service rudely drops connections.

VMware HTTP response modify

Rewrite HTTP responses (status code, body, headers) from a service running inside a Linux VMware VM so you can test how callers behave when responses are corrupted.

VMware IO stress

Drive disk IO load on a Linux VMware VM for a configurable duration so you can test how the workload behaves when storage throughput is saturated.

VMware memory hog

Consume a configurable amount of memory on a Linux VMware VM for a configurable duration so you can test how the workload behaves when memory headroom shrinks.

VMware network latency

Inject network latency on egress traffic from a Linux VMware VM for a configurable duration so you can test how the workload behaves under slow networks.

VMware network loss

Drop a configurable percentage of egress packets on a Linux VMware VM so you can test how the workload behaves when packet loss spikes.

VMware network rate limit

Cap egress bandwidth on a Linux VMware VM so you can test how the workload behaves when network throughput is throttled.

VMware process kill

Kill one or more processes inside a Linux VMware VM for a configurable duration so you can test how supervisors and application logic recover.

VMware service stop

Stop one or more services inside a Linux VMware VM for a configurable duration so you can test how the workload behaves when a managed service is down.

VMware VM power off (by MOID)

Power off one or more VMware VMs (identified by Managed Object ID) for a configurable duration so you can test how applications behave when a VM disappears.

VMware VM power off (by name)

Power off one or more VMware VMs (identified by name) for a configurable duration so you can test how applications behave when a VM disappears.

VPC route misconfiguration

Temporarily remove specified CIDR routes from one or more VPC route tables for a configurable duration and restore them afterwards so you can test how the workload behaves when egress to a Transit Gateway, NAT Gateway, VPC peer, or internet gateway disappears.

Windows blackhole chaos

Block all network traffic to selected destination hosts or IP addresses from a Windows VM so you can test how the workload behaves during a network blackout.

Windows CPU stress

Consume CPU resources on a Windows VM for a configurable duration so you can test how the workload behaves when compute headroom shrinks.

Windows disk stress

Drive disk IO load on a Windows VM for a configurable duration so you can test how the workload behaves when storage throughput is saturated.

Windows EC2 blackhole chaos

Blackhole all network traffic destined for specific IPs or hosts on one or more Windows EC2 instances (selected by ID or tag) for a configurable duration so you can test how Windows-hosted workloads behave when a specific dependency is completely unreachable.

Windows EC2 CPU hog

Stress a configurable number of CPU cores at a configurable percentage on one or more Windows EC2 instances (selected by ID or tag) for a configurable duration so you can test how Windows-hosted workloads behave under sustained CPU pressure.

Windows EC2 memory hog

Consume a configurable amount of memory (absolute or percentage) on one or more Windows EC2 instances (selected by ID or tag) for a configurable duration so you can test how Windows-hosted workloads behave under sustained memory pressure.

Windows EC2 network latency

Add a configurable amount of latency to network traffic destined for specific IPs or hosts on one or more Windows EC2 instances (selected by ID or tag) for a configurable duration so you can test how Windows-hosted workloads behave when the network is slow.

Windows EC2 network loss

Drop a configurable percentage of network packets destined for specific IPs or hosts on one or more Windows EC2 instances (selected by ID or tag) for a configurable duration so you can test how Windows-hosted workloads behave when the network is lossy.