ALB AZ down
Detach one or more availability zones from an Application Load Balancer for a configurable duration so you can test how clients, target groups, and AZ-aware routing behave when a zone is taken out of the load balancer rotation.
Detach one or more availability zones from an Application Load Balancer for a configurable duration so you can test how clients, target groups, and AZ-aware routing behave when a zone is taken out of the load balancer rotation.
Disable one or more availability zones on a Classic Load Balancer for a configurable duration so you can test how clients and back-end instances behave when an AZ is removed from the load balancer rotation.
Environment variables shared by node-level chaos faults for selecting target nodes by name, by label, or by percentage.
Kill a specific container inside a Kubernetes pod to test restart loops, sidecar resilience, probe tuning, and multi-container coordination.
Fill a target Kubernetes container's ephemeral storage as a percentage of its limit to test ephemeral-storage eviction, retention, and back-pressure logic.
Detach an EBS volume by volume ID for a configurable duration and reattach it afterwards so you can test how a workload behaves when its storage disappears.
Detach EBS volumes selected by tag for a configurable duration and reattach them afterwards so you can test how workloads behave when a tagged subset of storage disappears.
Stress a configurable number of CPU cores inside a target EC2 instance via AWS Systems Manager so you can test how the workload behaves when the host is starved of CPU.
Block or redirect DNS resolution for selected hostnames on a target EC2 instance via AWS Systems Manager so you can test how the workload reacts when a dependency cannot be resolved.
Add latency to inbound HTTP traffic on a configurable port of a target EC2 instance via AWS Systems Manager so you can test how clients react when an HTTP service responds slowly.
Replace HTTP response bodies on a configurable port of a target EC2 instance via AWS Systems Manager so you can test how clients react when an upstream returns unexpected content.
Add, change, or remove HTTP headers on a configurable port of a target EC2 instance via AWS Systems Manager so you can test how clients react when headers are missing or malformed.
Reset TCP connections to an HTTP service on a configurable port of a target EC2 instance via AWS Systems Manager so you can test how clients react when the server tears down connections mid-flight.
Rewrite HTTP response status codes on a configurable port of a target EC2 instance via AWS Systems Manager so you can test how clients react to specific error codes returned by an upstream service.
Generate sustained filesystem read and write load on a target EC2 instance via AWS Systems Manager so you can test how the workload behaves under disk pressure or near-full storage.
Consume a configurable amount of memory inside a target EC2 instance via AWS Systems Manager so you can test how the workload behaves when the host is starved of memory.
Add configurable latency and jitter to outbound traffic on an EC2 instance via AWS Systems Manager so you can test how the workload reacts when network round-trip times grow.
Drop a configurable percentage of outbound packets on a target EC2 instance via AWS Systems Manager so you can test how the workload reacts when network reliability degrades.
Kill one or more processes by PID inside a target EC2 instance via AWS Systems Manager, so you can test how the workload recovers when a critical process disappears without losing the host.
Stop one or more EC2 instances selected by instance ID for a configurable duration so you can test how the workload running on those instances behaves during and after the outage.
Stop EC2 instances selected by tag for a configurable duration so you can test how the workload running on those instances behaves when a tagged subset disappears.
Stop the ECS container agent on every container instance in an ECS cluster for a configurable duration so you can test how tasks, scheduling, and self-healing behave when the cluster temporarily loses its agent.
Stress a configurable number of CPU cores at a configurable load percentage inside a percentage of running ECS tasks (EC2 launch type) for a configurable duration so you can test how the workload behaves under sustained CPU pressure.
Add latency to inbound HTTP traffic on a specific port inside a percentage of running ECS tasks (EC2 launch type) for a configurable duration so you can test how clients behave when the HTTP service responds slowly.
Replace HTTP response bodies on a specific port inside a percentage of running ECS tasks (EC2 launch type) with a configurable string for a configurable duration so you can test how clients behave when the response body is unexpected.
Reset TCP connections to HTTP clients on a specific port after a configurable timeout inside a percentage of running ECS tasks (EC2 launch type) for a configurable duration so you can test how clients behave when the server abruptly closes the connection.
Return a configurable HTTP status code (and optionally rewrite the body) on a specific port inside a percentage of running ECS tasks (EC2 launch type) for a configurable duration so you can test how clients behave when the service returns an unexpected status.
Stress filesystem IO using a configurable number of workers writing to a configurable mount path inside a percentage of running ECS tasks (EC2 launch type) for a configurable duration so you can test how the workload behaves when disk IO is saturated.
Consume a configurable amount of memory (absolute or percentage) using a configurable number of workers inside a percentage of running ECS tasks (EC2 launch type) for a configurable duration so you can test how the workload behaves under sustained memory pressure.
Add a configurable amount of network latency on a specific interface inside a percentage of running ECS tasks (EC2 launch type) for a configurable duration so you can test how the workload behaves when the network is slow.
Drop a configurable percentage of network packets on a specific interface inside a percentage of running ECS tasks (EC2 launch type) for a configurable duration so you can test how the workload behaves when the network is lossy.
Detach the data volume attached to a percentage of running ECS tasks for a configurable duration so you can test how the workload behaves when its storage disappears.
Inject CPU stress inside a percentage of running ECS Fargate tasks for a configurable duration via a sidecar container so you can test how the service behaves under sustained CPU pressure.
Consume a configurable amount of memory inside a percentage of running ECS Fargate tasks for a configurable duration via a sidecar container so you can test how the service behaves under sustained memory pressure.
Stop one or more EC2 container instances that back an ECS cluster for a configurable duration so you can test how the cluster reschedules tasks, drains workloads, and recovers when capacity disappears.
Swap the container image of an ECS service to an invalid value for a configurable duration so you can test how ECS, your deployment guardrails, and your alerting respond to a failed image pull.
Add or remove a network rule (ingress or egress, by IP and port range) for the security group of one or more ECS services for a configurable duration so you can test how the workload behaves when network access is partially restricted.
Force one or more ECS services to a configurable replica count for a configurable duration so you can test how the workload, dependent services, and autoscaling logic behave when capacity is suddenly scaled up or down.
Stop a configurable percentage of ECS tasks (selected by task ID or by service) for a configurable duration so you can test how the service reschedules, how dependent traffic reroutes, and how the workload recovers.
Re-register the task definition of an ECS service with smaller CPU and memory limits for a configurable duration so you can test how the workload behaves when its container resources shrink.
Re-register the task definition of an ECS service with chaos values for container start and stop timeouts for a configurable duration so you can test how the workload behaves when ECS no longer waits long enough for containers to start or drain.
Swap the task role of an ECS service to a chaos value (or empty) for a configurable duration so you can test how the workload behaves when its IAM identity loses or changes permissions.
Write a configurable amount of data into a specific path inside a Kubernetes container to test mounted-volume capacity, eviction, and write-failure handling.
Stop the kubelet on a Kubernetes node to simulate node loss without rebooting, and test eviction, rescheduling, and recovery behavior.
Detach one or more availability zones from a Network Load Balancer for a configurable duration so you can test how clients, target groups, and AZ-aware routing behave when a zone is taken out of the load balancer rotation.
Exhaust CPU on a Kubernetes node to test scheduler behavior, pod eviction under pressure, HPA reactions, and noisy-neighbor isolation.
Cordon and drain a Kubernetes node using the Eviction API to test PodDisruptionBudget enforcement, graceful shutdown, and rescheduling behavior.
Stress disk I/O on a Kubernetes node to test ephemeral-storage eviction, etcd write tolerance, log shipper backpressure, and noisy-neighbor isolation.
Exhaust memory on a Kubernetes node to test kubelet eviction order, QoS-based pod prioritization, OOM behavior, and noisy-neighbor isolation.
Inject configurable network latency on a Kubernetes node's interface to test application timeouts, retry tuning, and tail-latency resilience.
Drop a configurable percentage of packets on a Kubernetes node's network interface to test cluster, application, and control-plane resilience.
Reboot a Kubernetes node over SSH to test how the cluster handles sudden node loss, pod rescheduling, and stateful recovery.
Apply a temporary taint to a Kubernetes node to test toleration correctness, scheduling policies, and NoExecute eviction behavior.
Block selected API requests or responses on a target Kubernetes pod using path, method, header, query parameter, and source or destination filters to test client retry and failover behavior.
Add a configurable delay to selected API calls on a target Kubernetes pod using path, method, header, query, and source or destination filters to test client timeouts, retries, and tail-latency budgets.
Overwrite API request or response bodies on a target Kubernetes pod using path, method, header, query, and source or destination filters to test client behavior under corrupted payloads.
Override API request or response headers on a target Kubernetes pod using path, method, query, and source or destination filters to test resilience to missing, altered, or unexpected header values.
Combine status code, header, and body modifications on selected API calls of a target Kubernetes pod in a single fault, with filtering by path, method, query, source, or destination.
Override the HTTP status code returned by selected API calls on a target Kubernetes pod using path, method, header, query, and source or destination filters to test client error handling and circuit-breaker behavior.
Inject a configurable error into a specific function of an instrumented application running in a Kubernetes pod so you can test how callers and dependents handle the failure.
Throw a configurable exception from a specific function of an instrumented application running in a Kubernetes pod so you can test how callers and dependents handle the failure.
Add a configurable delay to a specific function of an instrumented application running in a Kubernetes pod so you can test timeout, retry, and tail-latency behavior of callers.
Scale a Kubernetes workload's replicas up to a target count to test cluster capacity, node autoscaling, scheduling pressure, and rollback behavior.
Consume CPU on a target Kubernetes pod's container to test autoscaling, throttling, latency budgets, and noisy-neighbor tolerance.
Delete one or more pods of a Kubernetes workload to test replica availability, controller recovery, graceful termination, and disruption budgets.
Block DNS resolution for selected hostnames inside a target Kubernetes pod to test how the application handles upstream lookup failures and cluster DNS outages.
Redirect DNS lookups for selected hostnames inside a target Kubernetes pod to a different address to test how the application handles misdirected upstream traffic and cache poisoning.
Add a configurable delay to HTTP responses served by a target Kubernetes pod to test timeouts, retries, and tail-latency behavior at the application protocol layer.
Overwrite the HTTP response body returned by a target Kubernetes pod to test client behavior under corrupted, empty, or unexpected response payloads.
Override HTTP request or response headers served by a target Kubernetes pod to test client and server resilience to missing, altered, or unexpected header values.
Forcibly reset TCP connections carrying HTTP requests to a target Kubernetes pod to test client retry, connection-pool, and circuit-breaker behavior on abrupt disconnects.
Override the HTTP response status code returned by a target Kubernetes pod to test client error handling, retry classification, and circuit-breaker behavior on specific HTTP status codes.
Override file attributes (such as permissions, size, or ownership) returned by stat syscalls on a target Kubernetes pod's mounted volume to test how the application reacts to changed metadata.
Make filesystem syscalls on a target Kubernetes pod's mounted volume return a configurable error code, so you can validate how the application handles failed reads, writes, and opens.
Add configurable delay to filesystem syscalls against a target Kubernetes pod's mounted volume so you can test how the application behaves under slow storage.
Seed wrong data into reads or writes against a target Kubernetes pod's mounted volume so you can validate how the application detects and recovers from silent data corruption.
Generate sustained filesystem read and write load inside a target Kubernetes pod to test how the application handles disk pressure, slow IO, and ephemeral storage exhaustion.
Generate sustained CPU load inside a JVM running in a target Kubernetes pod to test how the application behaves when its Java process is starved of CPU.
Cause Kafka producer or consumer calls from a JVM running in a target Kubernetes pod to throw a configurable exception on a chosen topic so you can test caller error handling.
Add a configurable delay to Kafka producer or consumer calls from a JVM running in a target Kubernetes pod, scoped by topic, so you can test timeout, back-pressure, and lag behavior under slow Kafka traffic.
Cause a specific Java method in a JVM running in a target Kubernetes pod to throw a configurable exception so you can test how callers handle the failure.
Add a configurable delay to every invocation of a specific Java method in a JVM running in a target Kubernetes pod so you can test how callers and dependents behave under slow methods.
Override the return value of a specific Java method in a JVM running in a target Kubernetes pod so you can test how callers behave when a method silently returns wrong data.
Cause MongoDB operations from a JVM running in a target Kubernetes pod to throw a configurable exception on a chosen database, collection, and operation so you can test caller error handling.
Add a configurable delay to MongoDB operations from a JVM running in a target Kubernetes pod, scoped by database, collection, and operation, so you can test timeout and back-pressure behavior under a slow MongoDB.
Cause Solace publisher or subscriber calls from a JVM running in a target Kubernetes pod to throw a configurable exception on a chosen topic or queue so you can test caller error handling.
Add a configurable delay to Solace publisher or subscriber calls from a JVM running in a target Kubernetes pod, scoped by topic or queue, so you can test timeout and back-pressure behavior under slow Solace messaging.
Cause JDBC calls from a JVM running in a target Kubernetes pod to throw a configurable exception on a chosen table and SQL operation so you can test caller error handling.
Add a configurable delay to JDBC calls from a JVM running in a target Kubernetes pod, scoped by table and SQL operation, so you can test timeout and back-pressure behavior under a slow database.
Force the JVM in a target Kubernetes pod to run garbage collection on a configurable schedule so you can test how the application behaves under repeated GC pauses.
Consume memory inside a target Kubernetes pod's container to test OOM behavior, eviction order, request handling under pressure, and limit enforcement.
Corrupt a configurable percentage of packets on a target Kubernetes pod's network namespace to test checksum, retransmit, and integrity behavior.
Duplicate a configurable percentage of packets on a target Kubernetes pod's network namespace to test idempotency and dedup behavior.
Add a configurable delay to packets on a target Kubernetes pod's network path to test timeout, retry, and tail-latency behavior of upstream and downstream calls.
Drop a configurable percentage of packets on a target Kubernetes pod's network path to test retry, timeout, and failover behavior.
Apply a temporary Kubernetes NetworkPolicy to isolate a target pod from its peers, dependencies, or namespaces and test split-brain behavior.
Cap bandwidth on a target Kubernetes pod's network path to test throughput-sensitive workloads, batch jobs, and bandwidth-bound flows.
Delete a target RDS DB instance so you can test how applications behave when a database disappears permanently and how disaster-recovery procedures handle the loss.
Reboot a target RDS DB instance (with optional Multi-AZ failover) for a configurable duration so you can test how applications behave when their database restarts.
Expire one or more keys (or all keys) in a target Redis instance for a configurable duration so you can test how the application behaves when its cache is suddenly evicted.
Cap the maximum memory of a target Redis instance to force evictions and write errors so you can test how the application behaves when Redis runs out of memory.
Generate a configurable burst of cache-miss requests against a target Redis instance so you can test how the application and its downstream database behave when the cache is bypassed.
Run an arbitrary AWS Systems Manager document against a target EC2 instance selected by ID so you can inject custom chaos that is not covered by a dedicated fault.
Run an arbitrary AWS Systems Manager document against EC2 instances selected by tag so you can inject custom chaos against a logical group of hosts.
Shift the wall-clock time observed by selected processes inside a target Kubernetes pod to test application behavior under clock skew, token expiry, and time-based scheduling errors.
Blackhole all network traffic destined for specific IPs or hosts on one or more Windows EC2 instances (selected by ID or tag) for a configurable duration so you can test how Windows-hosted workloads behave when a specific dependency is completely unreachable.
Stress a configurable number of CPU cores at a configurable percentage on one or more Windows EC2 instances (selected by ID or tag) for a configurable duration so you can test how Windows-hosted workloads behave under sustained CPU pressure.
Consume a configurable amount of memory (absolute or percentage) on one or more Windows EC2 instances (selected by ID or tag) for a configurable duration so you can test how Windows-hosted workloads behave under sustained memory pressure.
Add a configurable amount of latency to network traffic destined for specific IPs or hosts on one or more Windows EC2 instances (selected by ID or tag) for a configurable duration so you can test how Windows-hosted workloads behave when the network is slow.
Drop a configurable percentage of network packets destined for specific IPs or hosts on one or more Windows EC2 instances (selected by ID or tag) for a configurable duration so you can test how Windows-hosted workloads behave when the network is lossy.
Kill one or more processes (selected by PID or process name) on one or more Windows EC2 instances (selected by ID or tag) for a configurable duration so you can test how Windows-hosted workloads behave when their backing processes die.