Chaos faults for Azure
Introduction
Azure faults disrupt resources that run on Microsoft Azure: Virtual Machines (standalone or VMSS), Azure Kubernetes Service (AKS) node pools, managed disks, App Service web apps, and Service Bus queues. Each fault calls the Azure Resource Manager API to inject the disruption, then reverses it cleanly at the end of the configured duration. Go to Azure authentication methods to set up service principals, workload identity, or managed identity, and to Azure fault permissions to grant the right RBAC roles.
Azure AKS node down
Deallocate a percentage of AKS worker VMs (selected by node pool and zone) for a configurable duration, then start them again.
Azure disk loss
Detach one or more managed data disks from their attached VMs for a configurable duration, then reattach them on the same LUN.
Azure instance CPU hog
Drive CPU utilization to a configurable target on one or more Azure VMs for a configurable duration via the VM run-command extension.
Azure instance IO stress
Drive sustained disk read/write IO on one or more Azure VMs for a configurable duration via the VM run-command extension.
Azure instance memory hog
Consume a configurable amount of memory on one or more Azure VMs for a configurable duration via the VM run-command extension.
Azure instance stop
Stop one or more Azure VMs (or VMSS instances) for a configurable duration, then start them again.
Azure AKS node down
Azure AKS node down resolves the VMSS instances backing an AKS cluster (filtered by TARGET_NODE_POOL_NAMES and TARGET_ZONES), picks NODE_AFFECTED_PERCENTAGE of them, deallocates them for TOTAL_CHAOS_DURATION seconds, then starts them again.Use cases
Azure disk loss
Azure disk loss detaches one or more managed data disks (VIRTUAL_DISK_NAMES) from their attached VMs for TOTAL_CHAOS_DURATION seconds, then reattaches them on the same LUN. OS disks are excluded by design.Use cases
Azure instance CPU hog
Azure instance CPU hog drives CPU utilization to CPU_LOAD percent (or saturates CPU_CORES cores) on the target VMs for TOTAL_CHAOS_DURATION seconds via the Azure VM run-command extension.Use cases
Azure instance IO stress
Azure instance IO stress drives sustained disk read/write IO on the volume mounted at VOLUME_MOUNT_PATH of the target VMs for TOTAL_CHAOS_DURATION seconds via the Azure VM run-command extension.Use cases
Azure instance memory hog
Azure instance memory hog consumes MEMORY_CONSUMPTION MB (or MEMORY_PERCENTAGE percent of RAM) through NUMBER_OF_WORKERS workers on the target VMs for TOTAL_CHAOS_DURATION seconds via the Azure VM run-command extension.Use cases
Azure instance stop
Azure instance stop deallocates one or more VMs listed in AZURE_INSTANCE_NAMES (or VMSS instance IDs when SCALE_SET=enable) for TOTAL_CHAOS_DURATION seconds, then starts them again.Use cases
Azure Service Bus queue state change
Azure Service Bus queue state change sets the status of one or more queues to disabled, sendDisabled, or receiveDisabled (via ACTION) for TOTAL_CHAOS_DURATION seconds, then restores Active.Use cases
Azure web app access restrict
Azure web app access restrict adds an Access Restriction rule (RULE_NAME) to one or more App Service web apps with ACTION against IP_ADDRESS_BLOCK for TOTAL_CHAOS_DURATION seconds, then removes the rule.Use cases
Azure web app stop
Azure web app stop calls the App Service stop API on one or more web apps for TOTAL_CHAOS_DURATION seconds, then starts them again.Use cases