Skip to main content

CF app container kill

Last updated on

CF app container kill is a Cloud Foundry chaos fault that terminates the container holding one or more instances of app in organization/space. Cloud Foundry detects the missing instance and restarts it elsewhere in the Diego cluster, exercising the platform's self-healing behavior.

Use this fault to validate how the application and its consumers behave when an individual app instance disappears: whether peers absorb the load, whether the app rejoins the route map cleanly, whether SLOs hold during the rescheduling window, and whether alerts fire only when the platform fails to recover.

Run your first experiment

If you have not configured the chaos infrastructure yet, go to Quickstart to install the Linux chaos infrastructure and run an experiment end to end.


Use cases

  • Instance loss resilience: Confirm peer instances absorb traffic when one container is killed.
  • Platform self-healing: Validate Cloud Foundry's Diego scheduler restarts the instance inside its expected window.
  • Connection draining: Verify in-flight requests on the killed instance fail cleanly (not silently hang) and clients retry.
  • Alert tuning: Tune health-check thresholds so single-instance failures do not page on-call.

Before you begin

  • Chaos infrastructure: A Linux chaos infrastructure (LCI) installed in one of the supported deployment models. Go to Cloud Foundry chaos deployment to read the options.
  • CF and BOSH credentials: The chaos infrastructure host has CF_*, UAA_SERVER_ENDPOINT, and BOSH_* credentials configured (see Authentication).
  • Target identifiers: You know the organization, space, and app name, and the boshDeployment that manages the CF cluster (find with bosh deployments).
  • Multiple instances recommended: The app runs more than one instance, so the experiment can validate peer absorption.

Supported environments

PlatformSupport status
Cloud Foundry (TAS, PCF, open-source) running on BOSH-managed Diego cellsSupported
Single-instance appsSupported (no peer absorption check possible)

Permissions required

ActionRequirement
List apps the CF user can accessSpaceDeveloper, SpaceAuditor, OrgManager, or OrgAuditor; scopes cloud_controller.read or cloud_controller.admin
List BOSH deploymentsBOSH user with bosh.read scope (typically admin or a read-only operator)
Establish a BOSH SSH session to a Diego cellBOSH UAA token with bosh.ssh or bosh.admin scope
Locate and terminate the target container on the cellOperator with SSH and sudo on the cell host

Authentication

LayerWhere to provideTunables
Cloud Foundry API + BOSH director/etc/linux-chaos-infrastructure/cf.env on the LCI hostCF_API_ENDPOINT, CF_USERNAME, CF_PASSWORD, UAA_SERVER_ENDPOINT, BOSH_CLIENT, BOSH_CLIENT_SECRET, BOSH_CA_CERT, BOSH_ENVIRONMENT
vSphere (only when faultInjectorLocation: vSphere)/etc/linux-chaos-infrastructure/vsphere.envGOVC_URL, GOVC_USERNAME, GOVC_PASSWORD, GOVC_INSECURE, VM_NAME, VM_USERNAME, VM_PASSWORD

Fault tunables

Required parameters

TunableDescriptionDefault
deploymentModelLCI placement model. One of model-1 or model-2. For model-1, boshDeployment and faultInjectorLocation are not required. Go to Cloud Foundry chaos deployment.(required)
organizationCF organization that owns the app.(required)
spaceCF space within the organization.(required)
appApp whose container instance is targeted.(required)

Chaos parameters

TunableDescriptionDefault
signalTermination signal sent to the container's main process.SIGKILL
instanceAffectedPercentagePercentage of app instances to target. Default of 0 means exactly one instance. Go to Instance affected percentage.0
boshDeploymentBOSH deployment name that manages the Diego cells. Required for deploymentModel: model-2.""
faultInjectorLocationWhere the fault-injector runs. Supports local and vSphere. Required for deploymentModel: model-2.local
faultInjectorPortLocal port used by the fault-injector. If unavailable, a random port in 50320-51320 is chosen.50320
durationTotal chaos duration.30s
skipSSLValidationSkip SSL validation when calling CF APIs.false
rampTimeWait period in seconds before and after the fault.0

Tunables that apply to every fault are documented in common tunables for all faults.


Fault execution in brief

Authenticates to Cloud Foundry and BOSH, identifies the Diego cell that hosts the target app instance(s), terminates the container holding the instance using the configured signal, then waits for Cloud Foundry to reschedule the instance. The fault exits once the reschedule completes or duration elapses.


Expected behavior during fault execution

  • The targeted instance disappears from the app's instance list briefly; CF marks it CRASHED then STARTING.
  • The CF router stops sending requests to the killed instance until it returns to RUNNING.
  • Peer instances absorb traffic during the rescheduling window.
  • After recovery, the app's instance count returns to the configured value.

Signals to watch

  • App health: Use an HTTP probe on a route mapped to the app and assert a healthy 2xx response throughout the experiment.
  • Instance count: Use a command probe running cf app <name> and assert the instance count matches the desired state after recovery.

Recovery and cleanup

  • CF automatically reschedules the killed instance. No manual cleanup is needed.
  • If the platform fails to reschedule within duration, investigate the Diego scheduler logs and the app's resource quotas.

Limitations

  • The killed container is restarted by Cloud Foundry, not by the fault itself. Recovery time depends on the cluster's scheduling capacity.
  • With instanceAffectedPercentage: 100 and an app that runs a single instance, brief downtime is expected (no peers to absorb traffic).
  • Requires BOSH access; standalone Diego deployments without BOSH are not supported.

Troubleshooting

CF app container kill fails to locate the target instance in Harness Chaos Engineering

Verify boshDeployment matches an output of bosh deployments. Run cf app <name> --guid to confirm the app exists in the given organization/space.

BOSH SSH session is rejected

Confirm the BOSH UAA client has the bosh.ssh (or bosh.admin) scope. Re-issue BOSH_CLIENT_SECRET if the existing token has expired.

App stays in CRASHED state after the fault ends

The platform attempted a restart and the app did not recover. Run cf logs <app> --recent to inspect crash output, then check Diego scheduler logs and the org/space quota.


Common configurations

Instance affected percentage

instanceAffectedPercentage controls how many instances of the app are targeted. The default 0 translates to exactly one instance. Set a value between 1 and 100 to target a percentage of the running instances.

apiVersion: litmuchaos.io/v1alpha1
kind: LinuxFault
metadata:
name: cf-app-container-kill
labels:
name: app-container-kill
spec:
cfAppContainerKill/inputs:
duration: 30s
deploymentModel: model-2
faultInjectorLocation: vSphere
app: cf-app
organization: dev-org
space: dev-space
boshDeployment: cf
instanceAffectedPercentage: 50

Signal

signal controls how the container's main process is terminated. Use SIGKILL to simulate an abrupt crash, SIGTERM to test graceful shutdown handling.

apiVersion: litmuchaos.io/v1alpha1
kind: LinuxFault
metadata:
name: cf-app-container-kill
labels:
name: app-container-kill
spec:
cfAppContainerKill/inputs:
duration: 30s
deploymentModel: model-2
faultInjectorLocation: vSphere
app: cf-app
organization: dev-org
space: dev-space
boshDeployment: cf
signal: SIGTERM

CF secrets

The following Cloud Foundry secrets reside on the same machine where the chaos infrastructure is executed. These secrets are provided in the /etc/linux-chaos-infrastructure/cf.env file in the following format:

CF_API_ENDPOINT=XXXXXXXXXXXXXXXXXXX
CF_USERNAME=XXXXXXXXXXXXXXXXXXXXXXX
CF_PASSWORD=XXXXXXXXXXXXXXXXXXXXXXX
UAA_SERVER_ENDPOINT=XXXXXXXXXXXXXXX
BOSH_CLIENT=XXXXXXXXXXXXXXXXXXXXXXX
BOSH_CLIENT_SECRET=XXXXXXXXXXXXXXXX
BOSH_CA_CERT=XXXXXXXXXXXXXXXXXXXXXX
BOSH_ENVIRONMENT=XXXXXXXXXXXXXXXXXX
info

If the secrets file is not provided, the secrets are attempted to be derived from environment variables and the config file by the fault-injector.

ENV nameDescriptionExample
CF_API_ENDPOINTAPI endpoint for the CF setuphttps://api.system.cf-setup.com
CF_USERNAMEUsername for the CF userusername
CF_PASSWORDPassword for the CF userpassword
UAA_SERVER_ENDPOINTAPI endpoint for the UAA server for the CF setuphttps://uaa.system.cf-setup.com
BOSH_CLIENTUsed by the bosh CLI, the BOSH clientadmin
BOSH_CLIENT_SECRETUsed by the bosh CLI, the BOSH client secretUBu9Fu3oW35sO6fw12auPH76gsRTy7
BOSH_CA_CERTUsed by the bosh CLI, the file path for BOSH CA certificate/root/root_ca_certificate
BOSH_ENVIRONMENTUsed by the bosh CLI, the BOSH environmentbosh.corp.local

Fault injector ENVs and config file

If /etc/linux-chaos-infrastructure/cf.env file is not provided, fault-injector attempts to derive the secrets from environment variables or a configuration file. Any secret that is re-declared will be overridden in the following order of decreasing precedence:

  1. /etc/linux-chaos-infrastructure/cf.env file
  2. Environment variables
  3. Configuration file

The configuration file should be provided at /etc/linux-chaos-infrastructure/cf-fault-injector.yaml:

cf-api-endpoint: XXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
username: XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
password: XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
uaa-server-endpoint: XXXXXXXXXXXXXXXXXXXXXXXXXX
bosh-client: XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
bosh-client-secret: XXXXXXXXXXXXXXXXXXXXXXXXXXX
bosh-ca-cert: XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
bosh-environment: XXXXXXXXXXXXXXXXXXXXXXXXXXXXX

A mapping between all the three formats for providing the secrets is as follows:

cf.envENVcf-fault-injector.yaml
CF_API_ENDPOINTCF_API_ENDPOINTcf-api-endpoint
CF_USERNAMEUSERNAMEusername
CF_PASSWORDPASSWORDpassword
UAA_SERVER_ENDPOINTUAA_SERVER_ENDPOINTuaa-server-endpoint
BOSH_CLIENTBOSH_CLIENTbosh-client
BOSH_CLIENT_SECRETBOSH_CLIENT_SECRETbosh-client-secret
BOSH_CA_CERTBOSH_CA_CERTbosh-ca-cert
BOSH_ENVIRONMENTBOSH_ENVIRONMENTbosh-environment

vSphere secrets

These secrets are provided only if vSphere is used as the deployment platform for CF.

The following vSphere secrets reside on the same machine where the chaos infrastructure is executed. These secrets are provided in the /etc/linux-chaos-infrastructure/vsphere.env file in the following format:

GOVC_URL=XXXXXXXXXXXXXXXXXXXXXX
GOVC_USERNAME=XXXXXXXXXXXXXXXXX
GOVC_PASSWORD=XXXXXXXXXXXXXXXXX
GOVC_INSECURE=XXXXXXXXXXXXXXXXX
VM_NAME=XXXXXXXXXXXXXXXXXXXXXXX
VM_USERNAME=XXXXXXXXXXXXXXXXXXX
VM_PASSWORD=XXXXXXXXXXXXXXXXXXX
ENV Name Description Notes
GOVC_URL Endpoint for vSphere For example, 192.168.214.244
GOVC_USERNAME Username for the vSphere user For example, username
GOVC_PASSWORD Password for the vSphere user For example, password
GOVC_INSECURE Skip SSL validation for govc commands For example, true
VM_NAME Name of the vSphere VM where the fault-injector utility is installed For example, cf-vm
VM_USERNAME Username for the VM guest user For example, root
VM_PASSWORD Password for the VM guest user For example, password

  • CF app stop: Stop the whole app instead of a single instance.
  • CF app route unmap: Disconnect the app from its route without touching containers.