September 24-27, 2018  /  Washington, D.C.

Chaos Engineering for PCF

Cloud Foundry

Modern Internet-scale microservice architectures exhibit complex communication behavior and failure scenarios with chaotic behavior (a.k.a the Butterfly Effect) that may lead to large scale disruptive events. This complexity comes from the Pivotal Cloud Foundry (PCF) components, services running thereon, and the underlying infrastructure necessary to provide highly available compute, network, security, storage, persistence services. For a distributed microservice architecture to function ideally, these elements must all work in tandem and tolerate failure. To systematically verify that a system can tolerate failure, a disciplined approach is necessary. One such approach is Chaos Engineering. This proposal demonstrates the approach and the custom tools T-Mobile is building to purposefully break systems, identify weaknesses and take corrective actions. It's an enhanced API on top of the ChaosLemur project for introducing more complex failure scenarios into the PCF environment.

September 26, 2018
4:20 pm - 4:50 pm
National Harbor 2-3

Watch Video


Karun Chennuri

Karun Chennuri
Sr. Engineer, T-Mobile

Ramesh Krishnaram

Ramesh Krishnaram
Sr. Manager, Platform Engineering, T-Mobile