Sep 1–2, 2021

Schedule

“Sh*^%# on Fire, Yo!”: A True Story Inspired by Real Events

Managing large-scale distributed systems at scale comes with a lot of challenges around security, compliance, logging, monitoring and capacity management. As your footprint expands and your customer base grows stronger, expectations around uptime and availability grow exponentially. Service level objectives (SLO) and service level agreements (SLA) are not just three-letter acronyms (TLA) anymore—they become the mantra that you need to live and breathe.

Outage and your customer

An outage is an event that disrupts your customer experience. Be it big or small, an outage comes at the cost of one element: customer trust. To maintain the trust with your customer and not fill them with outrage, you must be prepared to fail fast and fail forward. You must be prepared to acknowledge failure is inevitable, but at the same time, you need to iterate and improve continuously. Come join us for an interactive session where we’ll share our lessons learned across people, process and tech.

Brendan Aye

Technical Director, Platform Architecture, T-Mobile

James Webb

MTS, T-Mobile

Track: Agile Leadership