Obviously a Major Malfunction... Lessons 35 Years after the Challenger Disaster

Workshop Stage 1
Join on Hopin

Robert Barron
IBM, AIOps Lead, IBM Garage for Technical Solution Acceleration

Robert works for IBM, helping clients improve their IT Operations. He is an SRE and AIOps evangelist who enjoys helping others solve problems even more than he enjoys solving them himself.

Robert has over 20 years of experience in IT development & operations and is happiest when learning something new. He lives in Israel with his wonderful wife and two children.
His hobbies include history, space exploration, and bird photography.

The Space Shuttle was the most advanced machine ever designed. It was a triumph and a marvel of the modern world.

And on January 1986, the shuttle Challenger disintegrated seconds after launch. This session will discuss how and why the disaster occurred and what lessons modern DevOps and Site Reliability Engineers should learn from it.

The Challenger disaster was not only a failure of the technology, but a failure of the engineering and management culture in NASA. While engineers were aware of problems in the technology stack, there was not enough awareness of the risks they actually posed to the spacecraft. Management had shifted the focus from “prove that it’s safe to launch” to “prove that it’s unsafe to stop the launch”.

This session will present the risk analysis (or lack thereof) of the Shuttle program and draw parallels to modern software development. In the end, launching a shuttle is an extremely complex deployment to the cloud… and above it.