Wednesday, April 28, 2021
BUSINESS PROBLEM & CHALLENGE
Network automation was not well practiced or well understood inside our network engineering team, but was sorely needed. We needed to decrease effort and mistakes on daily management tasks by minimizing the direct human interaction with network devices. High on our priority list of goals, was improving network security by recognizing and fixing security vulnerabilities and increasing the network performance.
HOW WE OVERCAME THE CHALLENGE
We started by simplifying daily workflows, baselining our configurations and removing snowflakes. While this can be very labour-intensive at the outset when you’re working on a global scale in a highly critical customer environment, the long-term benefits far outweighed the labour.
Next, we created an inventory file which listed all network devices by type, model, location and IP address - this enabled us to retrieve info about devices and using network programming and automation, allowing us to deploy to all devices, or even a subset of devices (eg. only those in a specific area), depending on what was needed. The benefit to this is we avoided manual configuration and logging into hundreds of different devices to add configuration to each one.
Overcoming these two big challenges set us up for success and enabled us to deploy at a global scale. We lived by the mantra:
“If it’s not repeatable, it’s not automatable. And if it’s not automatable, it’s not scalable.”
LEARNINGS AND MEASURABLE OUTCOMES
So what did we learn? For starters, it can be hard to automate a use case or test in the same way you would if doing it manually. Testing that requires physical movement, for example losing service provider links or hardware failure is also a challenge, as automating something like that is very tricky. We also learned that code reviews are extremely important. Shared code ownership means the entire team can make changes anywhere, at any time.
And what we’re the measurable outcomes?
Faster deployment times - we were able to efficiently push changes to over 300 network devices and audit the configuration of our global network, taking the time to execute from days down to hours.
Removed the fear of large and complex network changes - the accuracy and efficiency with which we were able to deploy at scale, gave business and the leadership more confidence in subsequent large scale network changes and deployments.
Faster feedback on network changes - it allowed us to get reviews on network configuration changes with version control and peer review, treating infrastructure as code (IaC).
Helped with adhering to PSIRT/CSIRT challenging timeframes and security vulnerabilities.
We started by simplifying daily workflows, baselining our configurations and removing snowflakes. Next, we created an inventory file which listed all network devices by type, model, location and IP address.
Speed of deployment; speed of feedback on network changes; speed of adherence to PSIRT/CSIRT timeframes; confidence and buy-in from senior leadership on subsequent deployments!
Download these images to your phone and post using the Instagram app.