A DevOps War Story
A small team of Proekspert devs was set loose in the world of DevOps. Here’s how they built something beautiful despite the odds against them.
A client actually said this to us: “What would you change in our current software architecture to make it better? Make a proposal and we’ll give you free hands to do it.” Many developers dream of such a task, but the opportunity rarely arises. So, it will come as no surprise that we were very excited.
What’s to blame?
Our customer is a major telecom operator in our region, and we are building a number of their customer-facing systems, such as self-service portals. Such software is plagued by the same problems as any other complex system. On one hand, we are looking to make new ideas reality as fast as possible. On the other, we have to be careful not to break things in the process. For example, an unresponsive or misconfigured background system might completely ruin the experience for many end customers. Ideally, these kinds of problems should be avoidable. But when they do happen, every second counts.
These were the main worries of our customer:
· How fast can we pinpoint a problem and fix it?
· How do we avoid such incidents in the first place?
We figured that these concerns stemmed from the lack of automation in testing, delivery, and monitoring of software – the very same problem that was making our lives as developers hard. We had been living in a world of a mysterious pre-configured bare metal infrastructure, with scary, manual late night releases and very involved incident investigation sessions. So, we set out to build a new architecture — an architecture that avoided many of such problems and made fixing them easier, an architecture that would enable us to release software faster. This is what DevOps is all about.
But the journey did not start without doubts or fears. First, we didn’t have the flexibility of a startup. We couldn’t just move to cloud, for example. Everything we required we would have to build ourselves on the infrastructure we already had, and we’d need to do it within the strict security confines of our system. It didn’t help that we were first and foremost developers with little previous experience in operations. Can you hope to succeed in building a modern system under these circumstances? This is our war story.
A fresh look
There are a variety of ways you can look at your infrastructure. You can see it as a set of carefully crafted machines that live and evolve with the system. Or you can take a colder attitude and see them as disposable machines necessary to run your apps, machines that can be destroyed and quickly rebuilt every time something goes wrong, or any time you just want to make a change. This notion can be described as immutable infrastructure and moving towards an immutable infrastructure was our first major decision. Configuring environments on the fly, even when using scripts, had proven to be a challenge time and again. Misconfiguration had been a source of numerous issues and confusion. Making our environments disposable gave us a way to make a lot more sense of the environments we were working with.
The right tools greatly reduced the effort needed to set up automated delivery of our software. Furthermore, we could have our whole environment in version control. This basically means we have a full description of our software system at any given point in time – a very valuable asset for tracking and rollbacks in case of trouble. What was left to do was to find the correct tools. Remember we were still dealing with a limited number of bare metal machines. In order to have the immutability we were seeking we opted for containers.
Tools of the trade
Containers gave us a way to package the necessary environment together with the application. Docker, being most popular of such container engines, has become insanely popular. But for us Docker was more of a means to an end. With Docker came our first reality check. We quickly ran into problems that could be described as leaky abstraction. That is, in order to put it to actual good use, we sometimes had to dig deeper into its inner workings than initially anticipated. For example, to debug a mysterious issue Docker had with the version of Linux we were running, we found ourselves knee deep in complexities of Docker’s storage drivers. But at the end of the day we still have great faith in the container technology.
Deploying a single containerized application is very convenient. But we were going to deploy lots of them. We needed some kind of orchestration. Also, we needed to balance the load among the applications and make sure applications could easily find each other. We researched several options and settled on Kubernetes. Kubernetes solved these problems and at the same time was still relatively focused in its feature set. We had grown wary of do-it-all solutions. Kubernetes provided its own opinionated model of building blocks of a software system. But we found these opinions to be very agreeable. Kubernetes also provided the dearly missed declarative way of describing the software system we were wishing to run. Having an orchestration tool such as Kubernetes really starts to pay off once the number of apps starts to grow. If you are planning to go for Microservices architecture it becomes absolutely essential.
Once we had the first applications running on the new stack we needed to monitor them. Without investing in monitoring solutions, a system can seem opaque. No single person has a clear overview what’s really going on under the hood. So, we set up a centralized solution to collect logs and metrics from all of the applications using Elasticsearch and InfluxDB. With Kibana and Grafana we had all the interesting data together on just two dashboards.
No pain, no gain
These are the tools we chose. But they did not come without cost. I already mentioned the gotchas we experienced with Docker. The difficulty with Kubernetes originates from its complexity, the same complexity that makes it extremely robust. Namely, it is made up of several smaller components, each with its own specific purpose. Each of them needs to be configured correctly to make the whole work seamlessly. This makes the set-up a non-trivial task. While there’s admirable effort being put into making it easier with installation scripts, we felt the need to fully understand each small part. After all, we were the ones who would maintain it later on. So we rolled up our sleeves and set up the Kubernetes from scratch.
Another hurdle was setting up the networking correctly. We were used to a lot simpler life where the routes between applications were well known and few in number. With the new approach, there was no knowing which application ended up on which physical host. The physical hosts didn’t matter anymore. This made the network topology considerably more complex and setting up firewalls more challenging. The strict security regulations made us doubt the feasibility of the new setup. We were moving in to an operations territory previously foreign to us. Take a crash course in networking and firewalls. In hindsight, we realized that fear has big eyes. As we gained more insight the picture cleared and the networking turned out to be a non-issue.
An ongoing task is onboarding new architecture. Of course, we didn’t tear down the old system altogether to replace it with new architecture. You don’t spit in the old well before the new one is ready. We set the new architecture alongside the old one and started migrating to it in small steps. The migration to the new system is of course a technical and social challenge in itself.
Time to count the chickens
So where are we today? Thanks to the new tools we are able to release software faster and avoid many problems originating from making too many big changes at once. More automation helps to improve testing of the software and hopefully we will see fewer bugs reach the end user. If things should start to go south, we want to know about it before the end customer. The logging and metrics tools now in place will be of aid there.
As a developer, I’m really happy with the outcome. This undertaking gave me lots of confidence and knowledge of new areas. And personally, I feel a lot of pieces that previously seemed out of place in software delivery, suddenly fell in place. Software delivery doesn’t seem like a chore anymore.
If you are building on a cloud platform today, these benefits are probably already at your disposal. But even if you’re stuck on a run-of-the-mill infrastructure, with a little effort you could still enjoy much of what modern technology has to offer. The benefits are real.