Proactive Ops

Archive

Automated Policy Enforcement

Rules are mostly made to be broken and are too often for the lazy to hide behind.

― Douglass Macarthur

There is merit in Macarthur’s often butchered quote about rules. Some rules are made to be broken.

Other rules are made for good reasons - such as security and privacy policies. For policies to be effective, they need to be enforced. There is no point in having policies that are ignored.

Enforcing policies with automation is seen as an easy path to compliance. But careful consideration needs to be given to every enforcement action.

#10
May 18, 2023
Read more

Managing CI/CD Configuration: Makefile or CI Native?

For many years I used Makefiles as an abstraction layer in my CI/CD tooling. make allows the same sequence of actions to be run both by the CI pipeline and developers locally. This is great for consistency and provides a fast feedback loop for the team.

Using Makefiles as an abstraction has saved me a tonne of effort when migrating between CI/CD platforms on short notice. It was as easy as grabbing the example config files for the languages I was using, add make build, make test and a few other steps and I was ready to test. When migrating at scale, a script can raise templated pull requests for a list of code repositories.

Makefiles are useful when using one platform to test and another for deployment. For a long time, the easiest and most secure way to deploy to AWS was with CodePipeline, but the best developer experience for running tests was a dedicated CI platform. A Makefile ensured the same steps were used to test changes with both toolsets.

Enter GitHub Actions

#9
May 3, 2023
Read more

Pivoting

This week I’m going to keep things short.

Over the last few weeks I’ve been reflecting on Proactive Ops. I am going to move to publishing fortnightly. This gives me more opportunities to spread the word about proactive ops on other platforms.

If you want to read an article on something related to Proactive Ops this week, I recommend Chris Allen’s Serverless Toolbox post. This is a good example of building a feature for reuse, especially when combined with the Lambda Invoke API. I disagree with Chris’ API client. I have a different approach to solving that problem coming soon.

Next week will be the first of the fortnightly posts.

#8
April 26, 2023
Read more

Amazon EventBridge as Your Proactive Ops Engine

Three lego buses on a road. One of them is full of pink bricks. Another bus has some pink bricks behind it. One of the key activities in Proactive Ops is analysing events to identify potential issues and remediating them without human intervention. These events need to be collected and routed to handlers. If you are building your Proactive Ops platform on AWS, Amazon’s EventBridge (nee Amazon CloudWatch Events) is a foundational component of your tech stack.

Rather than bore you with all the detail of AWS EventBridge, I will give you a quick run through of the key features. EventBridge is a managed scalable service bus that includes the following features:

  • Native integrations with over 200 Amazon services, as well as third party SaaS products and your own applications
  • Powerful filtering and routing of events
  • Event routing between AWS regions and accounts
  • Event archiving and replay for backfilling new services and recovering from failures
  • Dead letter queues for handling undeliverable events
  • Schema Registry for capturing and sharing event structure data between teams
  • Integration with X-Ray for distributed tracing and observability

There is so much to learn about EventBridge that I can’t cover it all in this post. When David Boyne releases his book Amazon EventBridge: The Definitive Guide, grab a copy. In the mean time, check out the official Amazon EventBridge documentation for all the stuff I skipped.

#7
April 19, 2023
Read more

You Don't Need More Than Two Environments

Duplo scene. Woman driving a yellow truck to collect bricks from a conveyor belt. Crane operator moving brick from forklift to the conveyor belt. Traditional thinking in application development is there should always be at least 3 environments - dev, stage and prod(uction). Some teams or organisations use different labels, but the purpose of each environment is the same.

For over 5 years, my teams have used two environments with over 100 micro services. These environments are strictly segregated so dev can’t access production data or services and vice versa. If a service integrates with an external service, it connects to a dedicated instance. This strict separation avoids those inevitable “whoops! [Name] just messed up [important data] in prod”.

Testing with Two Environments

You’re probably wondering how we test with only two environments.

#6
April 12, 2023
Read more

Experimenting Doesn’t Need a Lab

Flat bottom flask, 2 beakers, 6 test tubes, all filled with lego bricks.

experiment (noun):

an operation or procedure carried out under controlled conditions in order to discover an unknown effect or law, to test or establish a hypothesis, or to illustrate a known law

Source: https://www.merriam-webster.com/dictionary/experiment

#5
April 5, 2023
Read more

Crafting Events

Photograph of a Lego scene. Woman helping child post letters while a bus passes on the street. Another child skates past. A Bird is sitting in a palm tree. Many teams aspire to building out a collection of loosely coupled domain driven microservices. Some get there, while others struggle. Engineers spend countless hours defining the boundaries of their contexts, designing models and mapping out their endpoints.

One area that often doesn’t get the attention it deserves is event design.

Events are how services communicate. We have established standards for communication. When humans want to use written or spoken words, the standard they use is called language. In tech we have protocols. Both spoken languages and network protocols allow for efficient communication. Imagine a Ukrainian speaker trying to negotiate with a Portuguese speaker, trying to connect an acoustic coupler to a 5G handset, or using HTTP to send email. It’s 2023, so of course you can use HTTP to send email, but you get the idea - we need standards for communicating.

I won’t be reopening the fat vs thin events debate. Just use fat events and move on. Rather than looking at how much data goes into an event, I want to explore the structure of the event.

#4
March 29, 2023
Read more

Step Functions: Unix Pipes in the Cloud

Lego Super Mario Bros with Mario emerging from a pipe, a 1UP being triggered, a goomba under a cloud, and a cloud emerging from another pipe.

Pipes are a key component of Unix. Pipes pass text streams from one process to the next. This allows users to execute a series of commands where the output of one command becomes the input of the next. Pipes can help you find the 5 processes consuming the most CPU time. They can also fetch a webpage and output all the external links in alphabetical order. The possibilities are almost endless.

Back in 2007 Yahoo! was so inspired by Unix pipes they released their product “Pipes”. Until it was shutdown in 2015 Yahoo Pipes allowed users to fetch data from the web and “remix” it by passing it through a series of commands. This was the first product that implemented the pipes philosophy for the web that built a mass user base.

Streaming JSON not Text

#3
March 22, 2023
Read more

Lambda Invoke: The Minimal Loveable API

Lego crane lifting bricks into position while woman guides the crane operator.

This post is the first in a series that will explore the technical foundations of an automation platform that supports Proactive Ops. This series will primarily focus on AWS, but the concepts can be applied to other cloud platforms.

Monolithic web apps often feature a REST API bolted onto the side. When first adopting serverless patterns, teams often implement API first micro services. Using this common pattern can reduce the cognitive load during a period of significant change. At the same time, API first isn’t always the most efficient way for services to interact. This is especially true if the services are only consumed by internal components of the application.

Before we look at alternatives, let’s compare the monolithic flow to one for serverless micro services. On the surface these two approaches appear to be very similar. While the logic is broken out differently, the key flow is a request hitting an API and getting a response. This simplification is deceiving.

#2
March 15, 2023
Read more

What is Proactive Ops?

Photograph of a Duplo aeroplane with two technicians performing maintenance work. The plane is flown by a pig. A chicken is watching the workers. Safety barriers protect the worksite.

At its core, traditional IT Operations is overwhelmingly reactive - humans respond to incidents and tickets. There is so much noise! Alerts, tickets, audits, chat threads and more. What if ops could catch the small things before they turn into big problems? What if your ops team built tools that allowed them to be proactive?

A Proactive Ops team is an IT Ops team engaged in software engineering. There will still be tickets and alerts for the team to deal with, but there should be fewer issues fighting for the team's attention. With most routine issues resolved by scripts, only the tickets that remain need a human to resolve.

Many organisations have embraced some version of the DevOps culture of shared ownership. Some teams took this too far - where all developers have full access to all systems and so everything is always on fire. In other organisations, DevOps became a job title for a sys admin who builds out and deploys to cloud environments. Then there is the enterprise; In this space Ops is often outsourced. Rather than focusing on needs and outcomes, engineers narrowly follow documented processes.

#1
March 8, 2023
Read more
Find Proactive Ops elsewhere: GitHub Twitter Linkedin Dave Hall Consulting