Crafting Events
Events are at the core of a Proactive Ops platform. In this post I explore what to consider when designing your events.
Many teams aspire to building out a collection of loosely coupled domain driven microservices. Some get there, while others struggle. Engineers spend countless hours defining the boundaries of their contexts, designing models and mapping out their endpoints.
One area that often doesn’t get the attention it deserves is event design.
Events are how services communicate. We have established standards for communication. When humans want to use written or spoken words, the standard they use is called language. In tech we have protocols. Both spoken languages and network protocols allow for efficient communication. Imagine a Ukrainian speaker trying to negotiate with a Portuguese speaker, trying to connect an acoustic coupler to a 5G handset, or using HTTP to send email. It’s 2023, so of course you can use HTTP to send email, but you get the idea - we need standards for communicating.
I won’t be reopening the fat vs thin events debate. Just use fat events and move on. Rather than looking at how much data goes into an event, I want to explore the structure of the event.
A Proactive Ops platform relies on the efficient sharing of state changes via events. Aligning on what those events should look like, is key.
Envelopes
Back in the 20th century, when we used to send handwritten letters, those scrawls were placed inside an envelope, a stamp affixed and the envelope was dropped into a mailbox. A few days or weeks later, it would arrive as a pleasant surprise for the receiver. The front of the envelope would usually have the name, street, town or suburb, state, post code and possibly the country of the receiver. The back would contain the same details for the sender in smaller print.
This standard form of addressing mail made it easy for postal workers and sorting machines to route a message to its destination. It also made it easy for the receiver to notice if they got a letter intended for someone else.
We can wrap events in envelopes to make it easier to route the messages. An event bus checks each message against a list of rules and if there is a match, send it on to the target. Buses can support matching on payload properties too, but this is usually less efficient.
As is the case with the traditional mail service, the carrier - not the sender - specifies the envelope and addressing standards. In most cases your choice of bus will dictate the envelope format you’re stuck with.
Many vendors are gravitating towards the CNCF’s CloudEvents standard as the event envelope standard. The notable exception here is Amazon, despite some initial interest in CloudEvents. It’s disappointing that AWS doesn’t support this emerging standard.
Let’s compare the JSON event envelopes used by CloudEvents and AWS EventBridge.
CloudEvents
{
"specversion": "1.0",
"type": "<Event producer. Reverse DNS prefix encouraged>",
"source": "<URI reference of producer>",
"id": "<Unique identifer for the event>",
"time": "<RFC 3339 formatted date time UTC>",
"datacontenttype": "application/json",
"data": "{/* payload as a JSON object */}"
}
EventBridge
{
"version": "0",
"id": "<UUID>",
"detail-type": "<Name/description of the event>",
"source": "<Service that emitted the event>",
"account": "<AWS account that emitted event>",
"time": "<ISO 8601 formatted date time UTC>",
"region": "<AWS region that emitted the event>",
"resources": [
"/* Optional list of relevant resources as strings. AWS events use ARNs */"
],
"detail": "{/* payload as a JSON object */}"
}
The two event formats are similar. The structure reflects their origins. CloudEvents are more generic, while the EventBridge format is more AWS specific.
If you’re considering building your own internal event router or bus from scratch, stop. It’s not worth the effort. Use your cloud provider’s bus. If you really, really want to run your own bus, pick TriggerMesh or Apache EventMesh. They’re both open source and support CloudEvents.
What’s Inside?
Once the event gets to its destination, the handler needs to process it. This usually means someone has to mash on the keyboard to turn a JSON encoded string into a native object. This native object is then passed around a bit while some further actions are taken based on the values of the properties. Efficient event design can reduce the amount of effort needed to process an event. This speeds up development cycle times.
If each service emits events with unique structures, it adds to the cognitive load of all engineers. Both those producing events and those consuming them. “Oh that’s right, this is the parcel dispatched event from the logistics service, not the parcel packed event from the warehouse service. That’s why all the properties are camel cased 🤦”. You don’t want that!
There needs to be some basic rules that all your internal events follow. These need to be documented. Here’s a few items to get you started:
- Are properties snake case or camelCase? snake
- What are the naming conventions for events? noun-verbed.
- Is there a common standard for naming entity ids? Is it
entity_type_id
or plain oldid
? - Are there any standard properties all events include? Probably not a bad a idea.
- Is there a consistent property that can be used as an idempotency key? That’s a really good idea.
- Can a consumer obtain the chain of events the current one belongs to? It would be nice.
- How do engineers see all the events that services emit and their properties? We’ll cover that in another post.
Documenting Your Standards
Defining some basic standards makes it easier for everyone to know what to expect. Document your rules in a concise format. A numbered or bulleted list encourages succinct items. Numbered items can become easy shorthand in code reviews. Here’s an example you can draw inspiration from:
- The event schema must be documented
- We emit events, not commands
- Event names must be
noun-verbed
and nevernoun-verbing
- Thou shall not camelCase
- Every event must contain
data
,meta
anderror
properties event_id
is the idempotency key, so it must be present in every event- Breaking backwards compatibility is a big deal, it should be the option of last resort
- Each service interacts with a single event bus
Put your rules at the top of your event standards documentation. It will be one of the first things new engineers read when they’re exploring your event system.
Versioning
Changing the envelope or common structure of your event schema isn’t trivial. Once events are flying around, it is very difficult to make radical changes. Teams depend on the format they use today.
Adding new common properties to the schema is relatively easy. Consumers should allow these additional properties, even if they don’t use them.
Documenting the schema assists event producers and consumers. Like events, documentation provides a lightweight interface for teams to collaborate. Use a common standard such as JSONSchema or OpenAPI to document the schema. Share the schema in an accessible location, such as a private or internal GitHub repository. Schemas should use semantic versioning.
The size of the organisation, number of teams, volume of events and availability of budget are all factors that will influence the appetite for change. A small organisation with one or two teams is likely to face less barriers to making the change, but there may not be funding for the work. The inertia in a large organisation may result in a lengthy, potentially multi year, transition where some events use the new schema and others use the legacy structure. Such a transition can result in higher cognitive load for all teams and slow delivery. This may be enough to kill the proposal.
Have Opinions
Event standards should be opinionated. There needs to be some consistency to make it easier for teams to collaborate at the boundaries of their service.
Domain specific events should mirror the models of the domain. They should be a public statement of the opinions of the team who built the service.
The producer decides what an event should contain, not the consumer. Each event should contain enough information for a consumer to make an informed decision, or at least the key identifiers to fetch the data needed to make informed decision. While sourcehut’s webhook implementation is an interesting idea, it inverts the control of the event structure. This may save some bandwidth, but it adds complexity for both the consumer and producer.
This post focused on an event bus being the transport and routing layer for the events the systems emit and process. There are other transport layers and patterns, but the same principles apply to them. 🌊
Need Help?
If you want to adopt Proactive Ops, but you're not sure where to start, get in touch! I am happy to help get you.
Proactive Ops is produced on the unceeded territory of the Ngunnawal people. We acknowledge the Traditional Owners and pay respect to Elders past and present.