#EDA ‘Event-Oriented Architecture’ Manifesto for Distributed Enterprise Applications

Cameron HUNT
12 min readJan 24, 2021

--

This Manifesto builds upon and adds further details to its companion, the ‘Event-Driven Architecture’ Manifesto for Distributed Enterprise Applications. Both take the February 2017 Blog by respected industry-expert Martin Fowler, What do you mean by “Event-Driven”?, as their starting point.

The ‘Event-Driven Architecture’ Manifesto for Distributed Enterprise Applications declares four fundamental principles:

1) ‘Always Asynchronous’ communication between software components

2) Each ‘Entity’ should be owned by a single ‘Owner-Component’

3) Exposure by individual software components of Entity Type-level, asynchronous ‘Macroservices’, to other software components of the Distributed Application

4) Use of an ‘Entity Query Store’ as a read-only, shared persistence layer

We will now cover each of these principles in further detail.

Always Asynchronous ?

Martin refers to ‘Asynchronous Communication’ in his Blog as “a known complexity-booster”. Given that the Blog — What do you mean by “Event-Driven”? — is unmistakably aimed at clarifying terms heard within the ‘Event-Driven’ context, I would like to start at the very beginning with the term: ‘Asynchronous’. Let’s start with a definition provided indirectly by Martin in his Blog: “a marked separation between the logic flow that sends the event and any logic flow that responds to some reaction to that event”.

Whilst such use of the term is common, this isn’t in fact the definition of ‘Asynchronous’, which is defined by the Oxford Dictionary as: “Not existing or occurring at the same time”. The reason I state this is because “a marked separation between … the event and … some reaction to that event” of a few milliseconds — which is very typical — means the reaction is fairly obviously “occurring at the same time”. This might sound pedantic, but it actually helps to explain the lack of clarity around another common term: ‘Real-Time’. What is interesting about ‘Real-Time’ operations today, is that the best, and maybe the only way we know how to build them, is by making such operations ‘Event-Triggered’. Given this, the best technique we have today to facilitate ‘Real-Time’ operations, is actually premised upon “a marked separation between … the event and … some reaction to that event” being triggered within milliseconds. Such a rapid “reaction to that event” will often be referred to — not necessarily inaccurately — as “synchronous”.

It would appear that the concept that we are actually tying to explain when we use the term ‘Asynchronous’ — within the context of EDA — might be better termed: ‘Monologueous Communication’. I will use Martin’s own words, from the same Blog, to define this new term: “the source system doesn’t really care much about the response. Often it doesn’t expect any answer at all, or if there is a response that the source does care about, it’s indirect”. Such a definition is far closer to the idea that we are actually trying to describe: ‘Asynchronous Communication’ very often expects a response, whilst ‘Monologueous Communication’ rarely expects a response. To take the point a little further, we should note that events are typically published to Brokers synchronously, via request and response, then often exchanged with subscribers synchronously. Should such events therefore be labelled synchronous or asynchronous? In the name of clarity, probably neither.

Now for the hard part: once events are posted by a ‘Publisher’ to a given Broker, does that Broker ‘Push’ those new events to their ‘Subscriber(s)’, or do its clients need to ‘Pull’ new events from the Broker (via polling)? Whilst such a distinction may seem trivial, it utterly determines the list of possible Brokers that can be used when constructing a Distributed Application. To provide some real-world context, RabbitMQ — the World’s most widely deployed open-source Broker — will ‘Push’ new events to connected ‘Subscriber(s)’, whilst Kafka — another World-class open-source Broker — allows its clients to ‘Pull’ events from the Broker themselves, at a time of their choosing. You may have noticed that I use the term ‘Subscriber’ in the Push-scenario, but ‘Client’ in the Pull-scenario. This distinction is also fundamental in the context of ‘Event-Driven’ Architecture.

This subtlety is best described by way of an analogy. If you ‘Subscribe’ to The Guardian, then you pay an annual fee in order to wake up every morning, and be able to find a copy of that newspaper somewhere in your front garden. So what about a second scenario where you pay The Guardian an annual fee, but must nonetheless put on you bathrobe and walk to the newsagent every morning in order to collect your newspaper? And what if your annoying neighbour on the other hand didn’t pay the annual fee, but instead walks to the same newsagent as you each morning to collect his own copy of the same newspaper, instead paying cash; refusing to ‘subscribe’? You probably get the point: only ‘Push’ supports true subscriptions. ‘Pull’ does not, as it requires constant ‘Polling’ of the Broker to be performed by the clients themselves; yet one of the key advantages of a true EDA is that computing resources should remain idle until there is certain work to be performed (vs. endless polling). You probably find this analogy so self-evident that you wonder why you bothered reading it? So why then are we told in Confluent’s “Apache Kafka 101”, that Kafka is “a distributed pub/sub messaging system”? What’s more, if Kafka is a “messaging system”, why does it store “records” and not ‘messages’ (or ‘events’)? And why does Kafka use ‘Partitions’, and not ‘Message Queues’?

Does it matter? Some of you will have noticed the subtle change in title from ‘Event-Driven Architecture’ Manifesto, to ‘Event-Oriented Architecture’ Manifesto. This is because it seems equally self-evident that in a ‘Pull’ scenario — which is always based upon polling of the Broker by the client (at a frequency of its own choosing) — nothing can ever possibly be “Driven” by events. The client will never even know if there are new events until it performs its next poll of the Broker; it is the client itself that ‘drives’ operations (just like the good-old-days). Should we ever care to use terms accurately, only ‘Push’ scenarios — genuine ‘Publish-Subscribe’ topologies — can fully support a true, real-time ‘Event-Driven Architecture’. Only when new events that are published on the Broker are immediately pushed to the nominated ‘Subscriber(s)’, can any operations ever be “Driven” by those events. To conclude: only true ‘Pub/Sub’ topologies can ever enable ‘Event-Driven’ operations, as only they guarantee that new events will be pushed/driven to their registered Subscriber(s) exactly when they occur, by design; everything else is ‘Event-Oriented’.

Now that the jargon has hopefully been demystified, it’s time for a surprising conclusion: real-time ‘Publish-Subscribe’ topologies are usually not the best choice for ‘Distributed Enterprise Applications’ — the best choice is instead an ‘Event-Oriented’ Architecture. Whilst Publish-Subscribe topologies are not only the best choice, but indeed a necessary choice in cases where there are multiple instances of the same Software Component running in parallel — as is the case with Microservices — each working on the same inbound messages which can be consumed ‘At Most Once’ — so-called ‘Competing Consumers’ — in the case of Enterprise Applications (e.g. ERP Procurement), there will typically only ever be a single instance of each individual Software Component.

In this later case, ‘Pub/Sub’ would add a great deal of complexity to the Distributed Application, but no value (see ‘Long Polling’); whilst simultaneously hampering flexibility. It would likewise risk overloading some of the target software components, for which there can be no horizontal scalability — unlike the case of Microservices. Interestingly, this fundamental distinction is not between ‘Microservices’ and ‘Monoliths’, but between ‘Micro-Servers’ (i.e. Containerization) and ‘Mono-Servers’ (where Events can only ever be processed by one consumer, and therefore only ever one time).

The ‘Push’ scenario requires the use of ‘Message Queues’ — such as those provided by RabbitMQ — and these are always initialized for known subscribers, that will each have new messages pushed to them via their preconfigured Message Queues. It is RabbitMQ that manages where each and every Subscriber is in their corresponding Queue(s) to ensure that messages are sent to subscribers ‘At-Most-Once’. The reason this sounds exactly like Point-to-Point Communication, is because it is difficult to ‘Push’ a message to a Target, if you don’t know who the Target is! So what happens if you wish to add new subscribers to your Distributed Application, here-and-there, when using RabbitMQ? Best if you don’t…

Conversely, Kafka — sometimes referred to as a ‘Dumb Broker’ — lets each client manage for itself, exactly where it is in any given Topic ‘Partition’ to which it has access. It lets you subscribe new clients to — long-existing — Topics at any point in time, and they can start reading from any ‘Offset’ of their choice. And what happens if one of your Software Components crashes, and you need to restore it from last week’s Backup? Well, in the case of a Kafka’s Pull mechanism, you restart your restored Component, and it picks up precisely where it left off. It must manage its own reads, so it persists its last committed Topic ‘Offsets’ locally. And what would be the restore process in the case of RabbitMQ’s broker-managed Message Queues? I prefer not to think about it…

Single-Entity-Owner ?

As we continue down Martin Fowler’s Blog, we are warned: “The problem is that it can be hard to see … a [distributed] flow as it’s not explicit in any program text. Often the only way to figure out this flow, is from monitoring a live system. This can make it hard to debug”. In the case of Distributed Enterprise Applications, we should rarely-if-ever need to debug Application ‘flows’. Each software component should instead have an assigned team that is responsible for assuring its correct functioning: debugging should only ever be required for the inner-workings of a given software component’s operations (i.e. ‘Unit Tests’). Consequently, the ‘Integration Tests’ associated with application flows become incredibly straight-forward to perform, as the trigger for any inter-component operation should never be more than a simple event (whose payload can be easily edited in any text editor). Once the unit tests are passed, the risk of bugs being detected during integration tests between software components should be little-to-none. If teams find themselves debugging across different software components, this suggests that the Entity upon which the Macroservice is based, is being updated by more than one component: a violation of the Single-Entity-Owner principle.

The obvious question becomes: what if it is not technically possible to have a single Owner-Component for a certain Entity given that, for example, it is used by completely different business domains that have no logical or practical connection? In such cases, you could consider the possibility of splitting an Entity Type into two, each owned by different components. For example, ‘Customer’ might be split into ‘SalesCustomer’ and ‘CustomerCreditStatus’. Whilst these two Entity Types should always remain conceptually 1:1, they could be said to belong to different business domains, and consequently different software components, if necessary. There is not even an absolute requirement for them to use the same identifiers — on different software components — just so long as some form of mapping between them is always possible, if necessary.

Should an Entity Type be split, it must be remembered that for any fields common to both Entity Types, only one Software Component should ever be nominated as the owner of any one of the shared fields. More importantly, it must be remembered that concurrent Entity/record-level update-locks will once again rear their ugly head, introducing far greater complexity to the Distributed Application: ‘Retries’ — but not ‘Rollbacks’ — must probably be supported if updates of (conceptually-related) Entity Types are split across distinct software components. A simpler, and therefore much cheaper alternative to implementing (n) ‘Retries’ would be the use of scheduled updates during lunch and night times, at which hour they are very likely to avoid any Entity/record-level update-locks. This possibility should in any case always be considered as the first option, because it serves as a very important test: if it is decided that the target system cannot possibly wait for these updates, this should be taken as an important indication that either the Entity Type cannot be split in the imagined way, or that the problematic Field(s) are owned by the wrong component. Such a high level of interdependence between the business domains of distinct software components, should also be taken as a hint that those software components ought to be merged, as their supposedly independent ‘Bounded Contexts’ are in fact heavily interwound.

Macroservices ?

Whilst Macroservices should always be Event-triggered, and should always be built at ‘Entity Type-level’, what about clients external to the Distributed Application (i.e. with no access to the Event Broker, its central nervous system) that would like it to perform certain operations? In fact, the discussed principles apply only to communication between software components of the Distributed Application itself. There is no interdiction to those same software components exposing either synchronous or asynchronous services to external clients; it is for each individual Software Component to maintain its own internal consistency.

Having said that, a major advantage of exposing Event-based services via an ‘Application Eventing Interface’ (AEI), is that the use of JSON event payloads can completely eliminate the API-versioning dilemma that shackles the most modern services developed by our most competent experts today, using a technique I call ‘Version-Stacking’. Using this approach, each software component would interpret received JSON payloads only up to their supported version, and would simply ignore any more recent payload content.

For example, a Component that supports version 1.1 of the “Quote.Requested” Payload would keep the two v1.1 elements — “make” and “model” — whilst ‘Collapsing’ the Payload, but would ignore the deprecated v1.0 “makeAndModel”, as well as the v1.2 elements ”maximumKilometers” and ”color” which it does not yet support. Another Component might support version 1.2 of the “Quote.Requested” Payload, and would instead choose to include these more recent elements — from exactly the same JSON payload — in its Quote calculation:

{ “type”: “Quote.Requested”,

“source”: “SoftCompA”,

“requestID”: “180767”,

“time”: “2021–01–24T12:13:14+00:00″,

“data”: {

“1.0”: {

“makeAndModel”: “Ducati Monster” },

“1.1”: {

“makeAndModel”: “-”,

“make”: “Ducati”,

“model”: “Monster” },

“1.2”: {

“maximumKilometers”: 20000,

“color”: “Red” }

} }

Interestingly, we are once again back at the distinction between ‘Micro-Servers’ — Containers with their own individual databases — and ‘Mono-Servers’ — Servers operating with a single database. Macroservices can be most easily implemented upon (Enterprise-grade) ‘Mono-Servers’. ‘Micro-Servers’, hosting independent Microservices in active competition for workload, cannot employ my proposed ‘Requested’ Event-Pattern (with optimal resource utilization). Although new events can be distributed to ‘Competing Consumers’ on a Round-Robin basis — via pre-configured message queues — it is impossible to know how long each individual request will take to process: it is impossible to know how much longer a Microservice needs to finish its current workload, at the time the message broker must assign it with new tasks.

Entity Query Store ?

As already discussed, an obvious exception to ‘Always Asynchronous’ concerns communication with external clients, which will often be synchronous. Mobile devices are an obvious example of such external clients that might wish to send, for example, an Inbound ‘Command’ — to a given Software Component — to update a customer’s address. However, what about the need of mobile clients for regular data-synchronisation via ‘queries’/’polling’, to support ‘Offline’ scenarios? It seems unlikely that mobile devices (or PCs) will always be able to subscribe directly to the Event Broker, so how should this be achieved? Whilst mobile clients could obviously perform polling of the relevant software component(s) — multiplied by the number of mobile devices — there is thankfully a better way. For regular ‘Query’ scenarios, we simply need to expose the ‘Entity Query Store’ (EQS) that we placed at the hub of our Distributed Application, to external clients. Such external clients could have a single go-to Querymart for all of their read-only ‘Data’ needs.

Looking beyond ‘Data’, there is a common perception that the ‘Event Sourcing’ Pattern has little role to play in enterprise applications: we have done just fine with databases — ‘Data Stores’ — for decades. Yet ‘Data’ is in reality the product of ‘Event Sourcing’: it is the (persisted) ‘Folded State’ of each Entity, in the form of a ‘record’. It is for this reason that State-changing events always result in updates to Data-bases, and precisely why there is (almost) always a 1:1 relationship between any given business Entity Type (e.g. ‘Product’), and the Data-base table in which its latest state is persisted (e.g. ‘Product_Table’). We already persist the ‘Folded State’ of all our business entities in enterprise applications, and have done so since the beginning of time. If the full ‘Event Sourcing’ Pattern is desired, we need do no more than add an ‘Event Store’ to our Distributed Application: precisely the role played by a modern Event Broker.

At this point in time, our project sponsors are likely to ask what purpose the ‘Event Sourcing’ Pattern can serve (even though we have it for free)? The answer — which is especially pertinent in the case of offline mobile scenarios — is the need to fully support Delta-Queries (e.g. OData) with good performance. If a Broker is queried with a given timestamp, it can very efficiently return all newer Events related to the desired Entity Type (providing, in the case of Kafka, there is one Partition per Entity Type). After this Query, which should only take a few milliseconds to execute, the next step is to query the ‘Entity Store’ of that Entity Type (e.g. its ‘Table’), using the returned entity keys. A few milliseconds later you have your Delta-Query results: no Folding, no Change Data Capture, and no duplication of ‘Data’. Who knows, such a pattern might one day come to be known as ‘DQRS’…

LINKS

--

--