#EDA ‘Event-Oriented Architecture’ Manifesto for Distributed Enterprise Applications
This Manifesto builds upon and adds further details to its companion, the ‘Event-Driven Architecture’ Manifesto for Distributed Enterprise Applications. Both take the February 2017 Blog by respected industry-expert Martin Fowler, What do you mean by “Event-Driven”?, as their starting point.
The ‘Event-Driven Architecture’ Manifesto for Distributed Enterprise Applications declared four fundamental principles:
1) ‘Always Asynchronous’ communication between software components
2) Each ‘Entity Type’ should be owned by a single ‘Owner-Component’
3) Provision of single-software-component, Entity Type-level ‘Macroservices’ for the exposure of operations to other software components of the Distributed Application
4) Use of an ‘Entity Query Store’ as a read-only, shared persistence layer
We will now cover each of these principles in further detail.
Always Asynchronous ?
Martin refers to ‘Asynchronous Communication’ in his Blog as “a known complexity-booster”. Given that the Blog — What do you mean by “Event-Driven”? — is unmistakably aimed at clarifying terms heard within the ‘Event-Driven’ context, I would like to start at the very beginning with the term: ‘Asynchronous’. Let’s start with a definition provided indirectly by Martin in his Blog: “a marked separation between the logic flow that sends the event and any logic flow that responds to some reaction to that event”.
Whilst such use of the term is common, this isn’t in fact the definition of ‘Asynchronous’, which is defined by the Oxford Dictionary as: “Not existing or occurring at the same time”. The reason I state this is because “a marked separation between … the event and … some reaction to that event” of a few milliseconds — which is very typical — is fairly obviously “occurring at the same time” (at least in terms of human perception). This might sound pedantic, but it actually helps to explain the lack of clarity around another common term: ‘real-time’. What is interesting about ‘real-time’ operations today, is that the best, and maybe the only way we know how to build them, is by making such operations ‘Event-Triggered’. Given this, the best technique we have today to facilitate ‘real-time’ operations, is actually premised upon “a marked separation between … the event and … some reaction to that event” within milliseconds. Such a rapid “reaction to that event” will often be referred to — not necessarily inaccurately — as “synchronous”.
It would appear that the concept that we are actually tying to explain when we use the term ‘Asynchronous’ — within the context of EDA — might be better termed: ‘Monologueous Communication’. I will use Martin’s own words, from the same Blog, to define this new term: “the source system doesn’t really care much about the response. Often it doesn’t expect any answer at all, or if there is a response that the source does care about, it’s indirect”. Such a definition is far closer to the idea that we are actually trying to describe: ‘Asynchronous Communication’ very often expects a response, whilst ‘Monologueous Communication’ rarely expects a response. To push the point a little further, we should also note that events are typically — within the context of EDA — published to a Broker synchronously. Should such events therefore be labelled synchronous or asynchronous? In the name of clarity, probably neither.
Now for the hard part. Once events are written by a ‘Publisher’ to a given Broker, does that Broker then ‘Push’ those new events to their ‘Subscriber(s)’, or do its clients need to ‘Pull’ new events from the Broker (via polling)? Whilst such a distinction may seem trivial, it utterly determines the list of possible Brokers that can be used when constructing a Distributed Application. To provide some real-world context, RabbitMQ — the World’s most widely deployed open-source Broker — will ‘Push’ new events to ‘Subscriber(s)’, whilst Kafka — another World-class open-source Broker — demands that its clients ‘Pull’ new events from the Broker themselves. You may have noticed that I use the term ‘Subscriber’ in the Push-scenario, but ‘Client’ in the Pull-scenario. This distinction is also fundamental in the context of ‘Event-Driven’ Architecture.
This subtlety is best described by way of an analogy. If you ‘Subscribe’ to TheGuardian, then you pay an annual fee in order to wake up every morning, and be able to find a copy of that newspaper somewhere in your front garden. So what about a second scenario where you pay TheGuardian an annual fee, but must nonetheless put on you bathrobe and walk to the newsagent every morning in order to collect your newspaper? And what if your annoying neighbour on the other hand didn’t pay the annual fee, but instead walks to the same newsagent as you each morning to collect his own copy of the same newspaper, instead paying cash; refusing to ‘subscribe’? You probably get the point: only ‘Push’ supports true subscriptions. ‘Pull’ does not, as it requires constant ‘Polling’ of the Broker to be performed by the clients themselves; yet one of the key advantages of a true EDA is that computing resources should remain idle until there is certain work to be performed (vs. endless polling). You probably find this analogy so self-evident that you wonder why you bothered reading it? So why then are we told in Confluent’s “Apache Kafka 101”, that Kafka is “a distributed pub/sub messaging system”? What’s more, if Kafka is a “messaging system”, why does it store “records” and not ‘messages’ (or ‘events’)? And why does Kafka use ‘Partitions’, and not ‘Message Queues’?
Does it matter? Some of you will have noticed the subtle change in title from ‘Event-Driven Architecture’ Manifesto, to ‘Event-Oriented Architecture’ Manifesto. This is because it seems equally self-evident that in a ‘Pull’ scenario — which is always based upon polling of the Broker by the client (at a frequency of its own choosing) — nothing can ever possibly be “Driven” by events. The client will never even know if there are new events until it performs its next poll of the Broker; it is the client itself that ‘drives’ operations (just like the good-old-days). Should we ever care to use terms accurately, only ‘Push’ scenarios — genuine ‘Publish-Subscribe’ topologies — can fully support a true, real-time ‘Event-Driven Architecture’. Only when new events that are published on the Broker are immediately pushed to the nominated ‘Subscriber(s)’, can any operations ever be “Driven” by those events. To conclude: only true ‘Pub/Sub’ topologies can ever enable ‘Event-Driven’ operations, as only they guarantee that new events will be pushed/driven to their registered Subscriber(s) exactly when they occur, by design; everything else is ‘Event-Oriented’.
Now that the jargon has hopefully been demystified, it’s time for a surprising conclusion: real-time, ‘Publish-Subscribe’ topologies are usually not the best choice for ‘Distributed Enterprise Applications’; the best choice is instead an ‘Event-Oriented’ Architecture. Whilst Publish-Subscribe topologies are not only the best choice, but indeed a necessary choice in cases where there are multiple instances of the same Software Component running in parallel, each working on the same messages that must be consumed ‘At Most Once’ — as is the case with Microservices — in the case of Enterprise Applications, there will typically only ever be a single runtime instance of each individual Software Component (e.g. ERP Procurement). In this later case, ‘Pub/Sub’ would add a great deal of complexity to the Distributed Application, but no value (see ‘Long Polling’); whilst simultaneously hampering flexibility. It would likewise risk overloading some of the target software components, for which there can be no horizontal scalability — unlike the case of Microservices. Interestingly, the necessary distinction is not between ‘Microservices’ and ‘Monoliths’, but between ‘Micro-Servers’ (i.e. Containerization) and ‘Mono-Servers’ (where Events can only ever be processed once, and only ever in the correct order).
The ‘Push’ scenario requires the use of ‘Message Queues’ — such as those provided by RabbitMQ — and these are always built for a known number of subscribers, that will each have messages pushed to them from their preconfigured Message Queues. It is RabbitMQ that manages where each and every Subscriber is in their corresponding Queue(s) to ensure that messages are sent to subscribers ‘At-Most-Once’. The reason this sounds exactly like Point-to-Point Communication, is because it is difficult to ‘Push’ a message to a Target, if you don’t know who the Target is! So what happens if you wish to add more subscribers to your Distributed Application, from time-to-time, when using RabbitMQ? Best if you don’t…
Conversely, Kafka — sometimes referred to as a ‘Dumb Broker’ — lets each client manage for itself, exactly where it is in any given Topic ‘Partition’ to which it has access; it lets you add new clients to (long-existing) Topic Partitions at any point in time, and they start reading from any ‘Offset’ of their choice. And what happens if one of your Software Components crashes, and you need to restore it from last week’s Backup? Well, in the case of a Kafka’s Pull mechanism, you restart your restored Component, and it picks up precisely where it left off; it must manage its own reads, so it persists its last read Topic ‘Offsets’ locally. And in the case of RabbitMQ’s broker-managed Message Queues? I prefer not to think about it…
As we continue down Martin Fowler’s Blog, we are warned: “The problem is that it can be hard to see … a [distributed] flow as it’s not explicit in any program text. Often the only way to figure out this flow, is from monitoring a live system. This can make it hard to debug”. In the case of Distributed Enterprise Applications, we should rarely-if-ever need to debug Application ‘flows’. Each software component should instead have an assigned team that is responsible for assuring its correct functioning: debugging should only ever be required for the inner-workings of a given software component’s operations (i.e. ‘Unit Tests’). Consequently, the ‘Integration Tests’ associated with application flows become incredibly straight-forward to perform, as the trigger for any inter-component operation should never be more than a simple event (whose payload can be easily edited in any text editor). Once the unit tests are passed, the risk of bugs being detected during integration tests between software components should be little-to-none. If teams find themselves debugging across different software components, this suggests that the Entity Type upon which the Macroservice is based, is being updated by more than one component: a violation of the Single-Entity-Owner principle.
The obvious question becomes: what if it is not technically possible to have a single Owner-Component for a certain Entity Type given that, for example, it is used by completely different business domains that have no logical or practical connection? In such cases, you could consider the possibility of splitting an Entity Type into two, each owned by different components. For example, ‘Customer’ might be split into ‘SalesCustomer’ and ‘CustomerCreditStatus’. Whilst these two Entity Types should always remain conceptually 1:1, they could be said to belong to different business domains, and consequently different software components, if necessary. There is not even an absolute requirement for them to use the same identifiers — on different software components — just so long as some form of mapping between them is always possible, if necessary.
Should an Entity Type be split, it must be remembered that for any fields common to both Entity Types, only one Software Component should ever be nominated as the owner of any one of the shared fields. More importantly, it must be remembered that concurrent Entity/record-level update-locks will once again rear their ugly head, introducing far greater complexity to the Distributed Application: ‘Retries’ — but not ‘Rollbacks’ — must probably be supported if updates of (conceptually-related) Entity Types are split across distinct software components. A simpler, and therefore much cheaper alternative to implementing (n) ‘Retries’ would be the use of scheduled updates during lunch and night times, at which hour they are very likely to avoid any Entity/record-level update-locks. This possibility should in any case always be considered as the first option, because it serves as a very important test: if it is decided that the target system cannot possibly wait for these updates, this should be taken as an important indication that either the Entity Type cannot be split in the imagined way, or that the problematic Field(s) are owned by the wrong component. Such a high level of interdependence between the business domains of distinct software components, should also be taken as a hint that those software components ought to be merged, as their supposedly independent ‘Bounded Contexts’ are in fact heavily interwound.
Whilst Macroservices should always be Event-based, and should always be built at ‘Entity Type-level’, what about clients external to the Distributed Application that would like it to perform certain operations? In fact, the discussed principles apply only to communication between the software components of the Distributed Application. There is no interdiction to those same software components providing (synchronous or asynchronous) interfaces — API, Command or Event-based — to external clients; it is for each individual Software Component to maintain its own internal consistency.
Having said that, a major advantage of exposing ‘Event-based Services’ is that the use of JSON event payloads can completely eliminate the API-versioning dilemma that shackles the most modern services developed by our most competent experts today, using a technique I call ‘Version-Stacking’. Using this approach, each software component would interpret received JSON payloads only up to their supported version, and would simply ignore any more recent payload content. For example, a Component that supports version 1.1 of the “Quote.Requested” Payload would keep the two v1.1 elements — ”make” and ”model” — whilst ‘Collapsing’ the payload below, but would ignore the deprecated v1.0 “makeAndModel”, as well as the v1.2 elements ”maximumKilometers” and ”color” which it does not yet support. Another Component might support version 1.2 of the “Quote.Requested” Payload, and would instead choose to include these more recent elements — from exactly the same JSON payload — in its Quote calculation:
“makeAndModel”: “Ducati Monster”
} } }
Entity Query Store ?
As already discussed, an obvious exception to ‘Always Asynchronous’ concerns communication with external clients, which will often be synchronous. Mobile devices are an obvious example of such external clients that might wish to send, for example, an Inbound ‘Command’ — to a given Software Component — to update a customer’s address. However, what about the need of mobile clients for regular data-synchronisation via ‘queries’/’polling’, to support ‘Offline’ scenarios? It seems highly unlikely that mobile devices (or PCs for that matter) will be able to subscribe directly to the Event Broker — although external servers probably ought to — so how should this be achieved? Whilst mobile clients could obviously perform polling of the relevant software component(s) — multiplied by the number of mobile devices — there is thankfully a better way. For regular ‘Query’ scenarios, we simply need to expose the ‘Entity Query Store’ (EQS) we placed at the hub of our Distributed Application to external clients. Such external clients could have a single go-to Querymart for all of their read-only ‘data’ needs.
Looking beyond ‘data’, there is a common perception that the ‘Event Sourcing’ Pattern (see Links) has little role to play in enterprise applications: we have done just fine with databases — ‘Data Stores’ — for decades. Yet ‘data’ is in reality the product of ‘Event Sourcing’: it is the persisted ‘Folded State’ of each Entity (in the form of a ‘record’). It is for that reason that State-changing events always result in updates to Data-bases, and precisely why there is (almost) always a 1:1 relationship between any given business Entity Type (e.g. ‘Product’), and the Data-base table in which its latest state is persisted (e.g. ‘Product_Table’). We already persist the ‘Folded State’ — otherwise referred to as ‘Snapshots’ — of all our business entities in enterprise applications and have done so since the beginning of time. If the full ‘Event Sourcing’ pattern is desired, we need do no more than add an ‘Event Store’ to our Distributed Application: precisely the role played by a modern Event Broker.
At this point in time, our project sponsors are likely to ask what purpose the ‘Event Sourcing’ Pattern can serve (even though we have it for free)? The answer — which is especially pertinent in the case of (offline) mobile devices — is the need to fully support Delta-Queries (e.g. OData), with good performance. If an Event Broker Topic is queried with a given timestamp, it can very efficiently return all Events related to the desired Entity Type (e.g. providing there is one Partition per Entity Type, in the case of Kafka), that occurred since that time. After this Query, which should only take a few milliseconds, the next step is to query the ‘Entity Store’ of that Entity Type (i.e. its ‘Table’), using the returned entity keys. A few milliseconds later you have your Delta-Query results: no Folding, no Change Data Capture, and no duplication of ‘data’. Who knows, such a pattern might one day come to be known as ‘DQRS’…
'Event-Driven Architecture' Manifesto for Distributed Enterprise Applications
In his February 2017 Blog, What do you mean by "Event-Driven"?, Martin Fowler - one of the founding fathers of the…
What do you mean by "Event-Driven"?
This happens when a system sends event messages to notify other systems of a change in its domain. A key element of…