Eventual Consistency can be a Good Thing

sftwr2020

7 years ago

Eventual Consistency can be a Good Thing

A foolish consistency is the hobgoblin of little minds, adored by little statesmen and philosophers and divines 🔮 With consistency a great soul has simply nothing to do.
~ Ralph W. Emerson (In Self-Reliance, pointing out the misguided kind of consistency)

A distributed system is one in which the failure of a computer you didn’t even know existed can render your own computer unusable 📺
~ Leslie Lamport (Ushering us into the wonders of the distributed universe)

Don’t interrupt me while I’m interrupting 📣
~ Winston S. Churchill (Elaborating in his inimitable style on one of the perennially consistent, finer points of concurrency gotchas)

You can have it good, you can have it fast, you can have it cheap; pick two 🐘 🐎 🐑
~ Software development folklore (Brewer’s CAP theorem, put another way)

When you eventually see
through the veils to how things really are,
you will keep saying again and again,
“This is certainly not like we thought it was!” 👻
~ Jelaluddin Rumi (translated by Coleman Barks, The Essential Rumi—HarperCollins)

The artist is a receptacle for emotions that come from all over the place: from the sky, from the earth, from a scrap of paper, from a passing shape, from a spider’s web 🐚
~ Pablo Picasso (painter, sculptor, printmaker, poet, and playwright)

Let’s not beat about the bush: As members of a generation that grew up on ACID—hey not that kind of acid—we get butterflies in our stomach when dealing with a system that doesn’t provide ironclad transactional guarantees. When confronting a system that reeks of the absence of transactions, we feel—that is precisely how I used to feel not so long ago—like we’re no longer on terra firma.

On top of that, I’ll be eating my own words in this essay: So I had made the bold statement elsewhere, and fairly recently too, that

…I generally avoid putting together (technical) tutorials, incredibly helpful as they are, simply because the internet is already awash with them; my inclination is to dig deeper, and share my findings with you.

But I keep running into the same uneasy confusion and vagueness with which many people greet the notion of eventual consistency. My best guess is that, raised on a steady diet of transaction processing, folks respond—and I was there not too long ago—with the reactions of one who is in terra incognito… Notwithstanding the preceding talk of my disinclination for putting together (technical) tutorials, I’ll simply veer into deep dives at the same time 😉

But first taking a step back, and synchronizing our terminology so as to be on the same page, let’s hear what is, IMHO, the finest definition that captures the essence of what exactly a transaction is. So let’s hear what the legendary Jim Gray (American computer scientist who received the Turing Award in 1998 for seminal contributions to database and transaction processing research and technical leadership in system implementation) had to say. As he aptly put it, a transaction is “a transformation of state” which has ACID properties—acidity, ala the pH scale? chemistry anyone? or perhaps Maalox antacid lol? ♨ Frankly, I can’t recall coming across a definition that better captures the gestalt of transaction processing.

And speaking of ACID, let’s pass them around, shall we? 🏀

Atomic semantics literally imply an all-or-nothing, one-shot operation. When a statement is executed, every update within the transaction must succeed in order to be called successful. There is no partial failure 🐙
Consistent suggests that data transitions from one proper state to another proper state, i.e. no chance that readers could view different values that do not make sense together 🐌
Isolated semantics imply that transactions which execute concurrently won’t become enmeshed with one other. Each transaction executes in its own quarantined space 🐚
Durable guarantees suggest that when a transaction has succeeded, the state changes will persist, and won’t be lost 🐢

As a technologist and essayist, it is ever my wish to serve you while remaining faithful to citing resources to the best of my ability; with that, this metaphor of ACID properties is one that I first came across in the pages of a slender book—scarcely 100 pages in length but a good one—entitled Java Transaction Design Strategies (Lulu.com) by Mark Richards. This was back in mid-2012. Anyhow, Richards starts a section, which is cheekily entitled Where’s the ACID, Man? with these words 😳

If we were in the 1960s this would probably be the most popular section in the book. However, in today’s technolog world ACID means something different than it did in the 1960s…

So we were saying… Yes, at a superficial level, these (ACID) properties seem so patently desirable—like motherhood and apple pie—as to not warrant even the contemplation of a discussion (of this subject), let alone a debate 🍎

But a finer-grained study may well lead us to the desire for finding a mechanism to adjust these properties somewhat, tweaking them to our advantage; what is missing from a superficial examination is the matter of how we are paying for our transactions. As memorably captured by the phrase that there ain’t no such thing as a free lunch, transactions are, sadly, not free, logically leading us to wonder what alternatives we have. Dare I say Mick Jagger was ruefully aware contemplating the heavy cost of transactions—the proverbial fly in the ointment—when he soulfully sang his heart out, telling us that

You can’t always get what you want

But if you try sometimes well you might find

You get what you need

I went down to the demonstration

To get my fair share of abuse

Singing, “We’re gonna vent our frustration

If we don’t we’re gonna blow a 50-amp fuse

~ The Rolling Stones (lyrics from You Can’t Always Get What You Want)

Enter eventual consistency, so despair not 🐔

The swimmingly elegant term eventual consistency—I cringe to think it being called otherwise by a less cooler name—was originally coined by Werner Vogels, CTO of Amazon (circa 2007). True enough, and you remind me as much by telling me, Hey Akram, a rose by any other name would smell as sweet. 🌹 Yes it does, as I’ll politely reply But then again, these matters have a way of transcending beyond the realm of reason. To wit, let’s digress, and I promise that this digression—may I gently remind the reader that this eponymous blog boldly proclaims the moniker Programming Digressions somewhere in its title—will be ever so brief 😇 I mean, we got to have ourselves some fun along the ride, don’t we?

So let’s hear what Steven Levy tells us with great relish in his sparkling book entitled In The Plex: How Google Thinks, Works, and Shapes Our Lives on this selfsame theme. Levy happens to also be the author of one of my all-time favorites, the classic Hackers: Heroes of the Computer Revolution, but pursuing Hackers would take us too far afield. Let’s save that for another date. Back to the gleaming pages of In The Plex, where Levy is regaling us with how

They built a system that implemented “checkpointing,” a way for the index to hold its place if a calamity befell a server or hard disk. But the new system went further—it used a different way to handle a cluster of disks, more akin to the parallel-processing style of computing (where a computational task would be split among multiple computers or processers) than the “sharding” technique Google had used, which was to split up the web and assign regions of it to individual computers. (Those familiar with computer terms may know this technique as “partitioning,” but, as Dean says, “everyone at Google calls it sharding because it sounds cooler.” Among Google’s infrastructure wizards, it’s key jargon.) 🚰

So there you have it. As we unwind our call stack, so to say, in returning to the central theme of eventual consistency, the white paper on the subject by Vogels is the one which brought this neologism into our midst. It’s positively worth a read. And what Vogels wrote in there has stood up amazingly well with the passage of time, given how we move at internet warp speed, and how technologies become obsolete at a dizzying pace. That white paper of sorts is entitled Eventually Consistent – Building reliable distributed systems at a worldwide scale demands tradeoffs between consistency and availability, and was featured, many moons ago—on December 22 of 2008 to be precise—in his blog called All Things Distributed. Vogels’ weblog on building scalable and robust distributed systems is one to bookmark, if I may suggest doing so 🚧

Needless to say, Amazon being the biggest e-commerce site out there, that white paper had tremendous influence on (the design of) a variety of databases. The NoSQL databases HBase and of course Amazon’s DynamoDB come to mind; the latter is especially familiar turf for me, swimming as I do in the high seas of Amazon Web Services (AWS), deploying to AWS the runnable artifacts of the software that my team and I design and craft 🌊

Enter the juggernaut of Big Data 🐘

Look around your data pipelines, and and hear the murmurs of torrential petabytes funneling through those very pipelines of our attempts at complexity management 🔩

Now, to understand the raison d’etre for eventual consistency, let’s take a step back and have ourselves a slightly deeper look at the state of affairs—a consideration of the heavy burden that transactions place in the Big Data terrain especially—which directly led to a serious revisitation of and widespread adoption of the eventual consistency wherewithal. As Martin Fowler aptly noted in his classic book entitled Patterns of Enterprise Application Architecture (Pearson Education)

Most enterprise applications run into transactions in terms of databases. But there are plenty of other things that can be controlled using transactions, such as message queues, printers, and ATMs. As a result, technical discussions of transactions use the term “transactional resource” to mean anything that’s transactional—that is, that uses transactions to control concurrency. “Transactional resource” is a bit of a mouthful, so we just use “database,” since that’s the most common case. But when we say “database,” the same applies for any other transactional resource.

To handle the greatest throughput, modern transaction systems are designed to keep transactions as short as possible. As a result the general advice is to never make a transaction span multiple requests. A transaction that spans multiple requests is generally known as a long transaction. For this reason a common approach is to start a transaction at the beginning of a request.

For this reason a common approach is to start a transaction at the beginning of a request and complete it at the end. This request transaction is a nice simple model, and a number of environments make it easy to do declaratively, by just tagging methods as transactional.

I highly recommend looking up the preceding section in Fowler’s stellar Patterns of Enterprise Application Architecture, which is affectionately known in our industry as POEEA. Yep, POEAA hit the bookshelves way back in 2002—it also hit a chord with software developers looking for guidance on architecting their applications—and it is alive and kicking, for eminently good reasons. I visited its page on Amazon and randomly picked a typical rave review. Here’s what one reviewer had posted there, and it totally resonates with me, so I’ll share it here: “This is a book that has stood the test of time incredibly well. If you want to understand the patterns and architectural principals behind the frameworks people use to make complex business or web software, this is fantastic”. I couldn’t agree more—my pic below with the thematic arrangement is a tribute to precisely that sentiment. And yeah, talk about our industry’s propensity for slick acronyms, POEAA and all 🍭

Look around your digital infrastructures, and hear the echoes reverberate through The House That Jack Built 🔨

And speaking of POEEA, Fowler goes on to provide a marvelously readable survey of (some related, hard core topics such as) reducing transaction isolation for liveness, patterns for offline concurrency control, and even—this one admittedly not for the faint of heart—application server concurrency. In fact, for application server concurrency in particular, I cannot think of a better, single resource than the fine book entitled Pattern-Oriented Software Architecture, Volume 2: Patterns for concurrent and distributed systems (John Wiley, 2000) by Schmidt, Stal, Rohnert, and Buschmann. Referring to that book, Fowler aptly points out that it is 😅

…more for people who design application servers than for those who use application servers, but it’s good to have some knowledge of these ideas when you use the results (italics mine).

So there 😎

When you read POEEA in its entirety—and you should because it’s indispensable reading for truly grokking the background on how we arrived at our modern distributed infrastructures—you’ll note that its author (Fowler) also, and tellingly so, cites the classic volume on the subject, entitled Principles of Transaction Processing (Morgan Kaufmann, 1997) by Bernstein and Newcomer, half-jokingly reminding us in introducing the classic that it is “An excellent introduction to the head-hurting world of transactions.” Amen, Fowler 😰 BTW, there is actually an updated, newer edition of the book available now: Principles of Transaction Processing, Second Edition (The Morgan Kaufmann Series in Data Management Systems), and which is what will pop up later in this essay, in another pic with my kludgy thematic arrangement

Look around and into the guts of your application servers, and fathom what all concurrency benefit you can derive, with or without them 🎻

With that, let’s bring replication into the picture. Stripped to its essence, replication means keeping a copy of the same data on multiple machines which are connected to one another in a network with a given topology. 🐴 And lest anyone accuse me of putting cart before the horse, I hasten to add that the whole reason we even have replication is that we wish to

scale out the servers which can serve queries (think increased throughput on the read side of things) 📶
keep data in geographical proximity to users (think reduced latency) 🏠
allow systems to continue functioning in the face of partial failure (think increased availability) 🔥

Needless to say, we begin with the eminently reasonable premise that our dataset has grown to the point where it’s no longer possible for a single server to hold a copy of the whole dataset. So if the data that you wish to replicate does not change over time, replication is super-easy. All you need to do is copy the data to every node just the one time, that’s it.

But here’s the kicker: Let’s say that the data that you have on your hands—as you will in all likelihood—happens to be subject to change, and remains in flux. Oh, the searing pain of dealing with ever-present change (changing requirements, anyone?) has singed its way into the collective psyche of us tech types, hasn’t it? 😹

Sleight of hand and twist of fate

On a bed of nails she makes me wait

And I wait without you

With or without you

Through the storm we reach the shore

You give it all but I want more

And I’m waiting for you

~ U2 (lyrics from With Or Without You)

In a nutshell, the challenge of replicating lies in navigating the realm of changing data. Luckily, though, there are a handful of well-studied strategies for replicating changes between nodes, and which would be well worth scrutinizing; this subject has been really well-documented, and you’ll find the gory details on primarily three flavors of replication (single-leader, multi-leader, and leaderless) out there. In fact, I’ll go out on a limb and say that any distributed database that you can throw a rock at will likely be using one of these three approaches. Again, the gory details are freely available elsewhere; ignore them at your own peril 🎃

Look around your software systems, and divine which components can benefit from a move away from—or perhaps even toward—a transactional approach 🏁

I concede that the replication of databases is an antiquated topic; it is hoary by the scale of internet years because the foundations haven’t changed since their introduction (circa 1970), plus networking principles have stayed unchanged, mercifully enough, if I may add 😂 Most developers, though, go about their work with the deeply engrained idea that any given database is made up of a single node; clearly, though, the trend has squarely and unerringly converged on databases that are distributed; so thoroughly mainstream have distributed databases become that it’s almost unthinkable nowadays to even contemplate that it was ever otherwise 💭

My best guess is that therein lie the genesis of the phenomenon that I had sketched at the very outset of this essay. Look, I’ll be the first one to confess that eventual consistency—an eminently sensible strategy that I have come to view it as—takes a little getting used to before it begins feeling normal, until it eventually becomes second nature (pun sort of intended) 👻

In fact, I would surmise that the next generation of technologists, who will scarcely have seen anything but NoSQL—with nary a palpable trace of transaction processing mechanism in there—will take to the notion of eventual consistency like ducks take to water 🐢 The rest of us, raised as we were on a steady diet of transaction processing, have had to go through, and perhaps continue to go through a period of sorting out the all-too-natural confusion that has surely lurked in our minds, even if diffusely so, until the proverbial light bulb went off and we saw the light (at the end of the tunnel). I just had to wedge in the tunnel metaphor in there, didn’t I? My story is that poetic license beckoned, and I’m sticking by my story 🔦

Some related topics that I’ll only mention in passing—topics that are well worth your while to dig into deeper—include monotonic read consistency and read-your-writes consistency. Meanwhile, let’s round out our exploration of eventual consistency with a foray through the vital concerns of leaders and followers. Essentially, we start with the premise that a replica is simply a node which stores a complete copy of a given database, and we want to formulate any and all questions that would be worth addressing to move forward.

First question, anyone? Okay, I’ll throw this one out there: How to make sure that the selfsame data consistently ends up on all replicas? If we think about it, each database write had better be processed by all replicas. If that does not happen, the replicas would diverge from a common dataset. Again, tons of gory details that we’ll gloss over here, but suffice it to say that the predominantly accepted solution for precisely this problem is known as leader-based replication. But I hear you asking, How does one achieve high availability with leader-based replication? Great question, and one that will take us down the path of coming up with strategies for handling node outages, which is a deep subject right there—straddling the broad areas of software design, operations, and maintenance—so I’ll be doing some vigorous hand waving here 🙌 😱

Remember, though, that the construct of a failover features prominently here. It will also behoove us to remain mindful of the many, many things which can go awry during a failover, including, but not limited to: unreliable networks, failing nodes, achieving a decidedly delicate balance between consistency, durability, availability, and latency of replicas.

Come to think of it, addressing these problems (and others, to be sure) constitute the very problems that take us romping into the territory of distributed systems central, so to say. One phrase to keep in mind through this all: trade-offs. Keep the centrality of trade-offs in mind, and you’ll be good. Yes, it can be a circus and a half, but we need to hang in there and remain undeterred as we go about solving those wicked problems 🎪

Returning now to the core topic of this essay—eventual consistency—we have at this point built up enough of a critical mass of understanding the relevant ideas, concepts, and background that are essential to grokking eventual consistency itself. The crux of the matter is that if an application reads from an asynchronous follower, and meanwhile the follower has fallen behind (the leader), it (i.e. the follower) will possibly see outdated data. Yep, you guessed it: This all will inevitably lead to perplexing inconsistencies in the database (The same query, when simultaneously run on the leader and a follower might well return divergent results). Yes, you guessed it again, this happens simply because not all writes have propagated on through to the follower. Given the transient nature of this inconsistency, all followers will eventually catch up with their leader, becoming matched up (consistency-wise) with it. And just to drive home the point—at the risk of sounding like a broken record—this phenomenon is the genesis of the phrase eventual consistency 🛁

An irony of terms as we see some boatsmen—quite blithely oblivious to the equilibrium-seeking side of system dynamics in eventual consistency—grapple with eventuality 🚢

If you are smitten by metrics—if you can’t measure, you can’t improve—the nebulously undefined eventual in eventual consistency could be the harbinger of a delay at the sub-second scale all the way up to (gasp!) minutes (unimaginably unacceptable, intolerably so, to consumers nowadays). Here is a word to the wise: Avoid giving bad user experience to your (online) customer or suffer the consequences at your own peril—in the prescient words of Walmart founder Sam Walton 💰

There is only one boss. The customer. And he can fire everybody in the company from the chairman on down, simply by spending his money somewhere else 💳 💲 💴

Back now to our comfortable quarters which make up, as it were, the realm of technological empires that we tech types inhabit: Such prolonged delays, especially when they become increasingly protracted, become genuinely severe problems that simply cannot be ignored. But rest assured; some very smart people have thought hard about this problem, and come up with strategies to resolve this especially thorny problem. At this juncture, I’m compelled to toss out there the notion of reading one’s own writes. Keep it in mind for use cases where writes to the database are seldom happening, whereas the reads are happening far more frequently.

Once again, to drive home the point—at the risk of sounding like a broken record—I should divulge that I was endearingly impressed by the onion metaphor as well as the undertones of constantly revisiting the basics for ever-increasingly faithful conceptualizations. I trace this to my undergrad years when I spent countless hours poring over the mesmerizing pages of the classic MIT textbook on Circuits, Signals, and Systems by William M. Siebert 🚂

So if you think about it, application code itself can provide stronger guarantees than the allied database itself. The sad reality, though, is that doing this sort of thing in application code is fraught with peril; ever tried doing specific types of reads on the leader, and so on?

Yeah, right. The complexity of doing this sort of thing (in application code) is typically prohibitive… Wouldn’t it be far easier to have us application developers remain carefree—in this regard anyway, and let the allied database behave intelligently (OK, no AI, just plain old DB management) and handle replication issues as and when they arise. And that’s what all this was building up to: The reason why transactions are even around, being as they are a reliable mechanism for databases to deliver guarantees, thereby simplifying application code by obviating the need for handling the aforementioned complexity (groan). So there you have it, a transaction is the abstraction layer which empowers application code to go merrily about its way, pretending that certain types of concurrency problems have simply vanished into non-existence; should you run afoul of errors, all you need to is is abort the transaction, and simply have your application code retry until success is achieved 🎉

Look around and peer into your persistence layer, and unravel yourself into the joy of adjusting and tweaking away with the wrench of tuneable consistency 🔱

You know that people are really having a ton of fun when they start getting carried away—in a very good way I hasten to add—with pushing the envelope on the limits of existing technologies. Emboldened, they and tweak and tune stuff, metamorphosing it (alchemy dire straits) into something superlative… Enter tuneable consistency. Here I cannot do much better than have us hear from the horse’s mouth, so to say. In their first class book entitled Cassandra: The Definitive Guide (O’Reilly), authors Jeff Carpenter and Eben Hewitt enlighten us that 🎧

Consistency essentially means that a read always returns the most recently written value. Consider two customers are attempting to put the same item into their shopping carts on an e-commerce site. If I place the last item in stock into my cart an instant after you do, you should get the item added to your cart, and I should be informed that the item is no longer available for purchase. This is guaranteed to happen when the state of a write is consistent among all nodes that have that data. But as we’ll see later, scaling data stores means making certain trade-offs between data consistency, node availability, and partition tolerance. Cassandra is frequently called eventually consistent, which is a bit misleading. Out of the box, Cassandra trades some consistency in order to achieve total availability. But Cassandra is more accurately termed tuneably consistent, which means it allows you to easily decide the level of consistency you require, in balance with the level of availability. Let’s take a moment to unpack this, as the term “eventual consistency” has caused some uproar in the industry. Some practitioners hesitate to use a system that is described as eventually consistent (italics mine).

As you’ll doubtless find, literature on or related to eventual consistency is justifiably replete with references to Vogels ground-breaking work that is cited above. Another fine resource in this area that you will benefit from is the top notch book entitled Riak Handbook by Mathias Meyer. Speaking to the crucial topic of Consistency in Quorum-Based Systems, Meyer is spot on when observes how 🏄

In a truly distributed environment, and when writes involve quorums, you can tune how many nodes need to have successfully accepted a write so that the operation as a whole is a success. If you choose a W value less than the number of replicas, the remaining replicas that were not involved in the write will receive the data eventually. Again, we’re talking milliseconds in common cases, but it can be a noticeable lag, and your application should be ready to deal with cases like that. In every scenario, the common thing is that a write will reach all the relevant nodes eventually, so that all nodes have the same data. It will take some time, but eventually the data in the whole cluster will be consistent for this particular piece of data, even after network partitions. Hence the name eventual consistency. Once again it’s not really a specific feature of NoSQL databases, every time you have a setup involving masters and slaves, eventual consistency will strike with furious anger. The term was originally coined by Werner Vogels, Amazon’s CTO, in 2007. The paper he wrote about it is well worth reading. Being the biggest e-commerce site out there, Amazon had a big influence on a whole slew of databases 🏧

Elsewhere in the same book, Meyer takes on Brewer’s CAP theorem with the reminder for readers that

CAP is an abbreviation for consistency, availability, and partition tolerance. The basic idea is that in a distributed system, you can have only two of these properties, but not all three at once.

Wait a second, how in the universe did we manage to gloss over Brewer’s CAP Theorem until now? Well, there’s no time to waste ⛄ So a remarkably readable discussion of exactly what the CAP Theorem is all about can be found in the pages of Cassandra: The Definitive Guide (O’Reilly), authors Jeff Carpenter and Eben Hewitt. In their words

The theorem states that within a large-scale distributed data system, there are three requirements that have a relationship of sliding dependency:

Consistency: All database clients will read the same value for the same query, even given concurrent updates ☀

Availability: All database clients will always be able to read and write data ☁

Partition tolerance: The database can be split into multiple machines; it can continue functioning in the face of network segmentation breaks ☂

Brewer’s theorem is that in any given system, you can strongly support only two of the three. This is analogous to the saying you may have heard in software development: You can have it good, you can have it fast, you can have it cheap: pick two 🐘 🐎 🐑

And there is a whole lot more that the authors share in crisp descriptions such as the one above. If your interest was piqued, I can only recommend grabbing a copy of the fine book; you won’t regret it, nearly (eventually?) guaranteed (pun partially intended) 😆

One other book which will serve you swimmingly well is Lars George’s HBase: The Definitive Guide (O’Reilly). His sparkling clear narrative is nearly up there with Tom White’s. The latter, of course, being the author of the classic Hadoop: The Definitive Guide (O’Reilly), now in its fourth edition! To digress a tad, White writes in the Preface to his Hadoop book that 🐘

Martin Gardner, the mathematics and science writer, once said in an interview:

Beyond calculus, I am lost. That was the secret of my column’s success. It took me so long to understand what I was writing about that I knew how to write in a way most readers would understand.

In many ways, this is how I feel about Hadoop. Its inner workings are complex, resting as they do on a mixture of distributed systems theory, practical engineering, and common sense. And to the uninitiated, Hadoop can appear alien.

But it doesn’t need to be like this. Stripped to its core, the tools that Hadoop provides for working with big data are simple. If there’s a common theme, it is about raising the level of abstraction—to create building blocks for programmers who have lots of data to store and analyze, and who don’t have the time, the skill, or the inclination to become distributed systems experts to build the infrastructure to handle it 🐝

⚠ Geek humor ahead ⚠

At the moment, we’re busy unwinding the call stack. Remember how an exception continues propagating up the call stack, from one method to its caller. So if you imagine the exception bubbling up until it’s caught, we return to the other definitive guide, this one being Lars George’s HBase: The Definitive Guide. After politely reminding us that 🏢

It seems fitting to talk about consistency a bit more since it is mentioned often throughout this book. On the outset, consistency is about guaranteeing that a database always appears truthful to its clients.

George then dives into an elegant recap of the nexus between Brewer’s CAP Theorem and being eventually consistent. He starts from first principles by noting how 🎱

Every operation on the database must carry its state from one consistent state to the next. How this is achieved or implemented is not specified explicitly so that a system has multiple choices. In the end, it has to get to the next consistent state, or return to the previous consistent state, to fulfill its obligation.

Consistency can be classified in, for example, decreasing order of its properties, or guarantees offered to clients. Here is an informal list:

Strict: The changes to the data are atomic and appear to take effect instantaneously. This is the highest form of consistency 🚧

Sequential: Every client sees all changes in the same order they were applied 🚃

Causal: All changes that are causally related are observed in the same order by all clients 🚠

Eventual: When no updates occur for a period of time, eventually all updates will propagate through the system and all replicas will be consistent 🚏

Weak: No guarantee is made that all updates will propagate and changes may appear out of order to various clients 🚁

The class of system adhering to eventual consistency can be even further divided into subtler sets, where those sets can also coexist. Werner Vogels, CTO of Amazon, lists them in his post titled “Eventually Consistent”. The article also picks up on the topic of the CAP Theorem, which states that a distributed system can only achieve two out of the following three properties: consistency, availability, and partition tolerance.

Look around and descend into your messaging layer, and honestly assess whether the imprints of the classic work on Enterprise Integration Patterns—the measure and yardstick of messaging wisdom—are writ large ⛷

Now here is an oldie, but goldie 🎁 It shows, in my opinion, our industry’s rich heritage in that it reverberates with echoes of (recent) history, which radiantly glows with the aura of just how far we’ve come since its publication (circa the turn of a new millennium, the year 2001 to be precise). This is from the public domain, should you wish to head over and browse online the periodical entitled IEEE Internet Computing—Volume: 5 Issue: 4 Lessons from giant-scale services. So without further adieu, here is the promised oldie, but goldie 🎁 In particular, it’s offered here by way of the Abstract which accompanies the aforesaid periodical, and where we are boldly told how

Web portals and ISPs such as AOL, Microsoft Network, and Yahoo have grown more than tenfold in the past five years (1996-2001). Despite their scale, growth rates, and rapid evolution of content and features, these sites and other “giant-scale” services like instant messaging and Napster must be always available. Many other major Web sites such as eBay, CNN, and Wal-Mart, have similar availability requirements. The article looks at the basic model for such services, focusing on the key real-world challenges they face (high availability, evolution, and growth), and developing some principles for attacking these problems. Few of the points made in the article are addressed in the literature, and most of the conclusions take the form of principles and approaches rather than absolute quantitative evaluations. This is due partly to the author’s focus on high-level design, partly to the newness of the area, and partly to the proprietary nature of some of the information (which represents 15-20 very large sites). Nonetheless, the lessons are easy to understand and apply, and they simplify the design of large systems.

~ Published in: IEEE Internet Computing ( Volume: 5, Issue: 4, Jul/Aug 2001 )

Look around the terrain of your distributed system, and spot where all you can get more bang for the buck by assembling—in the fashion of LEGO blocks—the coordination concerns, focusing mainly on thee application logic rather than coordination 🚉

On that subject, the definitive volume is a slender but rich one entitled ZooKeeper: Distributed Process Coordination (O’Reilly) by Flavio Junqueira and Benjamin Reed. In a nutshell, and in the words of these authors

ZooKeeper was designed to be a robust service that enables application developers to focus mainly on their application logic rather than coordination. It exposes a simple API, inspired by the filesystem API, that allows developers to implement common coordination tasks, such as electing a master server, managing group membership, and managing metadata. ZooKeeper is an application library with two principal implementations of the APIs—Java and C—and a service component implemented in Java that runs on an ensemble of dedicated servers. Having an ensemble of servers enables ZooKeeper to tolerate faults and scale throughput.

In the end, I invite you to reflect on my choice of this essay’s title, in particular homing in on the italicized phrase: Eventual Consistency can be a Good Thing. It is a power tool, but fast becoming a necessary one nowadays; use it wisely, and it will boost your distributed systems tremendously. Remember the advice that Peter Parker—the young hero shrouded in a costume as Spider-Man—received from Uncle Ben: Remember, with great power comes great responsibility 🐞

I leave you with these words of Jelaluddin Rumi—thirteenth-century Persian philosopher, mystic, scholar, and poet—whose poetry has been exquisitely rendered into American free verse by Coleman Barks. This excerpt, from Rumi’s poem entitled The Sunrise Ruby, which can be found in the pages of Barks’ The Essential Rumi (Harper Collins) may well make us pause and reflect on the undertones of eventuality: This essay’s theme writ large, with wings hopefully given to it by this profound knitting-together of some haunting metaphors 🌱🌿🍃🍂

Work. Keep digging your well.

Don’t think about getting off from work.

Water is there somewhere.

Submit to a daily practice.

Your loyalty to that

is a ring on the door.

Keep knocking, and the joy inside

will eventually open a window

and look out to see who’s there (italics are mine)

Share this: