A foolish consistency is the hobgoblin of little minds, adored by little statesmen and philosophers and divines 🔮 With consistency a great soul has simply nothing to do.
~ Ralph W. Emerson (In Self-Reliance, pointing out the misguided kind of consistency)
A distributed system is one in which the failure of a computer you didn’t even know existed can render your own computer unusable 📺
~ Leslie Lamport (Ushering us into the wonders of the distributed universe)
Don’t interrupt me while I’m interrupting 📣
~ Winston S. Churchill (Elaborating in his inimitable style on one of the perennially consistent, finer points of concurrency gotchas)
You can have it good, you can have it fast, you can have it cheap; pick two 🐘 🐎 🐑
~ Software development folklore (Brewer’s CAP theorem, put another way)
When you eventually see
through the veils to how things really are,
you will keep saying again and again,
“This is certainly not like we thought it was!” 👻
~ Jelaluddin Rumi (translated by Coleman Barks, The Essential Rumi—HarperCollins)
The artist is a receptacle for emotions that come from all over the place: from the sky, from the earth, from a scrap of paper, from a passing shape, from a spider’s web 🐚
~ Pablo Picasso (painter, sculptor, printmaker, poet, and playwright)
Let’s not beat about the bush: As members of a generation that grew up on ACID—hey not that kind of acid—we get butterflies in our stomach when dealing with a system that doesn’t provide ironclad transactional guarantees. When confronting a system that reeks of the absence of transactions, we feel—that is precisely how I used to feel not so long ago—like we’re no longer on terra firma.
On top of that, I’ll be eating my own words in this essay: So I had made the bold statement elsewhere, and fairly recently too, that
…I generally avoid putting together (technical) tutorials, incredibly helpful as they are, simply because the internet is already awash with them; my inclination is to dig deeper, and share my findings with you.
But first taking a step back, and synchronizing our terminology so as to be on the same page, let’s hear what is, IMHO, the finest definition that captures the essence of what exactly a transaction is. So let’s hear what the legendary Jim Gray (American computer scientist who received the Turing Award in 1998 for seminal contributions to database and transaction processing research and technical leadership in system implementation) had to say. As he aptly put it, a transaction is “a transformation of state” which has ACID properties—acidity, ala the pH scale? chemistry anyone? or perhaps Maalox antacid lol? ♨ Frankly, I can’t recall coming across a definition that better captures the gestalt of transaction processing.
And speaking of ACID, let’s pass them around, shall we? 🏀
- Atomic semantics literally imply an all-or-nothing, one-shot operation. When a statement is executed, every update within the transaction must succeed in order to be called successful. There is no partial failure 🐙
- Consistent suggests that data transitions from one proper state to another proper state, i.e. no chance that readers could view different values that do not make sense together 🐌
- Isolated semantics imply that transactions which execute concurrently won’t become enmeshed with one other. Each transaction executes in its own quarantined space 🐚
- Durable guarantees suggest that when a transaction has succeeded, the state changes will persist, and won’t be lost 🐢
If we were in the 1960s this would probably be the most popular section in the book. However, in today’s technolog world ACID means something different than it did in the 1960s…
You can’t always get what you want
You can’t always get what you want
You can’t always get what you want
But if you try sometimes well you might find
You get what you need
I went down to the demonstration
To get my fair share of abuse
Singing, “We’re gonna vent our frustration
If we don’t we’re gonna blow a 50-amp fuse
~ The Rolling Stones (lyrics from You Can’t Always Get What You Want)
The swimmingly elegant term eventual consistency—I cringe to think it being called otherwise by a less cooler name—was originally coined by Werner Vogels, CTO of Amazon (circa 2007). True enough, and you remind me as much by telling me, Hey Akram, a rose by any other name would smell as sweet. 🌹 Yes it does, as I’ll politely reply But then again, these matters have a way of transcending beyond the realm of reason. To wit, let’s digress, and I promise that this digression—may I gently remind the reader that this eponymous blog boldly proclaims the moniker Programming Digressions somewhere in its title—will be ever so brief 😇 I mean, we got to have ourselves some fun along the ride, don’t we?
So let’s hear what Steven Levy tells us with great relish in his sparkling book entitled In The Plex: How Google Thinks, Works, and Shapes Our Lives on this selfsame theme. Levy happens to also be the author of one of my all-time favorites, the classic Hackers: Heroes of the Computer Revolution, but pursuing Hackers would take us too far afield. Let’s save that for another date. Back to the gleaming pages of In The Plex, where Levy is regaling us with how
They built a system that implemented “checkpointing,” a way for the index to hold its place if a calamity befell a server or hard disk. But the new system went further—it used a different way to handle a cluster of disks, more akin to the parallel-processing style of computing (where a computational task would be split among multiple computers or processers) than the “sharding” technique Google had used, which was to split up the web and assign regions of it to individual computers. (Those familiar with computer terms may know this technique as “partitioning,” but, as Dean says, “everyone at Google calls it sharding because it sounds cooler.” Among Google’s infrastructure wizards, it’s key jargon.) 🚰
Enter the juggernaut of Big Data 🐘
Look around your data pipelines, and and hear the murmurs of torrential petabytes funneling through those very pipelines of our attempts at complexity management 🔩
Now, to understand the raison d’etre for eventual consistency, let’s take a step back and have ourselves a slightly deeper look at the state of affairs—a consideration of the heavy burden that transactions place in the Big Data terrain especially—which directly led to a serious revisitation of and widespread adoption of the eventual consistency wherewithal. As Martin Fowler aptly noted in his classic book entitled Patterns of Enterprise Application Architecture (Pearson Education)
Most enterprise applications run into transactions in terms of databases. But there are plenty of other things that can be controlled using transactions, such as message queues, printers, and ATMs. As a result, technical discussions of transactions use the term “transactional resource” to mean anything that’s transactional—that is, that uses transactions to control concurrency. “Transactional resource” is a bit of a mouthful, so we just use “database,” since that’s the most common case. But when we say “database,” the same applies for any other transactional resource.
To handle the greatest throughput, modern transaction systems are designed to keep transactions as short as possible. As a result the general advice is to never make a transaction span multiple requests. A transaction that spans multiple requests is generally known as a long transaction. For this reason a common approach is to start a transaction at the beginning of a request.
For this reason a common approach is to start a transaction at the beginning of a request and complete it at the end. This request transaction is a nice simple model, and a number of environments make it easy to do declaratively, by just tagging methods as transactional.
And speaking of POEEA, Fowler goes on to provide a marvelously readable survey of (some related, hard core topics such as) reducing transaction isolation for liveness, patterns for offline concurrency control, and even—this one admittedly not for the faint of heart—application server concurrency. In fact, for application server concurrency in particular, I cannot think of a better, single resource than the fine book entitled Pattern-Oriented Software Architecture, Volume 2: Patterns for concurrent and distributed systems (John Wiley, 2000) by Schmidt, Stal, Rohnert, and Buschmann. Referring to that book, Fowler aptly points out that it is 😅
…more for people who design application servers than for those who use application servers, but it’s good to have some knowledge of these ideas when you use the results (italics mine).
So there 😎
With that, let’s bring replication into the picture. Stripped to its essence, replication means keeping a copy of the same data on multiple machines which are connected to one another in a network with a given topology. 🐴 And lest anyone accuse me of putting cart before the horse, I hasten to add that the whole reason we even have replication is that we wish to
- scale out the servers which can serve queries (think increased throughput on the read side of things) 📶
- keep data in geographical proximity to users (think reduced latency) 🏠
- allow systems to continue functioning in the face of partial failure (think increased availability) 🔥
But here’s the kicker: Let’s say that the data that you have on your hands—as you will in all likelihood—happens to be subject to change, and remains in flux. Oh, the searing pain of dealing with ever-present change (changing requirements, anyone?) has singed its way into the collective psyche of us tech types, hasn’t it? 😹
Sleight of hand and twist of fate
On a bed of nails she makes me wait
And I wait without you
With or without you
With or without you
Through the storm we reach the shore
You give it all but I want more
And I’m waiting for you
~ U2 (lyrics from With Or Without You)
Look around your software systems, and divine which components can benefit from a move away from—or perhaps even toward—a transactional approach 🏁
I concede that the replication of databases is an antiquated topic; it is hoary by the scale of internet years because the foundations haven’t changed since their introduction (circa 1970), plus networking principles have stayed unchanged, mercifully enough, if I may add 😂 Most developers, though, go about their work with the deeply engrained idea that any given database is made up of a single node; clearly, though, the trend has squarely and unerringly converged on databases that are distributed; so thoroughly mainstream have distributed databases become that it’s almost unthinkable nowadays to even contemplate that it was ever otherwise 💭
My best guess is that therein lie the genesis of the phenomenon that I had sketched at the very outset of this essay. Look, I’ll be the first one to confess that eventual consistency—an eminently sensible strategy that I have come to view it as—takes a little getting used to before it begins feeling normal, until it eventually becomes second nature (pun sort of intended) 👻
In fact, I would surmise that the next generation of technologists, who will scarcely have seen anything but NoSQL—with nary a palpable trace of transaction processing mechanism in there—will take to the notion of eventual consistency like ducks take to water 🐢 The rest of us, raised as we were on a steady diet of transaction processing, have had to go through, and perhaps continue to go through a period of sorting out the all-too-natural confusion that has surely lurked in our minds, even if diffusely so, until the proverbial light bulb went off and we saw the light (at the end of the tunnel). I just had to wedge in the tunnel metaphor in there, didn’t I? My story is that poetic license beckoned, and I’m sticking by my story 🔦
Some related topics that I’ll only mention in passing—topics that are well worth your while to dig into deeper—include monotonic read consistency and read-your-writes consistency. Meanwhile, let’s round out our exploration of eventual consistency with a foray through the vital concerns of leaders and followers. Essentially, we start with the premise that a replica is simply a node which stores a complete copy of a given database, and we want to formulate any and all questions that would be worth addressing to move forward.
First question, anyone? Okay, I’ll throw this one out there: How to make sure that the selfsame data consistently ends up on all replicas? If we think about it, each database write had better be processed by all replicas. If that does not happen, the replicas would diverge from a common dataset. Again, tons of gory details that we’ll gloss over here, but suffice it to say that the predominantly accepted solution for precisely this problem is known as leader-based replication. But I hear you asking, How does one achieve high availability with leader-based replication? Great question, and one that will take us down the path of coming up with strategies for handling node outages, which is a deep subject right there—straddling the broad areas of software design, operations, and maintenance—so I’ll be doing some vigorous hand waving here 🙌 😱
Remember, though, that the construct of a failover features prominently here. It will also behoove us to remain mindful of the many, many things which can go awry during a failover, including, but not limited to: unreliable networks, failing nodes, achieving a decidedly delicate balance between consistency, durability, availability, and latency of replicas.
Come to think of it, addressing these problems (and others, to be sure) constitute the very problems that take us romping into the territory of distributed systems central, so to say. One phrase to keep in mind through this all: trade-offs. Keep the centrality of trade-offs in mind, and you’ll be good. Yes, it can be a circus and a half, but we need to hang in there and remain undeterred as we go about solving those wicked problems 🎪
Returning now to the core topic of this essay—eventual consistency—we have at this point built up enough of a critical mass of understanding the relevant ideas, concepts, and background that are essential to grokking eventual consistency itself. The crux of the matter is that if an application reads from an asynchronous follower, and meanwhile the follower has fallen behind (the leader), it (i.e. the follower) will possibly see outdated data. Yep, you guessed it: This all will inevitably lead to perplexing inconsistencies in the database (The same query, when simultaneously run on the leader and a follower might well return divergent results). Yes, you guessed it again, this happens simply because not all writes have propagated on through to the follower. Given the transient nature of this inconsistency, all followers will eventually catch up with their leader, becoming matched up (consistency-wise) with it. And just to drive home the point—at the risk of sounding like a broken record—this phenomenon is the genesis of the phrase eventual consistency 🛁
There is only one boss. The customer. And he can fire everybody in the company from the chairman on down, simply by spending his money somewhere else 💳 💲 💴
Once again, to drive home the point—at the risk of sounding like a broken record—I should divulge that I was endearingly impressed by the onion metaphor as well as the undertones of constantly revisiting the basics for ever-increasingly faithful conceptualizations. I trace this to my undergrad years when I spent countless hours poring over the mesmerizing pages of the classic MIT textbook on Circuits, Signals, and Systems by William M. Siebert 🚂
Yeah, right. The complexity of doing this sort of thing (in application code) is typically prohibitive… Wouldn’t it be far easier to have us application developers remain carefree—in this regard anyway, and let the allied database behave intelligently (OK, no AI, just plain old DB management) and handle replication issues as and when they arise. And that’s what all this was building up to: The reason why transactions are even around, being as they are a reliable mechanism for databases to deliver guarantees, thereby simplifying application code by obviating the need for handling the aforementioned complexity (groan). So there you have it, a transaction is the abstraction layer which empowers application code to go merrily about its way, pretending that certain types of concurrency problems have simply vanished into non-existence; should you run afoul of errors, all you need to is is abort the transaction, and simply have your application code retry until success is achieved 🎉
Look around and peer into your persistence layer, and unravel yourself into the joy of adjusting and tweaking away with the wrench of tuneable consistency 🔱
You know that people are really having a ton of fun when they start getting carried away—in a very good way I hasten to add—with pushing the envelope on the limits of existing technologies. Emboldened, they and tweak and tune stuff, metamorphosing it (alchemy dire straits) into something superlative… Enter tuneable consistency. Here I cannot do much better than have us hear from the horse’s mouth, so to say. In their first class book entitled Cassandra: The Definitive Guide (O’Reilly), authors Jeff Carpenter and Eben Hewitt enlighten us that 🎧
Consistency essentially means that a read always returns the most recently written value. Consider two customers are attempting to put the same item into their shopping carts on an e-commerce site. If I place the last item in stock into my cart an instant after you do, you should get the item added to your cart, and I should be informed that the item is no longer available for purchase. This is guaranteed to happen when the state of a write is consistent among all nodes that have that data. But as we’ll see later, scaling data stores means making certain trade-offs between data consistency, node availability, and partition tolerance. Cassandra is frequently called eventually consistent, which is a bit misleading. Out of the box, Cassandra trades some consistency in order to achieve total availability. But Cassandra is more accurately termed tuneably consistent, which means it allows you to easily decide the level of consistency you require, in balance with the level of availability. Let’s take a moment to unpack this, as the term “eventual consistency” has caused some uproar in the industry. Some practitioners hesitate to use a system that is described as eventually consistent (italics mine).
In a truly distributed environment, and when writes involve quorums, you can tune how many nodes need to have successfully accepted a write so that the operation as a whole is a success. If you choose a W value less than the number of replicas, the remaining replicas that were not involved in the write will receive the data eventually. Again, we’re talking milliseconds in common cases, but it can be a noticeable lag, and your application should be ready to deal with cases like that. In every scenario, the common thing is that a write will reach all the relevant nodes eventually, so that all nodes have the same data. It will take some time, but eventually the data in the whole cluster will be consistent for this particular piece of data, even after network partitions. Hence the name eventual consistency. Once again it’s not really a specific feature of NoSQL databases, every time you have a setup involving masters and slaves, eventual consistency will strike with furious anger. The term was originally coined by Werner Vogels, Amazon’s CTO, in 2007. The paper he wrote about it is well worth reading. Being the biggest e-commerce site out there, Amazon had a big influence on a whole slew of databases 🏧
CAP is an abbreviation for consistency, availability, and partition tolerance. The basic idea is that in a distributed system, you can have only two of these properties, but not all three at once.
Wait a second, how in the universe did we manage to gloss over Brewer’s CAP Theorem until now? Well, there’s no time to waste ⛄ So a remarkably readable discussion of exactly what the CAP Theorem is all about can be found in the pages of Cassandra: The Definitive Guide (O’Reilly), authors Jeff Carpenter and Eben Hewitt. In their words
The theorem states that within a large-scale distributed data system, there are three requirements that have a relationship of sliding dependency:
- Consistency: All database clients will read the same value for the same query, even given concurrent updates ☀
- Availability: All database clients will always be able to read and write data ☁
- Partition tolerance: The database can be split into multiple machines; it can continue functioning in the face of network segmentation breaks ☂
And there is a whole lot more that the authors share in crisp descriptions such as the one above. If your interest was piqued, I can only recommend grabbing a copy of the fine book; you won’t regret it, nearly (eventually?) guaranteed (pun partially intended) 😆
One other book which will serve you swimmingly well is Lars George’s HBase: The Definitive Guide (O’Reilly). His sparkling clear narrative is nearly up there with Tom White’s. The latter, of course, being the author of the classic Hadoop: The Definitive Guide (O’Reilly), now in its fourth edition! To digress a tad, White writes in the Preface to his Hadoop book that 🐘
Martin Gardner, the mathematics and science writer, once said in an interview:
Beyond calculus, I am lost. That was the secret of my column’s success. It took me so long to understand what I was writing about that I knew how to write in a way most readers would understand.
In many ways, this is how I feel about Hadoop. Its inner workings are complex, resting as they do on a mixture of distributed systems theory, practical engineering, and common sense. And to the uninitiated, Hadoop can appear alien.
But it doesn’t need to be like this. Stripped to its core, the tools that Hadoop provides for working with big data are simple. If there’s a common theme, it is about raising the level of abstraction—to create building blocks for programmers who have lots of data to store and analyze, and who don’t have the time, the skill, or the inclination to become distributed systems experts to build the infrastructure to handle it 🐝
It seems fitting to talk about consistency a bit more since it is mentioned often throughout this book. On the outset, consistency is about guaranteeing that a database always appears truthful to its clients.
Every operation on the database must carry its state from one consistent state to the next. How this is achieved or implemented is not specified explicitly so that a system has multiple choices. In the end, it has to get to the next consistent state, or return to the previous consistent state, to fulfill its obligation.
Consistency can be classified in, for example, decreasing order of its properties, or guarantees offered to clients. Here is an informal list:
Strict: The changes to the data are atomic and appear to take effect instantaneously. This is the highest form of consistency 🚧
Sequential: Every client sees all changes in the same order they were applied 🚃
Causal: All changes that are causally related are observed in the same order by all clients 🚠
Eventual: When no updates occur for a period of time, eventually all updates will propagate through the system and all replicas will be consistent 🚏
Weak: No guarantee is made that all updates will propagate and changes may appear out of order to various clients 🚁
The class of system adhering to eventual consistency can be even further divided into subtler sets, where those sets can also coexist. Werner Vogels, CTO of Amazon, lists them in his post titled “Eventually Consistent”. The article also picks up on the topic of the CAP Theorem, which states that a distributed system can only achieve two out of the following three properties: consistency, availability, and partition tolerance.
Look around and descend into your messaging layer, and honestly assess whether the imprints of the classic work on Enterprise Integration Patterns—the measure and yardstick of messaging wisdom—are writ large ⛷
Now here is an oldie, but goldie 🎁 It shows, in my opinion, our industry’s rich heritage in that it reverberates with echoes of (recent) history, which radiantly glows with the aura of just how far we’ve come since its publication (circa the turn of a new millennium, the year 2001 to be precise). This is from the public domain, should you wish to head over and browse online the periodical entitled IEEE Internet Computing—Volume: 5 Issue: 4 Lessons from giant-scale services. So without further adieu, here is the promised oldie, but goldie 🎁 In particular, it’s offered here by way of the Abstract which accompanies the aforesaid periodical, and where we are boldly told how
Web portals and ISPs such as AOL, Microsoft Network, and Yahoo have grown more than tenfold in the past five years (1996-2001). Despite their scale, growth rates, and rapid evolution of content and features, these sites and other “giant-scale” services like instant messaging and Napster must be always available. Many other major Web sites such as eBay, CNN, and Wal-Mart, have similar availability requirements. The article looks at the basic model for such services, focusing on the key real-world challenges they face (high availability, evolution, and growth), and developing some principles for attacking these problems. Few of the points made in the article are addressed in the literature, and most of the conclusions take the form of principles and approaches rather than absolute quantitative evaluations. This is due partly to the author’s focus on high-level design, partly to the newness of the area, and partly to the proprietary nature of some of the information (which represents 15-20 very large sites). Nonetheless, the lessons are easy to understand and apply, and they simplify the design of large systems.
~ Published in: IEEE Internet Computing ( Volume: 5, Issue: 4, Jul/Aug 2001 )
ZooKeeper was designed to be a robust service that enables application developers to focus mainly on their application logic rather than coordination. It exposes a simple API, inspired by the filesystem API, that allows developers to implement common coordination tasks, such as electing a master server, managing group membership, and managing metadata. ZooKeeper is an application library with two principal implementations of the APIs—Java and C—and a service component implemented in Java that runs on an ensemble of dedicated servers. Having an ensemble of servers enables ZooKeeper to tolerate faults and scale throughput.
Work. Keep digging your well.
Don’t think about getting off from work.
Water is there somewhere.
Submit to a daily practice.
Your loyalty to that
is a ring on the door.
Keep knocking, and the joy inside
will eventually open a window
and look out to see who’s there (italics are mine)