A foolish consistency is the hobgoblin of little minds, adored by little statesmen and philosophers and divines š® With consistency a great soul has simply nothing to do.
~ Ralph W. Emerson (In Self-Reliance, pointing out the misguided kind of consistency) Ā
A distributed system is one in which the failure of a computer you didnāt even know existed can render your own computer unusable šŗ
~ Leslie Lamport Ā (Ushering us into the wonders of the distributed universe)Ā
Donāt interrupt me while Iām interrupting š£
~ Winston S. Churchill (ElaboratingĀ in his inimitable style onĀ one of the perennially consistent, finer points of concurrency gotchas)
You can have it good, you can have it fast, you can have it cheap; pick twoĀ š š š
~ Software development folklore (Brewerās CAP theorem, put another way)Ā
When you eventually see
through the veils to how things really are,
you will keep saying again and again,
āThis is certainly not like we thought it was!ā š»
~ Jelaluddin Rumi (translated by Coleman Barks, The Essential RumiāHarperCollins)
The artist is a receptacle for emotions that come from all over the place: from the sky, from the earth, from a scrap of paper, from a passing shape, from a spiderās web š
~ Pablo Picasso (painter, sculptor, printmaker, poet, and playwright)
Letās not beat about the bush: As members of a generation that grew up on ACIDāhey not thatĀ kind of acidāwe get butterflies in our stomach when dealing with a system that doesnāt provide ironclad transactional guarantees. When confronting a system that reeks of the absence of transactions, we feelāthat is precisely how I used to feel not so long agoālike weāre no longer on terra firma.
On top of that, Iāll be eating my own words in this essay:Ā So I had made the bold statement elsewhere, and fairly recently too, thatĀ
ā¦I generally avoid putting together (technical) tutorials, incredibly helpful as they are, simply because the internet is already awash with them; my inclination is to dig deeper, and share my findings with you.
But first taking a step back, and synchronizing our terminology so as to be on the same page, letās hear what is, IMHO, the finest definition that captures the essence of what exactly a transactionĀ is. So letās hear what the legendary Jim Gray (American computer scientist who received the Turing Award in 1998 for seminal contributions to database and transaction processing research and technical leadership in system implementation) had to say. As he aptly put it, a transaction is āa transformation of stateā which has ACID propertiesāacidity, ala the pH scale? chemistry anyone? or perhaps Maalox antacid lol?Ā āØĀ Frankly, I canāt recall coming across a definition that better captures the gestalt of transaction processing.Ā
And speaking of ACID, letās pass them around, shall we? š
- Atomic semantics literally imply an all-or-nothing, one-shot operation. When a statement is executed, every update within the transaction must succeed in order to be called successful. There is no partial failure š
- Consistent suggests that data transitions from one proper state to another proper state, i.e. no chance that readers could view different values that do not make sense together š
- Isolated semantics imply that transactions which execute concurrently wonāt become enmeshed with one other. Each transaction executes in its own quarantined space š
- Durable guarantees suggest that when a transaction has succeeded, the state changes will persist, and wonāt be lost š¢
If we were in the 1960s this would probably be the most popular section in the book. However, in todayās technolog world ACID means something different than it did in the 1960sā¦
You canāt always get what you wantĀ
You canāt always get what you wantĀ
You canāt always get what you wantĀ
But if you try sometimes well you might findĀ
You get what you needĀ
I went down to the demonstrationĀ Ā
To get my fair share of abuseĀ
Singing, āWeāre gonna vent our frustrationĀ
If we donāt weāre gonna blow a 50-amp fuseĀ
~ The Rolling Stones (lyrics from You Canāt Always Get What You Want)
The swimmingly elegant term eventual consistencyāI cringe to think it being called otherwise by a less cooler nameāwas originally coined by Werner Vogels, CTO of Amazon (circa 2007). True enough, and you remind me as much by telling me, Hey Akram, aĀ rose by any other name would smell as sweet. š¹ Yes it does, as Iāll politely reply Ā But then again, these matters have a way of transcending beyond the realm of reason.Ā To wit, letās digress, and I promise that this digressionāmay I gently remind the reader that thisĀ eponymousĀ blog boldlyĀ proclaims theĀ monikerĀ Programming DigressionsĀ somewhere in its titleāwill be ever so brief š I mean, we got to have ourselves some fun along the ride, donāt we?
So letāsĀ hear whatĀ Steven Levy tells us with great relish in his sparkling book entitledĀ In The Plex: How Google Thinks, Works, and Shapes Our Lives on this selfsame theme.Ā Levy happens to also be the author of one of my all-time favorites, the classicĀ Hackers: Heroes of the Computer Revolution, but pursuingĀ HackersĀ would take us too far afield. Letās save that for another date. Back to the gleaming pages ofĀ In The Plex, where Levy is regaling us with how
They built a system that implemented ācheckpointing,ā a way for the index to hold its place if a calamity befell a server or hard disk. But the new system went furtherāit used a different way to handle a cluster of disks, more akin to the parallel-processing style of computing (where a computational task would be split among multiple computers or processers) than the āshardingā technique Google had used, which was to split up the web and assign regions of it to individual computers. (Those familiar with computer terms may know this technique as āpartitioning,ā but, as Dean says, āeveryone at Google calls it sharding because it sounds cooler.ā Among Googleās infrastructure wizards, itās key jargon.) š°
Look around yourĀ data pipelines, andĀ and hear the murmurs of torrential petabytes funneling through those veryĀ pipelines ofĀ our attempts at complexity managementĀ š©
Now, to understand the raison dāetre for eventual consistency, letās take a step back and have ourselves a slightly deeper look at the state of affairsāa consideration of the heavy burden that transactions place in the Big Data terrain especiallyāwhich directly led to a serious revisitation ofĀ and widespread adoption of the eventual consistencyĀ wherewithal. As Martin Fowler aptly noted in his classic book entitled Patterns of Enterprise Application Architecture (Pearson Education)
Most enterprise applications run into transactions in terms of databases. But there are plenty of other things that can be controlled using transactions, such as message queues, printers, and ATMs. As a result, technical discussions of transactions use the term ātransactional resourceā to mean anything thatās transactionalāthat is, that uses transactions to control concurrency. āTransactional resourceā is a bit of a mouthful, so we just use ādatabase,ā since thatās the most common case. But when we say ādatabase,ā the same applies for any other transactional resource.Ā
To handle the greatest throughput, modern transaction systems are designed to keep transactions as short as possible. As a result the general advice is to never make a transaction span multiple requests. A transaction that spans multiple requests is generally known as a long transaction. For this reason a common approach is to start a transaction at the beginning of a request.Ā
For this reason a common approach is to start a transaction at the beginning of a request and complete it at the end. This request transaction is a nice simple model, and a number of environments make it easy to do declaratively, by just tagging methods as transactional.
And speaking of POEEA, Fowler goes on to provide a marvelously readable survey of (some related, hard core topics such as) reducing transaction isolation for liveness, patterns for offline concurrency control, and evenāthis one admittedly not for the faint of heartāapplication server concurrency. In fact, for application server concurrency in particular, I cannot think of a better, single resource than the fine book entitled Pattern-Oriented Software Architecture, Volume 2: Patterns for concurrent and distributed systemsĀ (John Wiley, 2000) by Schmidt, Stal, Rohnert, and Buschmann. Referring to that book,Ā Fowler aptly points out that it is š
ā¦more for people who design application servers than for those who use application servers, but itās good to have some knowledge of these ideas when you use the results (italics mine).Ā
So there š
With that, letās bring replicationĀ into the picture. Stripped to its essence, replication means keeping a copy of the same data on multiple machines which are connected to one another in a network with a given topology. š“ Ā And lest anyone accuse me of putting cart before the horse, I hasten to add that the whole reason we even have replication is that we wish toĀ
- scale out the servers which can serve queries (think increased throughput on the read side of things) š¶
- keep data in geographical proximity to users (think reduced latency) š
- allow systems to continue functioning in the face of partial failure (think increased availability) š„
But hereās the kicker: Letās say that the data that you have on your handsāas you will in all likelihoodāhappens to be subject to change, and remains in flux. Oh, the searing pain of dealing with ever-present change (changing requirements, anyone?)Ā has singed its way into the collective psyche of us tech types, hasnāt it? š¹
Sleight of hand and twist of fate
On a bed of nails she makes me wait
And I wait without you
With or without you
With or without you
Through the storm we reach the shore
You give it all but I want more
And Iām waiting for you
~ U2 (lyrics from With Or Without You)
Look around your software systems, andĀ divine which components can benefit from a move away fromāor perhaps even towardāa transactional approach š
I concede that the replication of databases is an antiquated topic; it is hoary by the scale of internet years because the foundations havenāt changed since their introduction (circa 1970), plus networking principles have stayed unchanged, mercifully enough, if I may add š Ā Most developers, though, go about their work with the deeply engrained idea that any given database is made up of a single node; clearly, though, the trend hasĀ squarely andĀ unerringly converged on databases that are distributed; so thoroughly mainstream have distributed databases become that itās almost unthinkable nowadays to even contemplate that it wasĀ everĀ otherwise š
My best guess is that therein lie the genesis of the phenomenon that I had sketched at the very outset of this essay. Look, Iāll be the first one to confess that eventual consistencyāan eminently sensible strategy that I have come to view it asātakes a little getting used to before it begins feeling normal, until itĀ eventuallyĀ becomes second natureĀ (pun sort of intended) š»
In fact, I would surmise that the next generation of technologists, who will scarcely have seen anything but NoSQLāwith nary a palpable trace of transaction processing mechanism in thereāwill take to the notion of eventual consistency like ducks take to water š¢ Ā The rest of us, raised as we were on a steady diet of transaction processing, have had to go through, and perhaps continue to go through a period of sorting out the all-too-natural confusion that has surely lurked in our minds, even ifĀ diffusely so,Ā until theĀ proverbialĀ light bulb went off and we saw the light (at the end of the tunnel). I just had to wedge in the tunnel metaphor in there, didnāt I?Ā My storyĀ is that poetic license beckoned, and Iām sticking by my story š¦
Some related topics that Iāll only mention in passingātopics that are well worth your while to dig into deeperāinclude monotonic read consistency and read-your-writes consistency. Meanwhile, letās round out our exploration of eventual consistency with a foray through the vital concerns of leaders and followers. Essentially, we start with the premise that a replica is simply a node which stores a complete copy of a given database, and we want to formulate any and all questions that would be worth addressing to move forward.Ā
First question, anyone? Okay, Iāll throw this one out there: How to make sure that the selfsame data consistently ends up on allĀ replicas? If we think about it, each database write had betterĀ be processed by all replicas. If that does not happen, the replicas would diverge from a common dataset. Again, tons of gory details that weāll gloss over here, but suffice it to say that the predominantly accepted solution for precisely this problem is known as leader-based replication. But I hear you asking, How does one achieve high availability with leader-based replication? Great question, and one that will take us down the path of coming up with strategies for handling node outages, which is a deep subject right thereāstraddling the broad areas of software design, operations, and maintenanceāso Iāll be doing some vigorous hand waving here š š±
Remember, though, that the construct of a failover features prominently here. It will also behoove us to remain mindful of the many, many things which can go awry during a failover, including, but not limited to: unreliable networks, failing nodes, achieving a decidedly delicate balance between consistency, durability, availability, and latency of replicas.Ā
Come to think of it, addressing these problems (andĀ others, to be sure) constitute the very problems that take us romping into the territory of distributed systems central, so to say. One phrase to keep in mind through this all: trade-offs. Keep the centrality ofĀ trade-offsĀ in mind, and youāll be good.Ā Yes, it can be a circus and a half, but we need to hang in there and remainĀ undeterredĀ as we go about solving those wicked problems šŖ
Returning now to the core topic of this essayāeventual consistencyāwe have at this point built up enough of a critical mass of understanding the relevant ideas, concepts, and background that are essential to grokking eventual consistency itself. The crux of the matter is that if an application reads from an asynchronous follower, and meanwhile the follower has fallen behind (the leader), it (i.e. the follower) will possibly see outdated data. Yep, you guessed it: This all will inevitably lead to perplexing inconsistencies in the database (The same query, when simultaneously run on the leader and a follower might well return divergent results). Yes, you guessed it again, this happens simply because not all writes have propagated on through to the follower. Given the transient nature of this inconsistency, all followers willĀ eventuallyĀ catch up with their leader, becoming matched up (consistency-wise) with it. And just to drive home the pointāat the risk of sounding like a broken recordāthis phenomenon is the genesis of the phrase eventual consistencyĀ š
There is only one boss. The customer. And he can fire everybody in the company from the chairman on down, simply by spending his money somewhere elseĀ š³Ā š²Ā š“
Once again, to drive home the pointāat the risk of sounding like a broken recordāI should divulge that I was endearingly impressed by the onion metaphor as well as the undertones of constantly revisiting the basics for ever-increasingly faithful conceptualizations. I trace this to my undergrad years when I spent countless hours poring over the mesmerizing pagesĀ of the classic MIT textbook on Circuits, Signals, and Systems by William M. Siebert š
Yeah, right. The complexity of doing this sort of thing (in application code) is typically prohibitiveā¦ Wouldnāt it be far easier to have us application developers remain carefreeāin this regard anyway, and let the allied database behave intelligently (OK, no AI, just plain old DB management) and handle replication issues as and when they arise. And thatās what all this was building up to: The reason why transactions are even around, being as they are a reliable mechanism for databases to deliver guarantees, thereby simplifying application code by obviating the need for handling the aforementioned complexity (groan). So there you have it, a transaction is the abstraction layer which empowers application code to go merrily about its way, pretending that certain types of concurrency problems have simply vanished into non-existence; should you run afoul of errors, all you need to is is abort the transaction, and simply have your application code retry until success is achieved š
Look around and peer into your persistence layer, andĀ unravel yourself into the joy of adjusting and tweaking away with the wrench of tuneable consistency š±
You know that people are reallyĀ having a ton of fun when they start getting carried awayāin a very good way I hasten to addāwith pushing the envelope on the limits of existing technologies. Emboldened, they and tweak and tune stuff, metamorphosing it (alchemy dire straits) into something superlativeā¦ Enter tuneable consistency. Here I cannot do much better than have us hear from the horseās mouth, so to say. In their first class book entitled Cassandra: The Definitive Guide (OāReilly), authors Jeff Carpenter and Eben Hewitt enlighten us that š§
Consistency essentially means that a read always returns the most recently written value. Consider two customers are attempting to put the same item into their shopping carts on an e-commerce site. If I place the last item in stock into my cart an instant after you do, you should get the item added to your cart, and I should be informed that the item is no longer available for purchase. This is guaranteed to happen when the state of a write is consistent among all nodes that have that data. But as weāll see later, scaling data stores means making certain trade-offs between data consistency, node availability, and partition tolerance. Cassandra is frequently called eventually consistent, which is a bit misleading. Out of the box, Cassandra trades some consistency in order to achieve total availability. But Cassandra is more accurately termed tuneably consistent, which means it allows you to easily decide the level of consistency you require, in balance with the level of availability. Letās take a moment to unpack this, as the term āeventual consistencyā has caused some uproar in the industry. Some practitioners hesitate to use a system that is described as eventually consistentĀ (italics mine).
In a truly distributed environment, and when writes involve quorums, you can tune how many nodes need to have successfully accepted a write so that the operation as a whole is a success. If you choose a W value less than the number of replicas, the remaining replicas that were not involved in the write will receive the data eventually. Again, weāre talking milliseconds in common cases, but it can be a noticeable lag, and your application should be ready to deal with cases like that. In every scenario, the common thing is that a write will reach all the relevant nodes eventually, so that all nodes have the same data. It will take some time, but eventually the data in the whole cluster will be consistent for this particular piece of data, even after network partitions. Hence the name eventual consistency. Once again itās not really a specific feature of NoSQL databases, every time you have a setup involving masters and slaves, eventual consistency will strike with furious anger. The term was originally coined by Werner Vogels, Amazonās CTO, in 2007. The paper he wrote about it is well worth reading. Being the biggest e-commerce site out there, Amazon had a big influence on a whole slew of databases š§
CAP is an abbreviation for consistency, availability, and partition tolerance. The basic idea is that in a distributed system, you can have only two of these properties, but not all three at once.
Wait a second, how in the universe did we manage to gloss over Brewerās CAP Theorem until now? Well, thereās no time to waste ā So a remarkably readable discussion of exactly what the CAP Theorem is all about can be found in the pages of Cassandra: The Definitive Guide (OāReilly), authors Jeff Carpenter and Eben Hewitt. In their words
The theorem states that within a large-scale distributed data system, there are three requirements that have a relationship of sliding dependency:Ā
- Consistency: All database clients will read the same value for the same query, even given concurrent updates ā
- Availability: All database clients will always be able to read and write data ā
- Partition tolerance: The database can be split into multiple machines; it can continue functioning in the face of network segmentation breaks ā
And there is a whole lot more that the authors share in crisp descriptions such as the one above. If your interest was piqued, I can only recommend grabbing a copy of the fine book; you wonāt regret it, nearlyĀ (eventually?) guaranteed (pun partially intended) š
One other book which will serve you swimmingly well is Lars Georgeās HBase: The Definitive Guide (OāReilly). His sparkling clear narrative is nearly up there with Tom Whiteās. The latter, of course, being the author of the classic Hadoop: The Definitive Guide (OāReilly), now in its fourth edition! To digress a tad, White writes in the Preface to his Hadoop book that š
Martin Gardner, the mathematics and science writer, once said in an interview:Ā
Beyond calculus, I am lost. That was the secret of my columnās success. It took me so long to understand what I was writing about that I knew how to write in a way most readers would understand.Ā
In many ways, this is how I feel about Hadoop. Its inner workings are complex, resting as they do on a mixture of distributed systems theory, practical engineering, and common sense. And to the uninitiated, Hadoop can appear alien.Ā Ā
But it doesnāt need to be like this. Stripped to its core, the tools that Hadoop provides for working with big data are simple. If thereās a common theme, it is about raising the level of abstractionāto create building blocks for programmers who have lots of data to store and analyze, and who donāt have the time, the skill, or the inclination to become distributed systems experts to build the infrastructure to handle it š
It seems fitting to talk about consistency a bit more since it is mentioned often throughout this book. On the outset, consistency is about guaranteeing that a database always appears truthful to its clients.
Every operation on the database must carry its state from one consistent state to the next. How this is achieved or implemented is not specified explicitly so that a system has multiple choices. In the end, it has to get to the next consistent state, or return to the previous consistent state, to fulfill its obligation.Ā
Consistency can be classified in, for example, decreasing order of its properties, or guarantees offered to clients. Here is an informal list:Ā
Strict: The changes to the data are atomic and appear to take effect instantaneously. This is the highest form of consistency š§
Sequential: Every client sees all changes in the same order they were applied š
Causal: All changes that are causally related are observed in the same order by all clients š
Eventual: When no updates occur for a period of time, eventually all updates will propagate through the system and all replicas will be consistent š
Weak: No guarantee is made that all updates will propagate and changes may appear out of order to various clients š
The class of system adhering to eventual consistency can be even further divided into subtler sets, where those sets can also coexist. Werner Vogels, CTO of Amazon, lists them in his post titled āEventually Consistentā. The article also picks up on the topic of the CAP Theorem, which states that a distributed system can only achieve two out of the following three properties: consistency, availability, and partition tolerance.Ā
Look around and descend into your messaging layer, andĀ honestly assess whether the imprints ofĀ the classic work on Enterprise Integration Patternsāthe measure and yardstick of messaging wisdomāare writ large ā·
Now here is an oldie, but goldie š It shows, in my opinion, our industryās rich heritage in that it reverberates with echoes of (recent) history, which radiantly glows with the aura of just how far weāve come since its publication (circa the turn of a new millennium, the year 2001 to be precise). This is from the public domain, should you wishĀ to head over and browse online the periodical entitled IEEE Internet ComputingāVolume: 5 Issue: 4 Lessons from giant-scale services. So without further adieu, here is the promisedĀ oldie, but goldie š In particular, itās offered here by way of the Abstract which accompanies the aforesaid periodical, and where we are boldly told how
Web portals and ISPs such as AOL, Microsoft Network, and Yahoo have grown more than tenfold in the past five years (1996-2001). Despite their scale, growth rates, and rapid evolution of content and features, these sites and other āgiant-scaleā services like instant messaging and Napster must be always available. Many other major Web sites such as eBay, CNN, and Wal-Mart, have similar availability requirements. The article looks at the basic model for such services, focusing on the key real-world challenges they face (high availability, evolution, and growth), and developing some principles for attacking these problems. Few of the points made in the article are addressed in the literature, and most of the conclusions take the form of principles and approaches rather than absolute quantitative evaluations. This is due partly to the authorās focus on high-level design, partly to the newness of the area, and partly to the proprietary nature of some of the information (which represents 15-20 very large sites). Nonetheless, the lessons are easy to understand and apply, and they simplify the design of large systems.Ā
~ Published in: IEEE Internet Computing ( Volume: 5, Issue: 4, Jul/Aug 2001 )
ZooKeeper was designed to be a robust service that enables application developers to focus mainly on their application logic rather than coordination. It exposes a simple API, inspired by the filesystem API, that allows developers to implement common coordination tasks, such as electing a master server, managing group membership, and managing metadata. ZooKeeper is an application library with two principal implementations of the APIsāJava and Cāand a service component implemented in Java that runs on an ensemble of dedicated servers. Having an ensemble of servers enables ZooKeeper to tolerate faults and scale throughput.
Work. Keep digging your well.Ā
Donāt think about getting off from work.Ā
Water is there somewhere.Ā
Submit to a daily practice.Ā
Your loyalty to thatĀ
is a ring on the door.Ā Ā
Keep knocking, and the joy insideĀ
will eventually open a windowĀ
and look out to see whoās there (italics areĀ mine)