CAP Theorem -

System Design

About Lesson

The CAP theorem is a fundamental concept in distributed computing that deals with the trade-offs between three key properties in a distributed system: Consistency, Availability, and Partition Tolerance.

Consistency (C)

In the context of the CAP theorem, consistency means that all nodes in a distributed system see the same data at the same time. In other words, if you read from one node and then read from another, you should get the same result.
Achieving strong consistency often involves coordination and synchronisation between nodes, which can lead to increased latency and reduced system availability during network partitions.

Availability (A)

Availability, in the context of the CAP theorem, means that every request to the system, whether it’s a read or write operation, receives a response without errors, even if some nodes in the system are experiencing failures.
High availability implies that the system remains operational and responsive, even when certain nodes are down or unreachable. This can be crucial for mission-critical applications.

Partition Tolerance (P):

Partition tolerance refers to the system’s ability to continue functioning even when there are network partitions or communication failures between nodes in the distributed system. Network partitions can result in delayed or lost messages between nodes.
Ensuring partition tolerance means that the system can handle network failures and still provide some level of consistency and availability.

The CAP theorem states that, in a distributed system, you can achieve at most two out of the three properties simultaneously but not all three. This leads to three primary trade-off scenarios:

CP (Consistency and Partition Tolerance):

In CP systems, data consistency is a top priority, and the system ensures that all nodes have consistent data, even in the presence of network partitions. However, this may lead to reduced availability during partitions because some nodes may become unreachable.

CA (Consistency and Availability):

CA systems prioritise both data consistency and high availability. They ensure that all nodes provide consistent data and are available to respond to requests, but they may not be tolerant to network partitions. These systems may work well in scenarios where network failures are rare.
However, network failure is unavoidable and a distributed system should be able to tolerate network partition. So, CA systems can’t exist in real-world applications.

AP (Availability and Partition Tolerance):

AP systems prioritise availability and partition tolerance over strong consistency. They aim to provide high availability even during network partitions, but this may result in temporary inconsistencies in the data, which are later resolved through mechanisms like eventual consistency.

It’s important to note that the CAP theorem doesn’t dictate a one-size-fits-all solution for distributed systems. The choice between consistency, availability, and partition tolerance depends on the specific requirements and constraints of the application. Many distributed databases and systems are designed to provide tunable levels of consistency and availability, allowing developers to make trade-offs based on their application’s needs.