Data durability in cloud storage systems
- This dissertation challenges widely held assumptions about data replication in cloud storage systems. It demonstrates that existing cloud storage techniques are far from being optimal for guarding against different types of node failure events. The dissertation provides novel methodologies for analyzing node failures and designing non-random replication schemes that offer significantly higher durability than existing techniques, at the same storage cost and cluster performance. Popular cloud storage systems typically replicate their data on random nodes to guard against data loss due to node failures. Node failures can be classified into two categories: independent and correlated node failures. Independent node failures are typically caused by independent server and hardware failures, and occur hundreds of times a year in a cluster of thousands of nodes. Correlated node failures are failures that cause multiple nodes to fail simultaneously and occur a handful of times a year or less. Examples of correlated failures include recovery following a power outage or a large-scale network failure. The conventional wisdom to guard against node failures is to replicate each node's data three times within the same cluster, and also geo-replicate the entire cluster to a separate location to protect against correlated failures. The dissertation shows that random replication within a cluster is almost guaranteed to lose data under common scenarios of correlated node failures. Due to the high fixed cost of each incident of data loss, many data center operators prefer to minimize the frequency of such events at the expense of losing more data in each event. The dissertation introduces Copyset Replication, a novel general-purpose replication technique that significantly reduces the frequency of data loss events within a cluster. It also presents an implementation and evaluation of Copyset Replication on two open source cloud storage systems, HDFS and RAMCloud, and shows it incurs a low overhead on all operations. Such systems require that each node's data be scattered across several nodes for parallel data recovery and access. Copyset Replication presents a near optimal trade-off between the number of nodes on which the data is scattered and the probability of data loss. For example, in a 5000-node RAMCloud cluster under a power outage, Copyset Replication reduces the probability of data loss from 99.99% to 0.15%. For Facebook's HDFS cluster, it reduces the probability from 22.8% to 0.78%. The dissertation also demonstrates that with any replication scheme (including Copyset Replication), using two replicas is sufficient for protecting against independent node failures within a cluster, while using three replicas is inadequate for protecting against correlated node failures. Given that in many storage systems the third or n-th replica was introduced for durability and not for performance, storage systems can change the placement of the last replica to address correlated failures, which are the main vulnerability of cloud storage systems. The dissertation presents Tiered Replication, a replication scheme that splits the cluster into a primary and backup tier. The first two replicas are stored on the primary tier and are used to recover data in the case of independent node failures, while the third replica is stored on the backup tier and is used for correlated failures. The key insight behind Tiered Replication is that, since the third replicas are rarely read, we can place the backup tier on separate physical infrastructure or a remote location without affecting performance. This separation significantly increases the resilience of the storage system to correlated failures and presents a low cost alternative to geo-replication of an entire cluster. In addition, the Tiered Replication algorithm optimally minimizes the probability of data loss under correlated failures. Tiered Replication can be executed incrementally for each cluster change, which allows it to supports dynamic environments where nodes join and leave the cluster, and it facilitates additional data placement constraints required by the storage designer, such as network and rack awareness. Tiered Replication was implemented on HyperDex, an open-source cloud storage system, and the dissertation demonstrates that it incurs a small performance overhead. Tiered Replication improves the cluster-wide MTTF by a factor of 100,000 in comparison to random replication, without increasing the amount of storage.
|Type of resource
|electronic; electronic resource; remote
|1 online resource.
|Stanford University, Department of Electrical Engineering.
|Kozyrakis, Christoforos, 1974-
|Kozyrakis, Christoforos, 1974-
|Statement of responsibility
|Submitted to the Department of Electrical Engineering.
|Thesis (Ph.D.)--Stanford University, 2014.
- © 2014 by Asaf Cidon
- This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).
Also listed in
Loading usage metrics...