Marek Zawirski, PhD Student, INRIA
Modern web applications are backed by geo-replicated data stores that offer highly available and low latency data access by replicating and updating data at a data center close to the client. Some applications require even greater availability and responsiveness, and cache and update data directly in mobile device or web browser. We propose to integrate client-side replication into the geo-replicated store in order to support an arbitrarily large database, and to ensure consistency and availability. Such design goals translate into a difficult set of somewhat conflicting requirements. (i) Consistency: To ensure Causal+ Consistency (the strongest consistency model for highly available systems), all the way to the client. (ii) Scalability: To replicate only a small part of the database and metadata at a client, and to keep metadata size bounded and independent of extreme numbers of widely-spread clients. (iii) Fault-tolerance: To remain available despite network, data center and client failures.
We present the design of SwiftCloud, the first distributed data store that satisfy this combination of requirements. A client replicates only the data and metadata of interest to him. The consistency protocol ensures causally-consistency access, even when client switches between data centers, by trading-off data freshness for safety. SwiftCloud keeps metadata small thanks to appropriate separation of concerns: by using small vector clocks for causality, and scalar timestamps for uniqueness. This allows to prune update logs safely even when some client replicas are unavailable.