Motivation

Given a cluster with an arbitrary number of nodes, we want to spread the data across those nodes so we reduce the amount of load that a single node bears. The idea is that, in a system with constant requests, load will be distributed throughout the system because data that is relevant to different requests are on different nodes. We also want to have data stored on multiple nodes so that we can continue to respond to queries in the event or arbitrary network/node failure.

Partition Factor

The partion_factor (pf) is a number [1, 100] that determines the partition pool (pp), or the maximum number of nodes across which data for a particular locator are stored. This is the upper bound on the number of nodes we need to query for a read that touches the entire locator (i.e. audit(record)).

pp = [1, ceil(pf * n)], where n is the number of nodes.

Redundancy Factor

The redundancy_factor is a number [1, 100] that determines the redundancy pool (rp) or the number of nodes which data for a particular key/locator pair are stored. This is the exact number of nodes that are eligible to respond to a query for a read that touches a key in a record. The redundancy pool is a subset of the partition pool.

rp = [1, ceil(rf * pp)]

Partitioning and Redundancy

Motivation

Partition Factor

Redundancy Factor