Motivation
Given a cluster with an arbitrary number of nodes, we want to spread the data across those nodes so we reduce the amount of load that a single node bears. The idea is that, in a system with constant requests, load will be distributed throughout the system because data that is relevant to different requests are on different nodes. We also want to have data stored on multiple nodes so that we can continue to respond to queries in the event or arbitrary network/node failure.
Partition Factor
The partition_factor (pf) is a number [1, 100] that determines the partition pool (pp), or the maximum number of nodes across which data for a particular locator are stored. This is the upper bound on the number of nodes we need to query for a read that touches the entire locator (i.e. audit(record)).
pp = 1 or ceil(pf * n), where n is the number of nodes.
Redundancy Factor
The redundancy_factor (rf) is a number [1, 100] that determines the redundancy pool (rp) or the number of nodes which data for a particular key/locator pair are stored. This is the exact number of nodes that are eligible to respond to a query for a read that touches a key in a record. The redundancy pool is a subset of the partition pool.
rp = 1 or ceil(rf * pp)
Distribution
0 | 1 | 2 | 3 | 4 | 5 |
---|
Consider a cluster that has 6 nodes. We're also making a few assumptions that are worth calling out:
- The server prefs file contains an order (comma separated) list of node addresses
- The server reads that list of nodes on startup and caches the list somewhere on disk
- If the server starts up and notices the list of nodes in the prefs file is different from the one it cached, then it needs to do a rebalance!
Look on http://en.wikipedia.org/wiki/List_of_hash_functions for a 64 bit hash function OR use a stronger hash function but only take the first 8 bytes out the output.
Locator Hashing
// We need to get pp distinct nodes for the pool, so we continually hash the hash of the locator until we have everything we need. Integer[] partitionPool = new Integer[pp] Long hash = null; for(int i = 0; i < pp ; i++){ int slot = -1; while(slot == -1 || partitionPool[slot] != null){ slot = hash(hash == null ? locator : hash) % n; //n is number of nodes } partitionPool[i] = slot; } // We now have an array of node identifiers that form our partition pool.
Locator and Key Hashing
// We need to get rp nodes from the partition pool, so we continually hash the hash of the locator and key until we have everything we need Integer[] redundancyPool = new Integer[rp] Long hash = null; for(int i = 0; i < rp; i++){ int slot = -1; while(slot == -1 || redundancyPool[slot] != null){ slot = hash(hash == null ? locator + key : hash) % partitionPool.length; } redundancyPool[i] = partitionPool[slot]; } //We now have an array of node identifiers that for our redundancy pool.