...
Warning |
---|
This is a DRAFT! Nothing in this proposal should be considered final or binding. |
Ping Process
- Happens asynchronously every 5 seconds (thread for each node in the system)
- For each node, conduct the PING PROTOCOL
- After the status of each node is updated
- If all nodes are available then delete any writes that the master node currently has in its buffer
- Else keep the writes in the buffer
...
Ping Protocol
- A ping is sent to the node
- The node
- Does not respond and is marked as "unavailable"
- Responds with a version that is < the version of the master node and is marked as "repairable"
- Responds with a version that is == to the version of the master node and is marked as "available"
Note |
---|
Don't lock collection of statuses on master node, but just use compareAndSet |
...
Repair Process
For each node marked as repairable:
Use a separate asynchronous thread to conduct the REPAIR PROTOCOL
...
Repail Protocol
- Given a repairable node
- Read lock the buffer of writes
- Conduct the WRITE PROTOCOL for each write between the node's version and the master node's version
- Unlock the buffer of writes
- If the node fails in the middle of the repair then the failure will propagate through the WRITE PROTOCOL and up to the PING PROCESS so no data will actually be sent, but it won't be lost because the master node will hang onto it
...
Write Protocol
- Given a write
- Write lock the buffer of Writes
- Write lock the master node's version to prevent it from being read by other protocols/processes
- The master node apppends the write in its Buffer and updates its version
- The master node determines
- The nodes for the PrimaryRecord
- The nodes for the SecondaryIndex and SearchIndex
- Read lock the collection of statuses
- The master node asynchronously
- Sends the Write to available nodes in the intersection of A and B
- A PrimaryWrite to the available nodes that are uniquely in A
- An IndexWrite to the available nodes that are uniquely in B
- A VersionWrite to the available nodes that are outside of A and B
- The master node DOES NOT wait for acknowledgement from the node
AFTER each node processes the write (if applicable), it updates its version. This means we'll need logic for nodes to determine if they are processing a previously accepted write (in the event that a node fails after it has processed a write, but before it updates its version)
Note We can only send to nodes that are marked as available to ensure that we don't skip version updates (i.e. this would happen if a node is unavailable for write A but becomes available for write B before it is repaired..if we just send the write to that node, then it will update its version to the current state but it will never actually get write A and will therefore lose data). Nodes that are marked as available are guaranteed to have a version that are at most 1 behind the master node's current version, so we can send the write to those nodes and be sure that no data will be lost if the write is accepted. If the write cant be accepted because the node goes down in the middle of the protocol then thats okay because the version of the node won't change and it will be repaired when it becomes available again
...
Write Request
- Given a request:
- Convert the request to the appropriate Write
- Conduct the WRITE PROTOCOL
Read Protocol
- Determine the record to read
- Determine the nodes where that record is stored
- Shuffle the nodes and iterate foreach node conduct the PING protocol
- If the node is not available
- Get the next node
- If the node is available
- Process the read
- If there are no available relevant nodes
- The system is unavailable
- maybe add some retry/timeout logic
- If the node is not available