Distributed Node Communication Proposal

PING PROCESS

Happens asynchronously every 5 seconds (thread for each node in the system)
For each node, conduct the PING PROTOCOL
After the status of each node is updated
- If all nodes are available then delete any writes that the master node currently has in its buffer
- Else keep the writes in the buffer

PING PROTOCOL

A ping is sent to the node
The node
- Does not respond and is marked as "unavailable"
- Responds with a version that is < the version of the master node and is marked as "repairable"
- Responds with a version that is == to the version of the master node and is marked as "available"

Don't lock collection of statuses on master node, but just use compareAndSet

REPAIR PROCESS

For each node marked as repairable:
- Use a separate asynchronous thread to conduct the REPAIR PROTOCOL

REPAIR PROTOCOL

Given a repairable node
Read lock the buffer of writes
Conduct the WRITE PROTOCOL for each write between the node's version and the master node's version
Unlock the buffer of writes
If the node fails in the middle of the repair then the failure will propagate through the WRITE PROTOCOL and up to the PING PROCESS so no data will actually be sent, but it won't be lost because the master node will hang onto it

WRITE PROTOCOL

Given a write
Write lock the buffer of Writes
Write lock the master node's version to prevent it from being read by other protocols/processes
The master node apppends the write in its Buffer and updates its version
The master node determines
- The nodes for the PrimaryRecord
- The nodes for the SecondaryIndex and SearchIndex
Read lock the collection of statuses
The master node asynchronously
- Sends the Write to available nodes in the intersection of A and B
- A PrimaryWrite to the available nodes that are uniquely in A
- An IndexWrite to the available nodes that are uniquely in B
- A VersionWrite to the available nodes that are outside of A and B
- The master node DOES NOT wait for acknowledgement from the node
- AFTER each node processes the write (if applicable), it updates its version. This means we'll need logic for nodes to determine if they are processing a previously accepted write (in the event that a node fails after it has processed a write, but before it updates its version)
  
  We can only send to nodes that are marked as available to ensure that we don't skip version updates (i.e. this would happen if a node is unavailable for write A but becomes available for write B before it is repaired..if we just send the write to that node, then it will update its version to the current state but it will never actually get write A and will therefore lose data). Nodes that are marked as available are guaranteed to have a version that are at most 1 behind the master node's current version, so we can send the write to those nodes and be sure that no data will be lost if the write is accepted. If the write cant be accepted because the node goes down in the middle of the protocol then thats okay because the version of the node won't change and it will be repaired when it becomes available again

WRITE REQUEST

Given a request:
Convert the request to the appropriate Write
Conduct the WRITE PROTOCOL

READ PROTOCOL

Determine the record to read
Determine the nodes where that record is stored
Shuffle the nodes and iterate foreach node conduct the PING protocol
- If the node is not available
  - Get the next node
- If the node is available
  - Process the read
- If there are no available relevant nodes
  - The system is unavailable
  - maybe add some retry/timeout logic