Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 2 Next »

PING PROCESS

  • Happens asynchronously every 5 seconds (thread for each node in the system)
  • For each node, conduct the PING PROTOCOL
  • After the status of each node is updated
    • If all nodes are available then delete any writes that the master node currently has in its buffer
    • Else keep the writes in the buffer

PING PROTOCOL

  • A ping is sent to the node
  • The node
    • Does not respond and is marked as "unavailable"
    • Responds with a version that is < the version of the master node and is marked as "repairable"
    • Responds with a version that is == to the version of the master node and is marked as "available"

Don't lock collection of statuses on master node, but just use compareAndSet

REPAIR PROCESS

  • For each node marked as repairable:

    • Use a separate asynchronous thread to conduct the REPAIR PROTOCOL

REPAIR PROTOCOL

  • Given a repairable node
  • Read lock the buffer of writes
  • Conduct the WRITE PROTOCOL for each write between the node's version and the master node's version
  • Unlock the buffer of writes
  • If the node fails in the middle of the repair then the failure will propagate through the WRITE PROTOCOL and up to the PING PROCESS so no data will actually be sent, but it won't be lost because the master node will hang onto it

WRITE PROTOCOL

  • Given a write
  • Write lock the buffer of Writes
  • Write lock the master node's version to prevent it from being read by other protocols/processes
  • The master node apppends the write in its Buffer and updates its version
  • The master node determines
    • The nodes for the PrimaryRecord
    • The nodes for the SecondaryIndex and SearchIndex
  • Read lock the collection of statuses
  • The master node asynchronously
    • Sends the Write to available nodes in the intersection of A and B
    • A PrimaryWrite to the available nodes that are uniquely in A
    • An IndexWrite to the available nodes that are uniquely in B
    • A VersionWrite to the available nodes that are outside of A and B
    • The master node DOES NOT wait for acknowledgement from the node
    • AFTER each node processes the write (if applicable), it updates its version. This means we'll need logic for nodes to determine if they are processing a previously accepted write (in the event that a node fails after it has processed a write, but before it updates its version)

      We can only send to nodes that are marked as available to ensure that we don't skip version updates (i.e. this would happen if a node is unavailable for write A but becomes available for write B before it is repaired..if we just send the write to that node, then it will update its version to the current state but it will never actually get write A and will therefore lose data). Nodes that are marked as available are guaranteed to have a version that are at most 1 behind the master node's current version, so we can send the write to those nodes and be sure that no data will be lost if the write is accepted. If the write cant be accepted because the node goes down in the middle of the protocol then thats okay because the version of the node won't change and it will be repaired when it becomes available again

WRITE REQUEST

  • Given a request:
  • Convert the request to the appropriate Write
  • Conduct the WRITE PROTOCOL

READ PROTOCOL

  • Determine the record to read
  • Determine the nodes where that record is stored
  • Shuffle the nodes and iterate foreach node conduct the PING protocol
    • If the node is not available
      • Get the next node
    • If the node is available
      •  Process the read
    • If there are no available relevant nodes
      • The system is unavailable
      • maybe add some retry/timeout logic
  • No labels