Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Add data in bulk using JSON

The idea is to allow the client to insert data in bulk by specifying a json string that should be inserted into a record.

NOTE: This needs to be done as an atomic operation on the server side (i.e. verifyAndSwap)!

...

Table of Contents

Summary

Allow the client to efficiently insert many key/value pairs into one or more records by reducing network overhead and farming work from the the client to the server.

Motivation

Since each public API call must go over the wire at least once, there is a lot of round trip network overhead incurred when adding lots of data to a record. Therefore, we want to make it possible for the client to send a lot of data over the wire once and have that data added to a record. Once this functionality is in place, we can drastically improve performance of the Import Framework.

Use Cases

  • I want to quickly seed a new record with data
  • I want to quickly insert new data into an existing record

Important Questions and Semantics

  1. Data from a bulk insert is converted/processed on the server.
    1. The client should only send data over the wire once, at which point the server is responsible for the rest of the work.
  2. We "add" instead of "set" data from a bulk insert.
    1. The term "insert" does not imply the deletion of any data, so "adding" is more intuitive.
    2. Unlike the "set" method, "add" is defined at the Engine level, which means it can be used in an atomic operation.
  3.  A bulk insert is a non-retry atomic operation that can fail (similar to verifyAndSwap)
    1. This means that a bulk insert cannot contain any data that already exists in the record.
  4. A JSON string is used to define the data for a bulk insert.
    1. JSON is simple and has support in many languages
    2. We imagine that most data to be bulk inserted will come from web forms via AJAX in the form of JSON, so the application backend can pass it off to the database directly without the need to do additional conversion.
  5. JSON value conversion semantics:
    1. Number → the appropriate integer, long, float or double

      1. May make sense to leverage convert code being written for the Import Framework

    2. String → string

    3. Boolean → boolean

    4. Object → a record is created and the key is linked to that record

    5. Array → Each element of the array is added as a value to the key

    6. Null → not permitted
  6. There are features we are intentionally not supporting at the moment:
    1. Bulk Delete
    2. A compound operation to do a bulk insert for many records in one call (i.e. a mapping from record to json string as a param, or specifying multiple records in the json string)

Effects on Public API

The following methods will be added to the Public API and will therefore have a client-side implementation:

  • public boolean insert(String json)
  • public boolean insert(String json, long record)
  • @CompoundOperation public Map<Long, Boolean> insert(String json, Collection<Long> records)

Effects on Thrift API

The following methods will be added to the Thrift API and will therefore have a server-side implementation:

  • bool insert(1:string json, 2:i64 record)

Implementation Plan

FeatureDescriptionNotes
Converter for JSON values to Java objectsLeverage code used in the Import Framework that can convert strings to Java objects(tick)
Update Thrift APIAdd new insert method to Thrift API(tick)
Server-side implementationImplement the logic for parsing a json string and adding the appropriate keys/values/links in an atomic operation on the server(tick)
Update Public APIAdd new insert methods to the Public API(tick)
Client-side implementationsImplement the logic for calling the new thrift #insert hook from the client(tick)