Summary
Allow the client to efficiently insert many key/value pairs into one or more records by reducing network overhead and farming work from the the client to the server.
Motivation
Since each public API call must go over the wire at least once, there is a lot of round trip network overhead incurred when adding lots of data to a record. Therefore, we want to make it possible for the client to send a lot of data over the wire once and have that data added to a record. Once this functionality is in place, we can drastically improve performance of the Import Framework.
Use Cases
- I want to quickly seed a new record with data
- I want to quickly insert new data into an existing record
Important Questions and Semantics
- Data from a bulk insert is converted/processed on the server.
- The client should only send data over the wire once, at which point the server is responsible for the rest of the work.
- We "add" instead of "set" data from a bulk insert.
- The term "insert" does not imply the deletion of any data, so "adding" is more intuitive.
- Unlike the "set" method, "add" is defined at the Engine level, which means it can be used in an atomic operation.
- A bulk insert is a non-retry atomic operation that can fail (similar to verifyAndSwap)
- This means that a bulk insert cannot contain any data that already exists in the record.
- A JSON string is used to define the data for a bulk insert.
- JSON is simple and has support in many languages
- We imagine that most data to be bulk inserted will come from web forms via AJAX in the form of JSON, so the application backend can pass it off to the database directly without the need to do additional conversion.
- JSON value conversion semantics:
Number → the appropriate integer, long, float or double
May make sense to leverage convert code being written for the Import Framework
String → string
Boolean → boolean
Object → a record is created and the key is linked to that record
Array → Each element of the array is added as a value to the key
- Null → not permitted
- There are features we are intentionally not supporting at the moment:
- Bulk Delete
- A compound operation to do a bulk insert for many records in one call (i.e. a mapping from record to json string as a param, or specifying multiple records in the json string)
Effects on Public API
The following methods will be added to the Public API and will therefore have a client-side implementation:
- public boolean insert(String json)
- public boolean insert(String json, long record)
- @CompoundOperation public Map<Long, Boolean> insert(String json, Collection<Long> records)
Effects on Thrift API
The following methods will be added to the Thrift API and will therefore have a server-side implementation:
- bool insert(1:string json, 2:i64 record)
Implementation Plan
Feature | Description | Notes |
---|---|---|
Converter for JSON values to Java objects | Leverage code used in the Import Framework that can convert strings to Java objects | |
Update Thrift API | Add new insert method to Thrift API | |
Server-side implementation | Implement the logic for parsing a json string and adding the appropriate keys/values/links in an atomic operation on the server | |
Update Public API | Add new insert methods to the Public API | |
Client-side implementations | Implement the logic for calling the new thrift #insert hook from the client |