...
- Should the appropriate processing and post-filtering for a complex find be done client or server side?
- We should probably do this on the server so that these queries are performant. I'm really concerned about all the network overhead of the client sending multiple separate queries over the wire.
- If this is done on the client, then complex queries are somewhat analogous to a compound operation
- This means that our results can become invalid if a spurious write changes the data in between network calls
- Example: Assume my criteria is find all records where "name = 'John' AND age = 23". If the client has to do separate calls and post filter the results then its possible a record that matches the first query can be changed to no longer match the first query after the result is returned and the client moves onto the second query.
- This means that our results can become invalid if a spurious write changes the data in between network calls
- If this is done on the server, the complex query is analogous to a repeatable atomic operation
- Should we deprecate the #find methods that take simple parameters?
- No: Builders are inherently not user friendly so we should not force the user to use one of a simple query
- Yes: Overloaded methods that take different kinds of parameters is bad API design
- Why a programatic builder as opposed to a grammar parser?
- Don't trust raw string input from users. We can regulate things a lot more if this is done programmatically with a builder.
- This project may form the basis of a larger database language (e.g. Concourse Action Language (CAL))
- This means that we need to write this feature in a way that can easily be leveraged and extended in the future
Building Process
- User creates Criteria from builder.
- The client goes through each symbol in the Criteria and converts them to the appropriate TSymbol, building a TCriteria along the way
- All sub criteria is expanded and prepended/appended with the appropriate parenthesis symbols
- The server receives the TCriteria, goes through the list of TSymbols and converts them back to Symbols
- The server does the shunting yard alg on the list of Symbols.
Context Free Grammar
We need to build a CFG that can control the state of the builder and also parse hand written queries. The grammar must support the following cases:
...
Code Block | ||
---|---|---|
| ||
enum TSymbolType {
CONJUNCTION = 1,
KEY = 2,
VALUE = 3,
PARENTHESIS = 4
}
|
TSymbol
A representation for a Symbol that can be passed over the wire via Thrift. Once passed over the wire, the server uses information about the symbol type to parse the string representation of the symbol to an actual object.
...
A wrapper around an in-order collection of symbols. A Criteria is generated from the completion of a state machine that comprises a builder.
Code Block |
---|
// When the Shunting-yard alg encounters The client must convert a Criteria to a TCriteria and while doing so and encountering an embedded Criteria, it should add a open paren then add all the symbols from the // sub criteria and finally add a closing paren paren. Then the server will take the TCriteria and make a Criteria that has no sub Criteria objects but has parenthesis in the list // of symbols. public class Criteria implements Symbol { public static Builder builder() { return new StartState(new Criteria()); } private final List<Symbol> symbols = Lists.newArrayList(); protected Criteria(); protected void addSymbol(Symbol symbol); } |
...
Code Block |
---|
class Value implements Symbol {
// When the State creates this symbol it must convert the object to a TObject so that it can be passed over the wire via Thrift
public TObject getValue();
} |
Parenthesis (only defined on the server)
A symbol that represents a parenthesis in a Criteria. Note that the builder doesn't actually need to use these since Criteria objects are embedded to signify groups.
Code Block |
---|
enum Parenthesis implements Symbol {
LEFT, RIGHT;
} |
Order of Operations
Given validly formed grouped criteria, we must process everything with the correct order of operations. For this, we should use the Shunting-yard algorithm. I've written this algorithm in PHP so we should port it to Java.
...