...
- Criteria
- CriteriaBuilder (or Criteria.Builder)
Effects on Thrift API
If we decide to support complex criteria on the server, then we'll need to modify the Thrift API.
Important Questions and Semantics
- Should the appropriate processing and post-filtering for a complex find be done client or server side?
- Should we deprecate the #find methods that take simple parameters?
- Why a programatic builder as opposed to a grammar parser?
- This project may form the basis of a larger database language (e.g. Concourse Action Language (CAL))
- This means that we need to write this feature in a way that can easily be leveraged and extended in the future
Context Free Grammar
We need to build a CFG that can control the state of the builder and also parse hand written queries. The grammar must support the following cases:
- Simple queries: key operator value
- Conjunctive queries: key operator value AND key operator value
- Disjunctive queries: key1 operator value1 OR key2 operator value2
- Combined queries: key1 operator value1 AND key2 operator value2 OR key3 operator value3
- Grouped queries: (key1 operator value1 AND key2 operator value2) OR key3 operator value3
Code Block |
---|
S → SS
S → (S)
S → ()
S → key KEY
KEY → operator OPERATOR
OPERATOR → value VALUE
OPERATOR → value
VALUE → value VALUE
VALUE → value
VALUE → and AND
VALUE → or OR
AND → S
OR → S |
Order of Operations
Given validly formed grouped criteria, we must process everything with the correct order of operations. For this, we should use the Shunting-yard algorithm. I've written this algorithm in PHP so we should port it to Java.
...
- Add a TCriteria struct (this must be wrapped by the Criteria object in the public api)
- Must deprecated or change the current #find() method
- Implement #find() method or new method that takes a Criteria as a param (i.e. set<i64> find0(data.Criteria criteria))
- Make sure that ConcourseServer takes simple queries and turns them to a Criteria and sends them to the appropriate thrift ConcourseService method.
Important Questions and Semantics
- Should the appropriate processing and post-filtering for a complex find be done client or server side?
- We should probably do this on the server so that these queries are performant. I'm really concerned about all the network overhead of the client sending multiple separate queries over the wire.
- If this is done on the client, then complex queries are somewhat analogous to a compound operation
- This means that our results can become invalid if a spurious write changes the data in between network calls
- Example: Assume my criteria is find all records where "name = 'John' AND age = 23". If the client has to do separate calls and post filter the results then its possible a record that matches the first query can be changed to no longer match the first query after the result is returned and the client moves onto the second query.
- This means that our results can become invalid if a spurious write changes the data in between network calls
- If this is done on the server, the complex query is analogous to a repeatable atomic operation
- Should we deprecate the #find methods that take simple parameters?
- No: Builders are inherently not user friendly so we should not force the user to use one of a simple query
- Yes: Overloaded methods that take different kinds of parameters is bad API design
- Why a programatic builder as opposed to a grammar parser?
- Don't trust raw string input from users. We can regulate things a lot more if this is done programmatically with a builder.
- This project may form the basis of a larger database language (e.g. Concourse Action Language (CAL))
- This means that we need to write this feature in a way that can easily be leveraged and extended in the future
- Will this builder be usable in CaSH?
Building Process
- User creates Criteria from builder.
- The client goes through each symbol in the Criteria and converts them to the appropriate TSymbol, building a TCriteria along the way
- All sub criteria is expanded and prepended/appended with the appropriate parenthesis symbols
- The server receives the TCriteria, goes through the list of TSymbols and converts them back to Symbols
- The server does the shunting yard alg on the list of Symbols.
Context Free Grammar
We need to build a CFG that can control the state of the builder and also parse hand written queries. The grammar must support the following cases:
- Simple queries: key operator value
- Conjunctive queries: key operator value AND key operator value
- Disjunctive queries: key1 operator value1 OR key2 operator value2
- Combined queries: key1 operator value1 AND key2 operator value2 OR key3 operator value3
- Grouped queries: (key1 operator value1 AND key2 operator value2) OR key3 operator value3
Code Block |
---|
S → SS
S → (S)
S → ()
S → key KEY
KEY → operator OPERATOR
OPERATOR → value VALUE
OPERATOR → value
VALUE → value VALUE
VALUE → value
VALUE → and AND
VALUE → or OR
AND → S
OR → S |
Psuedocode
TSymbolType
A representation for an enum that declares the type of a TSymbol.
Code Block | ||
---|---|---|
| ||
enum TSymbolType {
CONJUNCTION = 1,
KEY = 2,
VALUE = 3,
PARENTHESIS = 4
}
|
TSymbol
A representation for a Symbol that can be passed over the wire via Thrift. Once passed over the wire, the server uses information about the symbol type to parse the string representation of the symbol to an actual object.
Code Block | ||
---|---|---|
| ||
struct TSymbol {
1:required TSymbolType type;
2:required string symbol;
}
|
TCriteria
A representation for a Criteria that can be passed over the wire via Thrift. Once passed over the write, the server goes through the list of TSymbols and converts them to actual Symbol objects which can then be used in the shunting-yard algorithm.
Code Block | ||
---|---|---|
| ||
struct TCriteria {
1:required list<TSymbol> symbols
}
|
Criteria
A wrapper around an in-order collection of symbols. A Criteria is generated from the completion of a state machine that comprises a builder.
Code Block |
---|
// The client must convert a Criteria to a TCriteria and while doing so and encountering an embedded Criteria, it should add a open paren then add all the symbols from the
// sub criteria and finally add a closing paren. Then the server will take the TCriteria and make a Criteria that has no sub Criteria objects but has parenthesis in the list
// of symbols.
public class Criteria implements Symbol {
public static Builder builder() {
return new StartState(new Criteria());
}
private final List<Symbol> symbols = Lists.newArrayList();
protected Criteria();
protected void addSymbol(Symbol symbol);
} |
State
The base class and marker for any valid state for the builder. Each state is passed the current Criteria and holds a reference. For any method called from the state, a symbol is added to the Criteria or the Criteria is returned.
Code Block |
---|
abstract class State {
Criteria criteria;
} |
StartState
The state that marks the beginning of a new Criteria.
Code Block |
---|
class StartState extends State {
StartState group(Criteria criteria);
KeyState key(String key);
Criteria build();
} |
KeyState
The state that expects the next symbol/token to be an operator specification.
Code Block |
---|
class KeyState extends State {
OperatorState operator(Operator operator);
} |
OperatorState
The state that expects the next symbol/token to be a value specification.
Code Block |
---|
class OperatorState extends State {
ValueState value(Object value);
} |
ValueState
The state that expects the current symbol/token to be the last or the next symbol/token to be value or conjunction specification.
Code Block |
---|
class ValueState extends State {
ValueState value(Object value);
StartState and();
StartState or();
Criteria build();
} |
Symbol
A marker interface to indicate an object that can be included in a Criteria
Code Block |
---|
interface Symbol {} //marker interface |
Note |
---|
Each subclass implementation of the Symbol interface should have some public static method that allows the creation of an Object in that class from a string parameter. |
ConjunctionSymbol
An enum that represents the possible conjunction symbols.
Code Block |
---|
enum ConjunctionSymbol implements Symbol {
AND, OR;
} |
KeySymbol
A symbol that represents a "key" in a Criteria.
Code Block |
---|
class KeySymbol implements Symbol {
public String getKey();
} |
OperatorSymbol
A symbol that represents an "operator" in a Criteria.
Code Block |
---|
class OperatorSymbol implements Symbol {
public Operator getOperator();
} |
ValueSymbol
A symbol that represents a "value" in a Criteria.
Code Block |
---|
class ValueSymbol implements Symbol {
// When the State creates this symbol it must convert the object to a TObject so that it can be passed over the wire via Thrift
public TObject getValue();
} |
ParenthesisSymbol (only defined on the server)
A symbol that represents a parenthesis in a Criteria. Note that the builder doesn't actually need to use these since Criteria objects are embedded to signify groups.
Code Block |
---|
enum ParenthesisSymbol implements Symbol {
LEFT, RIGHT;
} |
Order of Operations
Given validly formed grouped criteria, we must process everything with the correct order of operations. For this, we should use the Shunting-yard algorithm. I've written this algorithm in PHP so we should port it to Java.
Code Block |
---|
/** * Breakdown a valid selection criteria using the Shunting-yard algorithin * @see http://en.wikipedia.org/wiki/Shunting-yard_algorithm * @param string $criteria * @return array $criteria in reverse polish notation {@link http://en.wikipedia.org/wiki/Reverse_Polish_notation} * @since 1.1.0 */ private static function breakdownSelectionCriteria($criteria) { $rangeSeparator = "*\*"; $spaceSeperator = "++\++"; $_criteria = $criteria; $stack = array(); $queue = array(); $criteria = str_replace("(", " ( ", $criteria); $criteria = str_replace(")", " ) ", $criteria); $criteria = preg_replace("/(range) ([0-9]+) ([0-9]+)/i", array_push($stack"$1$2$rangeSeparator$3", $token$criteria); $criteria }= preg_replace("/'([a-zA-Z0-9]+)([\s])+([a-zA-Z0-9]+)'[\s]*/i", else if (strcasecmp($token, "($1$spaceSeperator$3") == 0) {, $criteria); foreach (self::$_SELECTION_CRITERIA_OPERATORS as $operator) { array_push($stack, $token); $criteria = str_ireplace(" $operator", $operator, }$criteria); else$criteria if= (strcasecmp($token, ")") == 0) {str_ireplace("$operator ", $operator, $criteria); } $foundLeftParen$tokens = false; array_filter(explode(" ", $criteria)); foreach ($tokens while (!empty($stack))as $token) { $token = str_replace($rangeSeparator, " ", $token); $topOfStack = $stack[count($stack) - 1]; $token = str_replace($spaceSeperator, " ", $token); if (strcasecmp($topOfStack$token, "(and") == 0) { || strcasecmp($token, "or") == 0) { while (!empty($stack)) { $foundLeftParen = true; $topOfStack = $topOfStack = $stack[count($stack) - break1]; }if (strcasecmp($token, "or") == 0 && (strcasecmp($topOfStack, else { "or") == 0 $queue[] = array_pop($stack);|| strcasecmp($topOfStack, "and") } == 0)) { } $queue[] if= array_pop(!$foundLeftParen$stack); { } throw new \Exception("Syntax error in criteria $_criteria. Mismatched parenthesis"); else { } array_pop($stack); break; } } else { } $queue[] = $token; }array_push($stack, $token); } } while (!empty($stack)) { else if $topOfStack = $stack[count($stack) - 1];(strcasecmp($token, "(") == 0) { if (strcasecmp($topOfStack, ")") == 0 || strcasecmp($topOfStack, "(") array_push($stack, $token); } == 0) { else if (strcasecmp($token, ")") == 0) { throw new \Exception("Syntax error in criteria $_criteria. Mismatched parenthesis")$foundLeftParen = false; } while (!empty($stack)) { $queue[] = array_pop($stack); } return $queue; } |
Implementation Plan
...
$topOfStack = $stack[count($stack) - 1];
if (strcasecmp($topOfStack, "(") == 0) {
$foundLeftParen = true;
break;
}
else {
$queue[] = array_pop($stack);
}
}
if (!$foundLeftParen) {
throw new \Exception("Syntax error in criteria $_criteria. Mismatched parenthesis");
}
array_pop($stack);
}
else {
$queue[] = $token;
}
}
while (!empty($stack)) {
$topOfStack = $stack[count($stack) - 1];
if (strcasecmp($topOfStack, ")") == 0 || strcasecmp($topOfStack, "(")
== 0) {
throw new \Exception("Syntax error in criteria $_criteria. Mismatched parenthesis");
}
$queue[] = array_pop($stack);
}
return $queue;
}
|
Implementation Plan
Task | Description | Notes |
---|---|---|
Define Criteria grammar | Implement the rules of the grammar, define all the appropriate symbols, and define all the appropriate operations | |
Port Shunting-yard to Java | Given a collection of symbols, implement the algorithm that returns those symbols in reverse polish notation. | |
Implement Criteria builder | Implement a builder that allows the programatic creation of a Criteria | |
Edit Public API | Add the appropriate methods to the public api | |
Integrate with CaSH |