Allow users to programmatically build complex find queries (e.g. conjunctive, disjunctive, combination, etc).
Right now, there is no support for building complex find queries. Users can do this manually by doing a bunch of separate queries and post filtering the results, but this isn't an ideal long-term SOP. If this is supported in the product, then it lowers the barrier to entry and it gives us the opportunity to make those complex queries more efficient (i.e. we can cache certain sub queries, do query planning, etc).
The following methods will be added to the Public API and will therefore have a client-side implementation:
The following classes will be added to the Public API:
We need to build a CFG that can control the state of the builder and also parse hand written queries. The grammar must support the following cases:
S → SS S → (S) S → () S → key KEY KEY → operator OPERATOR OPERATOR → value VALUE OPERATOR → value VALUE → value VALUE VALUE → value VALUE → and AND VALUE → or OR AND → S OR → S |
A representation for an enum that declares the type of a TSymbol.
enum TSymbolType { CONJUNCTION = 1, KEY = 2, VALUE = 3, PARENTHESIS = 4 } |
A representation for a Symbol that can be passed over the wire via Thrift. Once passed over the wire, the server uses information about the symbol type to parse the string representation of the symbol to an actual object.
struct TSymbol { 1:required TSymbolType type; 2:required string symbol; } |
A representation for a Criteria that can be passed over the wire via Thrift. Once passed over the write, the server goes through the list of TSymbols and converts them to actual Symbol objects which can then be used in the shunting-yard algorithm.
struct TCriteria { 1:required list<TSymbol> symbols } |
A wrapper around an in-order collection of symbols. A Criteria is generated from the completion of a state machine that comprises a builder.
// The client must convert a Criteria to a TCriteria and while doing so and encountering an embedded Criteria, it should add a open paren then add all the symbols from the // sub criteria and finally add a closing paren. Then the server will take the TCriteria and make a Criteria that has no sub Criteria objects but has parenthesis in the list // of symbols. public class Criteria implements Symbol { public static Builder builder() { return new StartState(new Criteria()); } private final List<Symbol> symbols = Lists.newArrayList(); protected Criteria(); protected void addSymbol(Symbol symbol); } |
The base class and marker for any valid state for the builder. Each state is passed the current Criteria and holds a reference. For any method called from the state, a symbol is added to the Criteria or the Criteria is returned.
abstract class State { Criteria criteria; } |
The state that marks the beginning of a new Criteria.
class StartState extends State { StartState group(Criteria criteria); KeyState key(String key); Criteria build(); } |
The state that expects the next symbol/token to be an operator specification.
class KeyState extends State { OperatorState operator(Operator operator); } |
The state that expects the next symbol/token to be a value specification.
class OperatorState extends State { ValueState value(Object value); } |
The state that expects the current symbol/token to be the last or the next symbol/token to be value or conjunction specification.
class ValueState extends State { ValueState value(Object value); StartState and(); StartState or(); Criteria build(); } |
A marker interface to indicate an object that can be included in a Criteria
interface Symbol {} //marker interface |
Each subclass implementation of the Symbol interface should have some public static method that allows the creation of an Object in that class from a string parameter. |
An enum that represents the possible conjunction symbols.
enum ConjunctionSymbol implements Symbol { AND, OR; } |
A symbol that represents a "key" in a Criteria.
class KeySymbol implements Symbol { public String getKey(); } |
A symbol that represents an "operator" in a Criteria.
class OperatorSymbol implements Symbol { public Operator getOperator(); } |
A symbol that represents a "value" in a Criteria.
class ValueSymbol implements Symbol { // When the State creates this symbol it must convert the object to a TObject so that it can be passed over the wire via Thrift public TObject getValue(); } |
A symbol that represents a parenthesis in a Criteria. Note that the builder doesn't actually need to use these since Criteria objects are embedded to signify groups.
enum ParenthesisSymbol implements Symbol { LEFT, RIGHT; } |
Given validly formed grouped criteria, we must process everything with the correct order of operations. For this, we should use the Shunting-yard algorithm. I've written this algorithm in PHP so we should port it to Java.
/** * Breakdown a valid selection criteria using the Shunting-yard algorithin * @see http://en.wikipedia.org/wiki/Shunting-yard_algorithm * @param string $criteria * @return array $criteria in reverse polish notation {@link http://en.wikipedia.org/wiki/Reverse_Polish_notation} * @since 1.1.0 */ private static function breakdownSelectionCriteria($criteria) { $rangeSeparator = "*\*"; $spaceSeperator = "++\++"; $_criteria = $criteria; $stack = array(); $queue = array(); $criteria = str_replace("(", " ( ", $criteria); $criteria = str_replace(")", " ) ", $criteria); $criteria = preg_replace("/(range) ([0-9]+) ([0-9]+)/i", "$1$2$rangeSeparator$3", $criteria); $criteria = preg_replace("/'([a-zA-Z0-9]+)([\s])+([a-zA-Z0-9]+)'[\s]*/i", "$1$spaceSeperator$3", $criteria); foreach (self::$_SELECTION_CRITERIA_OPERATORS as $operator) { $criteria = str_ireplace(" $operator", $operator, $criteria); $criteria = str_ireplace("$operator ", $operator, $criteria); } $tokens = array_filter(explode(" ", $criteria)); foreach ($tokens as $token) { $token = str_replace($rangeSeparator, " ", $token); $token = str_replace($spaceSeperator, " ", $token); if (strcasecmp($token, "and") == 0 || strcasecmp($token, "or") == 0) { while (!empty($stack)) { $topOfStack = $topOfStack = $stack[count($stack) - 1]; if (strcasecmp($token, "or") == 0 && (strcasecmp($topOfStack, "or") == 0 || strcasecmp($topOfStack, "and") == 0)) { $queue[] = array_pop($stack); } else { break; } } array_push($stack, $token); } else if (strcasecmp($token, "(") == 0) { array_push($stack, $token); } else if (strcasecmp($token, ")") == 0) { $foundLeftParen = false; while (!empty($stack)) { $topOfStack = $stack[count($stack) - 1]; if (strcasecmp($topOfStack, "(") == 0) { $foundLeftParen = true; break; } else { $queue[] = array_pop($stack); } } if (!$foundLeftParen) { throw new \Exception("Syntax error in criteria $_criteria. Mismatched parenthesis"); } array_pop($stack); } else { $queue[] = $token; } } while (!empty($stack)) { $topOfStack = $stack[count($stack) - 1]; if (strcasecmp($topOfStack, ")") == 0 || strcasecmp($topOfStack, "(") == 0) { throw new \Exception("Syntax error in criteria $_criteria. Mismatched parenthesis"); } $queue[] = array_pop($stack); } return $queue; } |
Task | Description | Notes |
---|---|---|
Define Criteria grammar | Implement the rules of the grammar, define all the appropriate symbols, and define all the appropriate operations | ![]() |
Port Shunting-yard to Java | Given a collection of symbols, implement the algorithm that returns those symbols in reverse polish notation. | ![]() |
Implement Criteria builder | Implement a builder that allows the programatic creation of a Criteria | ![]() |
Edit Public API | Add the appropriate methods to the public api | ![]() |
Integrate with CaSH | ![]() |