Table of Contents |
---|
Summary
Allow users to programmatically build complex find queries (e.g. conjunctive, disjunctive, combination, etc).
Motivation
Right now, there is no support for building complex find queries. Users can do this manually by doing a bunch of separate queries and post filtering the results, but this isn't an ideal long-term SOP. If this is supported in the product, then it lowers the barrier to entry and it gives us the opportunity to make those complex queries more efficient (i.e. we can cache certain sub queries, do query planning, etc).
Use Cases
- I want to find records that match all of several criteria (AND)
- I want to find records that match one of several criteria (OR)
- I want to find records that match all of several criteria and/or one of several criteria (AND/OR)
- I want to group different criteria in my query (parenthesis, order of operations)
Effects on Public API
The following methods will be added to the Public API and will therefore have a client-side implementation:
- public Set<Long> find(Criteria criteria)
The following classes will be added to the Public API:
- Criteria
- CriteriaBuilder (or Criteria.Builder)
Effects on Thrift API
If we decide to support complex criteria on the server, then we'll need to modify the Thrift API.
Important Questions and Semantics
- Should the appropriate processing and post-filtering for a complex find be done client or server side?
- Should we deprecate the #find methods that take simple parameters?
- Why a programatic builder as opposed to a grammar parser?
- This project may form the basis of a larger database language (e.g. Concourse Action Language (CAL))
- This means that we need to write this feature in a way that can easily be leveraged and extended in the future
Context Free Grammar
We need to build a CFG that can control the state of the builder and also parse hand written queries. The grammar must support the following cases:
- Simple queries: key operator value
- Conjunctive queries: key operator value AND key operator value
- Disjunctive queries: key1 operator value1 OR key2 operator value2
- Combined queries: key1 operator value1 AND key2 operator value2 OR key3 operator value3
- Grouped queries: (key1 operator value1 AND key2 operator value2) OR key3 operator value3
Code Block |
---|
S → SS |
...
S → (S) |
...
S → () |
...
S → key KEY |
...
KEY → operator OPERATOR |
...
OPERATOR → value VALUE |
...
OPERATOR → |
...
VALUE → value VALUE
VALUE → value
VALUE → and AND
VALUE → or OR
AND → S
OR → S
...
value
VALUE → value VALUE
VALUE → value
VALUE → and AND
VALUE → or OR
AND → S
OR → S |
Order of Operations
Given validly formed grouped criteria, we must process everything with the correct order of operations. For this, we should use the Shunting-yard algorithm. I've written this algorithm in PHP so we should port it to Java.
Code Block |
---|
/**
* Breakdown a valid selection criteria using the Shunting-yard algorithin
* @see http://en.wikipedia.org/wiki/Shunting-yard_algorithm
* @param string $criteria
* @return array $criteria in reverse polish notation {@link http://en.wikipedia.org/wiki/Reverse_Polish_notation}
* @since 1.1.0
*/
private static function breakdownSelectionCriteria($criteria) {
$rangeSeparator = "*\*";
$spaceSeperator = "++\++";
$_criteria = $criteria;
$stack = array();
$queue = array();
$criteria = str_replace("(", " ( ", $criteria);
$criteria = str_replace(")", " ) ", $criteria);
$criteria = preg_replace("/(range) ([0-9]+) ([0-9]+)/i",
"$1$2$rangeSeparator$3", $criteria);
$criteria = preg_replace("/'([a-zA-Z0-9]+)([\s])+([a-zA-Z0-9]+)'[\s]*/i",
"$1$spaceSeperator$3", $criteria);
foreach (self::$_SELECTION_CRITERIA_OPERATORS as $operator) {
$criteria = str_ireplace(" $operator", $operator, $criteria);
$criteria = str_ireplace("$operator ", $operator, $criteria);
}
$tokens = array_filter(explode(" ", $criteria));
foreach ($tokens as $token) {
$token = str_replace($rangeSeparator, " ", $token);
$token = str_replace($spaceSeperator, " ", $token);
if (strcasecmp($token, "and") == 0 || strcasecmp($token, "or") == 0) {
while (!empty($stack)) {
$topOfStack = $topOfStack = $stack[count($stack) - 1];
if (strcasecmp($token, "or") == 0 && (strcasecmp($topOfStack,
"or") == 0 || strcasecmp($topOfStack, "and")
== 0)) {
$queue[] = array_pop($stack);
}
else {
break;
}
}
array_push($stack, $token);
}
else if (strcasecmp($token, "(") == 0) {
array_push($stack, $token);
}
else if (strcasecmp($token, ")") == 0) {
$foundLeftParen = false;
while (!empty($stack)) {
$topOfStack = $stack[count($stack) - 1];
if (strcasecmp($topOfStack, "(") == 0) {
$foundLeftParen = true;
break;
}
else {
$queue[] = array_pop($stack);
}
}
if (!$foundLeftParen) {
throw new \Exception("Syntax error in criteria $_criteria. Mismatched parenthesis");
}
array_pop($stack);
}
else {
$queue[] = $token;
}
}
while (!empty($stack)) {
$topOfStack = $stack[count($stack) - 1];
if (strcasecmp($topOfStack, ")") == 0 || strcasecmp($topOfStack, "(")
== 0) {
throw new \Exception("Syntax error in criteria $_criteria. Mismatched parenthesis");
}
$queue[] = array_pop($stack);
}
return $queue;
}
|
Implementation Plan