Concourse Criteria Language (CCL)

Summary

Allow Concourse to parse structure statements written in natural language to Criteria objects. This is not designed to be an entire query language akin to SQL. Instead, we want to provide a limited language that allows users to express complex criteria similarly to the Criteria builder, but using natural language as the interface.

Motivation

The Criteria builder is good for allowing a user to specify a complex query that is handled entirely on the server, but it is not super usable. It works okay in an IDE with code completion, but its not intuitive to use in other environments like CaSH where code completion is not available. This feature should make it querying Concourse less intimidating and familiar for users with a SQL background.

Additionally, this feature is an integral part of the JSON Dumping project that will allow users to export certain records from the command line by specifying a Criteria using CCL. I couldn't imagine forcing users to use the Criteria builder for this purpose.

Use Cases

  • From CaSH, I want to quickly query Concourse using a complex criteria.
  • From the command line, I want to describe the records I want to export with a Criteria.
  • From the web (Conquest), I want to view records that match a Criteria.

Effects on Public API

  • We must add a case to the find(Object) method that checks to see if the object is a string, and if so, tries to parse it into a Criteria object.

Effects on Thrift API

None

Important Questions and Semantics

  • CCL is pronounced like the name Cecil (/ˈsiːsəl/)
  • Even though there are parser building tools like ANTLR, we are building our own because we don't want to deal with the overhead of an additional dependency at this time and we only want to do very simple parsing. Moreover, our parsing is merely designed to convert text to a Criteria object, which is what we use on the backend to handle queries. Its possible we may expand "language" support in Concourse in the future, at which point it might make sense to reconsider ANTLR or some other parsing tool.
  • The accept() method for the ValueState has a corner case where it must deal with the "at" keyword which signals that the next token is a timestamp, but doesn't actually transition to a new state
    • private boolean readyToAcceptTimestamp = false;
      public State accept(String input){
          if(readyToAcceptTimestamp){
              return at(input); //TODO need to convert the input string to a Timestamp object
          }
          else if(input.equalsIgnoreCase("at")){
              readyToAcceptTimeStamp = true;
              return this;
          }
          else{
              return value(Convert.stringToJava(input));
          }
      }
    • This approach also means that its possible for users to input a CCL statement with a hanging "at" that'll actually be parsed correctly (i.e. key = value at)

Parsing Algorithm

The parsing algorithm must take a string and convert it to a Criteria object.

public static Criteria parse(String input){
    Deque<State> stack = new ArrayDeque<State>();
    stack.push(Criteria.where());
    String[] tokens = Strings.splitByDelimeterAndRespectQuotes(input);
    for(String token : tokens){
        if(token == "("){
            stack.push(Criteria.where()))
        }
        else if(token == ")"){
            Criteria criteria = stack.pop().build();
            criteria.peek().accept(criteria); //the criteria now at the top of the stack should be in a BuildableState
        }
        else{
            stack.peek().accept(token);
        }
    }
    Criteria criteria = stack.pop();
    return criteria;
}

Implementation Plan

TaskNotes
Define Grammar
Add accept() method to each State 
Move splitStringByDelimiterAndRespectQuotes() method to concourse projectCreate a util class called "Strings" in org.cinchapi.concourse.util package. This method must respect single and double quotes equally. Right now the current implementation in the concourse-import project only respects double quotes.
Add parse method to Criteria classSee the above psuedocode. There are lots of error cases to handle (i.e. NullPointerException if the stack is empty prematurely, too many elements being on the stack meaning that there were parenthesis in the wrong place, exceptions thrown from states when they can't accept certain input, etc)
Unit testsUnit tests should focus on whether input was correctly parsed. In almost all cases, the Criteria that is parsed should have a toString output that matches the original input