Storage Model

Since Concourse is both schemaless and optimized for ad-hoc queries, it automatically indexes everything it stores. So in order to maintain high write throughout, Concourse uses a diff based buffered storage system where each Write is immediately added to a durable, append-only Buffer and eventually transported to the Database in the background for rich indexing and permanent storage. During each read, both the Buffer and Database are consulted, so data is consistently available immediately after being written.

The Buffer

The Buffer is divided into multiple pages and dynamically adjusts its capacity as necessary. Each Write in the Buffer is described by a key, value, and record. Since the Buffer is append-only, a Write that removes data is simulated by appending a new Write with the same components as the data being removed. During each read, the Buffer uses bloom filters to test whether the requested data exists on any page and only does a linear scan if necessary.

The Database

Once a Buffer page fills, its Writes are transported to the Database, which indexes and stores data in various Blocks that are optimized for retrieving, querying or searching. Each Block accumulates and sorts data in memory before syncing to disk, at which point it becomes immutable and the corresponding Buffer page is deleted. Blocks are kept on disk, but each one is associated with an in-memory bloom filter and index that are used to test whether requested data is contained and, if so, to find its exact location on disk so that each read only maps relevant information into memory for fast processing. The database also caches recently read data for further performance boosts.