Sunday, October 27, 2013

Column-oriented Database System, and query execution

Column database is evolving very fast these days, accompanied with the requirement in storing and querying "big data". I've made some notes here based on Daniel Abadi's doctor dissertation, which wins the Sigmod Jim Gray Doctoral Dissertation Award. It also contains materials from the VLDB 09' tutorial on columnar dbms.

In a column store db,  queries only read in relevant attributes, but tuple writes need multiple seeks. So generally it's good for read-intensive data warehouses, which is characterized by lots of column scans and batch writes.

Simplest store solution is to use tuple ID: each column is a separate table, with table header and tuple id for each value. Optimization is very expensive representing in this way. We can remove the tuple id and store data in its original order, and table header is also stored in separate column. For each query, we first join all accessed attributes into traditional row tuples, then apply row optimization.
 
Observing that I/O cost is the most expensive, the opportunity for column store is to utilize the high data value locality and apply compression. As an example below, for sorted data we can save it as (value, start idx, length)  tuples to avoid repetition.
We can apply encoding for values within a limited domain. In general, column data are from the same domain, thus contain significantly less entropy than row store, and gives a better compression efficiency: for row store, compression ratio is often 1:3, while for column store we can do 1:10. We can also use extra space to store multiple copies of data in different sort orders, to further speed up query execution. Sorted columns compress better and are faster for range queries. We can also store column data in a BTree, while compacting the values in the leaf nodes.
Columns store offers more opportunities in block and vectorized processing. Queries can operate on compressed data directly and avoid decompression, hereby we trade off I/O for CPU. Here's the example query on the above data:
A tuple can be materialized either at the beginning of of query plan. However, the most efficient way is to operate on the columns as long as possible. Basically we can
- first compute the predicates on the selected columns,
- then join the result bit array or bloom filter,
- then do projection on the selection attribute,
- finally do aggregation on the projection.

0 Comments:

Post a Comment

<< Home