Monday, November 30, 2015

Apache Kudu, a mimic of SAP Hana?

Cloudera released Apache Kudu this late Sept. The technical paper is published here.
The primary strength of Kudu is the fast data access rates for both sequential scan and random access. This is similar to what SAP Hana has claimed. By reading the details of the paper above, we can see the implementation details are also similar: It keeps the in-memory version of the table/tablets RowSets, and modification delta stores. Maintenance is done on the flushing and compaction of the deltas and rowsets. The market will become interesting now ... I guess Kudu's focus and application scenarios are on the NoSQL world, working together with the Hadoop/Spark stack, while Hana is still a RDBMS solution.

Friday, November 13, 2015

Taobao's distributed file system

Taobao, which is China’s biggest online shopping website similar to Amazon/EBay, just broke the sales record worth $14.3bn in a single day on Nov. 11. It has it’s file system called TFS(Taobao File System), open sourced at http://code.taobao.org/p/tfs/src/

From the description on this Chinese website, I can see it's classic HDFS file system, integrated with Zookeeper's leader selection algorithm for the data nodes, and Hive's myssql metadata store. It also says it's going to integrate with Erasure Code, which I believe is the version just open sourced by Cloudera this September. These Chinese companies are really good at integrating the latest open source technologies, and put it to scale!