Also contains leaf block index and Bloom chunk blocks. The section is named so because it contains all data blocks that need to be read when an HFile is scanned sequentially. The above format of blocks is used in the following HFile sections: Compressed data (or uncompressed data if the compression algorithm is NONE).File offset of the previous block of the same type (long)Ĭan be used for seeking to the previous data/index block.This is equal to the compressed size if the compression algorithm is NONE Uncompressed size of the block’s data, not including the header (int).Compressed size of the block’s data, not including the header (int).Ĭan be used for skipping the current data block when scanning HFile data.8 bytes: Block type, a sequence of bytes equivalent to version 1’s “magic records”.In the version 2 every block in the data section contains the following fields: HFile version 2 fixed this limitation, and was introduced in in HBase 0.92. One of the limitations of the block index in version 1 is that it does not provide the compressed size of a block, which turns out to be necessary for decompression. The number of entries in the block index is stored in the fixed file trailer, and has to be passed in to the method that reads the block index.
Public interface Region extends ConfigurationObserver On the contrary, regions can be merged if too many regions exists. You can also force split manually at will, but there are some rules of thumb needs to be respected. After that, regions split when they reach a configured threshold. Regions are distributed across the cluster, requests from client can be processed by RegionServer process independently.Ī HBase table can be pre-split into regions when creation. A region is only served by one single RegionServer, one RegionServer can hold many regions at the same time. A single row belongs to exactly one region at any time. Since rows are sorted lexicographically, it’s easy to deduce that all rows within the scope of the region’s start key and end key are stored in the same region. A region contains a continuous range of rows. For high performance and availability, Tables are split into “regions”. I drew a simplified version from region’s viewpoint. The MemStore holds in-memory modifications to the Store.Ī widespread graph of HBase architecture lists as following.A Store hosts a MemStore and 0 or more StoreFiles (HFiles).A Store corresponds to a column family for a table for a given region.Table will be split into Regions based on rows’ lexicographical order.An HBase Table consists of multiple rows.