Apache HBase is an open source, scalable, consistent, low latency, random access data store
Source from Infinite Skills
Features
Horizontally Scalable
Linear increase in servers results in linear increases in storage capacity and I/O operations
CAP Trade off
In CAP theory, Hbase is more likely a CP type of system
- Consistency: ACID(atomicity, consistency, isolation, durability) garantees on rows
- Availability: Response time 2-3ms from cache, 10-20ms from disk
- Partition Tolerance: Failures don’t block system. It might take longer to response to maintain consistency
Dependencies
Apache ZooKeeper
- Use for distributed coordination of leaders for high availability
- Optimized to be highly avaiable for reads
- Not designed to scale for high write throughput
Apache Hadoop HDFS
- Provide data durability and reliability
- Optimized for sequential reads and writes of large files
- Does not provide random updates, only simple API for rando reads
- Cannot scale tens of billions of small entities (less then a few hundred MB)
Both system have their strengths but do not individually provide the same properties as HBase
Random Access
Optimized for small random reads
- Entities indexed for efficient random reads
Optimized for high throughput random writes
- Updates without requiring read
- Random writes via Log Structured Merge (LSM)
Short History
Inspired from Google’s Bigtable
Bigtable: A Distributed Storage System for Structured Data(2006)
BigTable
Datastore for Google’s Web Crawl Table
- Store web page content
- Web URL as key
- Use MapReduce to find links and generate backlinks
- Calculate page rank to build the Google index
Later, it also used as backend for Gmail, GA, Google Earth etc.
Hadoop HDFS
Inspired by Google distributed file system GFS
Timeline
Since 2009, many compaies (Yahoo, Facebook, eBay etc.) chose to use HBase for large scale production use case
In 2015, Google announced BigTable with HBase 1.0 compatible API support for its compute engine users
2017, HBase 2.0.0
2020, HBase 3.0.0
Despite being bucketed into NoSQL category of data storage, some of intresting are moving NoSQL back to SQL, by using HBase as a storage engine for SQL compliant OLTP database system.
Use case
HBase’s strengths are its ability to scale and sustain high write throughputs
Many HBase apps are:
- Ports from RDBMS to HBase
- New low-latency big data apps
How to Porting RDBMS to HBase?
- Many RDBMS are painful to scale
- Scale up is no longer pratical for massive data
- Data inconsistency was not acceptable when scaling reads
- Operationally gets more complicated as the number of replicas increases
- Operational techniques not sufficient when scaling writes
To make it easier to scale, we need to discard the fundamental features that RDBMS provides, such as:
- text search (LIKE)
- joins
- foreign keys and avoid constraint checks
Changing the schema, make it only contains denormalized tables, we won’t incur replication IO when sharding the RDBMS
Now you’re relatively straightforward porting RDBMS to HBase
Why choosing HBase instead?
- When your apps need high wirte and read throughput
- When you tired of RDMS’s fragile scaling operations
Data Volumes
- Entity data: information about the current state of a particular persion or thing
- Event data(or time series data): Records events that are generally spaced over many time intervals
Data volume explods when we need both of them
HBase or Not
Q: Does your app expect new data to be vailable immediately after an update?
- Yes: Use HBase
- When data queried, must reflect the most recent values
- Expect query responses in milliseconds
- No: No need for HBase
Q: Whether your app analytical or operational?
- Analytical: Not optimal for HBase
- Look for large set of data
- Often filter for particular time range
- Better choose Hadoop
- Operational: Use HBase
- Look for single or small set of entities
Q: Does your app expect updates to be available immediately after an update?
- Yes: Use HBase
- Frequently modified
- Pinpoint deletes
- Updates must be reflected within milliseconds
- No: No need for HBase
- Data is append-only
- Deletes in bulk or never
- Updates can be ignored until the next report is run
comparison
Workload | HBase | Hadoop |
---|---|---|
Low Latency | 1ms from cache 10ms from disk | 1min vis MR/Spark 1s via Impala |
Random Read | Rowkey is primary index | The small file problem |
Short Scan | Sorted and efficient | Bespoke partitioning can help |
Full Scan | Possible but non-optimal Improved pref w/MR on snapshots | Optimized with MR, Hive, Impala |
Updates | Optimized | Not supported |
Logical Data Model
- Data is stored as Bigtable
- Tables consist of rows
- Each row has a primary row key
- Rows are sorted by row key
- Each row has a set of columns
- Each row my have same columns
- Each row’s column is a cell
- Cells may contain a byte[] value
- Specify none or the lack of a property is HBase’s sparse row property
Example model: HBase commiters
row key | d:hair | d:beard |
---|---|---|
apurtell | brown | grey |
busbey | brown | scruffy |
enis | black | black |
lhofhansl | silver | |
jmhsieh | black | |
stack |
Operation Behavior
ACID reads and writes per row
ACID: Automicity, Consistency, Isolation, Durable
Traditional databases offer absolute guarantees for all r/w even if they touch multiple rows or tables.
One design decision made in HBase is the limit and guarantees to a per row granularity.
For automicity:
- Reads will only see complete writes and no intermediate state
For consistency, isolation:
- Subsequent reads will only read up-to-date writes
- If there are two concurrent writes, one will win
Multiversion concurrency control (MCC/MVCC) per row
- Does not require reads in order to write data. HBase doesn’t check for pre-existsing values
- Every cell write gets assigned a timestamp. Use timestamp mechanism to track and determine which value get read.
- A cell deletion also use a delete marker with timestamp to mask out the previous value
row key | d:hair | d:beard |
---|---|---|
apurtell | brown@1 | grey@1 |
busbey | brown@1 | trim@2 scruffy@1 |
enis | black@1 | black@1 |
lhofhansl | silver@1 | |
jmhsieh | delete@3 black@1 | |
stack |
Physical Data Model
Optimizing HBase apps require understanding some iternals.
Understanding the data layout and how I/O is done is crucial for read performance.
- Optimize your schema for your quires
- Predict latency of random read operations
- Minimize I/O necessary for random access reads
Logical
row key | d:hair | d:beard |
---|---|---|
busbey | brown@1 | trim@2 scruffy@1 |
Phycial
row key | column key | timestamp | cell value |
---|---|---|---|
busbey | d:hair | 1 | brown |
busbey | d:beard | 1 | scruffy |
busbey | d:beard | 2 | trim |
Hbase data is laid out on disk as an indexed, sorted, list of cells. Crucial for optimizing read performance.
- Indexing is base on cell coordinates(row key, col, ts)
- Single seek to read specific cells
- Single seed to start a scan
- Sorting allows for efficient scanning of a single row’s data or for efficient scans of serveral related rows of data
Column Families (CF)
A CF is a set of physically related columns that can help read performance by giving developers who understancd their workloads the ability to control physical layouts.
Column Family provides facilities to group the columns that have similar read access patterns, so those patterns can be optimized.
Let’s say we add some new columns to our table schema, to records the committer’s code contributions.
row key | d:hair | d:beard | j:150715 | j:150716 |
---|---|---|---|---|
apurtell | brown | grey | HBASE-8642 | HBASE-14048 |
busbey | brown | trim | HBASE-14027 | |
enis | black | black | ||
lhofhansl | silver | |||
jmhsieh | black | |||
stack | HBASE-14102 |
If we want to do the analysis of code contributions, without CF, the app would have to iter over each commiter with 6 I/O operation and have to read then skip the irrelevant cells.
To add a CF, prefixing data with the CF name(keep it short) and separating from the qualifier with a colon. By doing this, we essentially broken one table up into two parallel tables under the cover.
Logical
row key | d:hair | d:beard | j:150715 | j:150716 |
---|---|---|---|---|
apurtell | brown | grey | HBASE-8642 | HBASE-14048 |
Phycial (CF/d)
row key | column key | timestamp | cell value |
---|---|---|---|
apurtell | d:hair | 1 | brown |
apurtell | d:beard | 1 | grey |
Phycial (CF/j)
row key | column key | timestamp | cell value |
---|---|---|---|
apurtell | j:150715 | 1 | HBASE-8642 |
apurtell | j:150716 | 1 | HBASE-14048 |
However, if we want to read a single row, we now need to search multiple column family locations and pay for the I/O to do this.
Conclusion
CF is an advanced feature to control data layout
- Design decision that need to be made very early
- Has trade offs – more is usually not better
Run HBase with Docker
Download Docker Desktop for Mac or Windows. Docker Compose will be automatically installed. On Linux, make sure you have the latest version of Compose.
The project we use here is big-data-europe/docker-hbase.
Running standalone HBase, then visit the Web UI at http://localhost:16010
The Dockerfile install HBase on jdk environment using tarballs:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 |
FROM openjdk:8 MAINTAINER Ivan Ermilov <ivan.s.ermilov@gmail.com> RUN apt-get update && DEBIAN_FRONTEND=noninteractive apt-get install -y --no-install-recommends net-tools curl netcat ENV HBASE_VERSION 1.2.6 ENV HBASE_URL http://www.apache.org/dist/hbase/$HBASE_VERSION/hbase-$HBASE_VERSION-bin.tar.gz RUN set -x \ && curl -fSL "$HBASE_URL" -o /tmp/hbase.tar.gz \ && curl -fSL "$HBASE_URL.asc" -o /tmp/hbase.tar.gz.asc \ && tar -xvf /tmp/hbase.tar.gz -C /opt/ \ && rm /tmp/hbase.tar.gz* RUN ln -s /opt/hbase-$HBASE_VERSION/conf /etc/hbase RUN mkdir /opt/hbase-$HBASE_VERSION/logs RUN mkdir /hadoop-data ENV HBASE_PREFIX=/opt/hbase-$HBASE_VERSION ENV HBASE_CONF_DIR=/etc/hbase ENV USER=root ENV PATH $HBASE_PREFIX/bin/:$PATH ADD entrypoint.sh /entrypoint.sh RUN chmod a+x /entrypoint.sh ENTRYPOINT ["/entrypoint.sh"] |
HBase Shell
- An interactive scriptable interface
- A JRuby Read-Eval-Print Loop(REPL)
- Scriptable access to all HBase client APIs
- Useful for Data Definition Language(DDL) commands
- Useful for tuning, operations and debugging
- Should not be primary mechanism to access data
Let’s allocate a HBase shell pseudo-TTY:
1 2 3 4 5 6 7 |
$ docker exec -it hbase hbase shell HBase Shell; enter 'help<RETURN>' for list of supported commands. Type "exit<RETURN>" to leave the HBase Shell Version 1.2.6, rUnknown, Mon May 29 02:25:32 CDT 2017 hbase(main):001:0> |
Create a table with one column family
1 2 3 4 5 6 7 8 9 10 11 12 13 |
hbase(main):005:0> create 'person', 'd' 0 row(s) in 1.2410 seconds => Hbase::Table - person hbase(main):007:0> describe 'person' Table person is ENABLED person COLUMN FAMILIES DESCRIPTION {NAME => 'd', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FAL SE', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', COMPRESSION => 'NONE', MIN_VERSIONS => '0', BLO CKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'} 1 row(s) in 0.0890 seconds |
Put and retrieve row data
1 2 3 4 5 6 7 8 |
hbase(main):009:0> put 'person', 'jmhsieh', 'd:hair', 'black' 0 row(s) in 0.0810 seconds hbase(main):010:0> get 'person', 'jmhsieh' COLUMN CELL d:hair timestamp=1595400870510, value=black 1 row(s) in 0.0450 seconds |
Put more data then scan the table
1 2 3 4 5 6 7 8 |
hbase(main):016:0> scan 'person' ROW COLUMN+CELL apurtell column=d:beard, timestamp=1595401071313, value=grey busbey column=d:beard, timestamp=1595401058797, value=scruffy enis column=d:hair, timestamp=1595401027066, value=brown jmhsieh column=d:hair, timestamp=1595400870510, value=black stack column=d:hair, timestamp=1595401038437, value=none |
Delete cell
1 2 |
hbase(main):016:0> delete 'person', 'stack', 'd:hair' |
Delete row
1 2 |
hbase(main):016:0> delete 'person', 'stack' |
Disable then drop the table
1 2 3 |
hbase(main):016:0> disable 'person' hbase(main):016:0> drop 'person' |
Run a script with --noninteractive
parameter
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 |
$ docker exec -it hbase bash root@606dc89bb131:/# cat >> person.rb if !list.include?("person") create 'person', 'd' end put 'person', 'jmhsieh', 'd:hair', 'black' put 'person', 'enis', 'd:hair', 'brown' put 'person', 'stack', 'd:hair', 'none' put 'person', 'busbey', 'd:beard', 'scruffy' put 'person', 'apurtell', 'd:beard', 'grey' exit root@606dc89bb131:/# hbase shell -n person.rb 1 row(s) in 0.2130 seconds 0 row(s) in 1.3140 seconds 0 row(s) in 0.1560 seconds 0 row(s) in 0.0150 seconds 0 row(s) in 0.0060 seconds 0 row(s) in 0.0130 seconds 0 row(s) in 0.0050 seconds |
Region Server
Distributed mode
- Need to spread r/w across the cluster
- Load balancing is critical
Scale Out
HBase tables are partitioned in to many regions.
Hbase serves regions with Region servers, a region is assigned to exactly one region server.
Regions can be assigned to any region server, when a region goes down, its region will automatically assign to another region server.
The region assignment is fast that usually completes in under a second.
Data Life Cycle
Write Path
HBase write path is designed to handle a high throughput stream of edits while maintaining durability, consistency and scalability.
Durability
Guarantees that data does not get lost or corrupted after client done a sucessfully write.
Since writing data only in memory can result in data loss, HBase also stores data to a Write Ahead Log(WAL/H log) which store data in disk.
Writes must make it both to the WAL and memory before acknowledge to the client.
Consistency
Guarantees when the client sucessfully write the data, read of that data will get that version or a newer version
HBase guarantees strong consistency on data in a row
WAL recovery guarantees the latest data is returned and order of writes if presverved. HBase server will block the request until the recovery is completed.
Scalability
Bottlenecks on a regrion server are: Memory, Disk, Network and CPU
Usually, Memory capacity is orders of magnitude smaller than disk capacity which will run out first.
Writes to HBase accumulate in the memstore, which will eventually use all the available RAM, we need a mechanism to periodically get data out of RAM.
The operation accomplishes this is flush. When memstore full and not accept new write, the flush operation takes all the data in memstore and flushes to disk as a read optimized file format called a HFile.
HFiles are memstore recovery checkpoints, allowing removal of old entries.
The flush operation frees the memory, the new write will then be able to work again.
Conclution
- Every write to HBase gets written to disk
- HBase write edits to the WAL
- HBase writes batches of edits to the HFiles with flush operation: 1. caps the memory for new writes 2. checkpoint data for faster recovery
Read Path
- Reads get data from memstore and HFiles
- Reading from the HFiles is a costly I/O operation
Block Cache
HBase add an in-memory block cache to reduce costly I/O operations by taking advantage of spatial and temporal locality
Compaction
As more HFiles are flushed, more I/O operations HBase needed to find data.
HBase will periodically run a compaction to reoptimize and consolidate HFiles.
After compaction, HBase only hit one disk operation for each read.
IO Path
Region Split
If all regions are the same size, load balacing is easier.
HBase limit the region size, and automatically performs the region split to avoid skew caused by hot regions or large regions.
Optimizing Tradeoff
Flush, compaction and split operations rewrite data previously written in the WAL or from previous compactions, these extra writes, a.k.a write amplification, consume extra I/O budget.
The operations take time to complete, may cause blocking I/O.