Why Apache Hadoop and HBase?
For realtime Data?
to analyze it obviously, but why?
Before HBase FB's system .
LAMP.
+ memcached for cache.
+ Hadoop for data analysis.
Powerful stuff but problem existing stack.
not inherently distributed.
table size limits.
inflexible schema.
Hadoop is scalable but,
MR is slow and difficult.
Does not support randam write.
poor support for randum read.
Specialized solution, facebook took.
High-throughput persistent key value ->Tokyocabinet
Large scale DWH -> Hive
Photo Store -> Haystack
Custum C++ servers for lots of other stuff.
# Aboves are enough scalable, but this system not fit for every service in FB.
What do we need in a data store?
Requirement for Facebook messages.
massive datasets, with large subsets of cold data.
Elasticity and high availability is important.
need many servers.
strong consistency within a datacenter.
fault isolation.
Some non-requirement.
NW partition within a single datacenter. -> NW doubled.
Active-active serving from multiple datacenters.
HBase satisfied our requirement.
in early 2010, engineers at FB compared DBs,
Apache Cassandra, Apace HBase, Sharded MySQL.
Compared performance, scalability, and features.
HBase gave excellent write performance good reads.
HBase already included many nice-to-have features.
Introduction to HBase.
What is HBase?
distributed.
designed for clusters of commodity nodes.
Column oriented.
map data structures not spreadsheets.
Multi-diminutional and sorted.
fully sorted on three dimensions.
tables sorted.
Highly available.
tolerant of node failure.
High performance.
log structured for high write throughput.
Originally part of Hadoop.
HBase adds random read/write access to HDFS.
Required some Hadoop changes for FB usages.
File appends ( using so called sync command, other user can see the data ).
HA Namenode.
Read optimization.
Plus Zookeeper!
Quick overview of HBase # nice image here.
on HDFS.
Zookeeper.
HBase Key Properties.
High write throughput.
Horizontal scalability.
Automatic failover.
Regions sharded dynamically.
HBase uses HDFS and Zookeeper (FB already uses it).
High Write Thoughput.
Commit log.
Memstore ( in memory ).
insert new piece of data.
pended to committed log -> sequential write to HDFS.
next into in-memory metastore.
if it grow up bigger, finally it write to disk.
horizontal scalability.
Region 0 - infinity.
Shard.
if you add new node, HBase.
Automatic FO.
HBase automatically shard data.
Application of HBase at Facebook.
some specific application at facebook.
Regions can be sharded dynamically.
HBase uses HDFS.
we get the benefits of HDFS as a storage system for free.
fault tolerance.
scalability.
checksums fix corruptions.
MR.
HDFS heavily tested and used at PB scale at FB.
3,000 nodes , 40PB.
Use case1 Titan.
Facebook message.
The new FB messages.
several communication chanel to one.
Largest engineering effort in the history of FB.
15 engineers over more than a year.
incorporates over 20 infrastructure technologies.
Hadoop HBase Haystack Zookeeper.
Aproduct at massive scale on day one.
hundred of million.
300TB.
Messaging Challenges.
High write throughput.
Every message, instant message, SMS, and email.
search indexes for all of the above.
Denormalized schema -> if 10 mail exist, 10 mail copied.
Massive clusters.
so much data and usage require a large server footprint.
do not want outages.
Typical cell layout.
Multiple cells for messaging.
20 server/rack , 5 or more racks per cluster.
Controllers (master/zookeeper) spread across racks ### image.
Use case2 Puma.
before Puma, traditional ETL with Hadoop.
Web tier, many 1000s of nodes.
-
- > scribe ( collect weblog ).
HDFS.
MySQL --> Webtier can select.
15mins-24hours.
Puma, realtime ETL system.
Webtier.
HDFS ( uses HDFS append feature ).
Puma (using PTail collect weblog).
HBase --> Thrift --> Webtier.
10-30 sec.
Raltime Data Pipeline.
Utilize existing log aggregation pipleline (Scribe-HDFS).
Extend low-latency capability of HDFS (Sync+Ptail).
High thoughput writes(HBase).
Support for Realtime Aggregation.
Utilize HBase atomic increnets to maintain rollups.
complex HBase schemas for unique user.
Puma as REaltime MR.
Map phase with PTail.
divide the input log stream into N shard.
First version only supported random bucketing.
Now supports applicaition level bucketing.
Reduce Phase with HBase.
Every row column in HBase is an output key.
Puma for Facebook Insights.
realtime URL/Domain insight.
Domain owners can see deep analytics for their site.
Detail demographic breakdown.
Top URLs calculated per-domain and globally.
Massive Throughput.
Billions of URLs.
1million counters increments within 1 sec.
- > Click count analytics,,,,
Use Case 3 ODS.
Facebook Incremental.
Operational Data Store.
System metrics (CPU, Memory, IO, NW).
application metrics (Web, DB, Caches.
FB metrics (Usage, Revenue).
easily graph this data over time.
Difficult to scale with MySQL.
Millions of unique time-series with billions of points.
irregular data growth patterns.
MySQL cant automatically shard orz....
- > Only MySQL DBA they become.
Future of FB HBase.
User and Graph Data in HBase.
Looking at HBase to augment MySQL
only single row ACID from MySQL is used.
DBs are always fronted by an in-memory cache.
Hbase is great at storing dictionaries and lists.
DB tier size determined by IOPS.
Cost saving -> FB succeeded to scale up MySQL, but it cost too much.
Low IOPs translate to lower cost.
Larger tables on denser, cheaper, commodity.
MySQL, 300GB, augmented flush memory.