HBase集群复制之验证

  1. prerequisite

Suppose 2 HBase pseudo distributed clusters have both started as folowing

relevant parameters in hbase-site.xml source destnation
hbase.zookeeper.quorum macos ubuntu
hbase.zookeeper.property.clientPort 2181 2181
zookeeper.znode.parent /hbase /hbase
  1. Create table for replication
  1. start hbase shell on source cluster and create a table
bash 复制代码
$ cd $HOME_HBASE
$ bin/hbase shell
> create 'peTable', {NAME => 'info0', VERSIONS => '1', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', MIN_VERSIONS => '0', REPLICATION_SCOPE => '0', BLOOMFILTER => 'ROW', IN_MEMORY => 'false', COMPRESSION => 'NONE', BLOCKCACHE => 'true', BLOCKSIZE => '65536', METADATA => {'IN_MEMORY_COMPACTION' => 'NONE'}}
  1. create excatly same table on destination cluster
  1. Add the destination cluster as a peer in source cluster hbase shell
bash 复制代码
> add_peer 'ubt_pe', CLUSTER_KEY => "ubuntu:2181:/hbase", TABLE_CFS => { "peTable" => []}
  1. Enable the table for replication in source cluster hbase shell
bash 复制代码
> enable_table_replication 'peTable'
  1. Put data by using HBase PerformanceEvaluation tool
bash 复制代码
$ cd $HOME_HBASE
$ bin/hbase pe --table=peTable --nomapred --valueSize=100 randomWrite 1
2023-09-08 19:57:55,256 INFO  [main] hbase.PerformanceEvaluation: RandomWriteTest test run options={"cmdName":"randomWrite","nomapred":true,"filterAll":false,"startRow":0,"size":0.0,"perClientRunRows":1048576,"numClientThreads":1,"totalRows":1048576,"measureAfter":0,"sampleRate":1.0,"traceRate":0.0,"tableName":"peTable","flushCommits":true,"writeToWAL":true,"autoFlush":false,"oneCon":false,"connCount":-1,"useTags":false,"noOfTags":1,"reportLatency":false,"multiGet":0,"multiPut":0,"randomSleep":0,"inMemoryCF":false,"presplitRegions":0,"replicas":1,"compression":"NONE","bloomType":"ROW","blockSize":65536,"blockEncoding":"NONE","valueRandom":false,"valueZipf":false,"valueSize":100,"period":104857,"cycles":1,"columns":1,"families":1,"caching":30,"latencyThreshold":0,"addColumns":true,"inMemoryCompaction":"NONE","asyncPrefetch":false,"cacheBlocks":true,"scanReadType":"DEFAULT","bufferSize":"2097152"}
...
2023-09-08 19:57:58,476 INFO  [TestClient-0] hbase.PerformanceEvaluation: row [start=0, current=104857, last=1048576], latency [mean=19.87, min=0.00, max=328487.00, stdDev=1355.87, 95th=1.00, 99th=8.00]
2023-09-08 19:57:59,679 INFO  [TestClient-0] hbase.PerformanceEvaluation: row [start=0, current=209714, last=1048576], latency [mean=15.34, min=0.00, max=328487.00, stdDev=1026.36, 95th=1.00, 99th=4.00]
...
2023-09-08 19:58:10,520 INFO  [TestClient-0] hbase.PerformanceEvaluation: row [start=0, current=1048570, last=1048576], latency [mean=13.17, min=0.00, max=328487.00, stdDev=780.16, 95th=0.00, 99th=1.00]
2023-09-08 19:58:10,569 INFO  [TestClient-0] hbase.PerformanceEvaluation: Test : RandomWriteTest, Thread : TestClient-0
2023-09-08 19:58:10,577 INFO  [TestClient-0] hbase.PerformanceEvaluation: Latency (us) : mean=13.17, min=0.00, max=328487.00, stdDev=780.16, 50th=0.00, 75th=0.00, 95th=0.00, 99th=1.00, 99.9th=19.00, 99.99th=28853.39, 99.999th=58579.15
2023-09-08 19:58:10,577 INFO  [TestClient-0] hbase.PerformanceEvaluation: Num measures (latency) : 1048575
2023-09-08 19:58:10,584 INFO  [TestClient-0] hbase.PerformanceEvaluation: Mean      = 13.17
Min       = 0.00
Max       = 328487.00
StdDev    = 780.16
50th      = 0.00
75th      = 0.00
95th      = 0.00
99th      = 1.00
99.9th    = 19.00
99.99th   = 28853.39
99.999th  = 58579.15
2023-09-08 19:58:10,584 INFO  [TestClient-0] hbase.PerformanceEvaluation: No valueSize statistics available
2023-09-08 19:58:10,586 INFO  [TestClient-0] hbase.PerformanceEvaluation: Finished class org.apache.hadoop.hbase.PerformanceEvaluation$RandomWriteTest in 14286ms at offset 0 for 1048576 rows (9.24 MB/s)
2023-09-08 19:58:10,586 INFO  [TestClient-0] hbase.PerformanceEvaluation: Finished TestClient-0 in 14286ms over 1048576 rows
2023-09-08 19:58:10,586 INFO  [main] hbase.PerformanceEvaluation: [RandomWriteTest] Summary of timings (ms): [14286]
2023-09-08 19:58:10,595 INFO  [main] hbase.PerformanceEvaluation: [RandomWriteTest duration ]	Min: 14286ms	Max: 14286ms	Avg: 14286ms
2023-09-08 19:58:10,595 INFO  [main] hbase.PerformanceEvaluation: [ Avg latency (us)]	13
2023-09-08 19:58:10,596 INFO  [main] hbase.PerformanceEvaluation: [ Avg TPS/QPS]	73399	 row per second
2023-09-08 19:58:10,596 INFO  [main] client.AsyncConnectionImpl: Connection has been closed by main.

Note, help of PerformanceEvaluation can be shown as:

bash 复制代码
$ bin/hbase pe
  1. Count rows on source and peer
  1. in source cluster hbase shell
bash 复制代码
> count 'peTable'
Current count: 1000, row: 00000000000000000000001563                                                
Current count: 2000, row: 00000000000000000000003160 
...
Current count: 663000, row: 00000000000000000001048457                                              
663073 row(s)
Took 12.9970 seconds
  1. in peer cluster hbase shell
bash 复制代码
> count 'peTable'
Current count: 1000, row: 00000000000000000000001563                                                
Current count: 2000, row: 00000000000000000000003160
...
Current count: 663000, row: 00000000000000000001048457                                              
663073 row(s)
Took 7.1883 seconds                                                                                 
  1. Verify replication by using VerifyReplication class from source cluster hbase shell
bash 复制代码
$ cd $HOME_HBASE
$ bin/hbase org.apache.hadoop.hbase.mapreduce.replication.VerifyReplication 'ubt_pe' 'peTable'
2023-09-08 20:14:37,199 INFO  [main] zookeeper.RecoverableZooKeeper: Process identifier=VerifyReplication connecting to ZooKeeper ensemble=localhost:2181
...
2023-09-08 20:14:44,393 INFO  [main] mapreduce.Job: The url to track the job: http://localhost:8088/proxy/application_1694172104063_0001/
2023-09-08 20:14:44,394 INFO  [main] mapreduce.Job: Running job: job_1694172104063_0001
2023-09-08 20:14:54,521 INFO  [main] mapreduce.Job: Job job_1694172104063_0001 running in uber mode : false
2023-09-08 20:14:54,524 INFO  [main] mapreduce.Job:  map 0% reduce 0%
2023-09-08 20:20:18,907 INFO  [main] mapreduce.Job:  map 100% reduce 0%
2023-09-08 20:20:19,924 INFO  [main] mapreduce.Job: Job job_1694172104063_0001 completed successfully
2023-09-08 20:20:20,040 INFO  [main] mapreduce.Job: uces in occupied slots (ms)=0
		Total time spent by all map tasks (ms)=321487
		Total vcore-milliseconds taken by all map tasks=321487
		Total megabyte-milliseconds taken by all map tasks=329202688
	Map-Reduce Framework
		Map input records=663073
		Map output records=0
		Input split bytes=105
		Spilled Records=0
		Failed Shuffles=0
		Merged Map outputs=0
		GC time elapsed (ms)=707
		CPU time spent (ms)=0
		Physical memory (bytes) snapshot=0
		Virtual memory (bytes) snapshot=0
		Total committed heap usage (bytes)=114819072
	HBaseCounters
		BYTES_IN_REMOTE_RESULTS=103439388
		BYTES_IN_RESULTS=103439388
		MILLIS_BETWEEN_NEXTS=313921
		NOT_SERVING_REGION_EXCEPTION=0
		REGIONS_SCANNED=1
		REMOTE_RPC_CALLS=60
		REMOTE_RPC_RETRIES=0
		ROWS_FILTERED=17
		ROWS_SCANNED=663073
		RPC_CALLS=60
		RPC_RETRIES=0
	org.apache.hadoop.hbase.mapreduce.replication.VerifyReplication$Verifier$Counters
		GOODROWS=663073
	File Input Format Counters 
		Bytes Read=0
	File Output Format Counters 
		Bytes Written=0

Note, help of VerifyReplication can be shown as:

bash 复制代码
$ bin/hbase org.apache.hadoop.hbase.mapreduce.replication.VerifyReplication --help
相关推荐
王小王-1234 小时前
基于 Hive 的网易云音乐数据分析及可视化系统
hive·hadoop·数据分析·音乐数据分析·网易云音乐分析·hive音乐分析·hadoop网易云
极光代码工作室5 小时前
基于数据仓库的电商数据分析平台
大数据·hadoop·python·spark·数据可视化
Chris _data10 小时前
WPF 学习第三天 — Modbus RTU 串口通信
hadoop·学习·wpf
知识分享小能手12 小时前
Hadoop学习教程,从入门到精通,Flume日志采集系统 — 完整知识点与案例代码(9)
hadoop·学习·flume
Francek Chen1 天前
【大数据处理与分析】MapReduce:06 MapReduce编程实践
大数据·hadoop·分布式·mapreduce
王小王-1231 天前
基于 Hadoop 的二手房数据分析与可视化平台项目展示
大数据·hadoop·数据分析·大数据房价分析·二手房价格预测·hive房价数据分析
知识分享小能手1 天前
Hadoop学习教程,从入门到精通, HBase 分布式数据库 — 完整知识点与案例代码(8)
数据库·hadoop·分布式
王小王-1232 天前
基于 Hadoop 的心脏病分析可视化与风险预测系统
大数据·hadoop·分布式·心脏病预测系统·疾病预测·冠心病风险预测
TPBoreas2 天前
springboot3.5比2.x做了哪儿些提升
数据仓库·hive·hadoop
Nefu_lyh3 天前
【Hive】七、Hive 函数:聚合 / 统计 / 分位数 / 集合 / 高级分组
数据仓库·hive·hadoop