HBase集群复制之验证

prerequisite

Suppose 2 HBase pseudo distributed clusters have both started as folowing

relevant parameters in hbase-site.xml	source	destnation
hbase.zookeeper.quorum	macos	ubuntu
hbase.zookeeper.property.clientPort	2181	2181
zookeeper.znode.parent	/hbase	/hbase

Create table for replication

start hbase shell on source cluster and create a table

bash 复制代码

$ cd $HOME_HBASE
$ bin/hbase shell
> create 'peTable', {NAME => 'info0', VERSIONS => '1', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', MIN_VERSIONS => '0', REPLICATION_SCOPE => '0', BLOOMFILTER => 'ROW', IN_MEMORY => 'false', COMPRESSION => 'NONE', BLOCKCACHE => 'true', BLOCKSIZE => '65536', METADATA => {'IN_MEMORY_COMPACTION' => 'NONE'}}

create excatly same table on destination cluster

Add the destination cluster as a peer in source cluster hbase shell

bash 复制代码

> add_peer 'ubt_pe', CLUSTER_KEY => "ubuntu:2181:/hbase", TABLE_CFS => { "peTable" => []}

Enable the table for replication in source cluster hbase shell

bash 复制代码

> enable_table_replication 'peTable'

Put data by using HBase PerformanceEvaluation tool

bash 复制代码

$ cd $HOME_HBASE
$ bin/hbase pe --table=peTable --nomapred --valueSize=100 randomWrite 1
2023-09-08 19:57:55,256 INFO  [main] hbase.PerformanceEvaluation: RandomWriteTest test run options={"cmdName":"randomWrite","nomapred":true,"filterAll":false,"startRow":0,"size":0.0,"perClientRunRows":1048576,"numClientThreads":1,"totalRows":1048576,"measureAfter":0,"sampleRate":1.0,"traceRate":0.0,"tableName":"peTable","flushCommits":true,"writeToWAL":true,"autoFlush":false,"oneCon":false,"connCount":-1,"useTags":false,"noOfTags":1,"reportLatency":false,"multiGet":0,"multiPut":0,"randomSleep":0,"inMemoryCF":false,"presplitRegions":0,"replicas":1,"compression":"NONE","bloomType":"ROW","blockSize":65536,"blockEncoding":"NONE","valueRandom":false,"valueZipf":false,"valueSize":100,"period":104857,"cycles":1,"columns":1,"families":1,"caching":30,"latencyThreshold":0,"addColumns":true,"inMemoryCompaction":"NONE","asyncPrefetch":false,"cacheBlocks":true,"scanReadType":"DEFAULT","bufferSize":"2097152"}
...
2023-09-08 19:57:58,476 INFO  [TestClient-0] hbase.PerformanceEvaluation: row [start=0, current=104857, last=1048576], latency [mean=19.87, min=0.00, max=328487.00, stdDev=1355.87, 95th=1.00, 99th=8.00]
2023-09-08 19:57:59,679 INFO  [TestClient-0] hbase.PerformanceEvaluation: row [start=0, current=209714, last=1048576], latency [mean=15.34, min=0.00, max=328487.00, stdDev=1026.36, 95th=1.00, 99th=4.00]
...
2023-09-08 19:58:10,520 INFO  [TestClient-0] hbase.PerformanceEvaluation: row [start=0, current=1048570, last=1048576], latency [mean=13.17, min=0.00, max=328487.00, stdDev=780.16, 95th=0.00, 99th=1.00]
2023-09-08 19:58:10,569 INFO  [TestClient-0] hbase.PerformanceEvaluation: Test : RandomWriteTest, Thread : TestClient-0
2023-09-08 19:58:10,577 INFO  [TestClient-0] hbase.PerformanceEvaluation: Latency (us) : mean=13.17, min=0.00, max=328487.00, stdDev=780.16, 50th=0.00, 75th=0.00, 95th=0.00, 99th=1.00, 99.9th=19.00, 99.99th=28853.39, 99.999th=58579.15
2023-09-08 19:58:10,577 INFO  [TestClient-0] hbase.PerformanceEvaluation: Num measures (latency) : 1048575
2023-09-08 19:58:10,584 INFO  [TestClient-0] hbase.PerformanceEvaluation: Mean      = 13.17
Min       = 0.00
Max       = 328487.00
StdDev    = 780.16
50th      = 0.00
75th      = 0.00
95th      = 0.00
99th      = 1.00
99.9th    = 19.00
99.99th   = 28853.39
99.999th  = 58579.15
2023-09-08 19:58:10,584 INFO  [TestClient-0] hbase.PerformanceEvaluation: No valueSize statistics available
2023-09-08 19:58:10,586 INFO  [TestClient-0] hbase.PerformanceEvaluation: Finished class org.apache.hadoop.hbase.PerformanceEvaluation$RandomWriteTest in 14286ms at offset 0 for 1048576 rows (9.24 MB/s)
2023-09-08 19:58:10,586 INFO  [TestClient-0] hbase.PerformanceEvaluation: Finished TestClient-0 in 14286ms over 1048576 rows
2023-09-08 19:58:10,586 INFO  [main] hbase.PerformanceEvaluation: [RandomWriteTest] Summary of timings (ms): [14286]
2023-09-08 19:58:10,595 INFO  [main] hbase.PerformanceEvaluation: [RandomWriteTest duration ]	Min: 14286ms	Max: 14286ms	Avg: 14286ms
2023-09-08 19:58:10,595 INFO  [main] hbase.PerformanceEvaluation: [ Avg latency (us)]	13
2023-09-08 19:58:10,596 INFO  [main] hbase.PerformanceEvaluation: [ Avg TPS/QPS]	73399	 row per second
2023-09-08 19:58:10,596 INFO  [main] client.AsyncConnectionImpl: Connection has been closed by main.

Note, help of PerformanceEvaluation can be shown as:

bash 复制代码

$ bin/hbase pe

Count rows on source and peer

in source cluster hbase shell

bash 复制代码

> count 'peTable'
Current count: 1000, row: 00000000000000000000001563                                                
Current count: 2000, row: 00000000000000000000003160 
...
Current count: 663000, row: 00000000000000000001048457                                              
663073 row(s)
Took 12.9970 seconds

in peer cluster hbase shell

bash 复制代码

> count 'peTable'
Current count: 1000, row: 00000000000000000000001563                                                
Current count: 2000, row: 00000000000000000000003160
...
Current count: 663000, row: 00000000000000000001048457                                              
663073 row(s)
Took 7.1883 seconds

Verify replication by using VerifyReplication class from source cluster hbase shell

bash 复制代码

$ cd $HOME_HBASE
$ bin/hbase org.apache.hadoop.hbase.mapreduce.replication.VerifyReplication 'ubt_pe' 'peTable'
2023-09-08 20:14:37,199 INFO  [main] zookeeper.RecoverableZooKeeper: Process identifier=VerifyReplication connecting to ZooKeeper ensemble=localhost:2181
...
2023-09-08 20:14:44,393 INFO  [main] mapreduce.Job: The url to track the job: http://localhost:8088/proxy/application_1694172104063_0001/
2023-09-08 20:14:44,394 INFO  [main] mapreduce.Job: Running job: job_1694172104063_0001
2023-09-08 20:14:54,521 INFO  [main] mapreduce.Job: Job job_1694172104063_0001 running in uber mode : false
2023-09-08 20:14:54,524 INFO  [main] mapreduce.Job:  map 0% reduce 0%
2023-09-08 20:20:18,907 INFO  [main] mapreduce.Job:  map 100% reduce 0%
2023-09-08 20:20:19,924 INFO  [main] mapreduce.Job: Job job_1694172104063_0001 completed successfully
2023-09-08 20:20:20,040 INFO  [main] mapreduce.Job: uces in occupied slots (ms)=0
		Total time spent by all map tasks (ms)=321487
		Total vcore-milliseconds taken by all map tasks=321487
		Total megabyte-milliseconds taken by all map tasks=329202688
	Map-Reduce Framework
		Map input records=663073
		Map output records=0
		Input split bytes=105
		Spilled Records=0
		Failed Shuffles=0
		Merged Map outputs=0
		GC time elapsed (ms)=707
		CPU time spent (ms)=0
		Physical memory (bytes) snapshot=0
		Virtual memory (bytes) snapshot=0
		Total committed heap usage (bytes)=114819072
	HBaseCounters
		BYTES_IN_REMOTE_RESULTS=103439388
		BYTES_IN_RESULTS=103439388
		MILLIS_BETWEEN_NEXTS=313921
		NOT_SERVING_REGION_EXCEPTION=0
		REGIONS_SCANNED=1
		REMOTE_RPC_CALLS=60
		REMOTE_RPC_RETRIES=0
		ROWS_FILTERED=17
		ROWS_SCANNED=663073
		RPC_CALLS=60
		RPC_RETRIES=0
	org.apache.hadoop.hbase.mapreduce.replication.VerifyReplication$Verifier$Counters
		GOODROWS=663073
	File Input Format Counters 
		Bytes Read=0
	File Output Format Counters 
		Bytes Written=0

Note, help of VerifyReplication can be shown as:

bash 复制代码

$ bin/hbase org.apache.hadoop.hbase.mapreduce.replication.VerifyReplication --help