HBase集群复制之验证

  1. prerequisite

Suppose 2 HBase pseudo distributed clusters have both started as folowing

relevant parameters in hbase-site.xml source destnation
hbase.zookeeper.quorum macos ubuntu
hbase.zookeeper.property.clientPort 2181 2181
zookeeper.znode.parent /hbase /hbase
  1. Create table for replication
  1. start hbase shell on source cluster and create a table
bash 复制代码
$ cd $HOME_HBASE
$ bin/hbase shell
> create 'peTable', {NAME => 'info0', VERSIONS => '1', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', MIN_VERSIONS => '0', REPLICATION_SCOPE => '0', BLOOMFILTER => 'ROW', IN_MEMORY => 'false', COMPRESSION => 'NONE', BLOCKCACHE => 'true', BLOCKSIZE => '65536', METADATA => {'IN_MEMORY_COMPACTION' => 'NONE'}}
  1. create excatly same table on destination cluster
  1. Add the destination cluster as a peer in source cluster hbase shell
bash 复制代码
> add_peer 'ubt_pe', CLUSTER_KEY => "ubuntu:2181:/hbase", TABLE_CFS => { "peTable" => []}
  1. Enable the table for replication in source cluster hbase shell
bash 复制代码
> enable_table_replication 'peTable'
  1. Put data by using HBase PerformanceEvaluation tool
bash 复制代码
$ cd $HOME_HBASE
$ bin/hbase pe --table=peTable --nomapred --valueSize=100 randomWrite 1
2023-09-08 19:57:55,256 INFO  [main] hbase.PerformanceEvaluation: RandomWriteTest test run options={"cmdName":"randomWrite","nomapred":true,"filterAll":false,"startRow":0,"size":0.0,"perClientRunRows":1048576,"numClientThreads":1,"totalRows":1048576,"measureAfter":0,"sampleRate":1.0,"traceRate":0.0,"tableName":"peTable","flushCommits":true,"writeToWAL":true,"autoFlush":false,"oneCon":false,"connCount":-1,"useTags":false,"noOfTags":1,"reportLatency":false,"multiGet":0,"multiPut":0,"randomSleep":0,"inMemoryCF":false,"presplitRegions":0,"replicas":1,"compression":"NONE","bloomType":"ROW","blockSize":65536,"blockEncoding":"NONE","valueRandom":false,"valueZipf":false,"valueSize":100,"period":104857,"cycles":1,"columns":1,"families":1,"caching":30,"latencyThreshold":0,"addColumns":true,"inMemoryCompaction":"NONE","asyncPrefetch":false,"cacheBlocks":true,"scanReadType":"DEFAULT","bufferSize":"2097152"}
...
2023-09-08 19:57:58,476 INFO  [TestClient-0] hbase.PerformanceEvaluation: row [start=0, current=104857, last=1048576], latency [mean=19.87, min=0.00, max=328487.00, stdDev=1355.87, 95th=1.00, 99th=8.00]
2023-09-08 19:57:59,679 INFO  [TestClient-0] hbase.PerformanceEvaluation: row [start=0, current=209714, last=1048576], latency [mean=15.34, min=0.00, max=328487.00, stdDev=1026.36, 95th=1.00, 99th=4.00]
...
2023-09-08 19:58:10,520 INFO  [TestClient-0] hbase.PerformanceEvaluation: row [start=0, current=1048570, last=1048576], latency [mean=13.17, min=0.00, max=328487.00, stdDev=780.16, 95th=0.00, 99th=1.00]
2023-09-08 19:58:10,569 INFO  [TestClient-0] hbase.PerformanceEvaluation: Test : RandomWriteTest, Thread : TestClient-0
2023-09-08 19:58:10,577 INFO  [TestClient-0] hbase.PerformanceEvaluation: Latency (us) : mean=13.17, min=0.00, max=328487.00, stdDev=780.16, 50th=0.00, 75th=0.00, 95th=0.00, 99th=1.00, 99.9th=19.00, 99.99th=28853.39, 99.999th=58579.15
2023-09-08 19:58:10,577 INFO  [TestClient-0] hbase.PerformanceEvaluation: Num measures (latency) : 1048575
2023-09-08 19:58:10,584 INFO  [TestClient-0] hbase.PerformanceEvaluation: Mean      = 13.17
Min       = 0.00
Max       = 328487.00
StdDev    = 780.16
50th      = 0.00
75th      = 0.00
95th      = 0.00
99th      = 1.00
99.9th    = 19.00
99.99th   = 28853.39
99.999th  = 58579.15
2023-09-08 19:58:10,584 INFO  [TestClient-0] hbase.PerformanceEvaluation: No valueSize statistics available
2023-09-08 19:58:10,586 INFO  [TestClient-0] hbase.PerformanceEvaluation: Finished class org.apache.hadoop.hbase.PerformanceEvaluation$RandomWriteTest in 14286ms at offset 0 for 1048576 rows (9.24 MB/s)
2023-09-08 19:58:10,586 INFO  [TestClient-0] hbase.PerformanceEvaluation: Finished TestClient-0 in 14286ms over 1048576 rows
2023-09-08 19:58:10,586 INFO  [main] hbase.PerformanceEvaluation: [RandomWriteTest] Summary of timings (ms): [14286]
2023-09-08 19:58:10,595 INFO  [main] hbase.PerformanceEvaluation: [RandomWriteTest duration ]	Min: 14286ms	Max: 14286ms	Avg: 14286ms
2023-09-08 19:58:10,595 INFO  [main] hbase.PerformanceEvaluation: [ Avg latency (us)]	13
2023-09-08 19:58:10,596 INFO  [main] hbase.PerformanceEvaluation: [ Avg TPS/QPS]	73399	 row per second
2023-09-08 19:58:10,596 INFO  [main] client.AsyncConnectionImpl: Connection has been closed by main.

Note, help of PerformanceEvaluation can be shown as:

bash 复制代码
$ bin/hbase pe
  1. Count rows on source and peer
  1. in source cluster hbase shell
bash 复制代码
> count 'peTable'
Current count: 1000, row: 00000000000000000000001563                                                
Current count: 2000, row: 00000000000000000000003160 
...
Current count: 663000, row: 00000000000000000001048457                                              
663073 row(s)
Took 12.9970 seconds
  1. in peer cluster hbase shell
bash 复制代码
> count 'peTable'
Current count: 1000, row: 00000000000000000000001563                                                
Current count: 2000, row: 00000000000000000000003160
...
Current count: 663000, row: 00000000000000000001048457                                              
663073 row(s)
Took 7.1883 seconds                                                                                 
  1. Verify replication by using VerifyReplication class from source cluster hbase shell
bash 复制代码
$ cd $HOME_HBASE
$ bin/hbase org.apache.hadoop.hbase.mapreduce.replication.VerifyReplication 'ubt_pe' 'peTable'
2023-09-08 20:14:37,199 INFO  [main] zookeeper.RecoverableZooKeeper: Process identifier=VerifyReplication connecting to ZooKeeper ensemble=localhost:2181
...
2023-09-08 20:14:44,393 INFO  [main] mapreduce.Job: The url to track the job: http://localhost:8088/proxy/application_1694172104063_0001/
2023-09-08 20:14:44,394 INFO  [main] mapreduce.Job: Running job: job_1694172104063_0001
2023-09-08 20:14:54,521 INFO  [main] mapreduce.Job: Job job_1694172104063_0001 running in uber mode : false
2023-09-08 20:14:54,524 INFO  [main] mapreduce.Job:  map 0% reduce 0%
2023-09-08 20:20:18,907 INFO  [main] mapreduce.Job:  map 100% reduce 0%
2023-09-08 20:20:19,924 INFO  [main] mapreduce.Job: Job job_1694172104063_0001 completed successfully
2023-09-08 20:20:20,040 INFO  [main] mapreduce.Job: uces in occupied slots (ms)=0
		Total time spent by all map tasks (ms)=321487
		Total vcore-milliseconds taken by all map tasks=321487
		Total megabyte-milliseconds taken by all map tasks=329202688
	Map-Reduce Framework
		Map input records=663073
		Map output records=0
		Input split bytes=105
		Spilled Records=0
		Failed Shuffles=0
		Merged Map outputs=0
		GC time elapsed (ms)=707
		CPU time spent (ms)=0
		Physical memory (bytes) snapshot=0
		Virtual memory (bytes) snapshot=0
		Total committed heap usage (bytes)=114819072
	HBaseCounters
		BYTES_IN_REMOTE_RESULTS=103439388
		BYTES_IN_RESULTS=103439388
		MILLIS_BETWEEN_NEXTS=313921
		NOT_SERVING_REGION_EXCEPTION=0
		REGIONS_SCANNED=1
		REMOTE_RPC_CALLS=60
		REMOTE_RPC_RETRIES=0
		ROWS_FILTERED=17
		ROWS_SCANNED=663073
		RPC_CALLS=60
		RPC_RETRIES=0
	org.apache.hadoop.hbase.mapreduce.replication.VerifyReplication$Verifier$Counters
		GOODROWS=663073
	File Input Format Counters 
		Bytes Read=0
	File Output Format Counters 
		Bytes Written=0

Note, help of VerifyReplication can be shown as:

bash 复制代码
$ bin/hbase org.apache.hadoop.hbase.mapreduce.replication.VerifyReplication --help
相关推荐
TTBIGDATA7 小时前
【ranger编译报错】cloudera-manager-api-swaggerjar7.0.3 not found
java·大数据·数据库·hadoop·oracle·ambari·cloudera
尘世壹俗人7 小时前
presto操作hive数据的时候如何覆盖写数据
数据仓库·hive·hadoop
码爸8 小时前
hbase merge工具
大数据·数据库·hbase
NiNg_1_23410 小时前
Linux中Hadoop常用命令
linux·运维·hadoop
readmancynn1 天前
Servlet
hive·hadoop·servlet
最强大神1 天前
2025年最新大数据毕业设计选题-Hadoop综合项目
大数据·hadoop·毕业设计·毕业设计选题·大数据毕业设计选题·大数据毕设·大数据毕设选题
学习3人组1 天前
CentOS安装Hadoop系列
linux·hadoop·centos
B站计算机毕业设计超人1 天前
计算机毕业设计hadoop+spark知网文献论文推荐系统 知识图谱 知网爬虫 知网数据分析 知网大数据 知网可视化 预测系统 大数据毕业设计 机器学习
大数据·hadoop·爬虫·机器学习·spark·知识图谱·推荐算法
Yz98761 天前
Hadoop里面MapReduce的序列化与Java序列化比较
java·大数据·jvm·hadoop·分布式·mapreduce·big data
Yz98761 天前
Hadoop-MapReduce的 原理 | 块和片 | Shuffle 过程 | Combiner
大数据·数据库·数据仓库·hadoop·mapreduce·big data