canal部署

定义

  1. canal组件是一个基于mysql数据库增量日志解析,提供增量数据订阅和消费,支持将增量数据投递到下游消费者(kafka,rocketmq等)或者存储(elasticearch,hbase等)
  2. canal感知到mysql数据变动,然后解析变动数据,将变动数据发送到mq或者同步到其他数据库,等待进一步业务逻辑处理

canal的工作原理

  1. mysql master将数据变更写入到二进制日志binary long,简称binlog
  2. mysql slave将master的bin log拷贝到它的中继日志relay log
  3. mysql slave重放relay log操作,将变更数据同步到最新

mysql binlog日志

介绍

  1. mysql的binlog可以说mysql的最重要的日志,它记录了所有的DDL(表的创建)和DML(数据发生变化)语句,以事件形式记录
  2. mysql默认情况下是不开启binlog,因为记录binlog日志需要消耗时间,官方给出的数据是有1%的性能损耗
    1. 一般来说,在下面两场场景下会开启binlog日志
      1. mysql主从集群部署时,需要将在master端开启binlog,方便将数据同步到slaves中
      2. 数据恢复了,通过使用mysql binlog工具来使数据恢复

binlog的分类

分类 介绍 优点 缺点
STATEMENT 语句级别,记录每一次执行写操作的语句相当于row模式节省了空间,但是可能产生的数据不一致,有疑执行时间不同,导致产生的数据不同 节省空间 可能造成数据不一致
ROW 行级,记录每次操作后每行记录的变化,假如一个update的sql执行结果是1万行statement只存一条,如果是row的话会把这1万行的结果存着 数据绝对一致性,因为不管sql是什么,引用了什么函数,他只记录执行后的效果 占用较大空间
MIXED 是对STATEMENT的升级,如当函数中包含UUID()时,包含AUTO_INCREMENT字段的表被更新时,执行INSERT DELAYED语句时,用UDF时,会按照ROW的方式进行处理 节省空间,同时兼顾了一定的一致性 还有些极个别情况依旧会造成不一致,另外STATEMENT和mixed对于需要对binlog的监控的情况都不方便

canal工作原理

  1. canal将自己伪装为mysql slave,向mysql master发送bump协议
  2. mysql master 收到dump请求,开始推送binary log给slave(canal)
  3. canal接收并解析binlog日志,得到变更的数据,执行后续逻辑

canal应用场景

数据同步

  1. canal可以帮助用户进行多种数据同步操作,如实时同步mysql数据到elasticsearch,redis等数据存储介质中

数据库实时监控

  1. canal可以实时监控mysql的更新操作,对于敏感数据的修改可以及时通知相关人员

数据分析和挖掘

  1. canal可以将mysql增量数据投递到kafka等消息队列中,为数据分析和挖掘提供数据来源

数据库备份

  1. canal可以将mysql主库上的数据增量日志复制到备库上,实现数据库的备份

数据集成

  1. canal可以将多个mysql数据库中的数据进行集成,为数据处理提供更加高效可靠的解决方案

数据库迁移

  1. canal可以协助完成mysql数据库的辨别升级及数据迁移任务

canal中mysql配置

shell 复制代码
vim /etc/mysql/my.cnf
[mysqld]
# 设置主服务器唯一ID
server-id=1
# 启用二进制日志
log-bin=mysql-bin
# 设置不要复制的数据库(可以设置多个)
binlog-ignore-db=mysql
# 设置需要监听的数据库,不配置表示所有数据库均开启
#binlog-do-db=canal-demo
# 设置logbin格式
binlog_format=row
bind-address = 0.0.0.0

canal下载配置文件

shell 复制代码
官网下载
tar -zvxf canal.deployer-1.1.7.tar.gz
cd canal
cd conf
#################################################
#########               common argument         #############
#################################################
# tcp bind ip
canal.ip =
# register ip to zookeeper
canal.register.ip = 192.168.7.129    # canal的ip
canal.port = 11111
canal.metrics.pull.port = 11112
# canal instance user/passwd
# canal.user = canal
# canal.passwd = E3619321C1A937C46A0D8BD1DAC39F93B27D4458

# canal admin config
#canal.admin.manager = 127.0.0.1:8089
canal.admin.port = 11110
canal.admin.user = admin
canal.admin.passwd = 4ACFE3202A5FF5CF467898FC58AAB1D615029441
# admin auto register
#canal.admin.register.auto = true
#canal.admin.register.cluster =
#canal.admin.register.name =

canal.zkServers =192.168.5.6:2181   # kafka的端口的和ip
# flush data to zk
canal.zookeeper.flush.period = 1000
canal.withoutNetty = false
# tcp, kafka, rocketMQ, rabbitMQ, pulsarMQ
canal.serverMode = kafka     # 选择同步方式为kafka
# flush meta cursor/parse position to file
canal.file.data.dir = ${canal.conf.dir}
canal.file.flush.period = 1000
## memory store RingBuffer size, should be Math.pow(2,n)
canal.instance.memory.buffer.size = 16384
## memory store RingBuffer used memory unit size , default 1kb
canal.instance.memory.buffer.memunit = 1024
## meory store gets mode used MEMSIZE or ITEMSIZE
canal.instance.memory.batch.mode = MEMSIZE
canal.instance.memory.rawEntry = true

## detecing config
canal.instance.detecting.enable = false
#canal.instance.detecting.sql = insert into retl.xdual values(1,now()) on duplicate key update x=now()
canal.instance.detecting.sql = select 1
canal.instance.detecting.interval.time = 3
canal.instance.detecting.retry.threshold = 3
canal.instance.detecting.heartbeatHaEnable = false

# support maximum transaction size, more than the size of the transaction will be cut into multiple transactions delivery
canal.instance.transaction.size =  1024
# mysql fallback connected to new master should fallback times
canal.instance.fallbackIntervalInSeconds = 60

# network config
canal.instance.network.receiveBufferSize = 16384
canal.instance.network.sendBufferSize = 16384
canal.instance.network.soTimeout = 30

# binlog filter config
canal.instance.filter.druid.ddl = true
canal.instance.filter.query.dcl = false
canal.instance.filter.query.dml = false
canal.instance.filter.query.ddl = false
canal.instance.filter.table.error = false
canal.instance.filter.rows = false
canal.instance.filter.transaction.entry = false
canal.instance.filter.dml.insert = false
canal.instance.filter.dml.update = false
canal.instance.filter.dml.delete = false

# binlog format/image check
canal.instance.binlog.format = ROW,STATEMENT,MIXED
canal.instance.binlog.image = FULL,MINIMAL,NOBLOB

# binlog ddl isolation
canal.instance.get.ddl.isolation = false

# parallel parser config
canal.instance.parser.parallel = true
## concurrent thread number, default 60% available processors, suggest not to exceed Runtime.getRuntime().availableProcessors()
#canal.instance.parser.parallelThreadSize = 16
## disruptor ringbuffer size, must be power of 2
canal.instance.parser.parallelBufferSize = 256

# table meta tsdb info
canal.instance.tsdb.enable = true
canal.instance.tsdb.dir = ${canal.file.data.dir:../conf}/${canal.instance.destination:}
canal.instance.tsdb.url = jdbc:h2:${canal.instance.tsdb.dir}/h2;CACHE_SIZE=1000;MODE=MYSQL;
canal.instance.tsdb.dbUsername = canal
canal.instance.tsdb.dbPassword = canal
# dump snapshot interval, default 24 hour
canal.instance.tsdb.snapshot.interval = 24
# purge snapshot expire , default 360 hour(15 days)
canal.instance.tsdb.snapshot.expire = 360

#################################################
#########               destinations            #############
#################################################
canal.destinations = example,script   # 需要进行同步的任务,在conf/example文件下,和script下
# conf root dir
canal.conf.dir = ../conf
# auto scan instance dir add/remove and start/stop instance
canal.auto.scan = true
canal.auto.scan.interval = 5
# set this value to 'true' means that when binlog pos not found, skip to latest.
# WARN: pls keep 'false' in production env, or if you know what you want.
canal.auto.reset.latest.pos.mode = false

canal.instance.tsdb.spring.xml = classpath:spring/tsdb/h2-tsdb.xml
#canal.instance.tsdb.spring.xml = classpath:spring/tsdb/mysql-tsdb.xml

canal.instance.global.mode = spring
canal.instance.global.lazy = false
canal.instance.global.manager.address = ${canal.admin.manager}
#canal.instance.global.spring.xml = classpath:spring/memory-instance.xml
canal.instance.global.spring.xml = classpath:spring/file-instance.xml
#canal.instance.global.spring.xml = classpath:spring/default-instance.xml

##################################################
#########             MQ Properties      #############
##################################################
# aliyun ak/sk , support rds/mq
canal.aliyun.accessKey =
canal.aliyun.secretKey =
canal.aliyun.uid=

canal.mq.flatMessage = true
canal.mq.canalBatchSize = 50
canal.mq.canalGetTimeout = 100
# Set this value to "cloud", if you want open message trace feature in aliyun.
canal.mq.accessChannel = local

canal.mq.database.hash = true
canal.mq.send.thread.size = 30
canal.mq.build.thread.size = 8

##################################################
#########                    Kafka                   #############
##################################################
kafka.bootstrap.servers = 192.168.5.6:9092   # kafka的IP地址和端口
kafka.acks = all
kafka.compression.type = none
kafka.batch.size = 16384
kafka.linger.ms = 1
kafka.max.request.size = 1048576
kafka.buffer.memory = 33554432
kafka.max.in.flight.requests.per.connection = 1
kafka.retries = 0

kafka.kerberos.enable = false
kafka.kerberos.krb5.file = ../conf/kerberos/krb5.conf
kafka.kerberos.jaas.file = ../conf/kerberos/jaas.conf

# sasl demo
# kafka.sasl.jaas.config = org.apache.kafka.common.security.scram.ScramLoginModule required \\n username=\"alice\" \\npassword="alice-secret\";
# kafka.sasl.mechanism = SCRAM-SHA-512
# kafka.security.protocol = SASL_PLAINTEXT

##################################################
#########                   RocketMQ         #############
##################################################
rocketmq.producer.group = test
rocketmq.enable.message.trace = false
rocketmq.customized.trace.topic =
rocketmq.namespace =
rocketmq.namesrv.addr = 127.0.0.1:9876
rocketmq.retry.times.when.send.failed = 0
rocketmq.vip.channel.enabled = false
rocketmq.tag =

##################################################
#########                   RabbitMQ         #############
##################################################
rabbitmq.host =
rabbitmq.virtual.host =
rabbitmq.exchange =
rabbitmq.username =
rabbitmq.password =
rabbitmq.deliveryMode =


##################################################
#########                     Pulsar         #############
##################################################
pulsarmq.serverUrl =
pulsarmq.roleToken =
pulsarmq.topicTenantPrefix =

配置任务

shell 复制代码
vim canal/conf/example
vim instance.properties
#################################################
## mysql serverId , v1.0.26+ will autoGen
# canal.instance.mysql.slaveId=0

# enable gtid use true/false
canal.instance.gtidon=false

# position info
canal.instance.master.address=192.168.7.129:3306  # 数据库的ip和端口
canal.instance.master.journal.name=
canal.instance.master.position=
canal.instance.master.timestamp=
canal.instance.master.gtid=

# rds oss binlog
canal.instance.rds.accesskey=
canal.instance.rds.secretkey=
canal.instance.rds.instanceId=

# table meta tsdb info
canal.instance.tsdb.enable=true
#canal.instance.tsdb.url=jdbc:mysql://127.0.0.1:3306/canal_tsdb
#canal.instance.tsdb.dbUsername=canal
#canal.instance.tsdb.dbPassword=canal

#canal.instance.standby.address =
#canal.instance.standby.journal.name =
#canal.instance.standby.position =
#canal.instance.standby.timestamp =
#canal.instance.standby.gtid=

# username/password
canal.instance.dbUsername=canal              # 数据库的用户名
canal.instance.dbPassword=docker211102       # 数据库的密码
canal.instance.connectionCharset = UTF-8
# enable druid Decrypt database password
canal.instance.enableDruid=false
#canal.instance.pwdPublicKey=MFwwDQYJKoZIhvcNAQEBBQADSwAwSAJBALK4BUxdDltRRE5/zXpVEVPUgunvscYFtEip3pmLlhrWpacX7y7GCMo2/JM6LeHmiiNdH1FWgGCpUfircSwlWKUCAwEAAQ==

# table regex
canal.instance.filter.regex=script\\..*    #监控script这个数据库下面的所有的表
# table black regex
canal.instance.filter.black.regex=mysql\\.slave_.*
# table field filter(format: schema1.tableName1:field1/field2,schema2.tableName2:field1/field2)
#canal.instance.filter.field=test1.t_product:id/subject/keywords,test2.t_company:id/name/contact/ch
# table field black filter(format: schema1.tableName1:field1/field2,schema2.tableName2:field1/field2)
#canal.instance.filter.black.field=test1.t_product:subject/product_image,test2.t_company:id/name/contact/ch

# mq config
canal.mq.topic=example             # 将数据发送到example这个主题里面
# dynamic topic route by schema or table regex
#canal.mq.dynamicTopic=mytest1.user,topic2:mytest2\\..*,.*\\..*
canal.mq.partition=0
# hash partition config
#canal.mq.enableDynamicQueuePartition=false
#canal.mq.partitionsNum=3
#canal.mq.dynamicTopicPartitionNum=test.*:4,mycanal:6
#canal.mq.partitionHash=test.table:id^name,.*\\..*
#
# multi stream for polardbx
canal.instance.multi.stream.on=false
#################################################
shell 复制代码
vim canal/conf/script
vim instance.properties
#################################################
## mysql serverId , v1.0.26+ will autoGen
# canal.instance.mysql.slaveId=0

# enable gtid use true/false
canal.instance.gtidon=false

# position info
canal.instance.master.address=192.168.7.129:3306   # 数据库的ip和端口
canal.instance.master.journal.name=
canal.instance.master.position=
canal.instance.master.timestamp=
canal.instance.master.gtid=

# rds oss binlog
canal.instance.rds.accesskey=
canal.instance.rds.secretkey=
canal.instance.rds.instanceId=

# table meta tsdb info
canal.instance.tsdb.enable=true
#canal.instance.tsdb.url=jdbc:mysql://127.0.0.1:3306/canal_tsdb
#canal.instance.tsdb.dbUsername=canal
#canal.instance.tsdb.dbPassword=canal

#canal.instance.standby.address =
#canal.instance.standby.journal.name =
#canal.instance.standby.position =
#canal.instance.standby.timestamp =
#canal.instance.standby.gtid=

# username/password
canal.instance.dbUsername=canal                # 数据库的用户名
canal.instance.dbPassword=docker211102         # 数据库的密码
canal.instance.connectionCharset = UTF-8
# enable druid Decrypt database password
canal.instance.enableDruid=false
#canal.instance.pwdPublicKey=MFwwDQYJKoZIhvcNAQEBBQADSwAwSAJBALK4BUxdDltRRE5/zXpVEVPUgunvscYFtEip3pmLlhrWpacX7y7GCMo2/JM6LeHmiiNdH1FWgGCpUfircSwlWKUCAwEAAQ==

# table regex
canal.instance.filter.regex=canal-demo\\..*   # 监控cancl-demo这个库下面的所有表
# table black regex
canal.instance.filter.black.regex=mysql\\.slave_.*
# table field filter(format: schema1.tableName1:field1/field2,schema2.tableName2:field1/field2)
#canal.instance.filter.field=test1.t_product:id/subject/keywords,test2.t_company:id/name/contact/ch
# table field black filter(format: schema1.tableName1:field1/field2,schema2.tableName2:field1/field2)
#canal.instance.filter.black.field=test1.t_product:subject/product_image,test2.t_company:id/name/contact/ch

# mq config
canal.mq.topic=script          # 将数据发送到script这个主题里面
# dynamic topic route by schema or table regex
#canal.mq.dynamicTopic=mytest1.user,topic2:mytest2\\..*,.*\\..*
canal.mq.partition=0
# hash partition config
#canal.mq.enableDynamicQueuePartition=false
#canal.mq.partitionsNum=3
#canal.mq.dynamicTopicPartitionNum=test.*:4,mycanal:6
#canal.mq.partitionHash=test.table:id^name,.*\\..*
#
# multi stream for polardbx
canal.instance.multi.stream.on=false
#################################################
相关推荐
nbwenren18 小时前
MySQL数据库误删恢复_mysql 数据 误删
数据库·mysql·adb
HUGu RGIN1 天前
MySQL--》如何在MySQL中打造高效优化索引
android·mysql·adb
北冥有鱼被烹2 天前
【微知】rokid glass如何开启无线adb进行APP安装
adb
STER labo3 天前
mysql配置环境变量——(‘mysql‘ 不是内部或外部命令,也不是可运行的程序 或批处理文件解决办法)
数据库·mysql·adb
sjmaysee3 天前
CentOS7安装Mysql5.7(ARM64架构)
adb·架构
AtOR CUES3 天前
MySQL——表操作及查询
android·mysql·adb
mOok ONSC3 天前
mysql9.0windows安装
windows·adb
xxjj998a4 天前
Laravel8.x核心特性详解
数据库·mysql·adb
TeDi TIVE4 天前
Linux下MySQL的简单使用
linux·mysql·adb
TeDi TIVE4 天前
MySQL四种备份表的方式
mysql·adb·oracle