1. TiDB-Operator 备份到 Minio

创建minio s3

  1. 初始化minio
shell 复制代码
minio server $HOME/operator/data --console-address :9090
  1. 设置region为上海

创建tidb-operator备份CR

1.备份CR配置文件backup-s3.yaml信息

yaml 复制代码
apiVersion: pingcap.com/v1alpha1
kind: Backup
metadata:
  name: backup2s3-dev
  namespace: tidb-admin
  labels:
    user: paul
spec:
  ## Describes the compute resource requirements and limits of Backup.
  ## Ref: https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/
  resources:
    requests:
      cpu: 500m
      memory: 512Mi
    limits:
      cpu: 500m
      memory: 512Mi

  ## List of environment variables to set in the container, like v1.Container.Env.
  ## Note that the following builtin env vars will be overwritten by values set here.
  ## - S3_PROVIDER
  ## - S3_ENDPOINT
  ## - AWS_REGION
  ## - AWS_ACL
  ## - AWS_STORAGE_CLASS
  ## - AWS_DEFAULT_REGION
  ## - AWS_ACCESS_KEY_ID
  ## - AWS_SECRET_ACCESS_KEY
  ## - GCS_PROJECT_ID
  ## - GCS_OBJECT_ACL
 ## From is the TidbCluster to be backed up.
  ## It takes high precedence than spec in BR. If `from` not set, cluster in BR will be backed up.
  #from:
    ## Host is the address of the TidbCluster to be backed up, which is the service name of the TidbCluster, such as `basic-tidb`.
    #host: alpha-tidb
    ## Port is the port of the TidbCluster to be backed up.
    #port: 4000
    ## User is the accessing user of the TidbCluster to be backed up.
    #user: root
    ## SecretName is the secret that contains the password of the accessing user of the TidbCluster to be backed up.
    # secretName: sh.helm.release.v1.tidb-operator.v1 
    ## TLSClientSecretName is the name of secret which stores tidb server client certificate.
    ## Defaults to nil.
    # tlsClientSecretName: ""
  backupType: full
  backupMode: snapshot
  ## TikvGCLifeTime specifies the safe gc life time for Backup.
  ## The time limit during which data is retained for each GC, in the format of Go Duration.
  ## When a GC happens, the current time minus this value is the safe point.
  ## Defaults to 72h.
  tikvGCLifeTime: 72h

  s3:
    provider: aws
    secretName: minio-secret
    bucket: tidbuss
    prefix: tidb/s3
    endpoint: http://192.168.1.2:9000

  ## StorageSize is the PV size specified for the backup operation.
  ## This value must be greater than the size of the TidbCluster to be backed up.
  ## Defaults to 100Gi.
  storageSize: "100Gi"


  ## BR configuration.
  ## Ref: https://docs.pingcap.com/tidb/stable/backup-and-restore-tool
  br:
    ## Cluster specifies name of TidbCluster to be backed up.
    cluster: "alpha"
    ## Namespace specifies namespace of TidbCluster to be backed up.
    clusterNamespace: "tidb-admin"
    ## LogLevel is the log level. Defaults to `info`.
    # logLevel: "info"
    ## StatusAddr is the HTTP listening address for the status report service. Defaults to empty.
    # statusAddr: ""
    ## Concurrency is the size of thread pool on each node that execute the backup task.
    ## Defaults to 4.
    concurrency: 4
    ## RateLimit is the rate limit of the backup task, MB/s per node.
    ## If set to 4, the speed limit is 4 MB/s.The speed limit is not set by default.
    # rateLimit: 0
    ## TimeAgo presents back up the data before `timeAgo`, e.g. 1m, 1h. Defaults to empty.
    # timeAgo: 1m
    ## Checksum specifies whether to verify the files after the backup is completed.
    ## Defaults to `true``.
    # checksum: true
    ## CheckRequirements specifies whether to check requirements before backup
    # checkRequirements: true
    ## SendCredToTikv specifies whether the BR process passes its AWS or GCP privileges to the TiKV process.
    ## Defaults to `true``.
    sendCredToTikv: true
    ## OnLine specifies whether online during restore. Defaults to false.
    # onLine: false
    ## Options specifies the extra arguments that BR supports. These options has highest priority.
    # options: []

  

  ## ToolImage specifies the tool image used in `Backup`, which supports BR and Dumpling images.
  ## For examples `spec.toolImage: pingcap/br:v5.2.0` or `spec.toolImage: pingcap/dumpling:v5.2.0`
  ## For BR image, if it does not contain tag, Pod will use image 'ToolImage:${TiKV_Version}'.
  toolImage: pingcap/br:v6.5.5

  ## ImagePullSecrets is an optional list of references to secrets in the same namespace to use for pulling any of the images.
  ## If private registry is used, imagePullSecrets may be set.
  ## You can also set this in service account.
  ## Ref: https://kubernetes.io/docs/concepts/containers/images/#specifying-imagepullsecrets-on-a-pod
  # imagePullSecrets:
  # - name: secretName

  ## TableFilter specifies tables that match the table filter rules for BR or Dumpling.
  ## Ref: https://docs.pingcap.com/tidb/stable/table-filter
  ## Defaults to empty.
  # tableFilter: []

  ## Affinity for Backup pod scheduling
  ## Ref: https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#affinity-and-anti-affinity
  # affinity: {}

  ## UseKMS to decrypt the secrets. Defaults to false.
  useKMS: false

  ## ServiceAccount Specify service account of Backup.
  serviceAccount: "tidb-backup-manager"

  ## CleanPolicy specifies whether to clean backup data when the Backup CR is deleted, if not set, the backup data will be retained.
  ## `Retain` represents that the backup data will be retained when the Backup CR is deleted.
  ## `OnFailure` represents that the backup data will be cleaned only for the failed backups when the Backup CR is deleted.
  ## `Delete` represents that the backup data will be cleaned when the Backup CR is deleted.
  cleanPolicy: Retain
  1. 执行创建备份
shell 复制代码
kubectl -n tidb-admin apply -f  backup-s3.yaml

备份错误与异常排查

  1. 错误日志如下:
shell 复制代码
E1105 10:41:10.821663       8 manager.go:408] Get backup metadata for backup files in s3://tidbuss/tidb/s3 of cluster tidb-admin/backup2s3-dev failed, err: read backup meta from bucket tidbuss and prefix tidb/s3: blob (key "backupmeta") (code=Unknown): BadRequest: Bad Request
	status code: 400, request id: 1794B3FFD6488588, host id: dd9025bab4ad464b049177c95eb6ebf374d3b3fd1af9251148b658df7ac2e3e8
I1105 10:41:10.841852       8 backup_status_updater.go:128] Backup: [tidb-admin/backup2s3-dev] updated successfully
error: read backup meta from bucket tidbuss and prefix tidb/s3: blob (key "backupmeta") (code=Unknown): BadRequest: Bad Request
	status code: 400, request id: 1794B3FFD6488588, host id: dd9025bab4ad464b049177c95eb6ebf374d3b3fd1af9251148b658df7ac2e3e
  1. tidb-operator br备份到minio s3异常排查
    通过对tidb-operator项目代码进行分析,定位到util.GetBRMetaData方法,文件位于cmd/backup-manager/app/util/util.go
go 复制代码
// 编写单元测试用例
func TestGetBRMetaData(t *testing.T) {
	ctx := context.Background()
	os.Setenv("AWS_ACCESS_KEY_ID", "tidb")
	os.Setenv("AWS_SECRET_ACCESS_KEY", "Jianxin123")
	provider := v1alpha1.StorageProvider{
		S3: &v1alpha1.S3StorageProvider{
			Provider:   "aws",
			Bucket:     "tidbuss",
			Prefix:     "tidb/s3",
			Endpoint:   "http://192.168.1.2:9000",
			SecretName: "minio-secrete",
		},
	}
	_, err := GetBRMetaData(ctx, provider)
	log.Fatalln(err)

}

// 修改原始方法,通过debug日志显示根本原因
// GetBRMetaData get backup metadata from cloud storage
func GetBRMetaData(ctx context.Context, provider v1alpha1.StorageProvider) (*kvbackup.BackupMeta, error) {
	s, err := util.NewStorageBackend(provider, &util.StorageCredential{})
	if err != nil {
		return nil, err
	}
	defer s.Close()

	var metaData []byte
	// use exponential backoff, every retry duration is duration * factor ^ (used_step - 1)
	backoff := wait.Backoff{
		Duration: time.Second,
		Steps:    6,
		Factor:   2.0,
		Cap:      time.Minute,
	}
	fmt.Println("bucket", s.GetBucket())
	//	_, err = s.Attributes(ctx, "backupmeta")
	obj, err := s.List(&blob.ListOptions{Prefix: "tidb/s3"}).Next(ctx)
	fmt.Println("xx bucket", err, obj)
	readBackupMeta := func() error {
		exist, err := s.Exists(ctx, "backupmeta")
		if err != nil {
			return err
		}
		fmt.Println("IS existed", exist)
		if !exist {
			return fmt.Errorf("%s not exist", constants.MetaFile)
		}
		metaData, err = s.ReadAll(ctx, constants.MetaFile)
		if err != nil {
			return err
		}
		return nil
	}
	fmt.Println("xxxx", readBackupMeta())

	isRetry := func(err error) bool {
		return !strings.Contains(err.Error(), "not exist")
	}
	err = retry.OnError(backoff, isRetry, readBackupMeta)
	if err != nil {
		return nil, errors.Annotatef(err, "read backup meta from bucket %s and prefix %s", s.GetBucket(), s.GetPrefix())
	}

	backupMeta := &kvbackup.BackupMeta{}
	err = proto.Unmarshal(metaData, backupMeta)
	if err != nil {
		return nil, errors.Annotatef(err, "unmarshal backup meta from bucket %s and prefix %s", s.GetBucket(), s.GetPrefix())
	}
	return backupMeta, nil
}

在修改后的方法中,定位到Bucket的Exists、Attribute的方法无法获得有用排查错误信息,转而采用List方法,李处minio s3存储的backupmeta文件,错误日志提示为缺乏region信息。

shell 复制代码
go test -timeout 30s -run ^TestGetBRMetaData$
bucket tidbuss
xx bucket blob (code=Unknown): AuthorizationHeaderMalformed: The authorization header is malformed; the region is wrong; expecting 'shanghai'.
        status code: 400, request id: 1794B4674A2D5A48, host id: dd9025bab4ad464b049177c95eb6ebf374d3b3fd1af9251148b658df7ac2e3e8 <nil>
xxxx blob (key "backupmeta") (code=Unknown): BadRequest: Bad Request
        status code: 400, request id: 1794B4674B073F88, host id: dd9025bab4ad464b049177c95eb6ebf374d3b3fd1af9251148b658df7ac2e3e8

测试用例中的数据与backup-s3.yaml中的s3相关配置一直,其单元测试复现结果指向region配置项。

  1. 在单元测试中加入region配置
go 复制代码
func TestGetBRMetaData(t *testing.T) {
	ctx := context.Background()
	os.Setenv("AWS_ACCESS_KEY_ID", "tidb")
	os.Setenv("AWS_SECRET_ACCESS_KEY", "Jianxin123")
	provider := v1alpha1.StorageProvider{
		S3: &v1alpha1.S3StorageProvider{
			Provider:   "aws",
			Region:     "shanghai",
			Bucket:     "tidbuss",
			Prefix:     "tidb/s3",
			Endpoint:   "http://192.168.1.2:9000",
			SecretName: "minio-secrete",
		},
	}
	_, err := GetBRMetaData(ctx, provider)
	log.Fatalln(err)

}

重跑该测试用例,显示测试通过,说明配置文件中s3相关内容中region为必填字段,需与minio的region配置保持一致

shell 复制代码
hbu@Pauls-MacBook-Air util % go test -timeout 30s -run ^TestGetBRMetaData$
bucket tidbuss
xx bucket EOF <nil>
IS existed true
xxxx <nil>
IS existed true
2023/11/05 18:52:00 <nil>
exit status 1
FAIL    github.com/pingcap/tidb-operator/cmd/backup-manager/app/util    0.442s

解决问题

  1. 修改tidb-operator备份到s3配置文件backup2s3-dev.yaml,在s3配置中添加region字段。
yaml 复制代码
apiVersion: pingcap.com/v1alpha1
kind: Backup
metadata:
  name: backup2s3-dev
  namespace: tidb-admin
  labels:
    user: paul
spec:
  ## Describes the compute resource requirements and limits of Backup.
  ## Ref: https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/
  resources:
    requests:
      cpu: 500m
      memory: 512Mi
    limits:
      cpu: 500m
      memory: 512Mi

  ## List of environment variables to set in the container, like v1.Container.Env.
  ## Note that the following builtin env vars will be overwritten by values set here.
  ## - S3_PROVIDER
  ## - S3_ENDPOINT
  ## - AWS_REGION
  ## - AWS_ACL
  ## - AWS_STORAGE_CLASS
  ## - AWS_DEFAULT_REGION
  ## - AWS_ACCESS_KEY_ID
  ## - AWS_SECRET_ACCESS_KEY
  ## - GCS_PROJECT_ID
  ## - GCS_OBJECT_ACL
 ## From is the TidbCluster to be backed up.
  ## It takes high precedence than spec in BR. If `from` not set, cluster in BR will be backed up.
  #from:
    ## Host is the address of the TidbCluster to be backed up, which is the service name of the TidbCluster, such as `basic-tidb`.
    #host: alpha-tidb
    ## Port is the port of the TidbCluster to be backed up.
    #port: 4000
    ## User is the accessing user of the TidbCluster to be backed up.
    #user: root
    ## SecretName is the secret that contains the password of the accessing user of the TidbCluster to be backed up.
    # secretName: sh.helm.release.v1.tidb-operator.v1 
    ## TLSClientSecretName is the name of secret which stores tidb server client certificate.
    ## Defaults to nil.
    # tlsClientSecretName: ""
  backupType: full
  backupMode: snapshot
  ## TikvGCLifeTime specifies the safe gc life time for Backup.
  ## The time limit during which data is retained for each GC, in the format of Go Duration.
  ## When a GC happens, the current time minus this value is the safe point.
  ## Defaults to 72h.
  tikvGCLifeTime: 72h

  s3:
    provider: aws
    secretName: minio-secret
    region: shanghai
    bucket: tidbuss
    prefix: tidb/s3
    endpoint: http://192.168.1.2:9000

  ## StorageSize is the PV size specified for the backup operation.
  ## This value must be greater than the size of the TidbCluster to be backed up.
  ## Defaults to 100Gi.
  storageSize: "100Gi"


  ## BR configuration.
  ## Ref: https://docs.pingcap.com/tidb/stable/backup-and-restore-tool
  br:
    ## Cluster specifies name of TidbCluster to be backed up.
    cluster: "alpha"
    ## Namespace specifies namespace of TidbCluster to be backed up.
    clusterNamespace: "tidb-admin"
    ## LogLevel is the log level. Defaults to `info`.
    # logLevel: "info"
    ## StatusAddr is the HTTP listening address for the status report service. Defaults to empty.
    # statusAddr: ""
    ## Concurrency is the size of thread pool on each node that execute the backup task.
    ## Defaults to 4.
    concurrency: 4
    ## RateLimit is the rate limit of the backup task, MB/s per node.
    ## If set to 4, the speed limit is 4 MB/s.The speed limit is not set by default.
    # rateLimit: 0
    ## TimeAgo presents back up the data before `timeAgo`, e.g. 1m, 1h. Defaults to empty.
    # timeAgo: 1m
    ## Checksum specifies whether to verify the files after the backup is completed.
    ## Defaults to `true``.
    # checksum: true
    ## CheckRequirements specifies whether to check requirements before backup
    # checkRequirements: true
    ## SendCredToTikv specifies whether the BR process passes its AWS or GCP privileges to the TiKV process.
    ## Defaults to `true``.
    sendCredToTikv: true
    ## OnLine specifies whether online during restore. Defaults to false.
    # onLine: false
    ## Options specifies the extra arguments that BR supports. These options has highest priority.
    # options: []

  

  ## ToolImage specifies the tool image used in `Backup`, which supports BR and Dumpling images.
  ## For examples `spec.toolImage: pingcap/br:v5.2.0` or `spec.toolImage: pingcap/dumpling:v5.2.0`
  ## For BR image, if it does not contain tag, Pod will use image 'ToolImage:${TiKV_Version}'.
  toolImage: pingcap/br:v6.5.5

  ## ImagePullSecrets is an optional list of references to secrets in the same namespace to use for pulling any of the images.
  ## If private registry is used, imagePullSecrets may be set.
  ## You can also set this in service account.
  ## Ref: https://kubernetes.io/docs/concepts/containers/images/#specifying-imagepullsecrets-on-a-pod
  # imagePullSecrets:
  # - name: secretName

  ## TableFilter specifies tables that match the table filter rules for BR or Dumpling.
  ## Ref: https://docs.pingcap.com/tidb/stable/table-filter
  ## Defaults to empty.
  # tableFilter: []

  ## Affinity for Backup pod scheduling
  ## Ref: https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#affinity-and-anti-affinity
  # affinity: {}

  ## UseKMS to decrypt the secrets. Defaults to false.
  useKMS: false

  ## ServiceAccount Specify service account of Backup.
  serviceAccount: "tidb-backup-manager"

  ## CleanPolicy specifies whether to clean backup data when the Backup CR is deleted, if not set, the backup data will be retained.
  ## `Retain` represents that the backup data will be retained when the Backup CR is deleted.
  ## `OnFailure` represents that the backup data will be cleaned only for the failed backups when the Backup CR is deleted.
  ## `Delete` represents that the backup data will be cleaned when the Backup CR is deleted.
  cleanPolicy: Retain
  1. 删除原有运行Backup CR
shell 复制代码
kubectl -n tidb-admin delete -f backup-s3.yaml
  1. 从minio中删除原有备份地址中的文件

重新运行备份

shell 复制代码
hbu@Pauls-MacBook-Air backup % kubectl -n tidb-admin apply -f  backup-s3.yaml
backup.pingcap.com/backup2s3-dev created

检查结果

shell 复制代码
hbu@Pauls-MacBook-Air data % kubectl  -n tidb-admin get pod                          
NAME                                       READY   STATUS      RESTARTS      AGE
alpha-discovery-68588cd598-k5m56           1/1     Running     1 (12d ago)   15d
alpha-pd-0                                 1/1     Running     1 (12d ago)   15d
alpha-tidb-0                               2/2     Running     2 (12d ago)   15d
alpha-tikv-0                               1/1     Running     1 (12d ago)   15d
backup-backup2s3-dev-l5rq4                 0/1     Completed   0             33s
tidb-controller-manager-54694444b9-ncj8z   1/1     Running     6             15d
hbu@Pauls-MacBook-Air data % kubectl  -n tidb-admin get backup
NAME              TYPE   MODE       STATUS     BACKUPPATH             BACKUPSIZE   COMMITTS             LOGTRUNCATEUNTIL   AGE
backup2s3-dev     full   snapshot   Complete   s3://tidbuss/tidb/s3   271 kB       445430282047455233                      42s

修改后的备份配置文件,成功触发tidb-operator备份到s3兼容存储minio。

总结

如果参照TiDB Operator官方文档,TiDB Operator执行备份到S3兼容存储minio相对容易一些。但是,TiDB Operator业务订制化开发工作需要开发者对相关字段掌握更多,才能更好的排查错误。

另外,AWS S3和Minio毕竟还是两种产品,有关Minio region设置和应用方式,也是开发过程需要关注的功能点。

相关推荐
isNotNullX7 分钟前
HBase在大数据实时处理中的角色
大数据·数据库·hbase
白总Server10 分钟前
MySQL在大数据场景应用
大数据·开发语言·数据库·后端·mysql·golang·php
JY_H1 小时前
MongoDB
数据库·mongodb
杨诚实2 小时前
20240912软考架构-------软考161-165答案解析
数据库·架构
尸僵打怪兽2 小时前
软考(中级-软件设计师)(0919)
java·c语言·数据库·计算机网络·软考·多媒体·软件设计师
litGrey3 小时前
Maven国内镜像(四种)
java·数据库·maven
huaqianzkh4 小时前
了解MySQL 高可用架构:主从备份
数据库·mysql·架构
无名之逆5 小时前
云原生(Cloud Native)
开发语言·c++·算法·云原生·面试·职场和发展·大学期末
向往风的男子5 小时前
【mysql】mysql之读写分离以及分库分表
数据库·mysql
阳光开朗_大男孩儿5 小时前
DBUS属性原理
linux·服务器·前端·数据库·qt