FastDFS 可观测性最佳实践

FastDFS 介绍

FastDFS(Fast Distributed File System)是一种开源的分布式文件系统,具有高可靠性、高可扩展性、高性能等特点,被广泛应用于互联网领域的大规模文件存储和共享中。

作为一个分布式文件系统,其运行状态(如存储节点、Tracker 节点的健康状况、文件上传下载性能、存储容量变化等)对于业务稳定性至关重要。观测云作为一个统一的监控平台,可以帮你:

  • 集中收集和可视化展示 FastDFS 的各类指标(Metrics)、日志(Logs)等信息。
  • 提供开箱即用的仪表盘或支持自定义仪表盘,让你能直观地查看 FastDFS 集群的整体健康状况、性能趋势和存储使用情况,无需在多个系统间切换。
  • 通过配置告警规则,在 FastDFS 出现异常时及时通知你,以便快速响应。

观测云

观测云是一款专为 IT 工程师打造的全链路可观测产品,它集成了基础设施监控、应用程序性能监控和日志管理,为整个技术栈提供实时可观察性。这款产品能够帮助工程师全面了解端到端的用户体验追踪,了解应用内函数的每一次调用,以及全面监控云时代的基础设施。此外,观测云还具备快速发现系统安全风险的能力,为数字化时代提供安全保障。

部署 DataKit

DataKit 是一个开源的、跨平台的数据收集和监控工具,由观测云开发并维护。它旨在帮助用户收集、处理和分析各种数据源,如日志、指标和事件,以便进行有效的监控和故障排查。Datakit 支持多种数据输入和输出格式,可以轻松集成到现有的监控系统中。

登录观测云控制台,在「集成」 - 「DataKit」选择对应安装方式,当前采用 Linux 主机部署 DataKit。

安装配置

前置条件

  • Python >= 3.10
  • Gunicorn = 23.0.0
  • Flask = 3.1.0
  • Prometheus_client = 0.21.1

Exporter 原理

go 复制代码
该库为FastDFS相关操作提供了Prometheus指标

收集的指标是基于FastDFS内置的`fdfs_monitor`命令

下载安装包

fastdfs-exporter 为官方研发的专用 Exporter,将 FastDFS 状态转换为 Prometheus 指标格式。

下载安装 Exporter:

ruby 复制代码
wget https://github.com/maxpasserby/fastdfs-exporter/archive/refs/tags/0.1.1.tar.gz

将 Exporter 目录移动到 /etc 目录下:

vbscript 复制代码
执行以下命令 运行exporter.main

python3 -u -m exporter.main

结果展示可以看到http://172.16.0.150:9036/ 为 prom指标:
Warning: The environment variable [TRACKER_SERVER] is empty, [TRACKER_SERVER] will be set to the default value of [127.0.0.1:22122]
 * Serving Flask app 'main' (lazy loading)
 * Environment: production
   WARNING: This is a development server. Do not use it in a production deployment.
   Use a production WSGI server instead.
 * Debug mode: off
 * Running on all addresses.
   WARNING: This is a development server. Do not use it in a production deployment.
 * Running on http://172.16.0.150:9036/ (Press CTRL+C to quit)

配置 Exporter 为后端运行:

bash 复制代码
nohup /usr/bin/python3.6 -u -m exporter.main &

访问指标

通过 curl 127.0.0.1:9036/metrics 访问指标。

ini 复制代码
[root@FHJ-TEST-TEMP ~]# curl 127.0.0.1:9036/metrics
# HELP fastdfs_group_count The total number of storage groups in FastDFS
# TYPE fastdfs_group_count gauge
fastdfs_group_count 1.0
# HELP fastdfs_group_storage_server_count The number of active storage servers in a FastDFS storage group
# TYPE fastdfs_group_storage_server_count gauge
fastdfs_group_storage_server_count{group="group1"} 1.0
# HELP fastdfs_group_active_storage_server_count The number of storage servers in a FastDFS storage group
# TYPE fastdfs_group_active_storage_server_count gauge
fastdfs_group_active_storage_server_count{group="group1"} 1.0
# HELP fastdfs_group_disk_total_space_bytes The total disk space of a FastDFS storage group in bytes
# TYPE fastdfs_group_disk_total_space_bytes gauge
fastdfs_group_disk_total_space_bytes{group="group1"} 2.0797456384e+010
# HELP fastdfs_group_disk_free_space_bytes The free disk space of a FastDFS storage group in bytes
# TYPE fastdfs_group_disk_free_space_bytes gauge
fastdfs_group_disk_free_space_bytes{group="group1"} 1.4593032192e+010
# HELP fastdfs_storage_server_info information about a fastdfs storage server(0:OFFLINE  1:ACTIVE  2:INIT  3:DELETED  4:WAIT_SYNC  5:SYNCING  6:ONLINE)
# TYPE fastdfs_storage_server_info gauge
fastdfs_storage_server_info{group="group1",ip="14.103.17.230",storage="storage1",version="6.07"} 1.0
# HELP fastdfs_storage_join_time_seconds the time when a storage server joined the fastdfs cluster in seconds since the epoch
# TYPE fastdfs_storage_join_time_seconds gauge
fastdfs_storage_join_time_seconds{group="group1",ip="14.103.17.230",storage="storage1"} 1.753076473e+09
# HELP fastdfs_storage_up_time_seconds the time when a storage server was last started (or restarted) in seconds since the epoch
# TYPE fastdfs_storage_up_time_seconds gauge
fastdfs_storage_up_time_seconds{group="group1",ip="14.103.17.230",storage="storage1"} 1.753076473e+09
# HELP fastdfs_storage_total_space_bytes the total disk space of a fastdfs storage server in bytes
# TYPE fastdfs_storage_total_space_bytes gauge
fastdfs_storage_total_space_bytes{group="group1",ip="14.103.17.230",storage="storage1"} 2.0797456384e+010
# HELP fastdfs_storage_free_space_bytes the free disk space of a fastdfs storage server in bytes
# TYPE fastdfs_storage_free_space_bytes gauge
fastdfs_storage_free_space_bytes{group="group1",ip="14.103.17.230",storage="storage1"} 1.4593032192e+010
# HELP fastdfs_storage_connection_alloc_count the total number of connections allocated from the pool (since server startup) of a fastdfs storage server
# TYPE fastdfs_storage_connection_alloc_count gauge
fastdfs_storage_connection_alloc_count{group="group1",ip="14.103.17.230",storage="storage1"} 256.0
# HELP fastdfs_storage_connection_current_count the number of currently used connections of a fastdfs storage server
# TYPE fastdfs_storage_connection_current_count gauge
fastdfs_storage_connection_current_count{group="group1",ip="14.103.17.230",storage="storage1"} 0.0
# HELP fastdfs_storage_connection_max_count the maximum number of allowed connections of a fastdfs storage server
# TYPE fastdfs_storage_connection_max_count gauge
fastdfs_storage_connection_max_count{group="group1",ip="14.103.17.230",storage="storage1"} 1.0
# HELP fastdfs_storage_total_upload_count the total number of file upload operations on a fastdfs storage server
# TYPE fastdfs_storage_total_upload_count gauge
fastdfs_storage_total_upload_count{group="group1",ip="14.103.17.230",storage="storage1"} 11.0
# HELP fastdfs_storage_success_upload_count the number of successful file upload operations on a fastdfs storage server
# TYPE fastdfs_storage_success_upload_count gauge
fastdfs_storage_success_upload_count{group="group1",ip="14.103.17.230",storage="storage1"} 11.0
# HELP fastdfs_storage_total_delete_count the total number of file deletion operations on a fastdfs storage server
# TYPE fastdfs_storage_total_delete_count gauge
fastdfs_storage_total_delete_count{group="group1",ip="14.103.17.230",storage="storage1"} 0.0
# HELP fastdfs_storage_success_delete_count the number of successful file deletion operations on a fastdfs storage server
# TYPE fastdfs_storage_success_delete_count gauge
fastdfs_storage_success_delete_count{group="group1",ip="14.103.17.230",storage="storage1"} 0.0
# HELP fastdfs_storage_total_download_count the total number of file download operations on a fastdfs storage server
# TYPE fastdfs_storage_total_download_count gauge
fastdfs_storage_total_download_count{group="group1",ip="14.103.17.230",storage="storage1"} 0.0
# HELP fastdfs_storage_success_download_count the number of successful file download operations on a fastdfs storage server
# TYPE fastdfs_storage_success_download_count gauge
fastdfs_storage_success_download_count{group="group1",ip="14.103.17.230",storage="storage1"} 0.0
# HELP fastdfs_storage_total_modify_count the total number of file modification operations on a fastdfs storage server
# TYPE fastdfs_storage_total_modify_count gauge
fastdfs_storage_total_modify_count{group="group1",ip="14.103.17.230",storage="storage1"} 0.0
# HELP fastdfs_storage_success_modify_count the number of successful file modification operations on a fastdfs storage server
# TYPE fastdfs_storage_success_modify_count gauge
fastdfs_storage_success_modify_count{group="group1",ip="14.103.17.230",storage="storage1"} 0.0
# HELP fastdfs_storage_total_append_count the total number of file append operations on a fastdfs storage server
# TYPE fastdfs_storage_total_append_count gauge
fastdfs_storage_total_append_count{group="group1",ip="14.103.17.230",storage="storage1"} 0.0
# HELP fastdfs_storage_success_append_count the number of successful file append operations on a fastdfs storage server
# TYPE fastdfs_storage_success_append_count gauge
fastdfs_storage_success_append_count{group="group1",ip="14.103.17.230",storage="storage1"} 0.0
# HELP fastdfs_storage_total_upload_bytes the total size of files uploaded to a fastdfs storage server in bytes
# TYPE fastdfs_storage_total_upload_bytes gauge
fastdfs_storage_total_upload_bytes{group="group1",ip="14.103.17.230",storage="storage1"} 154.0
# HELP fastdfs_storage_success_upload_bytes the size of successfully uploaded files to a fastdfs storage server in bytes
# TYPE fastdfs_storage_success_upload_bytes gauge
fastdfs_storage_success_upload_bytes{group="group1",ip="14.103.17.230",storage="storage1"} 154.0
# HELP fastdfs_storage_total_download_bytes the total size of files downloaded from a fastdfs storage server in bytes
# TYPE fastdfs_storage_total_download_bytes gauge
fastdfs_storage_total_download_bytes{group="group1",ip="14.103.17.230",storage="storage1"} 0.0
# HELP fastdfs_storage_success_download_bytes the size of successfully downloaded files from a fastdfs storage server in bytes
# TYPE fastdfs_storage_success_download_bytes gauge
fastdfs_storage_success_download_bytes{group="group1",ip="14.103.17.230",storage="storage1"} 0.0
# HELP fastdfs_storage_total_append_bytes the total size of files appended to a fastdfs storage server in bytes
# TYPE fastdfs_storage_total_append_bytes gauge
fastdfs_storage_total_append_bytes{group="group1",ip="14.103.17.230",storage="storage1"} 0.0
# HELP fastdfs_storage_success_append_bytes the size of successfully appended files to a fastdfs storage server in bytes
# TYPE fastdfs_storage_success_append_bytes gauge
fastdfs_storage_success_append_bytes{group="group1",ip="14.103.17.230",storage="storage1"} 0.0
# HELP fastdfs_storage_total_modify_bytes the total size of files modified on a fastdfs storage server in bytes
# TYPE fastdfs_storage_total_modify_bytes gauge
fastdfs_storage_total_modify_bytes{group="group1",ip="14.103.17.230",storage="storage1"} 0.0
# HELP fastdfs_storage_success_modify_bytes the size of successfully modified files on a fastdfs storage server
# TYPE fastdfs_storage_success_modify_bytes gauge
fastdfs_storage_success_modify_bytes{group="group1",ip="14.103.17.230",storage="storage1"} 0.0
# HELP fastdfs_storage_total_file_open_count the total number of file open operations on a fastdfs storage
# TYPE fastdfs_storage_total_file_open_count gauge
fastdfs_storage_total_file_open_count{group="group1",ip="14.103.17.230",storage="storage1"} 11.0
# HELP fastdfs_storage_success_file_open_count the number of successful file open operations on a fastdfs storage server
# TYPE fastdfs_storage_success_file_open_count gauge
fastdfs_storage_success_file_open_count{group="group1",ip="14.103.17.230",storage="storage1"} 11.0
# HELP fastdfs_storage_total_file_read_count the total number of file read operations on a fastdfs storage server
# TYPE fastdfs_storage_total_file_read_count gauge
fastdfs_storage_total_file_read_count{group="group1",ip="14.103.17.230",storage="storage1"} 0.0
# HELP fastdfs_storage_success_file_read_count the number of successful file read operations on a fastdfs storage server
# TYPE fastdfs_storage_success_file_read_count gauge
fastdfs_storage_success_file_read_count{group="group1",ip="14.103.17.230",storage="storage1"} 0.0
# HELP fastdfs_storage_total_file_write_count the total number of file write operations on a fastdfs storage server
# TYPE fastdfs_storage_total_file_write_count gauge
fastdfs_storage_total_file_write_count{group="group1",ip="14.103.17.230",storage="storage1"} 11.0
# HELP fastdfs_storage_success_file_write_count the number of successful file write operations on a fastdfs storage server
# TYPE fastdfs_storage_success_file_write_count gauge
fastdfs_storage_success_file_write_count{group="group1",ip="14.103.17.230",storage="storage1"} 11.0
# HELP fastdfs_storage_last_heart_beat_time the time of the last heartbeat of a fastdfs storage server
# TYPE fastdfs_storage_last_heart_beat_time gauge
fastdfs_storage_last_heart_beat_time{group="group1",ip="14.103.17.230",storage="storage1"} 1.753084637e+09
# HELP fastdfs_storage_last_source_update the time of the last source file update on a fastdfs storage server
# TYPE fastdfs_storage_last_source_update gauge
fastdfs_storage_last_source_update{group="group1",ip="14.103.17.230",storage="storage1"} 1.753077421e+09
# HELP fastdfs_storage_last_sync_update the time of the last synchronization update on a fastdfs storage server
# TYPE fastdfs_storage_last_sync_update gauge
fastdfs_storage_last_sync_update{group="group1",ip="14.103.17.230",storage="storage1"} 0.0
# HELP fastdfs_storage_last_synced_timestamp the timestamp of the last synchronization on a fastdfs storage server
# TYPE fastdfs_storage_last_synced_timestamp gauge
fastdfs_storage_last_synced_timestamp{group="group1",ip="14.103.17.230",storage="storage1"} 0.0

采集器配置

新增 prom-fastdfs.conf 配置文件

/usr/local/datakit/conf.d/samples 目录下,复制 prom.conf.sampleprom-fastdfs.conf

bash 复制代码
cp prom.conf.sample prom-fastdfs.conf 

主要参数说明:

  • url:fastdfs-exporter 指标地址
  • interval:采集频率
  • source:采集器别名
ini 复制代码
[[inputs.prom]]
  ## Exporter URLs.
  urls = ["http://127.0.0.1:9036/metrics"]

  ## Stream Size. 
  ## The source stream segmentation size, (defaults to 1).
  ## 0 source stream undivided. 
  # stream_size = 1

  ## Unix Domain Socket URL. Using socket to request data when not empty.
  uds_path = ""

  ## Ignore URL request errors.
  ignore_req_err = false

  ## Collector alias.
  source = "fastdfs"

重启 DataKit

复制代码
systemctl restart Datakit

关键指标

序号 指标名称 描述 单位
1 storage_join_time_seconds 存储服务器加入集群的时间当此值为 true 时,命名空间将停止接受客户端写入 Int
2 storage_up_time_seconds 存储服务器上次启动时间(或最近一次重启时间) Int
3 storage_total_space_bytes 存储服务器总磁盘空间 Int
4 free_space_bytes 存储服务器可用磁盘空间 Int
5 storage_connection_alloc_count 累积已分配的连接数(自启动以来) Int
6 storage_connection_current_count 当前使用的连接数 Int
7 storage_connection_max_count 最大允许连接数 Int
8 storage_total_delete_count 总删除文件次数 Int
9 storage_success_download_count 成功下载文件次数 Int
10 storage_total_modify_count 总修改文件次数 Int
11 group_storage_server_count storage group 存储服务器数量 Int
12 group_active_storage_server_count storage group 运行的存储服务器数量 Int
13 group_disk_total_space_bytes storage group 总磁盘 Int
14 group_disk_free_space_bytes storage group 剩余磁盘 Int

场景视图

登录观测云控制台,点击「场景」 -「新建仪表板」,输入 "FastDFS", 选择 "FastDFS",点击 "确定" 即可添加视图。

监控器(告警)

FastDF 成功上传文件次数大于 100 次

FastDFS 磁盘空间剩余 30% 告警

总结

FastDFS 监控系统通过实时状态感知、智能告警与数据可视化,将故障响应从小时级压缩至分钟级,接入观测云后监控后,运维效率提升 50% 以上,故障响应从小时级降至分钟级,资源利用率提升 30%,业务中断风险降低 90%。对于高并发文件服务(如电商图片、视频平台),让监控不仅是"保险绳",更是性能优化和成本控制的决策引擎 。

相关推荐
会飞的小蛮猪5 天前
Prometheus运维之路(ES监控接入)
elasticsearch·监控·自动化运维
key_Go18 天前
07.容器监控
运维·网络·网络协议·docker·监控
颜如玉19 天前
Trace Sql:打通全链路日志最后一里路
后端·源码·监控
handsome1234525 天前
CentOS 8.5.2.111部署Zabbix6.0 手把手、保姆级
运维·mysql·centos·zabbix·监控·centos8·zabbix6·linux源
鼠鼠我捏,要死了捏1 个月前
Spring Boot Actuator自定义指标与监控实践指南
spring boot·监控·actuator
可观测性用观测云1 个月前
Openresty Tracing 最佳实践
监控
可观测性用观测云1 个月前
连锁门店可用性监测和进程监测最佳实践
监控