在Docker中部署DataKit最佳实践

本文主要介绍如何在 Docker 中安装 DataKit。

配置和启动 DataKit 容器

登陆观测云平台,点击「集成」 -「DataKit」 - 「Docker」,然后拷贝第二步的启动命令,启动参数按实际情况配置。

拷贝启动命令:

复制代码
sudo docker run \
    --hostname "$(hostname)" \
    --workdir /usr/local/datakit \
    -v "/etc/conf/dir/conf.d":"/usr/local/datakit/conf.d/host-inputs-conf"
    -v "/":"/rootfs" \
    -v /var/run/docker.sock:/var/run/docker.sock \
    -e ENV_DATAWAY="https://openway.guance.com?token=tkn_XXXX" \
    -e ENV_DEFAULT_ENABLED_INPUTS='cpu,disk,diskio,mem,swap,system,net,host_processes,hostobject,container,dk' \
    -e ENV_GLOBAL_HOST_TAGS="tag1=a1,tag2=a2" \
    -e ENV_HTTP_LISTEN="0.0.0.0:9529" \
    -e HOST_PROC="/rootfs/proc" \
    -e HOST_SYS="/rootfs/sys" \
    -e HOST_ETC="/rootfs/etc" \
    -e HOST_VAR="/rootfs/var" \
    -e HOST_RUN="/rootfs/run" \
    -e HOST_DEV="/rootfs/dev" \
    -e HOST_ROOT="/rootfs" \
    --cpus 2 \
    --memory 1g \
    --privileged \
    --publish 9529:9529 \
    --name datakit-docker \
    -d \
    pubrepo.guance.com/datakit/datakit:1.66.2

容器启动后,查看是否启动成功:

复制代码
docker ps

如下所示,启动成功:

启动参数说明:

  • --hostname:将宿主机的主机名作为 DataKit 运行的主机名,如果需要在当前宿主机上运行多个 DataKit,可以给它适当加一些后缀 --hostname "$(hostname)-dk1"
  • --workdir:设置容器工作目录
  • -v:各种宿主机文件挂载:
    • DataKit 中有很多配置文件,我们可以将其在宿主机上准备好,通过 -v 一次性整个挂载到容器中去(容器中的路径为 conf.d/host-inputs-conf 目录)
    • 此处将宿主机根目录挂载进 Datakit,目的是访问宿主机上的各种信息(比如 /proc 目录下的各种文件),便于默认开启的采集器采集数据
    • 将 docker.sock 文件挂载进 Datakit 容器,便于 container 采集器采集数据。不同宿主机该文件目录可能不同,需按照实际来配置
  • -e:各种 Datakit 运行期的环境变量配置,这些环境变量功能跟 DaemonSet 部署 时是一样的
  • ENV_DATAWAY : 将 token 粘贴到 ENV_DATAWAY 环境变量值中 "token="
  • --publish:便于外部将 Trace 等数据发送给 Datakit 容器,此处我们将 Datakit 的 HTTP 端口映射到外面的 9529 上,诸如 trace 数据设置发送地址的时候,需关注这个端口设置。
  • --name: 指定 Docker 容器名称,否则,name 将随机生成
  • 此处对该运行的 DataKit 设置了 2C 的 CPU 和 1GiB 内存限制

假如我们在 /host/conf/dir 目录下配置了如下一些采集器:

登陆观测云平台,点击「基础设施」 - 「容器」,查看名称为 datakit-docker 容器是否上报,点击进入查看容器详情。

场景演示

如何使用 Docker 的 DataKit 采集用户应用访问数据。

开启 RUM 采集器

在挂载的目录 /etc/conf/dir/conf.d 下创建 rum 目录,然后在 rum 目录下,新建 rum.conf 文件,内容如下:

复制代码
# {"version": "1.66.2", "desc": "do NOT edit this line"}                                                                                 
                                                                                                                                         
[[inputs.rum]]                                                                                                                           
  ## profile Agent endpoints register by version respectively.                                                                           
  ## Endpoints can be skipped listen by remove them from the list.                                                                       
  ## Default value set as below. DO NOT MODIFY THESE ENDPOINTS if not necessary.                                                         
  endpoints = ["/v1/write/rum"]                                                                                                          
                                                                                                                                         
  ## used to upload rum session replay.                                                                                                  
  session_replay_endpoints = ["/v1/write/rum/replay"]                                                                                    
                                                                                                                                         
  ## specify which metrics should be captured.                                                                                           
  measurements = ["view", "resource", "action", "long_task", "error", "telemetry"]                                                       
                                                                                                                                         
  ## Android command-line-tools HOME                                                                                                     
  android_cmdline_home = "/usr/local/datakit/data/rum/tools/cmdline-tools"                                                               
                                                                                                                                         
  ## proguard HOME                                                                                                                       
  proguard_home = "/usr/local/datakit/data/rum/tools/proguard"                                                                           
                                                                                                                                         
  ## android-ndk HOME                                                                                                                    
  ndk_home = "/usr/local/datakit/data/rum/tools/android-ndk"                                                                             
                                                                                                                                         
  ## atos or atosl bin path                                                                                                              
  ## for macOS datakit use the built-in tool atos default                                                                                
  ## for Linux there are several tools that can be used to instead of macOS atos partially,                                              
  ## such as https://github.com/everettjf/atosl-rs                                                                                       
  atos_bin_path = "/usr/local/datakit/data/rum/tools/atosl"                                                                              
                                                                                                                                         
  # Provide a list to resolve CDN of your static resource.                                                                               
  # Below is the Datakit default built-in CDN list, you can uncomment that and change it to your cdn list,                               
  # it's a JSON array like: [{"domain": "CDN domain", "name": "CDN human readable name", "website": "CDN official website"},...],        
  # domain field value can contains '*' as wildcard, for example: "kunlun*.com",                                                         
  # it will match "kunluna.com", "kunlunab.com" and "kunlunabc.com" but not "kunlunab.c.com".                                            
  # cdn_map = '''                                                                                                                        
  # [                                                                                                                                    
  #   {"domain":"15cdn.com","name":"some-CDN-name","website":"https://www.15cdn.com"},                                                   
  #   {"domain":"tzcdn.cn","name":"some-CDN-name","website":"https://www.15cdn.com"}                                                     
  # ]                                                                                                                                    
  # '''                                                                                                                                  
                                                                                                                                         
  ## Threads config controls how many goroutines an agent cloud start to handle HTTP request.                                            
  ## buffer is the size of jobs' buffering of worker channel.                                                                            
  ## threads is the total number fo goroutines at running time.                                                                          
  # [inputs.rum.threads]                                                                                                                 
  #   buffer = 100                                                                                                                       
  #   threads = 8                                                                                                                        
                                                                                                                                         
  ## Storage config a local storage space in hard dirver to cache trace data.                                                            
  ## path is the local file path used to cache data.                                                                                     
  ## capacity is total space size(MB) used to store data.                                                                                
  # [inputs.rum.storage]                                                                                                                 
  #   path = "./rum_storage"                                                                                                             
  #   capacity = 5120                                                                                                                    
                                                                                                                                         
  ## session_replay config is used to control Session Replay uploading behavior.                                                         
  ## cache_path set the disk directory where temporarily cache session replay data.                                                      
  ## cache_capacity_mb specify the max storage space (in MiB) that session replay cache can use.                                         
  ## clear_cache_on_start set whether we should clear all previous session replay cache on restarting Datakit.                           
  ## upload_workers set the count of session replay uploading workers.                                                                   
  ## send_timeout specify the http timeout when uploading session replay data to dataway.                                                
  ## send_retry_count set the max retry count when sending every session replay request.                                                 
  ## filter_rules set the the filtering rules that matched session replay data will be dropped,                                          
  ## all rules are of relationship OR, that is to day, the data match any one of them will be dropped.                                   
  # [inputs.rum.session_replay]                                                                                                          
  #   cache_path = "/usr/local/datakit/cache/session_replay"                                                                             
  #   cache_capacity_mb = 20480                                                                                                          
  #   clear_cache_on_start = false                                                                                                       
  #   upload_workers = 16                                                                                                                
  #   send_timeout = "75s"                                                                                                               
  #   send_retry_count = 3                                                                                                               
  #   filter_rules = [                                                                                                                   
  #       "{ service = 'xxx' or version IN [ 'v1', 'v2'] }",                                                                             
  #       "{ app_id = 'yyy' and env = 'production' }"                                                                                    
  #   ]                                       

然后重启 DataKit。

复制代码
docker restart datakit-docker
docker ps

进入容器查看是否挂载成功,如下图所示已成功挂载。

复制代码
docker exec -it datakit-docker /bin/bash
datakit monitor

应用接入

登录观测云控制台,进入「用户访问监测」,点击左上角「新建应用」,即可开始创建一个新的应用。

选择 Web 应用,并选择本地环境部署的 NPM 接入方式。

按需填入配置参数,点击创建,即可在应用列表查看应用。

然后,将 SDK 复制到前端项目中。

启动应用后,进行访问,相关数据会上报到观测云平台。

观测云效果

登录观测云控制台,点击「用户访问监测」 -「应用列表」,然后点击创建的应用。

点击查看器,就能查询采集到的用户访问数据。

相关推荐
XIAOHEZIcode12 小时前
Linux系统鼠标偏移常见原因以及修复方案
linux·运维·游戏
用户0328472220701 天前
如何搭建本地yum源(上)
运维
武子康1 天前
调查研究-183 Apple container:Mac 上用轻量 VM 跑 Linux 容器,Swift 会改写本地容器体验吗?
docker·容器·apple
大树884 天前
金刚石散热越强,管路越先见顶
大数据·运维·服务器·人工智能·ai
摇滚侠4 天前
Linux CentOS7 rpm 安装 MySQL 5.7
linux·运维·mysql
霸道流氓气质4 天前
领域驱动设计(DDD)在 Spring Boot 微服务中的实践指南
运维·spring boot·微服务
Inhand陈工4 天前
基于台达PLC与映翰通IG502的智慧水产养殖精准投喂与远程运维解决方案
运维·人工智能·物联网·阿里云·信息与通信
Alsn864 天前
等待学习-学习目录:Docker 容器安全攻防
学习·安全·docker
酣大智4 天前
ARP代理--工作原理
运维·网络·arp·arp代理
shushangyun_4 天前
2026年快消品B2B系统推荐:支持终端门店订货、促销政策自动化的工具?
java·运维·网络·数据库·人工智能·spring·自动化