在Docker中部署DataKit最佳实践

本文主要介绍如何在 Docker 中安装 DataKit。

配置和启动 DataKit 容器

登陆观测云平台,点击「集成」 -「DataKit」 - 「Docker」,然后拷贝第二步的启动命令,启动参数按实际情况配置。

拷贝启动命令:

复制代码
sudo docker run \
    --hostname "$(hostname)" \
    --workdir /usr/local/datakit \
    -v "/etc/conf/dir/conf.d":"/usr/local/datakit/conf.d/host-inputs-conf"
    -v "/":"/rootfs" \
    -v /var/run/docker.sock:/var/run/docker.sock \
    -e ENV_DATAWAY="https://openway.guance.com?token=tkn_XXXX" \
    -e ENV_DEFAULT_ENABLED_INPUTS='cpu,disk,diskio,mem,swap,system,net,host_processes,hostobject,container,dk' \
    -e ENV_GLOBAL_HOST_TAGS="tag1=a1,tag2=a2" \
    -e ENV_HTTP_LISTEN="0.0.0.0:9529" \
    -e HOST_PROC="/rootfs/proc" \
    -e HOST_SYS="/rootfs/sys" \
    -e HOST_ETC="/rootfs/etc" \
    -e HOST_VAR="/rootfs/var" \
    -e HOST_RUN="/rootfs/run" \
    -e HOST_DEV="/rootfs/dev" \
    -e HOST_ROOT="/rootfs" \
    --cpus 2 \
    --memory 1g \
    --privileged \
    --publish 9529:9529 \
    --name datakit-docker \
    -d \
    pubrepo.guance.com/datakit/datakit:1.66.2

容器启动后,查看是否启动成功:

复制代码
docker ps

如下所示,启动成功:

启动参数说明:

  • --hostname:将宿主机的主机名作为 DataKit 运行的主机名,如果需要在当前宿主机上运行多个 DataKit,可以给它适当加一些后缀 --hostname "$(hostname)-dk1"
  • --workdir:设置容器工作目录
  • -v:各种宿主机文件挂载:
    • DataKit 中有很多配置文件,我们可以将其在宿主机上准备好,通过 -v 一次性整个挂载到容器中去(容器中的路径为 conf.d/host-inputs-conf 目录)
    • 此处将宿主机根目录挂载进 Datakit,目的是访问宿主机上的各种信息(比如 /proc 目录下的各种文件),便于默认开启的采集器采集数据
    • 将 docker.sock 文件挂载进 Datakit 容器,便于 container 采集器采集数据。不同宿主机该文件目录可能不同,需按照实际来配置
  • -e:各种 Datakit 运行期的环境变量配置,这些环境变量功能跟 DaemonSet 部署 时是一样的
  • ENV_DATAWAY : 将 token 粘贴到 ENV_DATAWAY 环境变量值中 "token="
  • --publish:便于外部将 Trace 等数据发送给 Datakit 容器,此处我们将 Datakit 的 HTTP 端口映射到外面的 9529 上,诸如 trace 数据设置发送地址的时候,需关注这个端口设置。
  • --name: 指定 Docker 容器名称,否则,name 将随机生成
  • 此处对该运行的 DataKit 设置了 2C 的 CPU 和 1GiB 内存限制

假如我们在 /host/conf/dir 目录下配置了如下一些采集器:

登陆观测云平台,点击「基础设施」 - 「容器」,查看名称为 datakit-docker 容器是否上报,点击进入查看容器详情。

场景演示

如何使用 Docker 的 DataKit 采集用户应用访问数据。

开启 RUM 采集器

在挂载的目录 /etc/conf/dir/conf.d 下创建 rum 目录,然后在 rum 目录下,新建 rum.conf 文件,内容如下:

复制代码
# {"version": "1.66.2", "desc": "do NOT edit this line"}                                                                                 
                                                                                                                                         
[[inputs.rum]]                                                                                                                           
  ## profile Agent endpoints register by version respectively.                                                                           
  ## Endpoints can be skipped listen by remove them from the list.                                                                       
  ## Default value set as below. DO NOT MODIFY THESE ENDPOINTS if not necessary.                                                         
  endpoints = ["/v1/write/rum"]                                                                                                          
                                                                                                                                         
  ## used to upload rum session replay.                                                                                                  
  session_replay_endpoints = ["/v1/write/rum/replay"]                                                                                    
                                                                                                                                         
  ## specify which metrics should be captured.                                                                                           
  measurements = ["view", "resource", "action", "long_task", "error", "telemetry"]                                                       
                                                                                                                                         
  ## Android command-line-tools HOME                                                                                                     
  android_cmdline_home = "/usr/local/datakit/data/rum/tools/cmdline-tools"                                                               
                                                                                                                                         
  ## proguard HOME                                                                                                                       
  proguard_home = "/usr/local/datakit/data/rum/tools/proguard"                                                                           
                                                                                                                                         
  ## android-ndk HOME                                                                                                                    
  ndk_home = "/usr/local/datakit/data/rum/tools/android-ndk"                                                                             
                                                                                                                                         
  ## atos or atosl bin path                                                                                                              
  ## for macOS datakit use the built-in tool atos default                                                                                
  ## for Linux there are several tools that can be used to instead of macOS atos partially,                                              
  ## such as https://github.com/everettjf/atosl-rs                                                                                       
  atos_bin_path = "/usr/local/datakit/data/rum/tools/atosl"                                                                              
                                                                                                                                         
  # Provide a list to resolve CDN of your static resource.                                                                               
  # Below is the Datakit default built-in CDN list, you can uncomment that and change it to your cdn list,                               
  # it's a JSON array like: [{"domain": "CDN domain", "name": "CDN human readable name", "website": "CDN official website"},...],        
  # domain field value can contains '*' as wildcard, for example: "kunlun*.com",                                                         
  # it will match "kunluna.com", "kunlunab.com" and "kunlunabc.com" but not "kunlunab.c.com".                                            
  # cdn_map = '''                                                                                                                        
  # [                                                                                                                                    
  #   {"domain":"15cdn.com","name":"some-CDN-name","website":"https://www.15cdn.com"},                                                   
  #   {"domain":"tzcdn.cn","name":"some-CDN-name","website":"https://www.15cdn.com"}                                                     
  # ]                                                                                                                                    
  # '''                                                                                                                                  
                                                                                                                                         
  ## Threads config controls how many goroutines an agent cloud start to handle HTTP request.                                            
  ## buffer is the size of jobs' buffering of worker channel.                                                                            
  ## threads is the total number fo goroutines at running time.                                                                          
  # [inputs.rum.threads]                                                                                                                 
  #   buffer = 100                                                                                                                       
  #   threads = 8                                                                                                                        
                                                                                                                                         
  ## Storage config a local storage space in hard dirver to cache trace data.                                                            
  ## path is the local file path used to cache data.                                                                                     
  ## capacity is total space size(MB) used to store data.                                                                                
  # [inputs.rum.storage]                                                                                                                 
  #   path = "./rum_storage"                                                                                                             
  #   capacity = 5120                                                                                                                    
                                                                                                                                         
  ## session_replay config is used to control Session Replay uploading behavior.                                                         
  ## cache_path set the disk directory where temporarily cache session replay data.                                                      
  ## cache_capacity_mb specify the max storage space (in MiB) that session replay cache can use.                                         
  ## clear_cache_on_start set whether we should clear all previous session replay cache on restarting Datakit.                           
  ## upload_workers set the count of session replay uploading workers.                                                                   
  ## send_timeout specify the http timeout when uploading session replay data to dataway.                                                
  ## send_retry_count set the max retry count when sending every session replay request.                                                 
  ## filter_rules set the the filtering rules that matched session replay data will be dropped,                                          
  ## all rules are of relationship OR, that is to day, the data match any one of them will be dropped.                                   
  # [inputs.rum.session_replay]                                                                                                          
  #   cache_path = "/usr/local/datakit/cache/session_replay"                                                                             
  #   cache_capacity_mb = 20480                                                                                                          
  #   clear_cache_on_start = false                                                                                                       
  #   upload_workers = 16                                                                                                                
  #   send_timeout = "75s"                                                                                                               
  #   send_retry_count = 3                                                                                                               
  #   filter_rules = [                                                                                                                   
  #       "{ service = 'xxx' or version IN [ 'v1', 'v2'] }",                                                                             
  #       "{ app_id = 'yyy' and env = 'production' }"                                                                                    
  #   ]                                       

然后重启 DataKit。

复制代码
docker restart datakit-docker
docker ps

进入容器查看是否挂载成功,如下图所示已成功挂载。

复制代码
docker exec -it datakit-docker /bin/bash
datakit monitor

应用接入

登录观测云控制台,进入「用户访问监测」,点击左上角「新建应用」,即可开始创建一个新的应用。

选择 Web 应用,并选择本地环境部署的 NPM 接入方式。

按需填入配置参数,点击创建,即可在应用列表查看应用。

然后,将 SDK 复制到前端项目中。

启动应用后,进行访问,相关数据会上报到观测云平台。

观测云效果

登录观测云控制台,点击「用户访问监测」 -「应用列表」,然后点击创建的应用。

点击查看器,就能查询采集到的用户访问数据。

相关推荐
一心09242 分钟前
ubuntu 20.04.6 sudo 源码包在线升级到1.9.17p1
运维·ubuntu·sudo·漏洞升级
好好学习啊天天向上43 分钟前
世上最全:ubuntu 上及天河超算上源码编译llvm遇到的坑,cmake,ninja完整过程
linux·运维·ubuntu·自动性能优化
你想考研啊1 小时前
三、jenkins使用tomcat部署项目
运维·tomcat·jenkins
代码老y2 小时前
Docker:容器化技术的基石与实践指南
运维·docker·容器
典学长编程2 小时前
Linux操作系统从入门到精通!第二天(命令行)
linux·运维·chrome
DuelCode3 小时前
Windows VMWare Centos Docker部署Springboot 应用实现文件上传返回文件http链接
java·spring boot·mysql·nginx·docker·centos·mybatis
你想考研啊5 小时前
四、jenkins自动构建和设置邮箱
运维·jenkins
Code blocks5 小时前
使用Jenkins完成springboot项目快速更新
java·运维·spring boot·后端·jenkins
饥饿的半导体6 小时前
Linux快速入门
linux·运维
杨浦老苏7 小时前
开源服务运行监控工具Lunalytics
docker·群晖·网站监控