Win10环境借助DockerDesktop部署最新版大数据时序数据库Apache Druid32.0.0
前言
大数据分析中,有一种常见的场景,那就是时序数据,简言之,数据一旦产生绝对不会修改,随着时间流逝,每个时间点都会有个新的状态值。这种时序数据的量级往往异常夸张,例如传感器的原始监控数据:
https://lizhiyong.blog.csdn.net/article/details/114898620
一个简单的加速度传感器一年的数据量就是31e!!!制造业传感器数据如果不经底层PLC
等下位机预处理,直接打到边缘计算网关,即使mqtt
也会有巨大的负载!!!
类似的,还有服务器的原始监控数据,例如常见的Prometheus
和Zabbix
,当集群很多时,监控项同样很多,再算上虚拟化后的容器和虚拟机内都可能部署了监控,此时的数据量级就灰常可观!!!一小时几百亿条数据都是常见的事情!!!
但是很多原始的监控数据如果全部存下来,存储成本高的可怕,同时信息密度极低,更多时候我们可能只关注近期的全部热数据来做在线的模型训练,人工查看每秒钟几千条数据也是不切合实际的,事实上,做一个简单的秒级/分钟级统计就能满足大多数的分析场景,超过1天的冷数据其实已经没什么时效性。
对于此类场景,可以高吞吐、预聚合的数据库,在压测后,从Apache Druid
、Clickhouse
、Kylin
中,选择了前者。。。专业的事情要交给专业的组件去做!!!
对于非内核和二开的业务开发人员,更多场景应该关注的是API、特性及用法,不应该在部署这种事情上花费太多精力!!!笔者之前已部署了Docker Desktop:
https://lizhiyong.blog.csdn.net/article/details/145580868
今天在Win10环境再搭建个Apache Druid
最新版玩玩。
版本选择
官网:
http
https://druid.apache.org/
注意不是阿里数据库连接池的那个Druid
!!!

截至2025-02-13
,Apache Druid
最新版本是32.0.0
。
资源准备
参考官网:
http
https://druid.apache.org/docs/latest/tutorials/docker
官方给出了使用docker-compose.yml
编排容器的教程,作为一个实时组件,大内存是必须的!!!但是启动8个容器【Zookeeper
+PostgreSQL
+6个Druid
】每个最多7GB内存也不是什么大事!!!
http
https://raw.githubusercontent.com/apache/druid/32.0.0/distribution/docker/docker-compose.yml
获取到这个资源文件:
yaml
version: "2.2"
volumes:
metadata_data: {}
middle_var: {}
historical_var: {}
broker_var: {}
coordinator_var: {}
router_var: {}
druid_shared: {}
services:
postgres:
container_name: postgres
image: postgres:latest
ports:
- "5432:5432"
volumes:
- metadata_data:/var/lib/postgresql/data
environment:
- POSTGRES_PASSWORD=FoolishPassword
- POSTGRES_USER=druid
- POSTGRES_DB=druid
# Need 3.5 or later for container nodes
zookeeper:
container_name: zookeeper
image: zookeeper:3.5.10
ports:
- "2181:2181"
environment:
- ZOO_MY_ID=1
coordinator:
image: apache/druid:32.0.0
container_name: coordinator
volumes:
- druid_shared:/opt/shared
- coordinator_var:/opt/druid/var
depends_on:
- zookeeper
- postgres
ports:
- "8081:8081"
command:
- coordinator
env_file:
- environment
broker:
image: apache/druid:32.0.0
container_name: broker
volumes:
- broker_var:/opt/druid/var
depends_on:
- zookeeper
- postgres
- coordinator
ports:
- "8082:8082"
command:
- broker
env_file:
- environment
historical:
image: apache/druid:32.0.0
container_name: historical
volumes:
- druid_shared:/opt/shared
- historical_var:/opt/druid/var
depends_on:
- zookeeper
- postgres
- coordinator
ports:
- "8083:8083"
command:
- historical
env_file:
- environment
middlemanager:
image: apache/druid:32.0.0
container_name: middlemanager
volumes:
- druid_shared:/opt/shared
- middle_var:/opt/druid/var
depends_on:
- zookeeper
- postgres
- coordinator
ports:
- "8091:8091"
- "8100-8105:8100-8105"
command:
- middleManager
env_file:
- environment
router:
image: apache/druid:32.0.0
container_name: router
volumes:
- router_var:/opt/druid/var
depends_on:
- zookeeper
- postgres
- coordinator
ports:
- "3012:8888" #这里笔者改为3012防止霸占有用的端口
command:
- router
env_file:
- environment
参照官网另一篇:
http
https://druid.apache.org/docs/latest/configuration/
自己玩玩可以先不改这些运行时配置,容器启动的,后续要重新部署也非常容易!!!
还需要:
http
https://raw.githubusercontent.com/apache/druid/32.0.0/distribution/docker/environment
做另一个配置文件:
yaml
# Java tuning
#DRUID_XMX=1g
#DRUID_XMS=1g
#DRUID_MAXNEWSIZE=250m
#DRUID_NEWSIZE=250m
#DRUID_MAXDIRECTMEMORYSIZE=6172m
DRUID_SINGLE_NODE_CONF=micro-quickstart
druid_emitter_logging_logLevel=debug
druid_extensions_loadList=["druid-histogram", "druid-datasketches", "druid-lookups-cached-global", "postgresql-metadata-storage", "druid-multi-stage-query"]
druid_zk_service_host=zookeeper
druid_metadata_storage_host=
druid_metadata_storage_type=postgresql
druid_metadata_storage_connector_connectURI=jdbc:postgresql://postgres:5432/druid
druid_metadata_storage_connector_user=druid
druid_metadata_storage_connector_password=FoolishPassword
druid_indexer_runner_javaOptsArray=["-server", "-Xmx1g", "-Xms1g", "-XX:MaxDirectMemorySize=3g", "-Duser.timezone=UTC", "-Dfile.encoding=UTF-8", "-Djava.util.logging.manager=org.apache.logging.log4j.jul.LogManager"]
druid_indexer_fork_property_druid_processing_buffer_sizeBytes=256MiB
druid_storage_type=local
druid_storage_storageDirectory=/opt/shared/segments
druid_indexer_logs_type=file
druid_indexer_logs_directory=/opt/shared/indexing-logs
druid_processing_numThreads=2
druid_processing_numMergeBuffers=2
DRUID_LOG4J=<?xml version="1.0" encoding="UTF-8" ?><Configuration status="WARN"><Appenders><Console name="Console" target="SYSTEM_OUT"><PatternLayout pattern="%d{ISO8601} %p [%t] %c - %m%n"/></Console></Appenders><Loggers><Root level="info"><AppenderRef ref="Console"/></Root><Logger name="org.apache.druid.jetty.RequestLog" additivity="false" level="DEBUG"><AppenderRef ref="Console"/></Logger></Loggers></Configuration>
部署文件看起来麻雀虽小五脏俱全!!!
部署
cmd
PS C:\Users\zhiyong> cd E:\dockerData\volume\druid1
PS E:\dockerData\volume\druid1> ls
目录: E:\dockerData\volume\druid1
Mode LastWriteTime Length Name
---- ------------- ------ ----
-a---- 2025-02-13 23:26 2980 docker-compose.yml
-a---- 2025-02-13 23:33 1576 environment
PS E:\dockerData\volume\druid1> docker compose up -d
time="2025-02-13T23:34:39+08:00" level=warning msg="E:\\dockerData\\volume\\druid1\\docker-compose.yml: the attribute `version` is obsolete, it will be ignored, please remove it to avoid potential confusion"
[+] Running 72/15
✔ router Pulled 230.7s
✔ coordinator Pulled 230.7s
✔ postgres Pulled 181.0s
✔ historical Pulled 230.7s
✔ broker Pulled 230.7s
✔ middlemanager Pulled 230.7s
✔ zookeeper Pulled 85.7s
[+] Running 15/15
✔ Network druid1_default Created 0.1s
✔ Volume "druid1_druid_shared" Created 0.0s
✔ Volume "druid1_historical_var" Created 0.0s
✔ Volume "druid1_middle_var" Created 0.0s
✔ Volume "druid1_router_var" Created 0.0s
✔ Volume "druid1_metadata_data" Created 0.0s
✔ Volume "druid1_coordinator_var" Created 0.0s
✔ Volume "druid1_broker_var" Created 0.0s
✔ Container postgres Started 2.4s
✔ Container zookeeper Started 2.4s
✔ Container coordinator Started 1.6s
✔ Container router Started 2.5s
✔ Container broker Started 2.3s
✔ Container historical Started 2.5s
✔ Container middlemanager Started 2.8s
PS E:\dockerData\volume\druid1>
拉取镜像成功后很快就能拉起容器:

好家伙。。。还顺便把其它组件的端口也给暴露出来了。。。
于是还**白piao
**到一个PG和Zookeeper
!!!
验证
http
http://localhost:3012/unified-console.html#

灰常好,现在已经拥有了一个最新Apache Druid32.0.0
!!!
转载请注明出处:https://lizhiyong.blog.csdn.net/article/details/145622903
