Systemd -systemctl - journalctl 速查表：服务管理 + 日志排障

适用人群 ：Linux 运维 / SRE / DevOps
适用场景 ：服务起不来、频繁重启、启动慢、现场快速查日志（systemd-journald）
覆盖范围 ：systemctl（服务管理）、journalctl（日志检索）、systemd-analyze（启动性能）、自定义 Service、Timer 替代 Cron

一、先记住这条"排障最短路径"

服务异常 ：先 status 看状态 + 最近日志 → 再 journalctl -u 精确过滤 → 必要时看依赖/启动链路 → 最后再改配置/回滚

bash 复制代码

# 1) 看服务状态（含最近日志片段、主进程 PID、退出码）
systemctl status <service> -l --no-pager

# 2) 拉取该服务最近 200 行日志
journalctl -u <service> -n 200 --no-pager

# 3) 实时跟踪 + 高亮最近现场变化
journalctl -u <service> -f

二、systemctl：服务管理（90% 日常操作）

1）启动 / 停止 / 重启 / 重载

bash 复制代码

systemctl start <service>
systemctl stop <service>
systemctl restart <service>

# 仅重载配置（服务支持 reload 才有效）
systemctl reload <service>

# 需要时：reload 不行就 reload-or-restart
systemctl reload-or-restart <service>

2）开机自启：enable / disable

bash 复制代码

# 开机自启（写入相应 target 的依赖）
systemctl enable <service>

# 关闭开机自启
systemctl disable <service>

# 立即启用并启动
systemctl enable --now <service>

# 立即停止并禁用
systemctl disable --now <service>

3）判断服务是否"真的在跑"

bash 复制代码

# 返回 active/inactive/failed 等（脚本友好）
systemctl is-active <service>

# 返回 enabled/disabled/static 等（是否自启）
systemctl is-enabled <service>

# 是否失败过（failed 状态）
systemctl is-failed <service>

4）查看配置与依赖（定位"为什么起不来/起得慢"）

bash 复制代码

# 查看 unit 内容（含 drop-in 覆盖）
systemctl cat <service>

# 查看 unit 文件路径与生效片段
systemctl show -p FragmentPath,DropInPaths <service>

# 编辑（推荐生成 drop-in，不直接改原文件）
systemctl edit <service>

# 依赖关系（看被谁拉起、依赖谁）
systemctl list-dependencies <service>

# 看"反向依赖"（谁依赖它）
systemctl list-dependencies --reverse <service>

5）列出服务（快速筛选失败项）

bash 复制代码

# 列出所有 service
systemctl list-units --type=service

# 只看失败的
systemctl --failed

# 只看正在运行的
systemctl list-units --type=service --state=running

三、journalctl：日志检索（现场定位的核心）

journald 是二进制日志存储；journalctl 是标准读取方式。对比"翻 /var/log/xx"更统一、过滤能力更强。

1）实时跟踪（相当于 tail -f）

bash 复制代码

journalctl -f
journalctl -u <service> -f

2）按服务过滤（最常用）

bash 复制代码

journalctl -u <service>
journalctl -u <service> -n 200 --no-pager
journalctl -u <service> --since "1 hour ago"
journalctl -u <service> --since "2026-02-05 10:00:00" --until "2026-02-05 10:30:00"

3）按优先级过滤（只看错误/告警）

优先级从高到低：emerg(0) alert(1) crit(2) err(3) warning(4) notice(5) info(6) debug(7)

bash 复制代码

# 只看 error 及以上（包含 crit/alert/emerg）
journalctl -p err -b

# 某服务仅看 warning 以上
journalctl -u <service> -p warning --since today

4）按启动周期（boot）过滤：定位"重启后才出现"的问题

bash 复制代码

# 列出历史启动记录（可看到 boot id 和时间范围）
journalctl --list-boots

# 本次启动
journalctl -b

# 上一次启动（-1），上上次（-2）...
journalctl -b -1

# 上一次启动里某服务日志
journalctl -b -1 -u <service>

5）内核日志（替代/补充 dmesg）

bash 复制代码

# 只看内核
journalctl -k

# 本次启动的内核日志
journalctl -k -b

6）结构化字段过滤（更"精准"的玩法）

bash 复制代码

# 按 PID
journalctl _PID=1234

# 按 UID（某用户相关日志）
journalctl _UID=1000 --since today

# 按可执行文件路径（某二进制产生的日志）
journalctl /usr/sbin/sshd

7）输出格式（便于拷贝、便于脚本）

bash 复制代码

# 不分页（适合复制粘贴/脚本）
journalctl -u <service> --no-pager

# 更紧凑
journalctl -u <service> -o short-iso

# JSON（适合后续 jq/采集）
journalctl -u <service> -o json-pretty

四、日志维护：journal 太大怎么控（磁盘爆了常见）

journald 支持 vacuum（按大小/时间/文件数）清理。生产更推荐"先确认保留策略"，再做清理动作。

bash 复制代码

# 查看当前占用
journalctl --disk-usage

# 控制到 500MB
journalctl --vacuum-size=500M

# 只保留最近 2 周
journalctl --vacuum-time=2weeks

# 只保留 10 个归档文件
journalctl --vacuum-files=10

长期策略（配置项） ：编辑 /etc/systemd/journald.conf（如 SystemMaxUse=, SystemKeepFree=, MaxRetentionSec= 等），修改后重启 journald：

bash 复制代码

systemctl restart systemd-journald

五、Advanced：启动慢怎么查（systemd-analyze）

1）看总启动耗时

bash 复制代码

systemd-analyze

2）看"谁最慢"（按服务启动耗时排序）

bash 复制代码

systemd-analyze blame

注意：blame 反映"耗时长"，但不一定是"阻塞链路"。要找真正卡住启动流程的，得看 critical-chain。

3）看关键链路（谁阻塞了谁）

bash 复制代码

systemd-analyze critical-chain
systemd-analyze critical-chain <target>

六、实战模板：创建一个自定义 systemd Service（最小可用版）

示例：把某个后台程序做成标准服务，具备：开机自启、自动重启、日志进 journal、支持 status/journalctl 一把梭。

1）创建 unit 文件

bash 复制代码

sudo tee /etc/systemd/system/myapp.service >/dev/null <<'EOF'
[Unit]
Description=MyApp Service
After=network-online.target
Wants=network-online.target

[Service]
Type=simple
User=myapp
Group=myapp
WorkingDirectory=/opt/myapp
ExecStart=/opt/myapp/myapp --config /etc/myapp/config.yaml
Restart=on-failure
RestartSec=3
# 建议：限制资源，避免"拖死整机"
LimitNOFILE=65535

[Install]
WantedBy=multi-user.target
EOF

2）加载并启动

bash 复制代码

sudo systemctl daemon-reload
sudo systemctl enable --now myapp
systemctl status myapp -l --no-pager
journalctl -u myapp -n 200 --no-pager

七、Timer vs Cron：用 systemd timer 做"可观测"的定时任务

Cron 的痛点 ：日志/执行状态分散，不易统一观测；
Timer 的优势 ：天然进 systemctl/journalctl，失败可追踪，可依赖网络/挂载等条件。

1）创建任务：myjob.service

bash 复制代码

sudo tee /etc/systemd/system/myjob.service >/dev/null <<'EOF'
[Unit]
Description=My periodic job

[Service]
Type=oneshot
ExecStart=/usr/local/bin/myjob.sh
EOF

2）创建定时器：myjob.timer（每 15 分钟）

bash 复制代码

sudo tee /etc/systemd/system/myjob.timer >/dev/null <<'EOF'
[Unit]
Description=Run myjob every 15 minutes

[Timer]
OnCalendar=*:0/15
Persistent=true

[Install]
WantedBy=timers.target
EOF

3）启用与观测

bash 复制代码

sudo systemctl daemon-reload
sudo systemctl enable --now myjob.timer

# 看所有 timer
systemctl list-timers --all

# 看定时任务日志
journalctl -u myjob.service -n 200 --no-pager

八、常见坑位速查（很"线上"）

改了 unit 不生效 ：忘了 systemctl daemon-reload
enable 了但不自动拉起 ：unit 是 static 或者缺少 [Install]/WantedBy
服务启动立即退出 ：Type= 不匹配（oneshot/forking/simple），或 ExecStart 程序本身前台/后台行为不一致
journal 看不到日志 ：服务输出没走 stdout/stderr，或权限/字段过滤写错（优先用 journalctl -u 验证）
启动慢定位偏差 ：只看 blame 不够，务必结合 critical-chain