OOM机制就是kill那些占用内存多且优先级低的进程以此来保证操作系统内核的正常运转,一旦我们关闭OOM可能会导致操作系统内核奔溃。
https://manpages.ubuntu.com/manpages/jammy/en/man1/choom.1.html
Linux kernel uses the badness heuristic to select which process gets killed in out of memory conditions.
Linux 内核使用不良探索式来选择在内存不足的情况下终止哪个进程。
涉及两个重要参数
oom_score
可以简单理解oom_score=内存消耗/总内存 *1000,也就是badneess分数,最高的会被kill掉
The badness heuristic assigns a value to each candidate task ranging from 0 (never kill) to 1000 (always kill) to determine which process is targeted. The units are roughly a proportion along that range of allowed memory the process may allocate from based on an estimation of its current memory and swap use. For example, if a task is using all allowed memory, its badness score will be 1000. If it is using half of its allowed memory, its score will be 500.
不良探索式为每个候选任务分配一个从 0(从不杀死)到 1000(总是杀死)的值,以确定哪个进程是目标。这些单位大致是进程可以根据其当前内存和交换使用的估计进行分配的允许内存范围的比例。例如,如果某个任务使用了所有允许的内存,则其不良分数将为 1000。如果它使用了允许的内存的一半,则其分数将为 500。
oom_score_adj
The adjust score value is added to the badness score before it is used to determine which task to kill. Acceptable values range from -1000 to +1000. This allows userspace to polarize the preference for oom killing either by always preferring a certain task or completely disabling it. The lowest possible value, -1000, is equivalent to disabling oom killing entirely for that task since it will always report a badness score of 0.
Setting an adjust score value of +500, for example, is roughly equivalent to allowing the remainder of tasks sharing the same system, cpuset, mempolicy, or memory controller resources to use at least 50% more memory. A value of -500, on the other hand, would be roughly equivalent to discounting 50% of the task's allowed memory from being considered as scoring against the task.
调整分数值会先添加到不良分数中,然后再用于确定要终止哪个任务。可接受的值范围为 -1000 到 +1000(建议值越小,进程被杀的机会越低。如果将其设置为 -1000 时,进程将被禁止杀掉。)。这允许用户空间通过始终优先选择某个任务或完全禁用它来极化 oom 终止的偏好。最低可能值 -1000,相当于完全禁用该任务的 oomkilling,因为它总是报告 0 的坏度分数。
例如,将调整分值设置为 +500 大致相当于允许共享相同系统、cpuset、mempolicy 或内存控制器资源的其余任务使用至少 50% 以上的内存。另一方面,值 -500 大致相当于将任务允许内存的 50% 打折扣,不将其视为针对任务的评分。
OOM的存在是为了保证操作系统内核的正常运行
https://www.oracle.com/technical-resources/articles/it-infrastructure/dev-oom-killer.html
The Linux kernel allocates memory upon the demand of the applications running on the system. Because many applications allocate their memory up front and often don't utilize the memory allocated, the kernel was designed with the ability to over-commit memory to make memory usage more efficient. This over-commit model allows the kernel to allocate more memory than it actually has physically available. If a process actually utilizes the memory it was allocated, the kernel then provides these resources to the application. When too many applications start utilizing the memory they were allocated, the over-commit model sometimes becomes problematic and the kernel must start killing processes in order to stay operational. The mechanism the kernel uses to recover memory on the system is referred to as the out-of-memory killer or OOM killer for short.
Linux 内核根据系统上运行的应用程序的需求分配内存。由于许多应用程序预先分配内存并且通常不利用分配的内存,因此内核设计为能够过度使用内存以使内存使用更有效。这种过度使用模型允许内核分配比实际可用的内存更多的内存。如果进程实际使用了为其分配的内存,则内核会将这些资源提供给应用程序。当太多应用程序开始使用为其分配的内存时,过度提交模型有时会出现问题,并且内核必须开始终止进程才能保持运行。内核用于恢复系统内存的机制称为内存不足杀手或简称 OOM 杀手。
查看服务器是否禁用了OOM机制,执行sysctl -a |grep panic_on_oom,如果vm.panic_on_oom=0就表示开启,如果想禁用的话执行vim /etc/sysctl.conf,修改vm.panic_on_oom = 1(1表示关闭,默认为0表示开启OOM),再执行sysctl -p
Postgresql在ubuntu上遭遇OOM的一个例子
服务器本身的内存和swap信息
bash
root@PGD001:~# free -m
total used free shared buff/cache available
Mem: 32058 19815 2984 4956 9258 6822
Swap: 4095 12 4083
dmesg命令看到信息如下
bash
root@PGD001:~# dmesg -T |grep postgres
[Wed Nov 15 20:31:32 2023] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=system-postgresql.slice,mems_allowed=0,global_oom,task_memcg=/system.slice/mountdatadomaindir.service,task=bash,pid=2392236,uid=0
[Wed Nov 15 20:31:33 2023] Out of memory: Killed process 2627 (postgres) total-vm:37766764kB, anon-rss:24965976kB, file-rss:2476kB, shmem-rss:2224896kB, UID:115 pgtables:63852kB oom_score_adj:-900
[Wed Nov 15 20:31:36 2023] oom_reaper: reaped process 2627 (postgres), now anon-rss:0kB, file-rss:0kB, shmem-rss:2224896kB
备注:anon-rss表示anonymous resident set size匿名驻留集
egrep看到OS错误日志信息如下,发现很多服务都被oom-kill掉了
bash
root@PGD001:~# egrep -i -r 'killed process' /var/log/syslog
Nov 15 20:31:34 PGD001 kernel: [1097200.699832] Out of memory: Killed process 2392264 (centrifydc) total-vm:4228kB, anon-rss:156kB, file-rss:1016kB, shmem-rss:0kB, UID:0 pgtables:48kB oom_score_adj:0
Nov 15 20:31:34 PGD001 kernel: [1097200.788886] Out of memory: Killed process 2392236 (bash) total-vm:7368kB, anon-rss:240kB, file-rss:744kB, shmem-rss:0kB, UID:0 pgtables:60kB oom_score_adj:0
Nov 15 20:31:34 PGD001 kernel: [1097200.947275] Out of memory: Killed process 872 (systemd-timesyn) total-vm:89356kB, anon-rss:128kB, file-rss:0kB, shmem-rss:0kB, UID:104 pgtables:72kB oom_score_adj:0
Nov 15 20:31:34 PGD001 kernel: [1097201.246575] Out of memory: Killed process 2392239 (boostfs) total-vm:9584kB, anon-rss:256kB, file-rss:216kB, shmem-rss:0kB, UID:0 pgtables:52kB oom_score_adj:0
Nov 15 20:31:34 PGD001 kernel: [1097201.248859] Out of memory: Killed process 2627 (postgres) total-vm:37766764kB, anon-rss:24965976kB, file-rss:2476kB, shmem-rss:2224896kB, UID:115 pgtables:63852kB oom_score_adj:-900
OS错误日志记录postgresql的信息如下
root@PGD001:~# vim /var/log/syslog
Nov 15 20:31:34 PGD001 kernel: [1097200.788558] Tasks state (memory values in pages):
Nov 15 20:31:34 PGD001 kernel: [1097200.788559] [ pid ] uid tgid total_vm rss pgtables_bytes swapents oom_score_adj name
Nov 15 20:31:34 PGD001 kernel: [1097200.788606] [ 1033] 115 1033 2169621 34143 536576 453 -900 postgres
Nov 15 20:31:34 PGD001 kernel: [1097200.788613] [ 1096] 115 1096 18294 356 118784 471 -900 postgres
Nov 15 20:31:34 PGD001 kernel: [1097200.788616] [ 1097] 115 1097 2169737 1004636 10375168 477 -900 postgres
Nov 15 20:31:34 PGD001 kernel: [1097200.788619] [ 1098] 115 1098 2169666 21642 315392 484 -900 postgres
Nov 15 20:31:34 PGD001 kernel: [1097200.788621] [ 1106] 115 1106 2169621 4454 172032 482 -900 postgres
Nov 15 20:31:34 PGD001 kernel: [1097200.788624] [ 1107] 115 1107 2170049 704 188416 495 -900 postgres
Nov 15 20:31:34 PGD001 kernel: [1097200.788626] [ 1108] 115 1108 2169647 369 139264 467 -900 postgres
Nov 15 20:31:34 PGD001 kernel: [1097200.788630] [ 1109] 115 1109 2170018 552 159744 494 -900 postgres
Nov 15 20:31:34 PGD001 kernel: [1097200.788633] [ 2627] 115 2627 9441691 6798507 65384448 989683 -900 postgres
Nov 15 20:31:34 PGD001 kernel: [1097200.788636] [2377149] 115 2377149 2171112 6198 315392 468 -900 postgres
Nov 15 20:31:34 PGD001 kernel: [1097200.788639] [2377150] 115 2377150 2171136 6728 315392 410 -900 postgres
Nov 15 20:31:34 PGD001 kernel: [1097200.788643] [2382504] 115 2382504 2172165 7311 323584 394 -900 postgres
Nov 15 20:31:34 PGD001 kernel: [1097200.788649] [2390327] 115 2390327 2172158 7270 323584 376 -900 postgres
Nov 15 20:31:34 PGD001 kernel: [1097200.788655] [2392025] 115 2392025 2240168 194775 3178496 371 -900 postgres
Nov 15 20:31:34 PGD001 kernel: [1097200.788659] [2392066] 115 2392066 2173236 43043 1736704 376 -900 postgres
Nov 15 20:31:34 PGD001 kernel: [1097200.788663] [2392076] 115 2392076 2193878 120763 2428928 376 -900 postgres
Nov 15 20:31:34 PGD001 kernel: [1097200.788665] [2392080] 115 2392080 2243484 197961 3190784 371 -900 postgres
Nov 15 20:31:34 PGD001 kernel: [1097200.788668] [2392093] 115 2392093 2226648 154194 2289664 371 -900 postgres
Nov 15 20:31:34 PGD001 kernel: [1097200.788671] [2392095] 115 2392095 2224712 63962 1990656 373 -900 postgres
Nov 15 20:31:34 PGD001 kernel: [1097200.788673] [2392110] 115 2392110 2241676 167738 2383872 373 -900 postgres
Nov 15 20:31:34 PGD001 kernel: [1097200.788676] [2392114] 115 2392114 2173331 41251 1609728 376 -900 postgres
Nov 15 20:31:34 PGD001 kernel: [1097200.788678] [2392115] 115 2392115 2170462 6530 417792 378 -900 postgres
Nov 15 20:31:34 PGD001 kernel: [1097200.788680] [2392116] 115 2392116 2174010 40685 1613824 376 -900 postgres
Nov 15 20:31:34 PGD001 kernel: [1097200.788682] [2392117] 115 2392117 2172912 35250 1581056 376 -900 postgres
Nov 15 20:31:34 PGD001 kernel: [1097200.788684] [2392124] 115 2392124 2172228 34549 1527808 376 -900 postgres
Nov 15 20:31:34 PGD001 kernel: [1097200.788687] [2392127] 115 2392127 2171267 9574 462848 377 -900 postgres
Nov 15 20:31:34 PGD001 kernel: [1097200.788689] [2392128] 115 2392128 2202943 240863 3563520 372 -900 postgres
Nov 15 20:31:34 PGD001 kernel: [1097200.788691] [2392140] 115 2392140 2193312 118464 2379776 376 -900 postgres
Nov 15 20:31:34 PGD001 kernel: [1097200.788695] [2392141] 115 2392141 2193830 121733 2387968 376 -900 postgres
Nov 15 20:31:34 PGD001 kernel: [1097200.788698] [2392142] 115 2392142 2193614 118047 2404352 376 -900 postgres
Nov 15 20:31:34 PGD001 kernel: [1097200.788701] [2392143] 115 2392143 2193945 118742 2387968 376 -900 postgres
Nov 15 20:31:34 PGD001 kernel: [1097200.788703] [2392144] 115 2392144 2173072 25293 753664 377 -900 postgres
Nov 15 20:31:34 PGD001 kernel: [1097200.788706] [2392145] 115 2392145 2173311 25541 753664 377 -900 postgres
Nov 15 20:31:34 PGD001 kernel: [1097200.788708] [2392150] 115 2392150 2170682 9356 434176 378 -900 postgres
Nov 15 20:31:34 PGD001 kernel: [1097200.788711] [2392160] 115 2392160 2171583 13385 655360 377 -900 postgres
Nov 15 20:31:34 PGD001 kernel: [1097200.788713] [2392162] 115 2392162 2171217 27180 1327104 376 -900 postgres
Nov 15 20:31:34 PGD001 kernel: [1097200.788715] [2392163] 115 2392163 2170622 9180 462848 377 -900 postgres
Nov 15 20:31:34 PGD001 kernel: [1097200.788717] [2392171] 115 2392171 2171229 25619 1241088 376 -900 postgres
Nov 15 20:31:34 PGD001 kernel: [1097200.788720] [2392173] 115 2392173 2170552 6687 376832 378 -900 postgres
Nov 15 20:31:34 PGD001 kernel: [1097200.788722] [2392174] 115 2392174 2170484 6309 385024 378 -900 postgres
Nov 15 20:31:34 PGD001 kernel: [1097200.788725] [2392176] 115 2392176 2171478 11652 610304 378 -900 postgres
Nov 15 20:31:34 PGD001 kernel: [1097200.788727] [2392177] 115 2392177 2170490 7124 425984 377 -900 postgres
Nov 15 20:31:34 PGD001 kernel: [1097200.788731] [2392178] 115 2392178 2170710 11417 561152 378 -900 postgres
Nov 15 20:31:34 PGD001 kernel: [1097200.788734] [2392179] 115 2392179 2170753 10507 548864 378 -900 postgres
Nov 15 20:31:34 PGD001 kernel: [1097200.788737] [2392180] 115 2392180 2170563 8826 471040 377 -900 postgres
Nov 15 20:31:34 PGD001 kernel: [1097200.788740] [2392181] 115 2392181 2170563 8319 471040 377 -900 postgres
Nov 15 20:31:34 PGD001 kernel: [1097200.788743] [2392182] 115 2392182 2171492 12312 598016 377 -900 postgres
Nov 15 20:31:34 PGD001 kernel: [1097200.788747] [2392184] 115 2392184 2171367 17877 815104 376 -900 postgres
Nov 15 20:31:34 PGD001 kernel: [1097200.788750] [2392185] 115 2392185 2171227 9037 430080 378 -900 postgres
Nov 15 20:31:34 PGD001 kernel: [1097200.788753] [2392195] 115 2392195 2170402 6457 413696 378 -900 postgres
Nov 15 20:31:34 PGD001 kernel: [1097200.788757] [2392197] 115 2392197 2171241 16841 843776 376 -900 postgres
Nov 15 20:31:34 PGD001 kernel: [1097200.788761] [2392200] 115 2392200 2174518 71413 1560576 376 -900 postgres
Nov 15 20:31:34 PGD001 kernel: [1097200.788764] [2392201] 115 2392201 2171813 16907 835584 376 -900 postgres
Nov 15 20:31:34 PGD001 kernel: [1097200.788766] [2392202] 115 2392202 2170446 6224 323584 379 -900 postgres
Nov 15 20:31:34 PGD001 kernel: [1097200.788769] [2392215] 115 2392215 2171139 13606 651264 377 -900 postgres
Nov 15 20:31:34 PGD001 kernel: [1097200.788773] [2392218] 115 2392218 2197655 6652 512000 405 -900 postgres
Nov 15 20:31:34 PGD001 kernel: [1097200.788777] [2392219] 115 2392219 2197655 6768 512000 405 -900 postgres
Nov 15 20:31:34 PGD001 kernel: [1097200.788780] [2392233] 115 2392233 2170461 6442 327680 378 -900 postgres
Nov 15 20:31:34 PGD001 kernel: [1097200.788786] [2392237] 115 2392237 2170063 977 196608 382 -900 postgres
Nov 15 20:31:34 PGD001 kernel: [1097200.788793] [2392240] 115 2392240 2170063 913 196608 390 -900 postgres
Nov 15 20:31:34 PGD001 kernel: [1097200.788797] [2392241] 115 2392241 2170063 690 155648 390 -900 postgres
Nov 15 20:31:34 PGD001 kernel: [1097200.788801] [2392242] 115 2392242 2170063 958 172032 390 -900 postgres
Nov 15 20:31:34 PGD001 kernel: [1097200.788804] [2392243] 115 2392243 2170063 1174 196608 388 -900 postgres
Nov 15 20:31:34 PGD001 kernel: [1097200.788807] [2392244] 115 2392244 2170063 1117 196608 388 -900 postgres
Nov 15 20:31:34 PGD001 kernel: [1097200.788810] [2392245] 115 2392245 2170063 716 155648 390 -900 postgres
Nov 15 20:31:34 PGD001 kernel: [1097200.788813] [2392246] 115 2392246 2170063 958 196608 388 -900 postgres
Nov 15 20:31:34 PGD001 kernel: [1097200.788816] [2392247] 115 2392247 2170063 909 196608 388 -900 postgres
Nov 15 20:31:34 PGD001 kernel: [1097200.788819] [2392249] 115 2392249 2169653 588 118784 406 -900 postgres
Nov 15 20:31:34 PGD001 kernel: [1097200.788822] [2392250] 115 2392250 2169621 441 118784 407 -900 postgres
Nov 15 20:31:34 PGD001 kernel: [1097200.788825] [2392251] 115 2392251 2170063 968 196608 388 -900 postgres
Nov 15 20:31:34 PGD001 kernel: [1097200.788828] [2392252] 115 2392252 2169653 403 118784 406 -900 postgres
Nov 15 20:31:34 PGD001 kernel: [1097200.788831] [2392253] 115 2392253 2169621 536 118784 407 -900 postgres
Nov 15 20:31:34 PGD001 kernel: [1097200.788834] [2392254] 115 2392254 2169621 299 118784 407 -900 postgres
Nov 15 20:31:34 PGD001 kernel: [1097200.788837] [2392255] 115 2392255 2170063 1316 217088 382 -900 postgres
Nov 15 20:31:34 PGD001 kernel: [1097200.788841] [2392256] 115 2392256 2170053 974 200704 420 -900 postgres
Nov 15 20:31:34 PGD001 kernel: [1097200.788845] [2392258] 115 2392258 2169621 367 118784 432 -900 postgres
Nov 15 20:31:34 PGD001 kernel: [1097200.788847] [2392259] 115 2392259 2169653 557 118784 407 -900 postgres
Nov 15 20:31:34 PGD001 kernel: [1097200.788850] [2392260] 115 2392260 2169621 349 118784 433 -900 postgres
Nov 15 20:31:34 PGD001 kernel: [1097200.788853] [2392261] 115 2392261 2169621 535 118784 435 -900 postgres
Nov 15 20:31:34 PGD001 kernel: [1097200.788855] [2392262] 115 2392262 2169621 360 118784 433 -900 postgres
Nov 15 20:31:34 PGD001 kernel: [1097200.788858] [2392263] 115 2392263 2169621 299 118784 407 -900 postgres
Nov 15 20:31:34 PGD001 kernel: [1097200.788860] [2392265] 115 2392265 2169621 364 118784 447 -900 postgres
Postgresql 被oom后,重新启动postgresql后
查看postgresql服务的oom_score和oom_score_adj值
bash
root@PGD001:~# ps -ef|grep postgres |grep PGDATA
postgres 2393324 1 0 02:01 ? 00:00:27 /usr/lib/postgresql/15/bin/postgres -D /PGDATA
root@PGD001:~# cat /proc/2393324/oom_score_adj
-900
root@PGD001:~# cat /proc/2393324/oom_score
70
查看操作系统级别的参数
bash
root@DAILACHDBUD001:~# sysctl -a |grep kernel.shmmax
kernel.shmmax = 18446744073692774399
root@DAILACHDBUD001:~# sysctl -a |grep kernel.shmall
kernel.shmall = 18446744073692774399
root@DAILACHDBUD001:~# sysctl -a |grep kernel.shmmni
kernel.shmmni = 4096
root@PGD001:~# cat /etc/security/limits.conf |grep -v "#"
* soft nofile 1024
* hard nofile 2048
* soft nproc 1024
* hard nproc 2048
查看postgresql数据库级别的参数
bash
postgres=# show shared_buffers;
shared_buffers
----------------
8GB
postgres=# show max_connections;
max_connections
-----------------
200
postgres=# show work_mem;
work_mem
----------
4MB
postgres=# show temp_buffers;
temp_buffers
--------------
8MB
postgres=# show maintenance_work_mem;
maintenance_work_mem
----------------------
64MB
postgres=# show autovacuum_work_mem;
autovacuum_work_mem
---------------------
-1
postgres=# show autovacuum_max_workers;
autovacuum_max_workers
------------------------
3
备注:
shared_buffers:设置数据库服务器将使用的共享内存缓冲区量。如果有一个专用的 1GB 或更多内存的数据库服务器,一个合理的shared_buffers开始值是系统内存的25%
work_mem:指定在写到临时磁盘文件之前被内部排序操作和哈希表使用的内存量。该值默认为四兆字节(4MB)
temp_buffers:设置每个数据库会话使用的临时缓冲区的最大数目。这些都是会话的本地缓冲区,只用于访问临时表。默认是8MB
autovacuum_work_mem指定每个自动清理工作者进程能使用的最大内存量。其默认值为-1表示转而使用maintenance_work_mem的值,当自动清理运行时,可能会分配最多达这个内存的autovacuum_max_workers倍
查看当前程序占用内存的信息
bash
root@PGD001:~# smem -t -r -a | head -20
PID User Command Swap USS PSS RSS
2394448 postgres postgres: 15/main: veeamuser VeeamBackupReporting 172.22.137.89(50228) idle 380 20701320 21192492 21812648
2393326 postgres postgres: 15/main: checkpointer 44 842436 1842987 3047168
3846886 postgres postgres: 15/main: veeamuser VeeamBackup 172.22.137.89(58846) idle 380 256848 567859 1011720
3846852 postgres postgres: 15/main: veeamuser VeeamBackup 172.22.137.89(58654) SELECT 380 130864 308700 619768
2392350 root /opt/emc/boostfs/bin/boostfs mount /mnt/datadomaindir -d DAILADD01.dai.netdai.com -s daipostgres -o allow-others=true 3888 292596 292670 294968
3846901 postgres postgres: 15/main: veeamuser VeeamBackup 172.22.137.89(58926) idle 380 87968 149992 386656
3846903 postgres postgres: 15/main: veeamuser VeeamBackup 172.22.137.89(58928) idle 380 76084 137657 373052
3846877 postgres postgres: 15/main: veeamuser VeeamBackup 172.22.137.89(58815) idle 380 26056 126282 387476
2393324 postgres /usr/lib/postgresql/15/bin/postgres -D /PGDATA 44 52700 77518 200092
3846889 postgres postgres: 15/main: veeamuser VeeamBackup 172.22.137.89(58860) idle 380 25172 57457 205336
633 root /sbin/multipathd -d -s 0 22712 23409 27924
3846902 postgres postgres: 15/main: veeamuser VeeamBackup 172.22.137.89(58927) idle 380 11204 23353 132416
961 root /usr/lib/snapd/snapd 2132 18836 18876 20848
3846890 postgres postgres: 15/main: veeamuser VeeamBackup 172.22.137.89(58861) idle 380 8552 18772 117992
3846899 postgres postgres: 15/main: veeamuser VeeamBackup 172.22.137.89(58914) idle 380 7340 18020 119676
2393327 postgres postgres: 15/main: background writer 80 336 17668 91248
3846898 postgres postgres: 15/main: veeamuser VeeamBackup 172.22.137.89(58913) idle 380 6092 16874 115916
3846904 postgres postgres: 15/main: veeamuser VeeamBackup 172.22.137.89(58930) idle 380 7024 15296 104320
3846908 postgres postgres: 15/main: veeamuser VeeamBackup 172.22.137.89(58957) idle 380 11440 14047 66592
bash
root@PGD001:~# smem -u -p -a
User Count Swap USS PSS RSS
systemd-timesync 1 0.00% 0.00% 0.00% 0.02%
messagebus 1 0.01% 0.00% 0.00% 0.02%
systemd-network 1 0.00% 0.01% 0.01% 0.02%
syslog 1 0.00% 0.01% 0.01% 0.02%
systemd-resolve 1 0.00% 0.02% 0.02% 0.03%
root 21 0.22% 1.19% 1.25% 1.56%
postgres 43 0.34% 68.76% 75.26% 90.34%
bash
root@PGD001:~# top
...
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
2394448 postgres 20 0 27.0g 21.0g 2.4g S 0.0 67.1 4100:15 postgres
2393326 postgres 20 0 8678924 2.9g 2.9g S 0.0 9.3 10:32.81 postgres
3855468 postgres 20 0 8807032 846728 818960 S 0.0 2.6 0:00.73 postgres
3855453 postgres 20 0 8877256 587404 492304 S 0.0 1.8 0:07.70 postgres
3855466 postgres 20 0 8778480 383144 287312 S 20.5 1.2 0:02.36 postgres
3855465 postgres 20 0 8775984 380292 288272 S 2.6 1.2 0:01.85 postgres
3855464 postgres 20 0 8774756 376020 287796 S 3.0 1.1 0:01.97 postgres
3855481 postgres 20 0 8703792 371900 345996 S 22.8 1.1 0:01.11 postgres
2392350 root 20 0 1182364 292852 2700 S 0.0 0.9 54:37.50 boostfs
2393324 postgres 20 0 8678484 200092 197304 S 0.0 0.6 24:03.58 postgres
2393327 postgres 20 0 8678648 90992 88152 S 0.0 0.3 0:27.56 postgres
3855473 postgres 20 0 8687624 76032 69008 S 0.7 0.2 0:00.07 postgres
3855483 postgres 20 0 8694496 66440 52784 S 0.0 0.2 0:00.04 postgres
3855463 postgres 20 0 8682872 59528 54192 S 0.3 0.2 0:00.06 postgres
3855482 postgres 20 0 8681836 42672 38304 S 0.0 0.1 0:00.03 postgres
3855471 postgres 20 0 8681720 39628 35356 S 0.0 0.1 0:00.02 postgres
3799836 postgres 20 0 8684528 39412 32052 S 0.0 0.1 0:18.87 postgres
3854094 postgres 20 0 8684520 39336 32008 S 0.0 0.1 0:00.86 postgres
3746080 postgres 20 0 8684644 39300 31820 S 0.0 0.1 0:35.88 postgres
3855475 postgres 20 0 8681652 38492 34096 S 0.0 0.1 0:00.02 postgres
备注:
VIRT:进程占用的虚拟内存空间大小,包含了在已经映射到物理内存空间的部分和尚未映射到物理内存空间的部分总和。VIRT是virtual memory usage虚拟内存的缩写,虚拟内存是一个假象的内存空间,在程序运行过程中虚拟内存空间中需要被访问的部分会被映射到物理内存空间中。虚拟内存空间大只能表示程序运行过程中可访问的空间比较大,不代表物理内存空间占用也大,VIRT = SWAP + RES
RES:进程占用的虚拟内存空间中已经映射到物理内存空间的那部分的大小。看进程在运行过程中占用了多少内存应该看RES的值而不是VIRT的值。RES是resident memory usage常驻内存的缩写,常驻内存就是进程实实在在占用的物理内存。一般我们所讲的进程占用了多少内存,其实就是说的占用了多少常驻内存而不是多少虚拟内存。
SHR:SHR是share(共享)的缩写,表示进程占用的共享内存大小,共享内存就是被多个进程所共享的内存,比如动态库libc.so占用的内存就是共享内存,因为这个共享内存可能被很多不同会话使用,但是这些会话都会去调用libc.so
VSS:Virtual Set Size是进程向系统申请的虚拟内存,和VIRT一样
RSS:Resident Set Size是进程在 RAM 中实际保存的总内存,和RES一样
PSS:Proportional Set Size是单个进程运行时实际占用的物理内存
USS:Unique Set Size是进程独自占用的物理内存
查看会话信息
bash
postgres=# show idle_session_timeout ;
idle_session_timeout
----------------------
0
postgres=# select count(*) from pg_stat_activity where state='idle';
count
-------
45
postgres=# select pid,usename,datname,client_addr,state from pg_stat_activity;
pid | usename | datname | client_addr | state
---------+-----------+----------------------+---------------+--------
2393349 | | | |
2393351 | postgres | | |
3931348 | postgres | postgres | 172.22.138.94 | idle
3949134 | veeamuser | VeeamBackup | 172.22.137.89 | idle
3949093 | veeamuser | VeeamBackup | 172.22.137.89 | idle
3949090 | veeamuser | VeeamBackup | 172.22.137.89 | idle
3949094 | veeamuser | VeeamBackup | 172.22.137.89 | idle
3949135 | veeamuser | VeeamBackup | 172.22.137.89 | idle
2394448 | veeamuser | VeeamBackupReporting | 172.22.137.89 | idle
3854094 | postgres | postgres | 172.22.138.94 | idle
3949083 | veeamuser | VeeamBackup | 172.22.137.89 | idle
3949102 | veeamuser | VeeamBackup | 172.22.137.89 | idle
3949103 | veeamuser | VeeamBackup | 172.22.137.89 | active
3949127 | veeamuser | VeeamBackup | 172.22.137.89 | idle
3949084 | veeamuser | VeeamBackup | 172.22.137.89 | idle
3949132 | veeamuser | VeeamBackup | 172.22.137.89 | idle
3949133 | veeamuser | VeeamBackup | 172.22.137.89 | idle
3949095 | veeamuser | VeeamBackup | 172.22.137.89 | idle
3949119 | veeamuser | VeeamBackup | 172.22.137.89 | idle
3949085 | veeamuser | VeeamBackup | 172.22.137.89 | idle
3949125 | veeamuser | VeeamBackup | 172.22.137.89 | idle
3949086 | veeamuser | VeeamBackup | 172.22.137.89 | idle
3949117 | veeamuser | VeeamBackup | 172.22.137.89 | idle
3949104 | veeamuser | VeeamBackup | 172.22.137.89 | idle
3949105 | veeamuser | VeeamBackup | 172.22.137.89 | idle
3949096 | veeamuser | VeeamBackupReporting | 172.22.137.89 | idle
3949098 | veeamuser | VeeamBackup | 172.22.137.89 | idle
3949136 | veeamuser | VeeamBackup | 172.22.137.89 | idle
3949099 | veeamuser | VeeamBackup | 172.22.137.89 | idle
3949100 | veeamuser | VeeamBackup | 172.22.137.89 | idle
3949101 | veeamuser | VeeamBackup | 172.22.137.89 | idle
3949106 | veeamuser | VeeamBackup | 172.22.137.89 | idle
3949107 | veeamuser | VeeamBackup | 172.22.137.89 | idle
3949137 | veeamuser | VeeamBackup | 172.22.137.89 | idle
3949047 | veeamuser | VeeamBackup | 172.22.137.89 | idle
3949138 | veeamuser | VeeamBackup | 172.22.137.89 | idle
3949108 | veeamuser | VeeamBackup | 172.22.137.89 | idle
3949109 | veeamuser | VeeamBackup | 172.22.137.89 | idle
...
root@PGD001:~# ps -ef|grep postgres
root 2392350 1 0 Nov15 ? 00:59:03 /opt/emc/boostfs/bin/boostfs mount /mnt/datadomaindir -d DAILADD01.dai.netdai.com -s daipostgres -o allow-others=true
postgres 2393324 1 0 Nov15 ? 00:25:46 /usr/lib/postgresql/15/bin/postgres -D /PGDATA
postgres 2393325 2393324 0 Nov15 ? 00:00:00 postgres: 15/main: logger
postgres 2393326 2393324 0 Nov15 ? 00:11:52 postgres: 15/main: checkpointer
postgres 2393327 2393324 0 Nov15 ? 00:00:30 postgres: 15/main: background writer
postgres 2393348 2393324 0 Nov15 ? 00:04:30 postgres: 15/main: walwriter
postgres 2393349 2393324 0 Nov15 ? 00:00:54 postgres: 15/main: autovacuum launcher
postgres 2393350 2393324 0 Nov15 ? 00:00:05 postgres: 15/main: archiver last was 0000000100000137000000E1
postgres 2393351 2393324 0 Nov15 ? 00:00:51 postgres: 15/main: logical replication launcher
postgres 2394448 2393324 38 Nov15 ? 3-02:20:42 postgres: 15/main: veeamuser VeeamBackupReporting 172.22.137.89(50228) SELECT
postgres 3854094 2393324 0 Nov22 ? 00:00:30 postgres: 15/main: postgres postgres 172.22.138.94(63531) idle
postgres 3906865 2393324 0 09:40 ? 00:00:12 postgres: 15/main: postgres postgres 172.22.138.94(52002) idle
postgres 3931348 2393324 0 15:17 ? 00:00:04 postgres: 15/main: postgres postgres 172.22.138.94(54154) idle
postgres 3949113 2393324 24 19:06 ? 00:00:15 postgres: 15/main: veeamuser VeeamBackup 172.22.137.89(51688) idle
postgres 3949122 2393324 0 19:06 ? 00:00:00 postgres: 15/main: veeamuser VeeamBackup 172.22.137.89(51913) idle
postgres 3949123 2393324 0 19:06 ? 00:00:00 postgres: 15/main: veeamuser VeeamBackup 172.22.137.89(51970) idle
postgres 3949179 2393324 27 19:06 ? 00:00:08 postgres: 15/main: veeamuser VeeamBackup 172.22.137.89(52262) SELECT
postgres 3949182 2393324 0 19:06 ? 00:00:00 postgres: 15/main: veeamuser VeeamBackup 172.22.137.89(52282) idle
postgres 3949184 2393324 1 19:06 ? 00:00:00 postgres: 15/main: veeamuser VeeamBackup 172.22.137.89(52300) idle
postgres 3949185 2393324 0 19:06 ? 00:00:00 postgres: 15/main: veeamuser VeeamBackup 172.22.137.89(52307) idle
postgres 3949186 2393324 2 19:06 ? 00:00:00 postgres: 15/main: veeamuser VeeamBackup 172.22.137.89(52309) idle
postgres 3949187 2393324 0 19:06 ? 00:00:00 postgres: 15/main: veeamuser VeeamBackup 172.22.137.89(52313) idle
postgres 3949190 2393324 0 19:06 ? 00:00:00 postgres: 15/main: veeamuser VeeamBackup 172.22.137.89(52372) idle
postgres 3949191 2393324 0 19:06 ? 00:00:00 postgres: 15/main: veeamuser VeeamBackup 172.22.137.89(52374) idle
postgres 3949192 2393324 0 19:06 ? 00:00:00 postgres: 15/main: veeamuser VeeamBackup 172.22.137.89(52375) idle
postgres 3949194 2393324 0 19:06 ? 00:00:00 postgres: 15/main: veeamuser VeeamBackup 172.22.137.89(52376) idle
postgres 3949196 2393324 0 19:06 ? 00:00:00 postgres: 15/main: veeamuser VeeamBackup 172.22.137.89(52449) idle
postgres 3949197 2393324 0 19:06 ? 00:00:00 postgres: 15/main: veeamuser VeeamBackup 172.22.137.89(52473) idle
postgres 3949198 2393324 0 19:06 ? 00:00:00 postgres: 15/main: veeamuser VeeamBackup 172.22.137.89(52475) idle
postgres 3949199 2393324 0 19:06 ? 00:00:00 postgres: 15/main: veeamuser VeeamBackup 172.22.137.89(52476) idle
postgres 3949201 2393324 0 19:06 ? 00:00:00 postgres: 15/main: veeamuser VeeamBackup 172.22.137.89(52508) idle
postgres 3949202 2393324 0 19:06 ? 00:00:00 postgres: 15/main: veeamuser VeeamBackup 172.22.137.89(52515) idle
postgres 3949205 2393324 0 19:06 ? 00:00:00 postgres: 15/main: veeamuser VeeamBackup 172.22.137.89(52539) idle
postgres 3949206 2393324 0 19:06 ? 00:00:00 postgres: 15/main: veeamuser VeeamBackup 172.22.137.89(52541) idle
postgres 3949207 2393324 0 19:06 ? 00:00:00 postgres: 15/main: veeamuser VeeamBackup 172.22.137.89(52543) idle
postgres 3949208 2393324 27 19:06 ? 00:00:03 postgres: 15/main: veeamuser VeeamBackup 172.22.137.89(52542) idle
postgres 3949209 2393324 0 19:06 ? 00:00:00 postgres: 15/main: veeamuser VeeamBackup 172.22.137.89(52567) idle
postgres 3949210 2393324 0 19:06 ? 00:00:00 postgres: 15/main: veeamuser VeeamBackup 172.22.137.89(52574) idle
postgres 3949212 2393324 0 19:07 ? 00:00:00 postgres: 15/main: veeamuser VeeamBackup 172.22.137.89(52599) idle
postgres 3949218 2393324 0 19:07 ? 00:00:00 postgres: 15/main: veeamuser VeeamBackup 172.22.137.89(52600) idle
postgres 3949219 2393324 0 19:07 ? 00:00:00 postgres: 15/main: veeamuser VeeamBackup 172.22.137.89(52638) idle
postgres 3949220 2393324 0 19:07 ? 00:00:00 postgres: 15/main: veeamuser VeeamBackup 172.22.137.89(52651) idle
postgres 3949222 2393324 0 19:07 ? 00:00:00 postgres: 15/main: veeamuser VeeamBackup 172.22.137.89(52666) idle
postgres 3949224 2393324 0 19:07 ? 00:00:00 postgres: 15/main: veeamuser VeeamBackup 172.22.137.89(52670) idle
...
分析:物理32GB的情况下,OOM时捕获的postgresql最大所需内存居然达total-vm:37766764kB,检查发现postgresql数据库级别的内存参数设置都是合理的,并且postgresql的被OOM级别很低值为-900(-1000的话就不会被内核OOM)。postgresql活动的时候查询到postgresql数据库服务会占用操作系统70%-90%的内存,而且OOM时发现发现不仅仅是postgres数据库服务器其他很多服务也都被oom-kill掉了,那么应该是操作系统级别参数kernel.shmmax和kernel.shmall值可能不太合适,,而且太多会话idle的情况下,内存还是很大,可能idle会话超时时间idle_session_timeout的设置也不太合理,swap值为4GB也不太合适
为避免再次被oom掉,采取如下措施
1、设置kernel.shmmax值17179869184为物理内存的一半,设置kernel.shmall值为4194304=shmmax/page_size
bash
root@PGD001:~# vim /etc/sysctl.conf
kernel.shmmax=17179869184
kernel.shmall=4194304
root@PGD001:~# sysctl -p
root@PGD001:~# sysctl -a |grep kernel.shmmax
kernel.shmmax = 17179869184
root@PGD001:~# sysctl -a |grep kernel.shmall
kernel.shmall = 4194304
root@PGD001:~# sysctl -a |grep kernel.shmmni
kernel.shmmni = 4096
root@PGD001:~# ipcs -lm
------ Shared Memory Limits --------
max number of segments = 4096
max seg size (kbytes) = 16777216
max total shared memory (kbytes) = 16777216
min seg size (bytes) = 1
2、设置idle_session_timeout=8h
bash
postgres=# alter system set idle_session_timeout='8h';
ALTER SYSTEM
postgres=# select pg_reload_conf();
pg_reload_conf
----------------
t
postgres=# show idle_session_timeout;
idle_session_timeout
----------------------
8h
3、设置swap为物理内存的1倍即32GB
bash
root@PGD001:~# free -m
total used free shared buff/cache available
Mem: 32058 20278 2195 4967 9583 6349
Swap: 4095 12 4083
root@PGD001:~# swapon -s
Filename Type Size Used Priority
/swap.img file 4194300 13120 -2
root@PGD001:/# cat /etc/fstab |grep swap
/swap.img none swap sw 0 0
root@PGD001:/# ll /swap.img
-rw------- 1 root root 4294967296 Sep 6 2022 /swap.img
root@PGD001:/# fallocate -l 4G /swap1.img
root@PGD001:/# chmod 600 /swap1.img
root@PGD001:/# ll /swap1.img
-rw------- 1 root root 4294967296 Nov 22 22:51 /swap1.img
root@PGD001:/# mkswap /swap1.img
Setting up swapspace version 1, size = 4 GiB (4294963200 bytes)
no label, UUID=85b78962-8bae-48d8-a5c0-e30903b7b8d6
root@PGD001:/# swapon /swap1.img
root@PGD001:/# free -m
total used free shared buff/cache available
Mem: 32058 20102 2325 4967 9630 6525
Swap: 8191 12 8179
root@PGD001:/# swapon -s
Filename Type Size Used Priority
/swap.img file 4194300 13120 -2
/swap1.img file 4194300 0 -3
root@PGD001:/# swapoff -v /swap.img
swapoff /swap.img
root@PGD001:/# swapon -s
Filename Type Size Used Priority
/swap1.img file 4194300 0 -2
root@PGD001:/# free -m
total used free shared buff/cache available
Mem: 32058 20149 2276 4969 9632 6476
Swap: 4095 0 4095
root@PGD001:/# fallocate -l 32G /swap.img
root@PGD001:/# chmod 600 /swap.img
root@PGD001:/# ll /swap.img
-rw------- 1 root root 34359738368 Nov 22 22:53 /swap.img
root@PGD001:/# mkswap /swap.img
mkswap: /swap.img: warning: wiping old swap signature.
Setting up swapspace version 1, size = 32 GiB (34359734272 bytes)
no label, UUID=9d658937-a89d-472b-aa94-be23e7f8703c
root@PGD001:/# swapon /swap.img
root@PGD001:/# free -m
total used free shared buff/cache available
Mem: 32058 20272 2149 4969 9637 6354
Swap: 36863 0 36863
root@PGD001:/# swapoff -v /swap1.img
swapoff /swap1.img
root@PGD001:/# swapon -s
Filename Type Size Used Priority
/swap.img file 33554428 0 -2
root@PGD001:/# free -m
total used free shared buff/cache available
Mem: 32058 20342 2078 4969 9637 6283
Swap: 32767 0 32767
root@PGD001:/# cat /etc/fstab |grep swap
/swap.img none swap sw 0 0