MySQL内存监控深度解析与排查实践

1. MySQL内存监控的重要性

内存问题是MySQL中仅次于锁问题的复杂故障类型。不同于锁问题通常有明确的等待和死锁信息,内存问题往往表现为性能逐渐下降、OOM(Out of Memory)异常终止或系统不稳定。建立完善的内存监控体系,能够从多个维度快速定位问题根源。以下内容以5.7版本为例进行说明。

2. 多维度内存监控视图

2.1 通过sys库的内存监控视图

MySQL 5.7的sys库提供了四个核心内存监控视图,分别从不同维度展示内存分配情况:

复制代码
-- 查看所有可用的内存监控视图
SELECT table_name, table_comment 
FROM information_schema.tables 
WHERE table_schema = 'sys' 
AND table_name LIKE 'memory%'
ORDER BY table_name;

各视图的核心功能:

视图名称 分组维度 适用场景
memory_by_host_by_current_bytes 连接主机 排查客户端连接内存泄漏
memory_by_thread_by_current_bytes 线程 分析特定会话内存使用
memory_by_user_by_current_bytes 用户 按用户审计内存消耗
memory_global_by_current_bytes 事件类型 全局内存分配分析

2.2 主机维度内存监控实战案例

案例背景:生产环境发现MySQL内存使用异常增长,需要确定是哪些客户端主机占用内存最多。

复制代码
-- 详细查看各主机内存使用情况
SELECT 
    IFNULL(host, 'background') AS 连接主机,
    ROUND(current_allocated/1024/1024, 2) AS 当前分配_MB,
    ROUND(current_count_used, 0) AS 使用中的块数,
    ROUND(current_avg_alloc/1024, 2) AS 平均分配_KB,
    ROUND(current_max_alloc/1024/1024, 2) AS 最大分配_MB,
    ROUND(total_allocated/1024/1024, 2) AS 历史分配总量_MB,
    ROUND((current_allocated/total_allocated)*100, 2) AS 内存使用率_百分比
FROM sys.x$memory_by_host_by_current_bytes
ORDER BY current_allocated DESC
LIMIT 10;

典型分析结果:

复制代码
+--------------+----------------+---------------+-----------------+---------------+---------------------+----------------------+
| 连接主机     | 当前分配_MB | 使用中的块数 | 平均分配_KB | 最大分配_MB | 历史分配总量_MB | 内存使用率_百分比 |
+--------------+----------------+---------------+-----------------+---------------+---------------------+----------------------+
| 10.186.61.18 | 1877.86       | 105          | 17932.25        | 1877.86       | 125803.03          | 1.49                 |
| background   | 1048.50       | 201          | 5216.42         | 1048.50       | 7123.12            | 14.72                |
| 127.0.0.1    | 6.40          | 429          | 15.29           | 6.40          | 745.01             | 0.86                 |
| 10.186.61.19 | 0.01          | 31           | 0.45            | 0.01          | 0.07               | 14.29                |
+--------------+----------------+---------------+-----------------+---------------+---------------------+----------------------+

问题分析

  • 10.186.61.18主机虽然当前分配了1.8GB内存,但使用率仅1.49%,可能是连接池保持的会话

  • background线程分配了1GB内存且使用率较高,需进一步分析内部组件

  • 可针对高内存使用主机进行会话级别深入分析

3. Performance Schema内存监控详解

3.1 五维度内存监控表

Performance Schema提供了更细粒度的内存监控能力:

复制代码
-- 查看Performance Schema中的内存监控表
SHOW TABLES FROM performance_schema LIKE 'memory%summary%';

核心监控表说明:

复制代码
-- 1. 账户和事件维度 - 按登录账户分析
SELECT * FROM performance_schema.memory_summary_by_account_by_event_name 
WHERE USER IS NOT NULL 
ORDER BY CURRENT_NUMBER_OF_BYTES_USED DESC 
LIMIT 5;

-- 2. 主机和事件维度 - 按连接主机分析
SELECT * FROM performance_schema.memory_summary_by_host_by_event_name 
WHERE HOST IS NOT NULL 
ORDER BY CURRENT_NUMBER_OF_BYTES_USED DESC 
LIMIT 5;

-- 3. 线程和事件维度 - 最详细的会话级分析
SELECT * FROM performance_schema.memory_summary_by_thread_by_event_name 
ORDER BY CURRENT_NUMBER_OF_BYTES_USED DESC 
LIMIT 5;

-- 4. 用户和事件维度 - 按数据库用户分析
SELECT * FROM performance_schema.memory_summary_by_user_by_event_name 
WHERE USER IS NOT NULL 
ORDER BY CURRENT_NUMBER_OF_BYTES_USED DESC 
LIMIT 5;

-- 5. 全局事件维度 - 整体内存分配分析
SELECT * FROM performance_schema.memory_summary_global_by_event_name 
ORDER BY CURRENT_NUMBER_OF_BYTES_USED DESC 
LIMIT 10;

3.2 线程级别内存监控实战案例

案例背景:发现某个业务高峰期MySQL内存急剧上升,需要定位具体是哪个SQL导致。

复制代码
-- 查看当前活动线程的内存使用TOP 10
SELECT 
    t.PROCESSLIST_ID AS 连接ID,
    CONCAT(t.PROCESSLIST_USER, '@', t.PROCESSLIST_HOST) AS 用户连接,
    t.PROCESSLIST_DB AS 数据库,
    m.EVENT_NAME AS 内存事件,
    ROUND(m.CURRENT_NUMBER_OF_BYTES_USED/1024/1024, 4) AS 当前使用_MB,
    ROUND(m.HIGH_NUMBER_OF_BYTES_USED/1024/1024, 4) AS 峰值使用_MB,
    m.COUNT_ALLOC AS 分配次数,
    m.COUNT_FREE AS 释放次数,
    LEFT(t.PROCESSLIST_INFO, 200) AS 执行语句摘要
FROM performance_schema.memory_summary_by_thread_by_event_name m
JOIN performance_schema.threads t ON m.THREAD_ID = t.THREAD_ID
WHERE t.PROCESSLIST_ID IS NOT NULL 
  AND m.CURRENT_NUMBER_OF_BYTES_USED > 0
ORDER BY m.CURRENT_NUMBER_OF_BYTES_USED DESC
LIMIT 10;

4. 深度关联分析:SQL级别内存追踪

4.1 关联分析实战案例

案例背景:应用报告某个复杂查询执行缓慢,怀疑是内存分配过多导致。

复制代码
-- 关联threads表获取SQL语句详情
SELECT 
    t.PROCESSLIST_ID AS 会话ID,
    CONCAT(t.PROCESSLIST_USER, '@', t.PROCESSLIST_HOST) AS 账户信息,
    -- 内存事件分类解析
    CASE 
        WHEN m.EVENT_NAME LIKE 'memory/sql/%' THEN 'SQL层内存'
        WHEN m.EVENT_NAME LIKE 'memory/innodb/%' THEN 'InnoDB存储引擎'
        WHEN m.EVENT_NAME LIKE 'memory/myisam/%' THEN 'MyISAM存储引擎'
        WHEN m.EVENT_NAME LIKE 'memory/temptable/%' THEN '临时表内存'
        ELSE '其他内存'
    END AS 内存类型,
    m.EVENT_NAME AS 具体事件,
    ROUND(m.CURRENT_NUMBER_OF_BYTES_USED/1024/1024, 6) AS 当前使用_MB,
    ROUND(m.SUM_NUMBER_OF_BYTES_ALLOC/1024/1024, 2) AS 历史分配总量_MB,
    -- 计算内存分配效率
    ROUND((m.SUM_NUMBER_OF_BYTES_FREE / NULLIF(m.SUM_NUMBER_OF_BYTES_ALLOC, 0)) * 100, 2) AS 内存释放率_百分比,
    t.PROCESSLIST_TIME AS 执行时间_秒,
    LEFT(t.PROCESSLIST_INFO, 300) AS 完整SQL语句
FROM performance_schema.memory_summary_by_thread_by_event_name m
INNER JOIN performance_schema.threads t ON m.THREAD_ID = t.THREAD_ID
WHERE t.PROCESSLIST_ID IS NOT NULL 
  AND t.PROCESSLIST_INFO IS NOT NULL
  AND m.CURRENT_NUMBER_OF_BYTES_USED > 1024*1024  -- 只显示使用超过1MB的
ORDER BY m.CURRENT_NUMBER_OF_BYTES_USED DESC
LIMIT 15;

4.2 内存事件分类解析

根据查询结果,可以将内存事件分为几个关键类别:

复制代码
-- 按内存类型统计使用情况
SELECT 
    CASE 
        WHEN EVENT_NAME LIKE 'memory/sql/%' THEN 'SQL层'
        WHEN EVENT_NAME LIKE 'memory/innodb/%' THEN 'InnoDB引擎'
        WHEN EVENT_NAME LIKE 'memory/myisam/%' THEN 'MyISAM引擎'
        WHEN EVENT_NAME LIKE 'memory/performance_schema/%' THEN '监控系统'
        WHEN EVENT_NAME LIKE 'memory/temptable/%' THEN '临时表'
        ELSE '其他'
    END AS 内存类别,
    COUNT(*) AS 事件数量,
    ROUND(SUM(CURRENT_NUMBER_OF_BYTES_USED)/1024/1024, 2) AS 当前使用_MB,
    ROUND(SUM(HIGH_NUMBER_OF_BYTES_USED)/1024/1024, 2) AS 峰值使用_MB,
    ROUND(AVG(CURRENT_NUMBER_OF_BYTES_USED/1024), 2) AS 平均使用_KB
FROM performance_schema.memory_summary_global_by_event_name
GROUP BY 内存类别
ORDER BY 当前使用_MB DESC;

5. 关键内存事件深度解读

5.1 SQL层关键内存组件

1. main_mem_root - 查询主内存池

复制代码
/* 源码注释解析 (sql/sql_class.h):
 * 主要用途:
 * 1. 常规查询:解析期间分配main_lex中的结构体
 * 2. 常规查询:执行期间分配运行时数据(执行计划等)
 * 3. 预处理查询:仅分配运行时数据,解析树在多次执行间复用
 */

监控SQL主内存池使用:

复制代码
-- 监控所有会话的main_mem_root使用情况
SELECT 
    t.PROCESSLIST_ID,
    CONCAT(t.PROCESSLIST_USER, '@', t.PROCESSLIST_HOST) AS user_host,
    ROUND(m.CURRENT_NUMBER_OF_BYTES_USED/1024/1024, 4) AS mem_root_used_MB,
    m.HIGH_NUMBER_OF_BYTES_USED/1024/1024 AS mem_root_high_MB,
    t.PROCESSLIST_TIME AS query_time_sec,
    LEFT(t.PROCESSLIST_INFO, 150) AS current_query
FROM performance_schema.memory_summary_by_thread_by_event_name m
JOIN performance_schema.threads t ON m.THREAD_ID = t.THREAD_ID
WHERE m.EVENT_NAME = 'memory/sql/thd::main_mem_root'
  AND t.PROCESSLIST_ID IS NOT NULL
  AND m.CURRENT_NUMBER_OF_BYTES_USED > 10*1024*1024  -- 超过10MB
ORDER BY m.CURRENT_NUMBER_OF_BYTES_USED DESC;

2. JOIN_CACHE - 连接缓存内存

复制代码
-- 大表连接操作的内存监控
SELECT 
    t.PROCESSLIST_ID,
    t.PROCESSLIST_INFO AS query,
    ROUND(m.CURRENT_NUMBER_OF_BYTES_USED/1024/1024, 2) AS join_cache_mb,
    ROUND(m.HIGH_NUMBER_OF_BYTES_USED/1024/1024, 2) AS join_cache_high_mb
FROM performance_schema.memory_summary_by_thread_by_event_name m
JOIN performance_schema.threads t ON m.THREAD_ID = t.THREAD_ID
WHERE m.EVENT_NAME = 'memory/sql/JOIN_CACHE'
  AND m.CURRENT_NUMBER_OF_BYTES_USED > 50*1024*1024  -- 超过50MB
ORDER BY m.CURRENT_NUMBER_OF_BYTES_USED DESC;

5.2 存储引擎内存组件

InnoDB内存组件:

复制代码
-- InnoDB缓冲池外的内存使用
SELECT 
    EVENT_NAME,
    ROUND(SUM(CURRENT_NUMBER_OF_BYTES_USED)/1024/1024, 2) AS current_mb,
    ROUND(SUM(HIGH_NUMBER_OF_BYTES_USED)/1024/1024, 2) AS high_mb,
    COUNT(*) AS thread_count
FROM performance_schema.memory_summary_global_by_event_name
WHERE EVENT_NAME LIKE 'memory/innodb/%'
GROUP BY EVENT_NAME
HAVING current_mb > 1
ORDER BY current_mb DESC;

6. 内存问题排查工作流

6.1 日常监控检查清单

复制代码
-- 1. 全局内存使用健康检查
SELECT 
    '全局内存使用' AS 检查项,
    ROUND(SUM(CURRENT_NUMBER_OF_BYTES_USED)/1024/1024/1024, 2) AS 当前使用_GB,
    ROUND(SUM(HIGH_NUMBER_OF_BYTES_USED)/1024/1024/1024, 2) AS 历史峰值_GB,
    COUNT(DISTINCT EVENT_NAME) AS 内存事件类型数
FROM performance_schema.memory_summary_global_by_event_name;

-- 2. 按用户的内存使用TOP 5
SELECT 
    IFNULL(USER, 'system') AS 用户名,
    ROUND(SUM(CURRENT_NUMBER_OF_BYTES_USED)/1024/1024, 2) AS 当前使用_MB,
    ROUND(SUM(HIGH_NUMBER_OF_BYTES_USED)/1024/1024, 2) AS 历史峰值_MB
FROM performance_schema.memory_summary_by_user_by_event_name
GROUP BY USER
ORDER BY 当前使用_MB DESC
LIMIT 5;

-- 3. 内存泄漏嫌疑检查(分配多释放少)
SELECT 
    t.PROCESSLIST_ID,
    CONCAT(t.PROCESSLIST_USER, '@', t.PROCESSLIST_HOST) AS 用户连接,
    m.EVENT_NAME,
    m.COUNT_ALLOC,
    m.COUNT_FREE,
    m.COUNT_ALLOC - m.COUNT_FREE AS 未释放次数,
    ROUND(m.CURRENT_NUMBER_OF_BYTES_USED/1024/1024, 4) AS 当前持有_MB
FROM performance_schema.memory_summary_by_thread_by_event_name m
JOIN performance_schema.threads t ON m.THREAD_ID = t.THREAD_ID
WHERE t.PROCESSLIST_ID IS NOT NULL
  AND m.COUNT_ALLOC > m.COUNT_FREE
  AND (m.COUNT_ALLOC - m.COUNT_FREE) > 1000  -- 未释放超过1000次
ORDER BY (m.COUNT_ALLOC - m.COUNT_FREE) DESC
LIMIT 10;

6.2 紧急内存问题排查流程

场景:MySQL内存使用率持续增长,接近OOM阈值。

复制代码
-- 第一步:快速定位内存消耗TOP会话
SELECT 
    '紧急排查' AS 场景,
    t.PROCESSLIST_ID AS 会话ID,
    CONCAT(t.PROCESSLIST_USER, '@', t.PROCESSLIST_HOST) AS 来源,
    ROUND(SUM(m.CURRENT_NUMBER_OF_BYTES_USED)/1024/1024, 2) AS 总内存_MB,
    GROUP_CONCAT(
        CONCAT(
            SUBSTRING_INDEX(m.EVENT_NAME, '/', -1),
            ':',
            ROUND(m.CURRENT_NUMBER_OF_BYTES_USED/1024/1024, 2),
            'MB'
        ) 
        ORDER BY m.CURRENT_NUMBER_OF_BYTES_USED DESC 
        SEPARATOR ' | '
    ) AS 内存分布详情,
    t.PROCESSLIST_TIME AS 运行时间_秒,
    LEFT(t.PROCESSLIST_INFO, 200) AS 当前SQL
FROM performance_schema.memory_summary_by_thread_by_event_name m
JOIN performance_schema.threads t ON m.THREAD_ID = t.THREAD_ID
WHERE t.PROCESSLIST_ID IS NOT NULL
  AND m.CURRENT_NUMBER_OF_BYTES_USED > 0
GROUP BY t.PROCESSLIST_ID, t.PROCESSLIST_USER, t.PROCESSLIST_HOST, 
         t.PROCESSLIST_TIME, t.PROCESSLIST_INFO
HAVING 总内存_MB > 100  -- 超过100MB的会话
ORDER BY 总内存_MB DESC
LIMIT 20;

-- 第二步:查看内存分配最频繁的事件
SELECT 
    EVENT_NAME,
    SUM(COUNT_ALLOC) AS 总分配次数,
    SUM(COUNT_FREE) AS 总释放次数,
    SUM(COUNT_ALLOC) - SUM(COUNT_FREE) AS 未释放次数,
    ROUND(SUM(CURRENT_NUMBER_OF_BYTES_USED)/1024/1024, 2) AS 当前占用_MB
FROM performance_schema.memory_summary_global_by_event_name
WHERE COUNT_ALLOC > 0
GROUP BY EVENT_NAME
HAVING 未释放次数 > 10000  -- 大量未释放
   OR 当前占用_MB > 500    -- 或占用大量内存
ORDER BY 未释放次数 DESC, 当前占用_MB DESC
LIMIT 15;

7. 内存监控配置优化建议

7.1 启用完整内存监控

复制代码
-- 检查当前内存监控配置
SELECT * FROM performance_schema.setup_instruments 
WHERE NAME LIKE 'memory/%';

-- 启用所有内存监控(生产环境谨慎评估性能影响)
UPDATE performance_schema.setup_instruments 
SET ENABLED = 'YES', TIMED = 'YES'
WHERE NAME LIKE 'memory/%';

-- 针对特定引擎监控
UPDATE performance_schema.setup_instruments 
SET ENABLED = 'YES'
WHERE NAME IN (
    'memory/sql/thd::main_mem_root',
    'memory/sql/JOIN_CACHE',
    'memory/innodb/mem0mem',
    'memory/temptable/%'
);

7.2 定期维护内存统计信息

复制代码
-- 重置内存统计(谨慎使用,会丢失历史数据)
-- TRUNCATE TABLE performance_schema.memory_summary_global_by_event_name;

-- 备份重要内存统计信息
CREATE TABLE memory_stats_backup AS
SELECT NOW() AS collect_time, * 
FROM performance_schema.memory_summary_global_by_event_name
WHERE CURRENT_NUMBER_OF_BYTES_USED > 0;

8. 源码

内存分配的event可以分配为两类,memory/sql/xxx,memory/innodb/xxx(这里innodb代表innodb引擎,如果使用了其他引擎,例如myisam,则为/memory/myisam/xxx),另外,mem_root事件用于到内存管理和线程管理,mem0mem是InnoDB存储引擎中用于分配和管理内存的模块。在源码sql/sql_class.h文件中对main_mem_root有如下描述:

复制代码
 /**
    This memory root is used for two purposes:
    - for conventional queries, to allocate structures stored in main_lex
    during parsing, and allocate runtime data (execution plan, etc.)
    during execution.
    - for prepared queries, only to allocate runtime data. The parsed
    tree itself is reused between executions and thus is stored elsewhere.
  */
  MEM_ROOT main_mem_root;
  Diagnostics_area main_da;
  Diagnostics_area m_parser_da;              /**< cf. get_parser_da() */
  Diagnostics_area m_query_rewrite_plugin_da;
  Diagnostics_area *m_query_rewrite_plugin_da_ptr;

9. 总结

MySQL 5.7的内存监控体系提供了从宏观到微观的全方位视角。通过sys库的聚合视图可以快速定位问题方向,通过Performance Schema的详细统计可以进行深度分析。关键点包括:

  1. 多维度监控:主机、用户、线程、事件等多个维度的交叉分析

  2. 事件分类识别:区分SQL层、存储引擎层、临时表等不同组件

  3. 关联分析能力:结合线程信息追踪具体SQL语句

  4. 历史趋势对比:通过历史分配与当前使用对比发现异常

建议在生产环境中建立定期的内存监控机制,结合监控系统设置阈值告警,在内存问题出现早期就能及时发现并处理,避免影响业务稳定性。

相关推荐
八饱粥1 小时前
excel数据导入mysql数据库
数据库·mysql·excel
路边草随风1 小时前
java 实现 flink cdc 读 mysql binlog 按表写入kafka不同topic
java·大数据·mysql·flink
('-')1 小时前
《从根上理解MySQL是怎样运行的》第二十四章笔记
数据库·笔记·mysql
CodeAmaz1 小时前
MySQL 事务的实现原理详解
数据库·mysql·事务·隔离性
n***s9098 小时前
【MySQL基础篇】概述及SQL指令:DDL及DML
sql·mysql·oracle
计算机毕设小月哥11 小时前
【Hadoop+Spark+python毕设】智能制造生产效能分析与可视化系统、计算机毕业设计、包括数据爬取、Spark、数据分析、数据可视化、Hadoop
后端·python·mysql
w***z5011 小时前
MYSQL 创建索引
数据库·mysql
j***518911 小时前
Java进阶,时间与日期,包装类,正则表达式
java·mysql·正则表达式
5***E68512 小时前
MySQL:drop、delete与truncate区别
数据库·mysql