负荷聚类及其在MATLAB中的实现

一、什么是负荷聚类？

负荷聚类是指通过对用户或测量点的用电负荷数据（通常是功率随时间变化的曲线，即"负荷曲线"）进行处理和分析，根据其用电模式的相似性，自动地将它们划分为不同的类别（簇）。

目标：发现数据中内在的、未知的分组模式，实现"物以类聚"。
输入：多个用户的日/周/年负荷曲线数据矩阵。
输出：每个用户的簇标签，以及能代表每一类用电模式的典型负荷曲线。

二、为什么要进行负荷聚类？

用户分群与精准营销：将用户分为"工商业"、"居民"、"农业"等不同类型，或更精细地分为"早九晚五上班族"、"夜班家庭"、"高能耗企业"等，为实施差异化电价、需求响应计划提供依据。
负荷预测：对同一簇内的用户进行聚合，其总负荷的预测会比预测单个用户或杂乱无章的总负荷更为准确。
异常用电检测：如果一个用户的负荷曲线与它所属簇的典型曲线差异巨大，则可能存在窃电或设备故障等异常情况。
电网规划与运行：了解不同类别用户的用电行为和时空分布，有助于优化电网规划、调度和可靠性分析。

三、MATLAB实现

步骤 1: 生成模拟数据

首先，我们创建一些有明显模式的人工数据来模拟真实负荷。通常会模拟几种典型用户：

居民用户：早晚高峰，夜间用电低。
商业用户：白天用电高，夜间极低。
工业用户：用电持续平稳，可能有午间休息。
低谷用户：反调峰特性，夜间用电高。

matlab 复制代码

% 生成模拟负荷数据 (24小时， 4类用户， 每类50个样本)
numTimePoints = 24; % 24小时
numUsersPerClass = 50;
rng(1); % 设置随机数种子以确保可重复性

% 1. 居民模式: 双峰 (早高峰和晚高峰)
residentBase = gaussmf(1:24, [2 8]) * 0.7 + gaussmf(1:24, [2 18]) * 0.8;
residentProfiles = repmat(residentBase, numUsersPerClass, 1) + randn(numUsersPerClass, numTimePoints) * 0.1;

% 2. 商业模式: 单峰 (白天)
commercialBase = gaussmf(1:24, [3 12]) * 1.5;
commercialProfiles = repmat(commercialBase, numUsersPerClass, 1) + randn(numUsersPerClass, numTimePoints) * 0.1;

% 3. 工业模式: 平坦 (全天稳定)
industrialBase = ones(1, 24) * 0.9;
industrialProfiles = repmat(industrialBase, numUsersPerClass, 1) + randn(numUsersPerClass, numTimePoints) * 0.15;

% 4. 低谷模式: 夜间用电
nightBase = gaussmf(1:24, [2 2]) * 0.5 + gaussmf(1:24, [2 22]) * 0.7;
nightProfiles = repmat(nightBase, numUsersPerClass, 1) + randn(numUsersPerClass, numTimePoints) * 0.1;

% 合并所有数据
allProfiles = [residentProfiles; commercialProfiles; industrialProfiles; nightProfiles];
trueLabels = [ones(numUsersPerClass,1); 2*ones(numUsersPerClass,1); 3*ones(numUsersPerClass,1); 4*ones(numUsersPerClass,1)]; % 真实标签用于验证

% 可视化几条样本曲线
figure;
plot(1:24, residentProfiles(1:3, :), 'b'); hold on;
plot(1:24, commercialProfiles(1:3, :), 'r');
plot(1:24, industrialProfiles(1:3, :), 'g');
plot(1:24, nightProfiles(1:3, :), 'm');
title('Sample Load Profiles from Different Classes');
xlabel('Hour'); ylabel('Normalized Load');
legend('Residential', 'Commercial', 'Industrial', 'Night Owl');
grid on;

步骤 2: 数据预处理与特征工程

直接使用24维的原始负荷曲线进行聚类是可行的，但有时需要提取特征来降维或突出关键信息。

matlab 复制代码

% 方法A: 直接使用标准化后的负荷曲线作为特征
dataMatrix = zscore(allProfiles')'; % 对每一行（每个用户）的数据进行z-score标准化

% 方法B: 提取特征 (例如: 平均值, 最大值, 最小值, 峰谷差, 负荷率等)
% avgLoad = mean(allProfiles, 2);
% maxLoad = max(allProfiles, [], 2);
% minLoad = min(allProfiles, [], 2);
% peakValleyDiff = maxLoad - minLoad;
% loadFactor = avgLoad ./ maxLoad; % 负荷率
% dataMatrix = [avgLoad, maxLoad, minLoad, peakValleyDiff, loadFactor];
% dataMatrix = zscore(dataMatrix); % 对特征进行标准化

步骤 3: 执行聚类分析 (K-Means)

K-Means需要预先指定簇的数量K。我们可以使用"手肘法"或"轮廓系数"来辅助选择K。

matlab 复制代码

% 尝试不同的K值，计算总距离平方和，用于手肘法
maxClusters = 8;
inertia = zeros(maxClusters, 1);
for k = 1:maxClusters
    [idx, ~, sumd] = kmeans(dataMatrix, k, 'Replicates', 10, 'Display', 'final'); % Replicates重复运行以避免局部最优
    inertia(k) = sum(sumd);
end

% 绘制手肘法图表
figure;
plot(1:maxClusters, inertia, '-o');
xlabel('Number of clusters (k)');
ylabel('Within-cluster Sum of Squared Distances (Inertia)');
title('Elbow Method For Optimal k');
grid on;

% 根据图表选择K（假设我们选择4）
optimalK = 4;
[idx, centroids] = kmeans(dataMatrix, optimalK, 'Replicates', 10);

步骤 4: 评估与可视化结果

matlab 复制代码

% 可视化聚类中心（典型负荷模式）
figure;
for i = 1:optimalK
    subplot(2, 2, i);
    plot(1:24, centroids(i, :), 'LineWidth', 2);
    title(sprintf('Cluster %d Center', i));
    xlabel('Hour'); ylabel('Normalized Load');
    grid on;
    ylim([-2, 2]); % 因为数据被标准化了
end
sgtitle('Cluster Centers (Typical Load Patterns)');

% 使用t-SNE进行降维可视化（将高维数据降到2维以便观察）
rng default; % for reproducibility
Y = tsne(dataMatrix);
figure;
gscatter(Y(:,1), Y(:,2), idx);
title('t-SNE Visualization of Clusters');
xlabel('Dimension 1'); ylabel('Dimension 2');

% 计算轮廓系数评估聚类质量
silhouetteValues = silhouette(dataMatrix, idx);
figure;
silhouette(dataMatrix, idx);
title('Silhouette Plot');
xlabel('Silhouette Value'); ylabel('Cluster');
meanSilhouette = mean(silhouetteValues);
fprintf('Average Silhouette Value: %.4f\n', meanSilhouette);

步骤 5: 分析与应用

matlab 复制代码

% 查看每个簇的用户数量
clusterCounts = histcounts(idx, 1:optimalK+1);
disp('Number of users in each cluster:');
disp(clusterCounts);

% 将聚类结果与真实标签对比（仅在我们的模拟数据中可行）
confusionmat(trueLabels, idx)
% 如果混淆矩阵接近对角阵，说明聚类效果很好

% 您可以将聚类标签写回数据库，或根据标签筛选出特定用户群进行进一步分析

参考代码负荷聚类，通过对负荷数据进行处理 www.youwenfan.com/contentcsj/53651.html

四、进阶方法

其他聚类算法：
- 层次聚类：不需要预先指定K，可以得到树状图。
- DBSCAN：基于密度，能发现任意形状的簇，并能识别噪声点。
- 高斯混合模型：基于概率模型，提供软分配（一个用户以一定概率属于多个簇）。

处理时间序列的专门方法：

动态时间规整 ：计算两个时间序列之间的最佳匹配距离，对时间轴上的伸缩不敏感，非常适合负荷曲线聚类。MATLAB中可用dtw函数。

matlab 复制代码

% 使用DTW距离进行层次聚类的示例
pairwiseDist = pdist(allProfiles, @(Xi, Xj) dtw(Xi, Xj)); % 计算DTW距离矩阵
tree = linkage(pairwiseDist, 'average'); % 进行层次聚类
figure;
dendrogram(tree, 50); % 绘制树状图
title('Hierarchical Clustering using DTW Distance');