基于三种机器学习方法的磁芯损耗预测模型

大家好,我是带我去滑雪!

磁性元件在电力转换和传输系统中扮演着关键角色,而磁芯损耗是影响其效率的重要指标。随着工业和能源需求的增长,尤其是可再生能源和智能电网的快速发展,对磁性元件的高效、紧凑、低功耗需求日益迫切。磁芯损耗与多种因素有关,包括工作频率、磁通密度、励磁波形、工作温度和材料特性,这些因素的非线性和相互关联性使得损耗预测变得复杂。通过对原始数据进行处理,包括剔除与磁通密度无关的变量,如温度、频率和磁芯损耗,以及检查数据中是否存在缺失值。在确保数据的完整性和相关性之后,采用时域和频域分析方法,结合统计分析、数值计算和信号处理技术,从数据中提取了关键特征,包括最大值、最小值、均值、标准差、偏度、峰度、总谐波失真、小波能级特征和波形因子等17个特征变量。

汇总所有特征变量,并对磁芯材料类型、励磁波形数据等分类变量采用独立热编码将文本转化为数值。分别构建BP神经网络、随机森林回归、轻量梯度提升三种磁芯损耗预测模型,并对比各个模型在验证集上的MSE、MSE、MAE和R方,选择最优的模型对测试集进行预测。研究结果表明,轻量梯度提升模型的预测效果最好。此外为了探究模型的泛化能力,采用k-折交叉验证方法,对模型进行评估。最后,根据模型评估结果给出了相应建议。本文的建模流程如下:

下面开始代码实战:

目录

(一)特征提取

(二)模块导入与数据展示

(三)划分训练集和验证集

(四)绘制变量热力图

(五)三种机器学习模型建立

(六)交叉验证


(一)特征提取

python 复制代码
import pandas as pd
import numpy as np
import scipy.fftpack as fftpack
import pywt
file_path = r"E:\工作\硕士\问题四\附件三测试集特征提取.xlsx"
df = pd.read_excel(file_path, header=None)
flux_density = df.values.flatten()
segment_length = 1024
all_features = []
for i in range(0, len(flux_density), segment_length):
    segment = flux_density[i:i+segment_length]    
    if len(segment) < segment_length: 
        break
    num_samples = len(segment)
    t = np.linspace(0, 1, num_samples)
    max_value = np.max(segment)
    min_value = np.min(segment)
    mean_value = np.mean(segment)
    std_deviation = np.std(segment)
    skewness = np.mean((segment - mean_value)**3) / (std_deviation**3)
    kurtosis = np.mean((segment - mean_value)**4) / (std_deviation**4)
    fft_values = fftpack.fft(segment)
    fft_freqs = fftpack.fftfreq(num_samples, d=t[1] - t[0])
    fft_magnitude = np.abs(fft_values)
    peak_to_peak = max_value - min_value
    dominant_frequency = fft_freqs[np.argmax(fft_magnitude[:len(fft_magnitude)//2])]
    harmonics = np.sqrt(np.sum(fft_magnitude[2:]**2))
    fundamental = fft_magnitude[1] 
    THD = harmonics / fundamental if fundamental != 0 else 0
    time_points = np.arange(len(segment))
    slope = np.polyfit(time_points, segment, 1)[0]
    zero_crossings = np.sum(np.diff(np.sign(segment)) != 0)
    rms_value = np.sqrt(np.mean(segment ** 2))
    mean_abs_value = np.mean(np.abs(segment))
    form_factor = rms_value / mean_abs_value if mean_abs_value != 0 else 0
    wavelet = 'db4'
    coeffs = pywt.wavedec(segment, wavelet, level=4)
    wavelet_coeffs_energy = [np.sum(c**2) for c in coeffs]
    features = {
        '最大值': max_value,
        '最小值': min_value,
        '峰峰值': peak_to_peak,
        '均值': mean_value,
        '标准差': std_deviation,
        '偏度': skewness,
        '峰度': kurtosis,
        '总谐波失真': THD,
        '小波能级1': wavelet_coeffs_energy[0],
        '小波能级2': wavelet_coeffs_energy[1],
        '小波能级3': wavelet_coeffs_energy[2],
        '小波能级4': wavelet_coeffs_energy[3],
        '小波能级5': wavelet_coeffs_energy[4],
        '斜率': slope,
        '过零率': zero_crossings,
        '波形因子 ': form_factor    
    }
    all_features.append(features)
features_df = pd.DataFrame(all_features)
output_file_path = r"E:\工作\硕士\问题四\附件三测试集特征提取结果.xlsx"
features_df.to_excel(output_file_path, index=False)
print(f"特征已提取并保存至 {output_file_path}")

(二)模块导入与数据展示

python 复制代码
import os
import math
import time
import datetime
import random as rn
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
plt.rcParams ['font.sans-serif'] ='SimHei' 
plt.rcParams ['axes.unicode_minus']=False
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import MinMaxScaler,StandardScaler
from sklearn.metrics import mean_absolute_error
from sklearn.metrics import mean_squared_error,r2_score
import tensorflow as tf
import keras
from keras.layers import Layer
import keras.backend as K
from keras.models import Model, Sequential
from keras.layers import GRU, Dense,Conv1D, MaxPooling1D,GlobalMaxPooling1D,Embedding,Dropout,Flatten,SimpleRNN,LSTM,Bidirectional
import tensorflow as tf
from keras.callbacks import EarlyStopping
from tensorflow.keras  import optimizers


data0 = pd.read_csv(
    r'E:\工作\硕士\科研竞赛\第四问模型训练数据.csv',
    encoding="ANSI",
    parse_dates=['序号']
).set_index('序号')[['温度', '频率', '最大值', '最小值', '峰峰值', '均值', '标准差', '偏度', '峰度', '总谐波失真', '小波能级1', '小波能级2', '小波能级3', '小波能级4', '小波能级5', '磁芯材料类型', '斜率', '过零率', '波形因子', '励磁波形', '磁芯损耗']].sort_index()
data0 = data0.astype(float)
data0.head()

数据展示:

(三)划分训练集和验证集

python 复制代码
from matplotlib.ticker import MaxNLocator
def build_sequences(text, window_size=24):
    x, y = [],[]
    for i in range(len(text) - window_size):
        sequence = text[i:i+window_size]
        target = text[i+window_size]
        x.append(sequence)
        y.append(target)
    return np.array(x), np.array(y)
def get_traintest(data,train_ratio=0.8,window_size=24):
    train_size=int(len(data)*train_ratio)
    train=data[:train_size]
    test=data[train_size-window_size:]
    X_train,y_train=build_sequences(train,window_size=window_size)
    X_test,y_test=build_sequences(test,window_size=window_size)
    return X_train,y_train[:,-1],X_test,y_test[:,-1]
data=data0.to_numpy()
scaler = MinMaxScaler() 
scaler = scaler.fit(data[:,:-1])
X=scaler.transform(data[:,:-1])   
y_scaler = MinMaxScaler() 
y_scaler = y_scaler.fit(data[:,-1].reshape(-1,1))
y=y_scaler.transform(data[:,-1].reshape(-1,1))

train_ratio=0.8   
window_size=5  
X_train,y_train,X_test,y_test=get_traintest(np.c_[X,y],window_size=window_size,train_ratio=train_ratio)
print(X_train.shape,y_train.shape,X_test.shape,y_test.shape)
y_test1 = y_scaler.inverse_transform(y_test.reshape(-1,1))
test_size=int(len(data)*(1-train_ratio))
plt.figure(figsize=(10, 5), dpi=256)
plt.plot(data0.index[:-test_size], data0.iloc[:, -1].iloc[:-test_size], label='训练集', color='#FA9905')
plt.plot(data0.index[-test_size:], data0.iloc[:, -1].iloc[-test_size:], label='验证集', color='#FB8498', linestyle='dashed')
ax = plt.gca()
ax.xaxis.set_major_locator(MaxNLocator(10)) 
plt.legend()
plt.ylabel('磁芯损耗(每立方米瓦特)', fontsize=14)
plt.xlabel('序号', fontsize=14)
plt.savefig(r"E:\工作\硕士\科研竞赛\问题四\训练集与验证集比例.png",
            bbox_inches="tight",
            pad_inches=1,
            transparent=True,
            facecolor="w",
            edgecolor='w',
            dpi=300,
            orientation='landscape')

可视化结果展示:

经过统计,共有12400个样本,将其划分为训练集和测试集,其中训练集为数据前80%,共计9920个,测试集为数据的后20%,共计2485个。利用python将磁芯损耗数据可视化,其中黄色实线为训练集数据,粉色虚线为测试集数据。

(四)绘制变量热力图

结合提取的特征变量,本问模型构建选取的特征变量有温度、频率、最大值、最小值、峰 峰值、均值、标准差、偏度、峰度、总谐波失真、小波能级1、小波能级2、小波能级3、小波能级4、小波能级5、磁芯材料类型、斜率、过零率、波形因子、励磁波形共计20个,响应变量为磁芯损耗。使用皮尔逊相关系数计算每个变量之间的相关系数,并绘制热力图。为了探究特征变量对磁芯损耗的影响,绘制了随机森林特征重要性排序。

python 复制代码
import numpy as np 
import pandas as pd 
import matplotlib.pyplot as plt 
import seaborn as sns 
import datetime
%matplotlib inline
plt.rcParams['font.sans-serif'] = ['KaiTi']
plt.rcParams['axes.unicode_minus'] = False


data0 = pd.read_csv(
    r'E:\工作\硕士\科研竞赛\问题四\第四问模型训练数据.csv',
    encoding="ANSI",
    parse_dates=['序号']
).set_index('序号')[['温度', '频率', '最大值', '最小值', '峰峰值', '均值', '标准差', '偏度', '峰度', '总谐波失真', '小波能级1', '小波能级2', '小波能级3', '小波能级4', '小波能级5', '磁芯材料类型', '斜率', '过零率', '波形因子', '励磁波形', '磁芯损耗']].sort_index()
data = data0.drop(columns=['磁芯损耗'])
y= data0['磁芯损耗']
data.shape,y.shape

corr = plt.subplots(figsize = (18,16),dpi=300)
corr= sns.heatmap(data.assign(Y=y).corr(method='spearman'),annot=True,square=True)
plt.savefig(r"E:\工作\硕士\科研竞赛\问题四\热力图.png",
            bbox_inches="tight",
            pad_inches=1,
            transparent=True,
            facecolor="w",
            edgecolor='w',
            dpi=300,
            orientation='landscape')

输出结果:

(五)三种机器学习模型建立

python 复制代码
from sklearn.model_selection import train_test_split
X_train,X_val,y_train,y_val=train_test_split(data,y,test_size=0.2,random_state=0)
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
scaler.fit(X_train)
X_train_s = scaler.transform(X_train)
X_val_s = scaler.transform(X_val)
from sklearn.linear_model import LinearRegression
from sklearn.linear_model import ElasticNet
from sklearn.neighbors import KNeighborsRegressor
from sklearn.tree import DecisionTreeRegressor
from sklearn.ensemble import RandomForestRegressor
from sklearn.ensemble import GradientBoostingRegressor
from xgboost.sklearn import XGBRegressor
from lightgbm import LGBMRegressor
from sklearn.svm import SVR
from sklearn.neural_network import MLPRegressor
model5= RandomForestRegressor(n_estimators=500,  max_features=int(X_train.shape[1]/3) , random_state=0)
model7 =  XGBRegressor(objective='reg:squarederror', n_estimators=1000, random_state=0)
model8 = LGBMRegressor(n_estimators=1000,objective='regression',random_state=0)
model10 = MLPRegressor(hidden_layer_sizes=(16,8), random_state=77, max_iter=10000)
model_list=[model1,model2,model3,model4,model5,model6,model7,model8,model9,model10]
model_name=['随机森林','轻量梯度提升','BP神经网络']
for i in range(3):
    model_C=model_list[i]
    name=model_name[i]
    model_C.fit(X_train_s, y_train)
    s=model_C.score(X_val_s, y_val)
    print(name+'方法在验证集的准确率为:'+str(s))

输出结果:

(六)交叉验证

python 复制代码
from sklearn.metrics import mean_absolute_error
from sklearn.metrics import mean_squared_error,r2_score
from sklearn.model_selection import KFold
 
def evaluation(y_test, y_predict):
    mae = mean_absolute_error(y_test, y_predict)
    mse = mean_squared_error(y_test, y_predict)
    rmse = np.sqrt(mean_squared_error(y_test, y_predict))
    mape=(abs(y_predict -y_test)/ y_test).mean()
    r_2=r2_score(y_test, y_predict)
    return mae, rmse, mape
def evaluation2(lis):
    array=np.array(lis)
    return array.mean() , array.std()

def cross_val(model=None,X=None,Y=None,K=5,repeated=1):
    df_mean=pd.DataFrame(columns=['R2','MAE','RMSE','MAPE']) 
    df_std=pd.DataFrame(columns=['R2','MAE','RMSE','MAPE'])
    for n in range(repeated):
        print(f'正在进行第{n+1}次重复K折.....随机数种子为{n}\n')
        kf = KFold(n_splits=K, shuffle=True, random_state=n)
        R2=[]
        MAE=[]
        RMSE=[]
        MAPE=[]
        print(f"    开始本次在{K}折数据上的交叉验证.......\n")
        i=1
        for train_index, test_index in kf.split(X):
            print(f'        正在进行第{i}折的计算')
            X_train=X.values[train_index]
            y_train=y.values[train_index]
            X_test=X.values[test_index]
            y_test=y.values[test_index]
            model.fit(X_train,y_train)
            score=model.score(X_test,y_test)
            R2.append(score)
            pred=model.predict(X_test)
            mae, rmse, mape=evaluation(y_test, pred)
            MAE.append(mae)
            RMSE.append(rmse)
            MAPE.append(mape)
            print(f'        第{i}折的拟合优度为:{round(score,4)},MAE为{round(mae,4)},RMSE为{round(rmse,4)},MAPE为{round(mape,4)}')
            i+=1
        print(f'    ---------------------------------------------完成本次的{K}折交叉验证---------------------------------------------------------\n')
        R2_mean,R2_std=evaluation2(R2)
        MAE_mean,MAE_std=evaluation2(MAE)
        RMSE_mean,RMSE_std=evaluation2(RMSE)
        MAPE_mean,MAPE_std=evaluation2(MAPE)
        print(f'第{n+1}次重复K折,本次{K}折交叉验证的总体拟合优度均值为{R2_mean},方差为{R2_std}')
        print(f'                               总体MAE均值为{MAE_mean},方差为{MAE_std}')
        print(f'                               总体RMSE均值为{RMSE_mean},方差为{RMSE_std}')
        print(f'                               总体MAPE均值为{MAPE_mean},方差为{MAPE_std}')
        print("\n====================================================================================================================\n")
        df1=pd.DataFrame(dict(zip(['R2','MAE','RMSE','MAPE'],[R2_mean,MAE_mean,RMSE_mean,MAPE_mean])),index=[n])
        df_mean=pd.concat([df_mean,df1])
        df2=pd.DataFrame(dict(zip(['R2','MAE','RMSE','MAPE'],[R2_std,MAE_std,RMSE_std,MAPE_std])),index=[n])
        df_std=pd.concat([df_std,df2])
    return df_mean,df_std


model = LGBMRegressor(n_estimators=1000,objective='regression',random_state=0)
lgb_crosseval,lgb_crosseval2=cross_val(model=model,X=data,Y=y,K=3,repeated=5)

lgb_crosseval

部分输出结果展示:

为了验证各个模型的泛化能力,本文采用5-折交叉验证方法,评估模型的泛化能力。展示了在5 折交叉验证下三种模型的总体拟合优度均值,而展示了在5折交叉验证下不同模型的总体拟合优度方差。


更多优质内容持续发布中,请移步主页查看。

若有问题可邮箱联系:1736732074@qq.com

博主的WeChat:TCB1736732074

点赞+关注,下次不迷路!

相关推荐
丝斯20114 分钟前
AI学习笔记整理(76)——Python学习5
人工智能·笔记·学习
LaughingZhu5 分钟前
Product Hunt 每日热榜 | 2026-03-22
大数据·数据库·人工智能·经验分享·搜索引擎
醉颜凉8 分钟前
Seal^_^【送书活动第8期】——《ChatGLM3大模型本地化部署、应用开发与微调》
人工智能·职场和发展·送书活动·chatglm3大模型
进击的野人10 分钟前
从AI“说人话”到“说结构话”:Spring AI结构化输出实战解析
人工智能·spring·ai编程
jay神10 分钟前
基于深度学习的车辆识别收费管理系统
人工智能·深度学习·yolo·目标检测·毕业设计
进击的雷神14 分钟前
Trae AI IDE 完全指南:从入门到精通
大数据·ide·人工智能·trae
汀丶人工智能17 分钟前
基于 Milvus 构建企业级 RAG 问答系统:从原理到实践-CSDN博客
人工智能
工边页字17 分钟前
为什么 RAG系统里,Embedding成本往往远低于 LLM成本,但很多公司仍然疯狂优化 Embedding?
前端·人工智能·后端
宇擎智脑科技18 分钟前
A2A 协议规范深度剖析:三层架构、数据模型、操作语义与协议绑定
人工智能·a2a
Mintopia21 分钟前
如何降低 Prompt 对 AI 理解的干扰
人工智能