基于三种机器学习方法的磁芯损耗预测模型

大家好,我是带我去滑雪!

磁性元件在电力转换和传输系统中扮演着关键角色,而磁芯损耗是影响其效率的重要指标。随着工业和能源需求的增长,尤其是可再生能源和智能电网的快速发展,对磁性元件的高效、紧凑、低功耗需求日益迫切。磁芯损耗与多种因素有关,包括工作频率、磁通密度、励磁波形、工作温度和材料特性,这些因素的非线性和相互关联性使得损耗预测变得复杂。通过对原始数据进行处理,包括剔除与磁通密度无关的变量,如温度、频率和磁芯损耗,以及检查数据中是否存在缺失值。在确保数据的完整性和相关性之后,采用时域和频域分析方法,结合统计分析、数值计算和信号处理技术,从数据中提取了关键特征,包括最大值、最小值、均值、标准差、偏度、峰度、总谐波失真、小波能级特征和波形因子等17个特征变量。

汇总所有特征变量,并对磁芯材料类型、励磁波形数据等分类变量采用独立热编码将文本转化为数值。分别构建BP神经网络、随机森林回归、轻量梯度提升三种磁芯损耗预测模型,并对比各个模型在验证集上的MSE、MSE、MAE和R方,选择最优的模型对测试集进行预测。研究结果表明,轻量梯度提升模型的预测效果最好。此外为了探究模型的泛化能力,采用k-折交叉验证方法,对模型进行评估。最后,根据模型评估结果给出了相应建议。本文的建模流程如下:

下面开始代码实战:

目录

(一)特征提取

(二)模块导入与数据展示

(三)划分训练集和验证集

(四)绘制变量热力图

(五)三种机器学习模型建立

(六)交叉验证


(一)特征提取

python 复制代码
import pandas as pd
import numpy as np
import scipy.fftpack as fftpack
import pywt
file_path = r"E:\工作\硕士\问题四\附件三测试集特征提取.xlsx"
df = pd.read_excel(file_path, header=None)
flux_density = df.values.flatten()
segment_length = 1024
all_features = []
for i in range(0, len(flux_density), segment_length):
    segment = flux_density[i:i+segment_length]    
    if len(segment) < segment_length: 
        break
    num_samples = len(segment)
    t = np.linspace(0, 1, num_samples)
    max_value = np.max(segment)
    min_value = np.min(segment)
    mean_value = np.mean(segment)
    std_deviation = np.std(segment)
    skewness = np.mean((segment - mean_value)**3) / (std_deviation**3)
    kurtosis = np.mean((segment - mean_value)**4) / (std_deviation**4)
    fft_values = fftpack.fft(segment)
    fft_freqs = fftpack.fftfreq(num_samples, d=t[1] - t[0])
    fft_magnitude = np.abs(fft_values)
    peak_to_peak = max_value - min_value
    dominant_frequency = fft_freqs[np.argmax(fft_magnitude[:len(fft_magnitude)//2])]
    harmonics = np.sqrt(np.sum(fft_magnitude[2:]**2))
    fundamental = fft_magnitude[1] 
    THD = harmonics / fundamental if fundamental != 0 else 0
    time_points = np.arange(len(segment))
    slope = np.polyfit(time_points, segment, 1)[0]
    zero_crossings = np.sum(np.diff(np.sign(segment)) != 0)
    rms_value = np.sqrt(np.mean(segment ** 2))
    mean_abs_value = np.mean(np.abs(segment))
    form_factor = rms_value / mean_abs_value if mean_abs_value != 0 else 0
    wavelet = 'db4'
    coeffs = pywt.wavedec(segment, wavelet, level=4)
    wavelet_coeffs_energy = [np.sum(c**2) for c in coeffs]
    features = {
        '最大值': max_value,
        '最小值': min_value,
        '峰峰值': peak_to_peak,
        '均值': mean_value,
        '标准差': std_deviation,
        '偏度': skewness,
        '峰度': kurtosis,
        '总谐波失真': THD,
        '小波能级1': wavelet_coeffs_energy[0],
        '小波能级2': wavelet_coeffs_energy[1],
        '小波能级3': wavelet_coeffs_energy[2],
        '小波能级4': wavelet_coeffs_energy[3],
        '小波能级5': wavelet_coeffs_energy[4],
        '斜率': slope,
        '过零率': zero_crossings,
        '波形因子 ': form_factor    
    }
    all_features.append(features)
features_df = pd.DataFrame(all_features)
output_file_path = r"E:\工作\硕士\问题四\附件三测试集特征提取结果.xlsx"
features_df.to_excel(output_file_path, index=False)
print(f"特征已提取并保存至 {output_file_path}")

(二)模块导入与数据展示

python 复制代码
import os
import math
import time
import datetime
import random as rn
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
plt.rcParams ['font.sans-serif'] ='SimHei' 
plt.rcParams ['axes.unicode_minus']=False
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import MinMaxScaler,StandardScaler
from sklearn.metrics import mean_absolute_error
from sklearn.metrics import mean_squared_error,r2_score
import tensorflow as tf
import keras
from keras.layers import Layer
import keras.backend as K
from keras.models import Model, Sequential
from keras.layers import GRU, Dense,Conv1D, MaxPooling1D,GlobalMaxPooling1D,Embedding,Dropout,Flatten,SimpleRNN,LSTM,Bidirectional
import tensorflow as tf
from keras.callbacks import EarlyStopping
from tensorflow.keras  import optimizers


data0 = pd.read_csv(
    r'E:\工作\硕士\科研竞赛\第四问模型训练数据.csv',
    encoding="ANSI",
    parse_dates=['序号']
).set_index('序号')[['温度', '频率', '最大值', '最小值', '峰峰值', '均值', '标准差', '偏度', '峰度', '总谐波失真', '小波能级1', '小波能级2', '小波能级3', '小波能级4', '小波能级5', '磁芯材料类型', '斜率', '过零率', '波形因子', '励磁波形', '磁芯损耗']].sort_index()
data0 = data0.astype(float)
data0.head()

数据展示:

(三)划分训练集和验证集

python 复制代码
from matplotlib.ticker import MaxNLocator
def build_sequences(text, window_size=24):
    x, y = [],[]
    for i in range(len(text) - window_size):
        sequence = text[i:i+window_size]
        target = text[i+window_size]
        x.append(sequence)
        y.append(target)
    return np.array(x), np.array(y)
def get_traintest(data,train_ratio=0.8,window_size=24):
    train_size=int(len(data)*train_ratio)
    train=data[:train_size]
    test=data[train_size-window_size:]
    X_train,y_train=build_sequences(train,window_size=window_size)
    X_test,y_test=build_sequences(test,window_size=window_size)
    return X_train,y_train[:,-1],X_test,y_test[:,-1]
data=data0.to_numpy()
scaler = MinMaxScaler() 
scaler = scaler.fit(data[:,:-1])
X=scaler.transform(data[:,:-1])   
y_scaler = MinMaxScaler() 
y_scaler = y_scaler.fit(data[:,-1].reshape(-1,1))
y=y_scaler.transform(data[:,-1].reshape(-1,1))

train_ratio=0.8   
window_size=5  
X_train,y_train,X_test,y_test=get_traintest(np.c_[X,y],window_size=window_size,train_ratio=train_ratio)
print(X_train.shape,y_train.shape,X_test.shape,y_test.shape)
y_test1 = y_scaler.inverse_transform(y_test.reshape(-1,1))
test_size=int(len(data)*(1-train_ratio))
plt.figure(figsize=(10, 5), dpi=256)
plt.plot(data0.index[:-test_size], data0.iloc[:, -1].iloc[:-test_size], label='训练集', color='#FA9905')
plt.plot(data0.index[-test_size:], data0.iloc[:, -1].iloc[-test_size:], label='验证集', color='#FB8498', linestyle='dashed')
ax = plt.gca()
ax.xaxis.set_major_locator(MaxNLocator(10)) 
plt.legend()
plt.ylabel('磁芯损耗(每立方米瓦特)', fontsize=14)
plt.xlabel('序号', fontsize=14)
plt.savefig(r"E:\工作\硕士\科研竞赛\问题四\训练集与验证集比例.png",
            bbox_inches="tight",
            pad_inches=1,
            transparent=True,
            facecolor="w",
            edgecolor='w',
            dpi=300,
            orientation='landscape')

可视化结果展示:

经过统计,共有12400个样本,将其划分为训练集和测试集,其中训练集为数据前80%,共计9920个,测试集为数据的后20%,共计2485个。利用python将磁芯损耗数据可视化,其中黄色实线为训练集数据,粉色虚线为测试集数据。

(四)绘制变量热力图

结合提取的特征变量,本问模型构建选取的特征变量有温度、频率、最大值、最小值、峰 峰值、均值、标准差、偏度、峰度、总谐波失真、小波能级1、小波能级2、小波能级3、小波能级4、小波能级5、磁芯材料类型、斜率、过零率、波形因子、励磁波形共计20个,响应变量为磁芯损耗。使用皮尔逊相关系数计算每个变量之间的相关系数,并绘制热力图。为了探究特征变量对磁芯损耗的影响,绘制了随机森林特征重要性排序。

python 复制代码
import numpy as np 
import pandas as pd 
import matplotlib.pyplot as plt 
import seaborn as sns 
import datetime
%matplotlib inline
plt.rcParams['font.sans-serif'] = ['KaiTi']
plt.rcParams['axes.unicode_minus'] = False


data0 = pd.read_csv(
    r'E:\工作\硕士\科研竞赛\问题四\第四问模型训练数据.csv',
    encoding="ANSI",
    parse_dates=['序号']
).set_index('序号')[['温度', '频率', '最大值', '最小值', '峰峰值', '均值', '标准差', '偏度', '峰度', '总谐波失真', '小波能级1', '小波能级2', '小波能级3', '小波能级4', '小波能级5', '磁芯材料类型', '斜率', '过零率', '波形因子', '励磁波形', '磁芯损耗']].sort_index()
data = data0.drop(columns=['磁芯损耗'])
y= data0['磁芯损耗']
data.shape,y.shape

corr = plt.subplots(figsize = (18,16),dpi=300)
corr= sns.heatmap(data.assign(Y=y).corr(method='spearman'),annot=True,square=True)
plt.savefig(r"E:\工作\硕士\科研竞赛\问题四\热力图.png",
            bbox_inches="tight",
            pad_inches=1,
            transparent=True,
            facecolor="w",
            edgecolor='w',
            dpi=300,
            orientation='landscape')

输出结果:

(五)三种机器学习模型建立

python 复制代码
from sklearn.model_selection import train_test_split
X_train,X_val,y_train,y_val=train_test_split(data,y,test_size=0.2,random_state=0)
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
scaler.fit(X_train)
X_train_s = scaler.transform(X_train)
X_val_s = scaler.transform(X_val)
from sklearn.linear_model import LinearRegression
from sklearn.linear_model import ElasticNet
from sklearn.neighbors import KNeighborsRegressor
from sklearn.tree import DecisionTreeRegressor
from sklearn.ensemble import RandomForestRegressor
from sklearn.ensemble import GradientBoostingRegressor
from xgboost.sklearn import XGBRegressor
from lightgbm import LGBMRegressor
from sklearn.svm import SVR
from sklearn.neural_network import MLPRegressor
model5= RandomForestRegressor(n_estimators=500,  max_features=int(X_train.shape[1]/3) , random_state=0)
model7 =  XGBRegressor(objective='reg:squarederror', n_estimators=1000, random_state=0)
model8 = LGBMRegressor(n_estimators=1000,objective='regression',random_state=0)
model10 = MLPRegressor(hidden_layer_sizes=(16,8), random_state=77, max_iter=10000)
model_list=[model1,model2,model3,model4,model5,model6,model7,model8,model9,model10]
model_name=['随机森林','轻量梯度提升','BP神经网络']
for i in range(3):
    model_C=model_list[i]
    name=model_name[i]
    model_C.fit(X_train_s, y_train)
    s=model_C.score(X_val_s, y_val)
    print(name+'方法在验证集的准确率为:'+str(s))

输出结果:

(六)交叉验证

python 复制代码
from sklearn.metrics import mean_absolute_error
from sklearn.metrics import mean_squared_error,r2_score
from sklearn.model_selection import KFold
 
def evaluation(y_test, y_predict):
    mae = mean_absolute_error(y_test, y_predict)
    mse = mean_squared_error(y_test, y_predict)
    rmse = np.sqrt(mean_squared_error(y_test, y_predict))
    mape=(abs(y_predict -y_test)/ y_test).mean()
    r_2=r2_score(y_test, y_predict)
    return mae, rmse, mape
def evaluation2(lis):
    array=np.array(lis)
    return array.mean() , array.std()

def cross_val(model=None,X=None,Y=None,K=5,repeated=1):
    df_mean=pd.DataFrame(columns=['R2','MAE','RMSE','MAPE']) 
    df_std=pd.DataFrame(columns=['R2','MAE','RMSE','MAPE'])
    for n in range(repeated):
        print(f'正在进行第{n+1}次重复K折.....随机数种子为{n}\n')
        kf = KFold(n_splits=K, shuffle=True, random_state=n)
        R2=[]
        MAE=[]
        RMSE=[]
        MAPE=[]
        print(f"    开始本次在{K}折数据上的交叉验证.......\n")
        i=1
        for train_index, test_index in kf.split(X):
            print(f'        正在进行第{i}折的计算')
            X_train=X.values[train_index]
            y_train=y.values[train_index]
            X_test=X.values[test_index]
            y_test=y.values[test_index]
            model.fit(X_train,y_train)
            score=model.score(X_test,y_test)
            R2.append(score)
            pred=model.predict(X_test)
            mae, rmse, mape=evaluation(y_test, pred)
            MAE.append(mae)
            RMSE.append(rmse)
            MAPE.append(mape)
            print(f'        第{i}折的拟合优度为:{round(score,4)},MAE为{round(mae,4)},RMSE为{round(rmse,4)},MAPE为{round(mape,4)}')
            i+=1
        print(f'    ---------------------------------------------完成本次的{K}折交叉验证---------------------------------------------------------\n')
        R2_mean,R2_std=evaluation2(R2)
        MAE_mean,MAE_std=evaluation2(MAE)
        RMSE_mean,RMSE_std=evaluation2(RMSE)
        MAPE_mean,MAPE_std=evaluation2(MAPE)
        print(f'第{n+1}次重复K折,本次{K}折交叉验证的总体拟合优度均值为{R2_mean},方差为{R2_std}')
        print(f'                               总体MAE均值为{MAE_mean},方差为{MAE_std}')
        print(f'                               总体RMSE均值为{RMSE_mean},方差为{RMSE_std}')
        print(f'                               总体MAPE均值为{MAPE_mean},方差为{MAPE_std}')
        print("\n====================================================================================================================\n")
        df1=pd.DataFrame(dict(zip(['R2','MAE','RMSE','MAPE'],[R2_mean,MAE_mean,RMSE_mean,MAPE_mean])),index=[n])
        df_mean=pd.concat([df_mean,df1])
        df2=pd.DataFrame(dict(zip(['R2','MAE','RMSE','MAPE'],[R2_std,MAE_std,RMSE_std,MAPE_std])),index=[n])
        df_std=pd.concat([df_std,df2])
    return df_mean,df_std


model = LGBMRegressor(n_estimators=1000,objective='regression',random_state=0)
lgb_crosseval,lgb_crosseval2=cross_val(model=model,X=data,Y=y,K=3,repeated=5)

lgb_crosseval

部分输出结果展示:

为了验证各个模型的泛化能力,本文采用5-折交叉验证方法,评估模型的泛化能力。展示了在5 折交叉验证下三种模型的总体拟合优度均值,而展示了在5折交叉验证下不同模型的总体拟合优度方差。


更多优质内容持续发布中,请移步主页查看。

若有问题可邮箱联系:1736732074@qq.com

博主的WeChat:TCB1736732074

点赞+关注,下次不迷路!

相关推荐
青春不败 177-3266-052040 分钟前
MATLAB近红外光谱分析技术及实践技术应用
随机森林·机器学习·支持向量机·matlab·卷积神经网络·遗传算法·近红外光谱
白-胖-子5 小时前
深入剖析大模型在文本生成式 AI 产品架构中的核心地位
人工智能·架构
想要成为计算机高手6 小时前
11. isaacsim4.2教程-Transform 树与Odometry
人工智能·机器人·自动驾驶·ros·rviz·isaac sim·仿真环境
NeoFii6 小时前
Day 22: 复习
机器学习
静心问道6 小时前
InstructBLIP:通过指令微调迈向通用视觉-语言模型
人工智能·多模态·ai技术应用
宇称不守恒4.07 小时前
2025暑期—06神经网络-常见网络2
网络·人工智能·神经网络
小楓12017 小时前
醫護行業在未來會被AI淘汰嗎?
人工智能·醫療·護理·職業
数据与人工智能律师8 小时前
数字迷雾中的安全锚点:解码匿名化与假名化的法律边界与商业价值
大数据·网络·人工智能·云计算·区块链
chenchihwen8 小时前
大模型应用班-第2课 DeepSeek使用与提示词工程课程重点 学习ollama 安装 用deepseek-r1:1.5b 分析PDF 内容
人工智能·学习
说私域8 小时前
公域流量向私域流量转化策略研究——基于开源AI智能客服、AI智能名片与S2B2C商城小程序的融合应用
人工智能·小程序