【scikit-learn 1.2版本后】sklearn.datasets中load_boston报错 使用 fetch_openml 函数来加载波士顿房价

ImportError:
load_boston has been removed from scikit-learn since version 1.2.

由于 load_boston 已经在 scikit-learn 1.2 版本中被移除,需要使用 fetch_openml 函数来加载波士顿房价数据集。

python 复制代码
# 导入sklearn数据集模块
from sklearn import datasets
# 导入波士顿房价数据集
data_x, data_y = datasets.fetch_openml(name="boston", version=1, as_frame=True, return_X_y=True, parser="pandas")

这段代码的功能是从 OpenML 数据集库中获取名为"boston"的数据集,并将其加载为 Pandas DataFrame 格式。具体步骤如下:

  1. 调用 datasets.fetch_openml 函数。
  2. 指定数据集名称为 "boston",版本为 1。
  3. 设置 as_frame=True,使数据以 Pandas DataFrame 格式返回。
  4. 设置 return_X_y=True,返回特征数据和目标数据。
  5. 设置 parser="pandas",使用 Pandas 解析器。

以下是完整的报错信息,包含了修改的建议:

ImportError:
load_boston has been removed from scikit-learn since version 1.2.

The Boston housing prices dataset has an ethical problem: as investigated in [1], the authors of this dataset engineered a non-invertible variable "B" assuming that racial self-segregation had a positive impact on house prices [2]. Furthermore the goal of the research that led to the creation of this dataset was to study the impact of air quality but it did not give adequate demonstration of the

validity of this assumption.

The scikit-learn maintainers therefore strongly discourage the use of this dataset unless the purpose of the code is to study and educate about ethical issues in data science and machine learning.

In this special case, you can fetch the dataset from the original

source::

import pandas as pd
import numpy as np

data_url = "http://lib.stat.cmu.edu/datasets/boston"
raw_df = pd.read_csv(data_url, sep="\s+", skiprows=22, header=None)
data = np.hstack([raw_df.values[::2, :], raw_df.values[1::2, :2]])
target = raw_df.values[1::2, 2]

Alternative datasets include the California housing dataset and the Ames housing dataset. You can load the datasets as follows::

from sklearn.datasets import fetch_california_housing
housing = fetch_california_housing()

for the California housing dataset and::

from sklearn.datasets import fetch_openml
housing = fetch_openml(name="house_prices", as_frame=True)

for the Ames housing dataset.

[1] M Carlisle.

"Racist data destruction?" https://medium.com/@docintangible/racist-data-destruction-113e3eff54a8

[2] Harrison Jr, David, and Daniel L. Rubinfeld. "Hedonic housing prices and the demand for clean air." Journal of environmental economics and management 5.1 (1978): 81-102.
https://www.researchgate.net/publication/4974606_Hedonic_housing_prices_and_the_demand_for_clean_air

相关推荐
hence..5 分钟前
Vscode写markdown快速插入python代码
ide·vscode·python
DanielYQ1 小时前
LCR 001 两数相除
开发语言·python·算法
sp_fyf_20241 小时前
【大语言模型】ACL2024论文-18 MINPROMPT:基于图的最小提示数据增强用于少样本问答
人工智能·深度学习·神经网络·目标检测·机器学习·语言模型·自然语言处理
爱喝白开水a1 小时前
Sentence-BERT实现文本匹配【分类目标函数】
人工智能·深度学习·机器学习·自然语言处理·分类·bert·大模型微调
vener_1 小时前
LuckySheet协同编辑后端示例(Django+Channel,Websocket通信)
javascript·后端·python·websocket·django·luckysheet
封步宇AIGC2 小时前
量化交易系统开发-实时行情自动化交易-4.2.3.指数移动平均线实现
人工智能·python·机器学习·数据挖掘
互联网杂货铺2 小时前
自动化测试基础知识总结
自动化测试·软件测试·python·selenium·测试工具·职场和发展·测试用例
小汤猿人类2 小时前
SpringTask
开发语言·python
Mr.谢尔比2 小时前
李宏毅机器学习课程知识点摘要(1-5集)
人工智能·pytorch·深度学习·神经网络·算法·机器学习·计算机视觉
DieYoung_Alive2 小时前
一篇文章了解机器学习
人工智能·机器学习