Python酷库之旅-第三方库Pandas(058)

# 221、pandas.Series.interpolate方法
pandas.Series.interpolate(method='linear', *, axis=0, limit=None, inplace=False, limit_direction=None, limit_area=None, downcast=_NoDefault.no_default, **kwargs)
Fill NaN values using an interpolation method.

Please note that only method='linear' is supported for DataFrame/Series with a MultiIndex.

Parameters:
methodstr, default 'linear'
Interpolation technique to use. One of:

'linear': Ignore the index and treat the values as equally spaced. This is the only method supported on MultiIndexes.

'time': Works on daily and higher resolution data to interpolate given length of interval.

'index', 'values': use the actual numerical values of the index.

'pad': Fill in NaNs using existing values.

'nearest', 'zero', 'slinear', 'quadratic', 'cubic', 'barycentric', 'polynomial': Passed to scipy.interpolate.interp1d, whereas 'spline' is passed to scipy.interpolate.UnivariateSpline. These methods use the numerical values of the index. Both 'polynomial' and 'spline' require that you also specify an order (int), e.g. df.interpolate(method='polynomial', order=5). Note that, slinear method in Pandas refers to the Scipy first order spline instead of Pandas first order spline.

'krogh', 'piecewise_polynomial', 'spline', 'pchip', 'akima', 'cubicspline': Wrappers around the SciPy interpolation methods of similar names. See Notes.

'from_derivatives': Refers to scipy.interpolate.BPoly.from_derivatives.

axis{{0 or 'index', 1 or 'columns', None}}, default None
Axis to interpolate along. For Series this parameter is unused and defaults to 0.

limitint, optional
Maximum number of consecutive NaNs to fill. Must be greater than 0.

inplacebool, default False
Update the data in place if possible.

limit_direction{{'forward', 'backward', 'both'}}, Optional
Consecutive NaNs will be filled in this direction.

If limit is specified:
If 'method' is 'pad' or 'ffill', 'limit_direction' must be 'forward'.

If 'method' is 'backfill' or 'bfill', 'limit_direction' must be 'backwards'.

If 'limit' is not specified:
If 'method' is 'backfill' or 'bfill', the default is 'backward'

else the default is 'forward'

raises ValueError if
limit_direction
is 'forward' or 'both' and
method is 'backfill' or 'bfill'.

raises ValueError if
limit_direction
is 'backward' or 'both' and
method is 'pad' or 'ffill'.

limit_area{{None, 'inside', 'outside'}}, default None
If limit is specified, consecutive NaNs will be filled with this restriction.

None: No fill restriction.

'inside': Only fill NaNs surrounded by valid values (interpolate).

'outside': Only fill NaNs outside valid values (extrapolate).

downcastoptional, 'infer' or None, defaults to None
Downcast dtypes if possible.

Deprecated since version 2.1.0.

``**kwargs``optional
Keyword arguments to pass on to the interpolating function.

Returns:
Series or DataFrame or None
Returns the same object type as the caller, interpolated at some or all NaN values or None if inplace=True.

221-2、参数

221-2-1、method**(可选，默认值为'linear')****：**表示插值方法，可以选择的插值方法有：

**'linear'：**线性插值。
**'time'：**基于时间序列的线性插值。
**'index'：**基于索引的线性插值。
**'nearest'：**使用最近的值插值。
**'zero'：**阶梯插值。
**'slinear'：**样条插值(一次)。
**'quadratic'：**样条插值(二次)。
**'cubic'：**样条插值(三次)。
其他方法如**'polynomial'、'barycentric'、'krogh'、'piecewise_polynomial'**等。

221-2-2、axis**(可选，默认值为0)****：**沿着哪一个轴进行插值，对于Series对象，只能为0(索引)。

221-2-3、limit**(可选，默认值为None)****：**限制可以填充的连续NaN的最大数量。例如，limit=2表示最多填充两个连续的NaN。

221-2-4、inplace**(可选，默认值为False)****：**如果为True，直接在原来的Series对象上进行插值并返回None；否则，返回一个新对象。

221-2-5、limit_direction**(可选，默认值为None)****：**指定填充方向，选项有：

**'forward'：**向前填充。
**'backward'：**向后填充。
**'both'：**两方向填充。

221-2-6、limit_area**(可选，默认值为None)****：**限制插值的区域，选项有：

**'inside'：**只在NaN区域内部插值。
**'outside'：**只在NaN区域外部插值。

221-2-7、downcast**(可选)****：**选择是否将结果的数据类型向下转换为更低的数据类型，例如将float64转换为float32。

221-2-8、**kwargs**(可选)****：**其他关键字参数，为后续扩展功能做预留。

221-3、功能

用于填充或插值Series对象中的缺失值(NaN)，该方法可以使用多种插值方法，其中最常用的是线性插值。

221-4、返回值

返回一个新的Series对象(除非设置了inplace=True)，通过合理选择插值方法和参数，可以有效地填补数据中的缺失值，从而提高数据的完整性和分析的准确性。

221-5、说明

使用场景：

221-5-1、时间序列数据：在处理时间序列数据时，常常会遇到缺失值，使用插值方法可以填补这些缺失值，从而保持数据的连续性。例如，天气数据、股市数据等。

221-5-2、传感器数据：传感器数据可能由于设备故障或数据丢失出现缺失，插值可以帮助恢复这些数据，使数据分析更准确。

221-5-3、实验数据：在实验过程中，某些测量值可能丢失或未能记录，插值可以用于推测这些缺失的数据点，以完成实验数据的完整性。

221-5-4、数据清理：在数据清理过程中，插值可以用来填补因数据收集错误或其他问题造成的缺失值，确保数据集的完整性和一致性。

221-5-5、模型训练：对于一些需要完整数据集的机器学习模型，插值可以帮助填补训练数据中的缺失值，避免因缺失值导致模型训练的失败或效果不佳。

221-5-6、数据可视化：在数据可视化过程中，插值可以使数据图表更加平滑，改善视觉效果，特别是在展示趋势和模式时。

221-6、用法

221-6-1、数据准备

python 复制代码

无

221-6-2、代码示例

python 复制代码

# 221、pandas.Series.interpolate方法
#221-1、时间序列数据
import pandas as pd
import numpy as np
# 创建时间序列数据
dates = pd.date_range('2024-01-01', periods=10)
data = pd.Series([20, np.nan, 22, np.nan, 23, 24, np.nan, 25, 26, 27], index=dates)
print("原始数据：")
print(data)
# 使用线性插值填补缺失值
interpolated_data = data.interpolate(method='linear')
print("插值后的数据：")
print(interpolated_data, end='\n\n')

#221-2、传感器数据
import pandas as pd
import numpy as np
# 创建传感器数据
timestamps = pd.date_range('2024-01-01', periods=10, freq='h')
sensor_data = pd.Series([1.2, np.nan, 1.5, np.nan, 1.7, 1.8, np.nan, 2.0, 2.1, 2.2], index=timestamps)
print("原始传感器数据：")
print(sensor_data)
# 使用时间插值填补缺失值
interpolated_sensor_data = sensor_data.interpolate(method='time')
print("插值后的传感器数据：")
print(interpolated_sensor_data, end='\n\n')

#221-3、实验数据
import pandas as pd
import numpy as np
# 创建实验数据
experiment_data = pd.Series([5.2, np.nan, 5.8, np.nan, 6.3, 6.5, np.nan], index=[1, 2, 3, 4, 5, 6, 7])
print("原始实验数据：")
print(experiment_data)
# 使用多项式插值填补缺失值
interpolated_experiment_data = experiment_data.interpolate(method='polynomial', order=2)
print("插值后的实验数据：")
print(interpolated_experiment_data, end='\n\n')

#221-4、数据清理
import pandas as pd
import numpy as np
# 创建数据集
data_cleaning = pd.Series([10, np.nan, np.nan, 12, 13, np.nan, 15], index=[1, 2, 3, 4, 5, 6, 7])
print("原始数据集：")
print(data_cleaning)
# 使用线性插值填补缺失值
cleaned_data = data_cleaning.interpolate(method='linear')
print("清理后的数据集：")
print(cleaned_data, end='\n\n')

#221-5、模型训练
import pandas as pd
import numpy as np
# 创建特征数据
features = pd.Series([1.1, np.nan, 1.3, np.nan, 1.5, 1.6, np.nan], index=[1, 2, 3, 4, 5, 6, 7])
print("原始特征数据：")
print(features)
# 使用线性插值填补缺失值
filled_features = features.interpolate(method='linear')
print("填补后的特征数据：")
print(filled_features, end='\n\n')

#221-6、数据可视化
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib
# 配置字体，确保中文字符正常显示
matplotlib.rcParams['font.sans-serif'] = ['Microsoft YaHei']
# 创建数据
data_for_plot = pd.Series([10, np.nan, 12, 13, np.nan, 15, 16], index=[1, 2, 3, 4, 5, 6, 7])
# 插值
interpolated_data_for_plot = data_for_plot.interpolate(method='linear')
# 可视化
plt.figure(figsize=(10, 6))
plt.plot(data_for_plot, 'o', label='原始数据')
plt.plot(interpolated_data_for_plot, '-', label='插值数据')
plt.title('数据插值示例')
plt.xlabel('索引')
plt.ylabel('值')
plt.legend()
plt.show()

221-6-3、结果输出

python 复制代码

# 221、pandas.Series.interpolate方法
#221-1、时间序列数据
# 原始数据：
# 2024-01-01    20.0
# 2024-01-02     NaN
# 2024-01-03    22.0
# 2024-01-04     NaN
# 2024-01-05    23.0
# 2024-01-06    24.0
# 2024-01-07     NaN
# 2024-01-08    25.0
# 2024-01-09    26.0
# 2024-01-10    27.0
# Freq: D, dtype: float64
# 插值后的数据：
# 2024-01-01    20.0
# 2024-01-02    21.0
# 2024-01-03    22.0
# 2024-01-04    22.5
# 2024-01-05    23.0
# 2024-01-06    24.0
# 2024-01-07    24.5
# 2024-01-08    25.0
# 2024-01-09    26.0
# 2024-01-10    27.0
# Freq: D, dtype: float64

#221-2、传感器数据
# 原始传感器数据：
# 2024-01-01 00:00:00    1.2
# 2024-01-01 01:00:00    NaN
# 2024-01-01 02:00:00    1.5
# 2024-01-01 03:00:00    NaN
# 2024-01-01 04:00:00    1.7
# 2024-01-01 05:00:00    1.8
# 2024-01-01 06:00:00    NaN
# 2024-01-01 07:00:00    2.0
# 2024-01-01 08:00:00    2.1
# 2024-01-01 09:00:00    2.2
# Freq: h, dtype: float64
# 插值后的传感器数据：
# 2024-01-01 00:00:00    1.20
# 2024-01-01 01:00:00    1.35
# 2024-01-01 02:00:00    1.50
# 2024-01-01 03:00:00    1.60
# 2024-01-01 04:00:00    1.70
# 2024-01-01 05:00:00    1.80
# 2024-01-01 06:00:00    1.90
# 2024-01-01 07:00:00    2.00
# 2024-01-01 08:00:00    2.10
# 2024-01-01 09:00:00    2.20
# Freq: h, dtype: float64

#221-3、实验数据
# 原始实验数据：
# 1    5.2
# 2    NaN
# 3    5.8
# 4    NaN
# 5    6.3
# 6    6.5
# 7    NaN
# dtype: float64
# 插值后的实验数据：
# 1    5.200000
# 2    5.511765
# 3    5.800000
# 4    6.064706
# 5    6.300000
# 6    6.500000
# 7         NaN
# dtype: float64

#221-4、数据清理
# 原始数据集：
# 1    10.0
# 2     NaN
# 3     NaN
# 4    12.0
# 5    13.0
# 6     NaN
# 7    15.0
# dtype: float64
# 清理后的数据集：
# 1    10.000000
# 2    10.666667
# 3    11.333333
# 4    12.000000
# 5    13.000000
# 6    14.000000
# 7    15.000000
# dtype: float64

#221-5、模型训练
# 原始特征数据：
# 1    1.1
# 2    NaN
# 3    1.3
# 4    NaN
# 5    1.5
# 6    1.6
# 7    NaN
# dtype: float64
# 填补后的特征数据：
# 1    1.1
# 2    1.2
# 3    1.3
# 4    1.4
# 5    1.5
# 6    1.6
# 7    1.6
# dtype: float64

#221-6、数据可视化
# 见图1

图1：

222、pandas.Series.isna方法

222-1、语法

python 复制代码

# 222、pandas.Series.isna方法
pandas.Series.isna()
Detect missing values.

Return a boolean same-sized object indicating if the values are NA. NA values, such as None or numpy.NaN, gets mapped to True values. Everything else gets mapped to False values. Characters such as empty strings '' or numpy.inf are not considered NA values (unless you set pandas.options.mode.use_inf_as_na = True).

Returns:
Series
Mask of bool values for each element in Series that indicates whether an element is an NA value.

222-2、参数

无

222-3、功能

用于检测Series中的缺失值(NaN或None)。

222-4、返回值

返回一个布尔类型的Series，其中缺失值的位置标记为True，非缺失值的位置标记为False。

222-5、说明

无

222-6、用法

222-6-1、数据准备

python 复制代码

无

222-6-2、代码示例

python 复制代码

# 222、pandas.Series.isna方法
import pandas as pd
import numpy as np
# 创建一个包含缺失值的Series
data = pd.Series([1, 2, np.nan, 4, None, 6])
print("原始数据：")
print(data)
# 检测缺失值
na_mask = data.isna()
print("缺失值掩码：")
print(na_mask, end='\n\n')

222-6-3、结果输出

python 复制代码

# 222、pandas.Series.isna方法
# 原始数据：
# 0    1.0
# 1    2.0
# 2    NaN
# 3    4.0
# 4    NaN
# 5    6.0
# dtype: float64
# 缺失值掩码：
# 0    False
# 1    False
# 2     True
# 3    False
# 4     True
# 5    False
# dtype: bool

223、pandas.Series.isnull方法

223-1、语法

python 复制代码

# 223、pandas.Series.isnull方法
pandas.Series.isnull()
Series.isnull is an alias for Series.isna.

Detect missing values.

Return a boolean same-sized object indicating if the values are NA. NA values, such as None or numpy.NaN, gets mapped to True values. Everything else gets mapped to False values. Characters such as empty strings '' or numpy.inf are not considered NA values (unless you set pandas.options.mode.use_inf_as_na = True).

Returns:
Series
Mask of bool values for each element in Series that indicates whether an element is an NA value.

223-2、参数

无

223-3、功能

用于检测Series中的缺失值(NaN或None)。

223-4、返回值

返回一个布尔类型的Series，其中缺失值的位置标记为True，非缺失值的位置标记为False。

223-5、说明

与pandas.Series.isna方法的功能相同。

223-6、用法

223-6-1、数据准备

python 复制代码

无

223-6-2、代码示例

python 复制代码

# 223、pandas.Series.isnull方法
import pandas as pd
import numpy as np
# 创建一个包含缺失值的Series
data = pd.Series([1, 2, np.nan, 4, None, 6])
print("原始数据：")
print(data)
# 检测缺失值
null_mask = data.isnull()
print("缺失值掩码：")

223-6-3、结果输出

python 复制代码

# 223、pandas.Series.isnull方法
# 原始数据：
# 0    1.0
# 1    2.0
# 2    NaN
# 3    4.0
# 4    NaN
# 5    6.0
# dtype: float64
# 缺失值掩码：
# 0    False
# 1    False
# 2     True
# 3    False
# 4     True
# 5    False
# dtype: bool

224、pandas.Series.notna方法

224-1、语法

python 复制代码

# 224、pandas.Series.notna方法
pandas.Series.notna()
Detect existing (non-missing) values.

Return a boolean same-sized object indicating if the values are not NA. Non-missing values get mapped to True. Characters such as empty strings '' or numpy.inf are not considered NA values (unless you set pandas.options.mode.use_inf_as_na = True). NA values, such as None or numpy.NaN, get mapped to False values.

Returns:
Series
Mask of bool values for each element in Series that indicates whether an element is not an NA value.

224-2、参数

无

224-3、功能

用于检测Series中的非缺失值(即不是NaN或None的值)。

224-4、返回值

返回一个布尔类型的Series，其中非缺失值的位置标记为True，缺失值的位置标记为False。

224-5、说明

无

224-6、用法

224-6-1、数据准备

python 复制代码

无

224-6-2、代码示例

python 复制代码

# 224、pandas.Series.notna方法
import pandas as pd
import numpy as np
# 创建一个包含缺失值的Series
data = pd.Series([1, 2, np.nan, 4, None, 6])
print("原始数据：")
print(data)
# 检测非缺失值
not_na_mask = data.notna()
print("非缺失值掩码：")
print(not_na_mask, end='\n\n')

224-6-3、结果输出

python 复制代码

# 224、pandas.Series.notna方法
# 原始数据：
# 0    1.0
# 1    2.0
# 2    NaN
# 3    4.0
# 4    NaN
# 5    6.0
# dtype: float64
# 非缺失值掩码：
# 0     True
# 1     True
# 2    False
# 3     True
# 4    False
# 5     True
# dtype: bool

225、pandas.Series.notnull方法

225-1、语法

python 复制代码

# 225、pandas.Series.notnull方法
pandas.Series.notnull()
Series.notnull is an alias for Series.notna.

Detect existing (non-missing) values.

Return a boolean same-sized object indicating if the values are not NA. Non-missing values get mapped to True. Characters such as empty strings '' or numpy.inf are not considered NA values (unless you set pandas.options.mode.use_inf_as_na = True). NA values, such as None or numpy.NaN, get mapped to False values.

Returns:
Series
Mask of bool values for each element in Series that indicates whether an element is not an NA value.

225-2、参数

无

225-3、功能

用于检测Series中的非缺失值(即不是NaN或None的值)。

225-4、返回值

返回一个布尔类型的Series，其中非缺失值的位置标记为True，缺失值的位置标记为False。

225-5、说明

与**pandas.Series.notna()**方法的功能相同。

225-6、用法

225-6-1、数据准备

python 复制代码

无

225-6-2、代码示例

python 复制代码

# 225、pandas.Series.notnull方法
import pandas as pd
import numpy as np
# 创建一个包含缺失值的Series
data = pd.Series([1, 2, np.nan, 4, None, 6])
print("原始数据：")
print(data)
# 检测非缺失值
not_null_mask = data.notnull()
print("非缺失值掩码：")
print(not_null_mask, end='\n\n')

225-6-3、结果输出

python 复制代码

# 225、pandas.Series.notnull方法
# 原始数据：
# 0    1.0
# 1    2.0
# 2    NaN
# 3    4.0
# 4    NaN
# 5    6.0
# dtype: float64
# 非缺失值掩码：
# 0     True
# 1     True
# 2    False
# 3     True
# 4    False
# 5     True
# dtype: bool