Python酷库之旅-第三方库Pandas(123)

# 546、pandas.DataFrame.ffill方法
pandas.DataFrame.ffill(*, axis=None, inplace=False, limit=None, limit_area=None, downcast=_NoDefault.no_default)
Fill NA/NaN values by propagating the last valid observation to next valid.

Parameters:
axis{0 or 'index'} for Series, {0 or 'index', 1 or 'columns'} for DataFrame
Axis along which to fill missing values. For Series this parameter is unused and defaults to 0.

inplacebool, default False
If True, fill in-place. Note: this will modify any other views on this object (e.g., a no-copy slice for a column in a DataFrame).

limitint, default None
If method is specified, this is the maximum number of consecutive NaN values to forward/backward fill. In other words, if there is a gap with more than this number of consecutive NaNs, it will only be partially filled. If method is not specified, this is the maximum number of entries along the entire axis where NaNs will be filled. Must be greater than 0 if not None.

limit_area{None, 'inside', 'outside'}, default None
If limit is specified, consecutive NaNs will be filled with this restriction.

None: No fill restriction.

'inside': Only fill NaNs surrounded by valid values (interpolate).

'outside': Only fill NaNs outside valid values (extrapolate).

New in version 2.2.0.

downcastdict, default is None
A dict of item->dtype of what to downcast if possible, or the string 'infer' which will try to downcast to an appropriate equal type (e.g. float64 to int64 if possible).

Deprecated since version 2.2.0.

Returns:
Series/DataFrame or None
Object with missing values filled or None if inplace=True.

546-2、参数

546-2-1、axis**(可选，默认值为None)****：**{0或'index'，1或'columns'}，确定填充操作的方向，0或'index'表示沿着行(向下填充)，1或'columns'表示沿着列(向右填充)，如果为None，则会根据轴的方向自动选择。

546-2-2、inplace**(可选，默认值为False)****：**布尔值，是否在原地修改DataFrame，如果为True，操作将在原始DataFrame上进行，而不会返回新的DataFrame；如果为False，则返回一个新的DataFrame，原始DataFrame不变。

546-2-3、limit**(可选，默认值为None)****：**整数，指定最大填充数量，填充过程将限制为最多填充limit个缺失值。

546-2-4、limit_area**(可选，默认值为None)****：**None或类似于DataFrame的对象，指定一个区域，该区域内的缺失值才会被填充，如果指定，将仅在这个区域内执行前向填充。

546-2-5、downcast**(可选)****：**{'int', 'float', 'string', 'boolean'}或None，指定数据类型的向下转型，若指定此参数，则会尝试将数据转换为更小的数据类型，前提是数据类型允许。

546-3、功能

用前一个有效值填充缺失值，在许多数据处理和分析应用中，缺失值是常见的问题，前向填充可以帮助将数据完整化，便于后续分析。

546-4、返回值

返回值是一个填充后的DataFrame，如果inplace=True，则返回值为None，原始DataFrame被直接修改。

546-5、说明

无

546-6、用法

546-6-1、数据准备

python 复制代码

无

546-6-2、代码示例

python 复制代码

# 546、pandas.DataFrame.ffill方法
import pandas as pd
import numpy as np
# 创建一个示例DataFrame
df = pd.DataFrame({
    'A': [1, np.nan, 3],
    'B': [np.nan, 2, np.nan],
    'C': [1, 2, 3]
})
# 使用前向填充
filled_df = df.ffill()
print(filled_df)

546-6-3、结果输出

python 复制代码

# 546、pandas.DataFrame.ffill方法
#      A    B  C
# 0  1.0  NaN  1
# 1  1.0  2.0  2
# 2  3.0  2.0  3

547、pandas.DataFrame.fillna方法

547-1、语法

python 复制代码

# 547、pandas.DataFrame.fillna方法
pandas.DataFrame.fillna(value=None, *, method=None, axis=None, inplace=False, limit=None, downcast=_NoDefault.no_default)
Fill NA/NaN values using the specified method.

Parameters:
valuescalar, dict, Series, or DataFrame
Value to use to fill holes (e.g. 0), alternately a dict/Series/DataFrame of values specifying which value to use for each index (for a Series) or column (for a DataFrame). Values not in the dict/Series/DataFrame will not be filled. This value cannot be a list.

method{'backfill', 'bfill', 'ffill', None}, default None
Method to use for filling holes in reindexed Series:

ffill: propagate last valid observation forward to next valid.

backfill / bfill: use next valid observation to fill gap.

Deprecated since version 2.1.0: Use ffill or bfill instead.

axis{0 or 'index'} for Series, {0 or 'index', 1 or 'columns'} for DataFrame
Axis along which to fill missing values. For Series this parameter is unused and defaults to 0.

inplacebool, default False
If True, fill in-place. Note: this will modify any other views on this object (e.g., a no-copy slice for a column in a DataFrame).

limitint, default None
If method is specified, this is the maximum number of consecutive NaN values to forward/backward fill. In other words, if there is a gap with more than this number of consecutive NaNs, it will only be partially filled. If method is not specified, this is the maximum number of entries along the entire axis where NaNs will be filled. Must be greater than 0 if not None.

downcastdict, default is None
A dict of item->dtype of what to downcast if possible, or the string 'infer' which will try to downcast to an appropriate equal type (e.g. float64 to int64 if possible).

Deprecated since version 2.2.0.

Returns:
Series/DataFrame or None
Object with missing values filled or None if inplace=True.

547-2、参数

547-2-1、value**(可选，默认值为None)****：**scalar, dict, Series或 DataFrame，指定填充缺失值的值，可以是单个值、一组值(字典或Series)或另一个DataFrame，如果为None，则需要同时指定method。

547-2-2、method**(可选，默认值为None)****：**{'backfill'，'bfill'，'pad'，'ffill'}，用于指定填充缺失值的方法：

**pad或ffill：**前向填充，使用前一个有效值填充。
**backfill或bfill：**后向填充，使用后一个有效值填充。

547-2-3、axis**(可选，默认值为None)****：**{0或'index'，1或'columns'}，确定填充操作的方向，0或'index'表示沿着行(纵向填充)，1或'columns'表示沿着列(横向填充)，如果为None，则根据数据的形状自动选择。

547-2-4、inplace**(可选，默认值为False)****：**布尔值，是否在原地修改DataFrame，如果为True，填充将在原始DataFrame上完成，并返回None；如果为False，则返回一个新的DataFrame，原始DataFrame保持不变。

547-2-5、limit**(可选，默认值为None)****：**整数，指定在填充操作中最多填充的缺失值数量，这适用于前向或后向填充方法。

547-2-6、downcast**(可选)****：**{'int'，'float'，'string'，'boolean'}或None，指定数据类型的向下转型，若是否将填充后的数据转换为更小的数据类型，前提是数据类型允许。

547-3、功能

用指定的值或方法替代缺失值，在数据处理中，缺失值常常需要被合理填充，以便进一步分析和建模。

547-4、返回值

返回值是一个填充后的DataFrame，如果inplace=True，则返回值为None，原始DataFrame会被直接修改。

547-5、说明

无

547-6、用法

547-6-1、数据准备

python 复制代码

无

547-6-2、代码示例

python 复制代码

# 547、pandas.DataFrame.fillna方法
import pandas as pd
import numpy as np
# 创建一个示例DataFrame
df = pd.DataFrame({
    'A': [1, np.nan, 3],
    'B': [np.nan, 2, np.nan],
    'C': [1, 2, 3]
})
# 使用填充指定值
filled_df1 = df.fillna(value=0)
# 使用前向填充
filled_df2 = df.fillna(method='ffill')
print("使用指定值填充：")
print(filled_df1)
print("\n使用前向填充：")
print(filled_df2)

547-6-3、结果输出

python 复制代码

# 547、pandas.DataFrame.fillna方法
# 使用指定值填充：
#      A    B  C
# 0  1.0  0.0  1
# 1  0.0  2.0  2
# 2  3.0  0.0  3
# 
# 使用前向填充：
#      A    B  C
# 0  1.0  NaN  1
# 1  1.0  2.0  2
# 2  3.0  2.0  3

548、pandas.DataFrame.interpolate方法

548-1、语法

python 复制代码

# 548、pandas.DataFrame.interpolate方法
pandas.DataFrame.interpolate(method='linear', *, axis=0, limit=None, inplace=False, limit_direction=None, limit_area=None, downcast=_NoDefault.no_default, **kwargs)
Fill NaN values using an interpolation method.

Please note that only method='linear' is supported for DataFrame/Series with a MultiIndex.

Parameters:
methodstr, default 'linear'
Interpolation technique to use. One of:

'linear': Ignore the index and treat the values as equally spaced. This is the only method supported on MultiIndexes.

'time': Works on daily and higher resolution data to interpolate given length of interval.

'index', 'values': use the actual numerical values of the index.

'pad': Fill in NaNs using existing values.

'nearest', 'zero', 'slinear', 'quadratic', 'cubic', 'barycentric', 'polynomial': Passed to scipy.interpolate.interp1d, whereas 'spline' is passed to scipy.interpolate.UnivariateSpline. These methods use the numerical values of the index. Both 'polynomial' and 'spline' require that you also specify an order (int), e.g. df.interpolate(method='polynomial', order=5). Note that, slinear method in Pandas refers to the Scipy first order spline instead of Pandas first order spline.

'krogh', 'piecewise_polynomial', 'spline', 'pchip', 'akima', 'cubicspline': Wrappers around the SciPy interpolation methods of similar names. See Notes.

'from_derivatives': Refers to scipy.interpolate.BPoly.from_derivatives.

axis{{0 or 'index', 1 or 'columns', None}}, default None
Axis to interpolate along. For Series this parameter is unused and defaults to 0.

limitint, optional
Maximum number of consecutive NaNs to fill. Must be greater than 0.

inplacebool, default False
Update the data in place if possible.

limit_direction{{'forward', 'backward', 'both'}}, Optional
Consecutive NaNs will be filled in this direction.

If limit is specified:
If 'method' is 'pad' or 'ffill', 'limit_direction' must be 'forward'.

If 'method' is 'backfill' or 'bfill', 'limit_direction' must be 'backwards'.

If 'limit' is not specified:
If 'method' is 'backfill' or 'bfill', the default is 'backward'

else the default is 'forward'

raises ValueError if
limit_direction
is 'forward' or 'both' and
method is 'backfill' or 'bfill'.

raises ValueError if
limit_direction
is 'backward' or 'both' and
method is 'pad' or 'ffill'.

limit_area{{None, 'inside', 'outside'}}, default None
If limit is specified, consecutive NaNs will be filled with this restriction.

None: No fill restriction.

'inside': Only fill NaNs surrounded by valid values (interpolate).

'outside': Only fill NaNs outside valid values (extrapolate).

downcastoptional, 'infer' or None, defaults to None
Downcast dtypes if possible.

Deprecated since version 2.1.0.

``**kwargs``optional
Keyword arguments to pass on to the interpolating function.

Returns:
Series or DataFrame or None
Returns the same object type as the caller, interpolated at some or all NaN values or None if inplace=True.

548-2、参数

548-2-1、method**(可选，默认值为'linear')****：**字符串，指定插值的方法，常用的方法包括：

**'linear'：**线性插值(默认)。
**'time'：**时间序列插值，仅适用于索引为时间戳的情况下。
**'index'：**根据索引值进行插值。
其他插值方法如'nearest'、'polynomial'、'spline'等。

548-2-2、axis**(可选，默认值为0)****：**{0或'index'，1或'columns'}，指定插值操作的方向，0或'index'表示沿着行进行插值，1或'columns'表示沿着列进行插值。

548-2-3、limit**(可选，默认值为None)****：**整数，指定在插值操作中最多插值的缺失值数量，这可以限制插值的范围。

548-2-4、inplace**(可选，默认值为False)****：**布尔值，是否在原地修改DataFrame，如果为True，插值将在原始DataFrame上完成并返回None；如果为False，则返回一个新的DataFrame，原始DataFrame保持不变。

548-2-5、limit_direction**(可选，默认值为None)****：**{None, 'forward', 'backward'}，指定插值的方向，'forward'表示只执行向前填充，'backward'表示只执行向后填充，如果为None，默认为两者都可。

548-2-6、limit_area**(可选，默认值为None)****：**{None, 'inside', 'outside', 'both'}，指定插值的区域，'inside'表示仅在内侧插值，'outside'表示仅在外侧插值，'both'表示在两者范围内插值。

548-2-7、downcast**(可选)****：**{'int'，'float'，'string'，'boolean'}或None，指定数据类型的向下转型，若是否将插值后的数据转换为更小的数据类型，前提是数据类型允许。

548-2-8、**kwargs**(可选)****：**其他额外的关键字参数，为后续扩展功能做预留。

548-3、功能

填充缺失值，通过插值计算在已有数据点之间估算缺失值，这在处理时间序列数据或一般情况下的数据填充时非常有用，可以保持数据的连续性。

548-4、返回值

返回值是一个填充后的DataFrame，如果inplace=True，则返回值为None，原始DataFrame会被直接修改。

548-5、说明

无

548-6、用法

548-6-1、数据准备

python 复制代码

无

548-6-2、代码示例

python 复制代码

# 548、pandas.DataFrame.interpolate方法
import pandas as pd
import numpy as np
# 创建一个示例DataFrame
df = pd.DataFrame({
    'A': [1, np.nan, 3, np.nan, 5],
    'B': [np.nan, 2, 3, 4, 5]
})
# 使用线性插值填充缺失值
interpolated_df1 = df.interpolate(method='linear')
print("线性插值填充结果：")
print(interpolated_df1)

548-6-3、结果输出

python 复制代码

# 548、pandas.DataFrame.interpolate方法
# 线性插值填充结果：
#      A    B
# 0  1.0  NaN
# 1  2.0  2.0
# 2  3.0  3.0
# 3  4.0  4.0
# 4  5.0  5.0

549、pandas.DataFrame.isna方法

549-1、语法

python 复制代码

# 549、pandas.DataFrame.isna方法
pandas.DataFrame.isna()
Detect missing values.

Return a boolean same-sized object indicating if the values are NA. NA values, such as None or numpy.NaN, gets mapped to True values. Everything else gets mapped to False values. Characters such as empty strings '' or numpy.inf are not considered NA values (unless you set pandas.options.mode.use_inf_as_na = True).

Returns:
DataFrame
Mask of bool values for each element in DataFrame that indicates whether an element is an NA value.

549-2、参数

无

549-3、功能

返回一个布尔型的DataFrame，与原始DataFrame具有相同的形状，布尔值表示数据是否为缺失值，缺失值(NaN)会被标记为True，而非缺失值会被标记为False。

549-4、返回值

返回一个与原始DataFrame形状相同的布尔型DataFrame，如果某个单元格的值为缺失(如NaN)，对应的位置将为True，否则为False。

549-5、说明

无

549-6、用法

549-6-1、数据准备

python 复制代码

无

549-6-2、代码示例

python 复制代码

# 549、pandas.DataFrame.isna方法
import pandas as pd
import numpy as np
# 创建一个示例DataFrame
df = pd.DataFrame({
    'A': [1, np.nan, 3],
    'B': [4, 5, np.nan],
    'C': [np.nan, np.nan, 9]
})
# 检测缺失值
na_df = df.isna()
print("缺失值检测结果：")
print(na_df)

549-6-3、结果输出

python 复制代码

# 549、pandas.DataFrame.isna方法
# 缺失值检测结果：
#        A      B      C
# 0  False  False   True
# 1   True  False   True
# 2  False   True  False

550、pandas.DataFrame.isnull方法

550-1、语法

python 复制代码

# 550、pandas.DataFrame.isnull方法
pandas.DataFrame.isnull()
DataFrame.isnull is an alias for DataFrame.isna.

Detect missing values.

Return a boolean same-sized object indicating if the values are NA. NA values, such as None or numpy.NaN, gets mapped to True values. Everything else gets mapped to False values. Characters such as empty strings '' or numpy.inf are not considered NA values (unless you set pandas.options.mode.use_inf_as_na = True).

Returns:
DataFrame
Mask of bool values for each element in DataFrame that indicates whether an element is an NA value.

550-2、参数

无

550-3、功能

返回一个与原始DataFrame同样形状的布尔型DataFrame，其中每个单元格指示该位置的值是否为缺失值，缺失值(NaN)将被标记为True，非缺失值将被标记为 False。

550-4、返回值

返回一个布尔型DataFrame，形状与原始DataFrame相同。

550-5、说明

无

550-6、用法

550-6-1、数据准备

python 复制代码

无

550-6-2、代码示例

python 复制代码

# 550、pandas.DataFrame.isnull方法
import pandas as pd
import numpy as np
# 创建一个示例DataFrame
df = pd.DataFrame({
    'A': [1, np.nan, 3],
    'B': [4, 5, np.nan],
    'C': [np.nan, np.nan, 9]
})
# 检测缺失值
null_df = df.isnull()
print("缺失值检测结果：")
print(null_df)

550-6-3、结果输出

python 复制代码

# 550、pandas.DataFrame.isnull方法
# 缺失值检测结果：
#        A      B      C
# 0  False  False   True
# 1   True  False   True
# 2  False   True  False