Python酷库之旅-第三方库Pandas(131)

# 586、pandas.DataFrame.to_timestamp方法
pandas.DataFrame.to_timestamp(freq=None, how='start', axis=0, copy=None)
Cast to DatetimeIndex of timestamps, at beginning of period.

Parameters:
freqstr, default frequency of PeriodIndex
Desired frequency.

how{'s', 'e', 'start', 'end'}
Convention for converting period to timestamp; start of period vs. end.

axis{0 or 'index', 1 or 'columns'}, default 0
The axis to convert (the index by default).

copybool, default True
If False then underlying input data is not copied.

Note

The copy keyword will change behavior in pandas 3.0. Copy-on-Write will be enabled by default, which means that all methods with a copy keyword will use a lazy copy mechanism to defer the copy and ignore the copy keyword. The copy keyword will be removed in a future version of pandas.

You can already get the future behavior and improvements through enabling copy on write pd.options.mode.copy_on_write = True

Returns:
DataFrame
The DataFrame has a DatetimeIndex.

586-2、参数

586-2-1、freq**(可选，默认值为None)****：**字符串或DateOffset，用来指定新索引的频率，若不指定，使用当前频率，如果是PeriodIndex，将被转换成DatetimeIndex。

586-2-2、how**(可选，默认值为'start')****：**指定如何对齐时间戳，

**'start'：**对齐到时间段的开始。
**'end'：**对齐到时间段的结束。

586-2-3、axis**(可选，默认值为0)****：**整数或字符串，指定转换哪个轴上的索引，

**0或'index'：**对行索引进行转换。
**1或'columns'：**对列索引进行转换。

586-2-4、copy**(可选，默认值为None)****：**布尔值，是否复制数据，通常默认是True，除非需要在原地进行修改。

586-3、功能

将DataFrame或Series的索引从其他时间类型(如Period)转换为时间戳类型(Timestamp)。

586-4、返回值

返回一个新的DataFrame或Series，其索引或列索引被转换为时间戳(DatetimeIndex)。

586-5、说明

无

586-6、用法

586-6-1、数据准备

python 复制代码

无

586-6-2、代码示例

python 复制代码

# 586、pandas.DataFrame.to_timestamp方法
import pandas as pd
# 创建一个带有PeriodIndex的DataFrame
periods = pd.period_range('2024-01', '2024-12', freq='M')
df = pd.DataFrame({'values': range(len(periods))}, index=periods)
# 将索引转换为时间戳
df_timestamped = df.to_timestamp()
print(df_timestamped)
# 以频率'D'进行索引转换
df_timestamped_daily = df.to_timestamp(freq='D')
print(df_timestamped_daily)

586-6-3、结果输出

python 复制代码

# 586、pandas.DataFrame.to_timestamp方法
#             values
# 2024-01-01       0
# 2024-02-01       1
# 2024-03-01       2
# 2024-04-01       3
# 2024-05-01       4
# 2024-06-01       5
# 2024-07-01       6
# 2024-08-01       7
# 2024-09-01       8
# 2024-10-01       9
# 2024-11-01      10
# 2024-12-01      11
#             values
# 2024-01-01       0
# 2024-02-01       1
# 2024-03-01       2
# 2024-04-01       3
# 2024-05-01       4
# 2024-06-01       5
# 2024-07-01       6
# 2024-08-01       7
# 2024-09-01       8
# 2024-10-01       9
# 2024-11-01      10
# 2024-12-01      11

587、pandas.DataFrame.tz_convert方法

587-1、语法

python 复制代码

# 587、pandas.DataFrame.tz_convert方法
pandas.DataFrame.tz_convert(tz, axis=0, level=None, copy=None)
Convert tz-aware axis to target time zone.

Parameters:
tzstr or tzinfo object or None
Target time zone. Passing None will convert to UTC and remove the timezone information.

axis{0 or 'index', 1 or 'columns'}, default 0
The axis to convert

levelint, str, default None
If axis is a MultiIndex, convert a specific level. Otherwise must be None.

copybool, default True
Also make a copy of the underlying data.

Note

The copy keyword will change behavior in pandas 3.0. Copy-on-Write will be enabled by default, which means that all methods with a copy keyword will use a lazy copy mechanism to defer the copy and ignore the copy keyword. The copy keyword will be removed in a future version of pandas.

You can already get the future behavior and improvements through enabling copy on write pd.options.mode.copy_on_write = True

Returns:
Series/DataFrame
Object with time zone converted axis.

Raises:
TypeError
If the axis is tz-naive.

587-2、参数

587-2-1、tz**(必须)****：**字符串或None，表示目标时区，目标时区可以是时区名称(如'UTC'、'Europe/London'等)或None，如果传入None，则会将时区感知型DatetimeIndex转换为无时区型DatetimeIndex(即将时区信息移除)。

587-2-2、axis**(可选，默认值为0)****：**整数或字符串，指定转换哪个轴上的时区，

0或'index'： 对行索引中的DatetimeIndex进行时区转换。
1或'columns'： 对列索引中的DatetimeIndex进行时区转换。

587-2-3、level**(可选，默认值为None)****：**整数、字符串或None，用于在多层索引(MultiIndex)时指定要转换时区的层，如果索引是MultiIndex，且某一层是DatetimeIndex类型，则可以使用level指定具体要转换的层。

587-2-4、copy**(可选，默认值为None)****：**布尔值，是否返回数据的副本，默认True表示创建数据副本；如果设置为False，则会尝试在原地进行修改(如果可能)。

587-3、功能

将DatetimeIndex的时区从当前时区转换为指定的时区。

587-4、返回值

返回一个新的DataFrame或Series，其中DatetimeIndex的时区已经被转换为指定时区，原数据保持不变，除非设置copy=False。

587-5、说明

无

587-6、用法

587-6-1、数据准备

python 复制代码

无

587-6-2、代码示例

python 复制代码

# 587、pandas.DataFrame.tz_convert方法
import pandas as pd
# 创建一个带有时区的DatetimeIndex DataFrame
rng = pd.date_range('2024-01-01', periods=5, freq='h', tz='UTC')
df = pd.DataFrame({'values': range(len(rng))}, index=rng)
# 转换时区到'Europe/London'
df_london = df.tz_convert('Europe/London')
print(df_london)
# 转换时区到'America/New_York'
df_ny = df.tz_convert('America/New_York')
print(df_ny)

587-6-3、结果输出

python 复制代码

# 587、pandas.DataFrame.tz_convert方法
#                            values
# 2024-01-01 00:00:00+00:00       0
# 2024-01-01 01:00:00+00:00       1
# 2024-01-01 02:00:00+00:00       2
# 2024-01-01 03:00:00+00:00       3
# 2024-01-01 04:00:00+00:00       4
#                            values
# 2023-12-31 19:00:00-05:00       0
# 2023-12-31 20:00:00-05:00       1
# 2023-12-31 21:00:00-05:00       2
# 2023-12-31 22:00:00-05:00       3
# 2023-12-31 23:00:00-05:00       4

588、pandas.DataFrame.tz_localize方法

588-1、语法

python 复制代码

# 588、pandas.DataFrame.tz_localize方法
pandas.DataFrame.tz_localize(tz, axis=0, level=None, copy=None, ambiguous='raise', nonexistent='raise')
Localize tz-naive index of a Series or DataFrame to target time zone.

This operation localizes the Index. To localize the values in a timezone-naive Series, use Series.dt.tz_localize().

Parameters:
tzstr or tzinfo or None
Time zone to localize. Passing None will remove the time zone information and preserve local time.

axis{0 or 'index', 1 or 'columns'}, default 0
The axis to localize

levelint, str, default None
If axis ia a MultiIndex, localize a specific level. Otherwise must be None.

copybool, default True
Also make a copy of the underlying data.

Note

The copy keyword will change behavior in pandas 3.0. Copy-on-Write will be enabled by default, which means that all methods with a copy keyword will use a lazy copy mechanism to defer the copy and ignore the copy keyword. The copy keyword will be removed in a future version of pandas.

You can already get the future behavior and improvements through enabling copy on write pd.options.mode.copy_on_write = True

ambiguous'infer', bool-ndarray, 'NaT', default 'raise'
When clocks moved backward due to DST, ambiguous times may arise. For example in Central European Time (UTC+01), when going from 03:00 DST to 02:00 non-DST, 02:30:00 local time occurs both at 00:30:00 UTC and at 01:30:00 UTC. In such a situation, the ambiguous parameter dictates how ambiguous times should be handled.

'infer' will attempt to infer fall dst-transition hours based on order

bool-ndarray where True signifies a DST time, False designates a non-DST time (note that this flag is only applicable for ambiguous times)

'NaT' will return NaT where there are ambiguous times

'raise' will raise an AmbiguousTimeError if there are ambiguous times.

nonexistentstr, default 'raise'
A nonexistent time does not exist in a particular timezone where clocks moved forward due to DST. Valid values are:

'shift_forward' will shift the nonexistent time forward to the closest existing time

'shift_backward' will shift the nonexistent time backward to the closest existing time

'NaT' will return NaT where there are nonexistent times

timedelta objects will shift nonexistent times by the timedelta

'raise' will raise an NonExistentTimeError if there are nonexistent times.

Returns:
Series/DataFrame
Same type as the input.

Raises:
TypeError
If the TimeSeries is tz-aware and tz is not None.

588-2、参数

588-2-1、tz**(必须)****：**str或pytz.timezone，表示要设置的时区名称或对象。例如'UTC'、'Europe/Berlin'、'Asia/Shanghai'等，如果传入None，表示将移除时区信息(将时间序列转换为无时区的时间)。

588-2-2、axis**(可选，默认值为0)****：**{0 or 'index', 1 or 'columns'}，指定要在哪个轴上进行时区本地化操作：

**0或'index'：**表示对索引进行操作。
**1或'columns'：**表示对列进行操作。

588-2-3、level**(可选，默认值为None)****：**整数或字符串，如果索引是MultiIndex，可以指定要操作的层级，当索引为多级索引时，这个参数用来指定哪一层的DatetimeIndex进行时区转换。

588-2-4、copy**(可选，默认值为None)****：**布尔值，是否返回副本，若为True，返回一个新的DataFrame副本；若为False，尽量就地修改而不复制数据。

588-2-5、ambiguous**(可选，默认值为'raise')****：**{'raise', 'NaT', bool array-like, 'infer'}，当时区转换过程中遇到重复时间(例如因夏令时导致的时间重复)时，如何处理。

**'raise'：**抛出错误。
**'NaT'：**将此类时间设为NaT。
**bool array-like：**指定哪些时间为夏令时。
**'infer'：**自动推断重复的时间是否为夏令时。

588-2-6、nonexistent**(可选，默认值为'raise')****：**{'raise', 'NaT', 'shift_forward', 'shift_backward'}，当遇到不存在的时间(例如因夏令时跳过的时间)时，如何处理。

**'raise'：**抛出错误。
**'NaT'：**将不存在的时间设为NaT。
**'shift_forward'：**将不存在的时间向前平移到下一个合法时间。
**'shift_backward'：**将不存在的时间向后平移到上一个合法时间。

588-3、功能

用于将无时区的DatetimeIndex或时间戳列赋予一个特定的时区，如果已经有时区信息，则会报错，主要用于时间序列数据的时区本地化处理，常见于处理全球跨时区的数据集。

588-4、返回值

返回一个新的DataFrame或Series，其中指定的时间列或索引已经被本地化到指定时区。

588-5、说明

无

588-6、用法

588-6-1、数据准备

python 复制代码

无

588-6-2、代码示例

python 复制代码

# 588、pandas.DataFrame.tz_localize方法
import pandas as pd
# 创建没有时区的时间索引
df = pd.DataFrame({
    'dates': pd.date_range('2024-01-01', periods=3, freq='D')
})
# 将日期列设为索引，并本地化到UTC时区
df = df.set_index('dates')
df = df.tz_localize('UTC')
print(df, end='\n\n')
# 创建跨越夏令时变化的时间序列
df = pd.DataFrame({
    'dates': pd.date_range('2024-10-29 01:30', periods=3, freq='h')
})
# 处理歧义时间
df = df.set_index('dates')
df = df.tz_localize('Europe/Berlin', ambiguous='infer')
print(df)

588-6-3、结果输出

python 复制代码

# 588、pandas.DataFrame.tz_localize方法
# Empty DataFrame
# Columns: []
# Index: [2024-01-01 00:00:00+00:00, 2024-01-02 00:00:00+00:00, 2024-01-03 00:00:00+00:00]
# 
# Empty DataFrame
# Columns: []
# Index: [2024-10-29 01:30:00+01:00, 2024-10-29 02:30:00+01:00, 2024-10-29 03:30:00+01:00]

589、pandas.DataFrame.attrs属性

589-1、语法

python 复制代码

# 589、pandas.DataFrame.attrs属性
property DataFrame.attrs
Dictionary of global attributes of this dataset.

Warning

attrs is experimental and may change without warning.

See also

DataFrame.flags
Global flags applying to this object.

Notes

Many operations that create new datasets will copy attrs. Copies are always deep so that changing attrs will only affect the present dataset. pandas.concat copies attrs only if all input datasets have the same attrs.

589-2、参数

无

589-3、功能

提供了一个字典接口，使您可以将键值对存储在DataFrame对象中。

589-4、返回值

返回一个字典对象(dict)，其中包含与数据帧相关的所有键值对。

589-5、说明

无

589-6、用法

589-6-1、数据准备

python 复制代码

无

589-6-2、代码示例

python 复制代码

# 589、pandas.DataFrame.attrs属性
import pandas as pd
# 创建一个示例DataFrame
df = pd.DataFrame({
    'A': [1, 2, 3],
    'B': [4, 5, 6]
})
# 初始情况下的attrs
print(df.attrs)  
# 设置元数据
df.attrs['description'] = '这是一个示例数据帧'
df.attrs['created_by'] = '数据科学团队'
df.attrs['creation_date'] = '2024-08-05'
# 查看设置后的attrs
print(df.attrs)

589-6-3、结果输出

python 复制代码

# 589、pandas.DataFrame.attrs属性
# {}
# {'description': '这是一个示例数据帧', 'created_by': '数据科学团队', 'creation_date': '2024-08-05'}

590、pandas.Flags类

590-1、语法

python 复制代码

# 590、pandas.Flags类
class pandas.Flags(obj, *, allows_duplicate_labels)
Flags that apply to pandas objects.
Parameters:
objSeries or DataFrame
The object these flags are associated with.
allows_duplicate_labelsbool, default True
Whether to allow duplicate labels in this object. By default, duplicate labels are permitted. Setting this to False will cause an errors.DuplicateLabelError to be raised when index (or columns for DataFrame) is not unique, or any subsequent operation on introduces duplicates. See Disallowing Duplicate Labels for more.
Warning
This is an experimental feature. Currently, many methods fail to propagate the allows_duplicate_labels value. In future versions it is expected that every method taking or returning one or more DataFrame or Series objects will propagate allows_duplicate_labels.

590-2、参数

590-2-1、obj**(必修)****：**pandas对象(Series, DataFrame等)，表示要设置标志的pandas对象。

590-2-2、allows_duplicate_labels**(必须)****：**布尔值，指定是否允许对象有重复的标签(如列名或索引标签)，如果设置为True，对象允许出现重复标签；如果设置为False，则不允许重复标签。

590-3、功能

提供一种机制来设置和获取对象的标志，这些标志用于控制对象行为的特定属性。例如，处理重复标签的能力，目前特别重要的一个标志是allows_duplicate_labels。

590-4、返回值

返回值是一个包含对象标志的实例，即一个Flags对象，通过该对象可以访问和修改数据结构的特定属性，如allows_duplicate_labels。

590-5、说明

无

590-6、用法

590-6-1、数据准备

python 复制代码

无

590-6-2、代码示例

python 复制代码

# 590、pandas.Flags类
import pandas as pd
# 创建一个DataFrame
df = pd.DataFrame({
    'A': [1, 2, 3],
    'B': [4, 5, 6]
})
# 查看初始标志（默认情况下allows_duplicate_labels为True）
print(df.flags.allows_duplicate_labels)
# 设置allows_duplicate_labels标志为False
df.flags.allows_duplicate_labels = False
# 验证设置是否生效
print(df.flags.allows_duplicate_labels)
# 创建另一个带有重复列名的DataFrame
df_with_duplicates = pd.DataFrame({
    'A': [1, 2, 3],
    'A': [4, 5, 6]  # 重复的列名
})
# 默认情况下允许重复标签
print(df_with_duplicates.flags.allows_duplicate_labels)
# 强制禁用重复标签，尝试创建该DataFrame会报错
df_with_duplicates.flags.allows_duplicate_labels = False
try:
    df_with_duplicates = pd.DataFrame({
        'A': [1, 2, 3],
        'A': [4, 5, 6]  # 重复的列名
    })
except ValueError as e:
    print(f"Error: {e}")

590-6-3、结果输出

python 复制代码

# 590、pandas.Flags类
# True
# False
# True