Pandas教程之二十九: 使用 Pandas 处理日期和时间

Python | 使用 Pandas 处理日期和时间

在处理数据时,遇到时间序列数据是很常见的。在处理时间序列数据时,Pandas 是一个非常有用的工具。

++++Pandas++++提供了一组不同的工具,我们可以使用这些工具对日期时间数据执行所有必要的任务。让我们尝试通过下面讨论的示例来理解。

在 Pandas 中使用日期

Python 的 DateTime 模块中的日期类处理公历中的日期。它接受三个整数参数:年、月和日。

  • Python3

|------------------------------------------------------------------------------------------------------|
| from datetime import date d****=**** date(2000,9,17) print(d) print(type(d)) |

输出:

2000-09-17

<class 'datetime.date'>

提取年、月、日

从 Timestamp 对象中检索年、月、日部分。

  • Python3

|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| import pandas as pd # Creating a Timestamp object timestamp = pd.Timestamp('2023-10-04 15:30:00') # Extracting the year from the Timestamp year = timestamp.year # Printing the extracted year print(year) # Extracting the month from the Timestamp month = timestamp.month # Printing the extracted month print(month) # Extracting the day from the Timestamp day = timestamp.day # Printing the extracted day print(day) |

输出:

2023

10

4

工作日和季度

确定与时间戳相关的星期几和季度。

  • Python3

|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| # Extracting the hour from the Timestamp hour = timestamp.hour # Printing the extracted hour print(hour) # Extracting the minute from the Timestamp minute = timestamp.minute # Printing the extracted minute print(minute) # Extracting the weekday from the Timestamp weekday = timestamp.weekday() # Printing the extracted weekday print(weekday) # Extracting the quarter from the Timestamp quarter = timestamp.quarter # Printing the extracted quarter print(quarter) |

输出:

复制代码
15
30
2
4

在 Pandas 中使用时间

DateTime 模块中的另一个类称为 time,它返回一个 DateTime 对象并接受整数参数,时间间隔最高达微秒:

  • Python3

|---------------------------------------------------------------------------------------------------------|
| from datetime import time t = time(12,50,12,40) print(t) print(type(t)) |

输出:

12:50:12.000040

<class 'datetime.time'> 复制代码

时间段和日期偏移

创建自定义时间段和日期偏移,以实现灵活的日期操作。

  • Python3

|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| # Creating a time period object time_period = pd.Period('2023-10-04', freq****='M') # Extracting the year from the time period year = time_period.year # Printing the extracted year print(year) # Extracting the month from the time period month = time_period.month # Printing the extracted month print(month) # Extracting the quarter from the time period quarter = time_period.quarter # Printing the extracted quarter print(quarter) # Creating a date offset object date_offset = pd.DateOffset(years=**** 2, months****=**** 3, days****=****10) # Adding the date offset to a Timestamp new_timestamp = timestamp + date_offset # Printing the new Timestamp print(new_timestamp) |

输出:

2023

10

4

2026-01-14 15:30:00

处理时区

时区在日期和时间数据中起着至关重要的作用。Pandas 提供了有效处理时区的机制:

  • **UTC 和时区转换:**在 UTC(协调世界时)和当地时区之间转换。
  • **时区感知数据操作:**使用时区感知数据,确保准确的日期和时间解释。
  • **自定义时区设置:**为数据分析和可视化指定自定义时区设置。
  • Python3

|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| import pandas as pd # Creating a Timestamp object with a specific time zone timestamp = pd.Timestamp('2023-10-04 15:30:00', tz****='America/New_York') # Printing the Timestamp with its time zone print(timestamp) # Converting the Timestamp to UTC utc_timestamp = timestamp.utcfromtz('America/New_York') # Printing the UTC timestamp print(utc_timestamp) # Converting the UTC timestamp back to the original time zone original_timestamp = utc_timestamp.tz_localize('America/New_York') # Printing the original timestamp print(original_timestamp) # Creating a DatetimeIndex with a specific time zone datetime_index = pd.DatetimeIndex(['2023-10-04', '2023-10-11', '2023-10-18'], tz=****'Asia/Shanghai') # Printing the DatetimeIndex with its time zone print(datetime_index) # Converting the DatetimeIndex to UTC utc_datetime_index = datetime_index.utcfromtz('Asia/Shanghai') # Printing the UTC DatetimeIndex print(utc_datetime_index) # Converting the UTC DatetimeIndex back to the original time zone original_datetime_index = utc_datetime_index.tz_localize( 'Asia/Shanghai') # Printing the original DatetimeIndex print(original_datetime_index) |

输出:

复制代码
Original Timestamp: 2023-10-04 15:30:00-04:00
UTC Timestamp: 2023-10-04 19:30:00+00:00
Original Timestamp (Back to America/New_York): 2023-10-04 15:30:00-04:00
Original DatetimeIndex: DatetimeIndex(['2023-10-04 00:00:00+08:00', '2023-10-11 00:00:00+08:00',
               '2023-10-18 00:00:00+08:00'],
              dtype='datetime64[ns, Asia/Shanghai]', freq=None)
UTC DatetimeIndex: DatetimeIndex(['2023-10-03 16:00:00+00:00', '2023-10-10 16:00:00+00:00',
               '2023-10-17 16:00:00+00:00'],
              dtype='datetime64[ns, UTC]', freq=None)
Original DatetimeIndex (Back to Asia/Shanghai): DatetimeIndex(['2023-10-04 00:00:00+08:00', '2023-10-11 00:00:00+08:00',
               '2023-10-18 00:00:00+08:00'],
              dtype='datetime64[ns, Asia/Shanghai]', freq=None)

在 Pandas 中使用日期和时间

Pandas 提供了方便的方法从 Timestamp 对象中提取特定的日期和时间部分。这些方法包括:

**步骤 1:**创建日期数据框

  • Python3

|------------------------------------------------------------------------------------------------------------------------------------------------------------|
| import pandas as pd # Create dates dataframe with frequency data = pd.date_range('1/1/2011', periods = 10, freq ='H') data |

输出:

复制代码
DatetimeIndex(['2011-01-01 00:00:00', '2011-01-01 01:00:00',
               '2011-01-01 02:00:00', '2011-01-01 03:00:00',
               '2011-01-01 04:00:00', '2011-01-01 05:00:00',
               '2011-01-01 06:00:00', '2011-01-01 07:00:00',
               '2011-01-01 08:00:00', '2011-01-01 09:00:00'],
              dtype='datetime64[ns]', freq='H')

**步骤 2:**创建日期范围并显示基本特征

  • Python3

|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| # Create date and time with dataframe data = pd.date_range('1/1/2011', periods = 10, freq ='H') x = pd.datetime.now() x.month, x.year |

输出:

复制代码
(9, 2018)

日期时间特征可分为两类。第一类是某个时期内的时间点,第二类是自某个特定时期以来经过的时间。这些特征对于理解数据中的模式非常有用。

步骤 3:将给定日期划分为特征 -

pandas.Series.dt.year 返回日期时间的年份。pandas.Series.dt.month 返回 日期时间的月份。pandas.Series.dt.day 返回日期时间的日期。pandas.Series.dt.hour 返回 日期时间的小时。pandas.Series.dt.minute 返回日期时间 分钟

将日期和时间分解为单独的特征

  • Python3

|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| # Create date and time with dataframe rng = pd.DataFrame() rng['date'] = pd.date_range('1/1/2011', periods = 72, freq ='H') # Print the dates in dd-mm-yy format rng[:5] # Create features for year, month, day, hour, and minute rng['year'] = rng['date'].dt.year rng['month'] = rng['date'].dt.month rng['day'] = rng['date'].dt.day rng['hour'] = rng['date'].dt.hour rng['minute'] = rng['date'].dt.minute # Print the dates divided into features rng.head(3) |

输出:

复制代码
date  year  month  day  hour  minute
0 2011-01-01 00:00:00  2011      1    1     0       0
1 2011-01-01 01:00:00  2011      1    1     1       0
2 2011-01-01 02:00:00  2011      1    1     2       0

**步骤 4:**要获取当前时间,使用 Timestamp.now(),然后将时间戳转换为日期时间并直接访问年、月或日。

  • Python3

|-------------------------------------------------------------------------------------|
| # Input present datetime using Timestamp t = pandas.tslib.Timestamp.now() t |

输出:

复制代码
Timestamp('2018-09-18 17:18:49.101496')
  • Python3

|-------------------------------------------------|
| # Convert timestamp to datetime t.to_datetime() |

输出:

datetime.datetime(2018, 9, 18, 17, 18, 49, 101496)

**步骤5:**提取日期时间列的特定组成部分,如日期、时间、星期几,以供进一步分析。

  • Python3

|----------------------------------------------------------------------------------------|
| # Directly access and print the features t.year t.month t.day t.hour t.minute t.second |

输出:

复制代码
2018
8
25
15
53

探索 UFO 目击事件的历史

让我们在真实的数据集uforeports上分析这个问题。

  • Python3

|--------------------------------------------------------------------------------------------------------------------------------------|
| import pandas as pd url = '++++http://bit.ly/uforeports++++' # read csv file df = pd.read_csv(url) df.head() |

输出:

复制代码
City Colors Reported Shape Reported State             Time
0                Ithaca             NaN       TRIANGLE    NY   6/1/1930 22:00
1           Willingboro             NaN          OTHER    NJ  6/30/1930 20:00
2               Holyoke             NaN           OVAL    CO  2/15/1931 14:00
3               Abilene             NaN           DISK    KS   6/1/1931 13:00
4  New York Worlds Fair             NaN          LIGHT    NY  4/18/1933 19:00

该代码用于将 Pandas DataFrame 中的一列时间值转换为日期时间格式。

  • Python3

|-------------------------------------------------------------------------------------------------------|
| # Convert the Time column to datetime format df['Time'] = pd.to_datetime(df.Time) df.head() |

输出:

复制代码
City Colors Reported Shape Reported State  \
0                Ithaca             NaN       TRIANGLE    NY   
1           Willingboro             NaN          OTHER    NJ   
2               Holyoke             NaN           OVAL    CO   
3               Abilene             NaN           DISK    KS   
4  New York Worlds Fair             NaN          LIGHT    NY   
                 Time  
0 1930-06-01 22:00:00  
1 1930-06-30 20:00:00  
2 1931-02-15 14:00:00  
3 1931-06-01 13:00:00  
4 1933-04-18 19:00:00  

该代码用于显示 Pandas DataFrame 中每列的数据类型。

  • Python3

|------------------------------------------------|
| # shows the type of each column data df.dtypes |

输出:

复制代码
City                       object
Colors Reported            object
Shape Reported             object
State                      object
Time               datetime64[ns]
dtype: object

该代码用于从 Pandas DataFrame 中的一列时间数据中提取小时详细信息。

  • Python3

|---------------------------------------------------------|
| # Get hour detail from time data df.Time.dt.hour.head() |

输出:

复制代码
0    22
1    20
2    14
3    13
4    19
Name: Time, dtype: int64

该代码用于检索 Pandas DataFrame 中日期和时间数据列中的星期几名称。

  • Python3

|--------------------------------------------------------|
| # Get name of each date df.Time.dt.weekday_name.head() |

输出:

复制代码
0     Sunday
1     Monday
2     Sunday
3     Monday
4    Tuesday
Name: Time, dtype: object

该代码用于检索 Pandas DataFrame 中日期和时间数据列中每个日期的一年中的序数日。

  • Python3

|-----------------------------------------------------------|
| # Get ordinal day of the year df.Time.dt.dayofyear.head() |

输出:

复制代码
0    152
1    181
2     46
3    152
4    108
Name: Time, dtype: int64

创建可视化效果来探索一天中各个时段出现 UFO 的频率。

  • Python3

|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| |----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | # Convert the 'Time' column to datetime format df[``'Time'``] ``= pd.to_datetime(df.Time) # Extract the hour of the day from the 'Time' column df[``'Hour'``] ``= df[``'Time'``].dt.hour # Create a histogram to visualize UFO sightings by hour plt.figure(figsize``=``(``10``, ``6``)) plt.hist(df[``'Hour'``], bins``=``24``, ``range``=``(``0``, ``24``), edgecolor``=``'black'``, alpha``=``0.7``) plt.xlabel(``'Hour of the Day'``) plt.ylabel(``'Number of UFO Sightings'``) plt.title(``'UFO Sightings by Hour of the Day'``) plt.xticks(``range``(``0``, ``25``)) plt.grid(``True``) plt.show() | |

输出:

最后:

处理日期和时间数据是数据分析师和科学家的一项基本技能。Pandas 提供了一套全面的工具和技术,可有效处理日期和时间信息,从而实现对时间相关数据的深入分析。通过掌握这些技术,您可以从时间序列数据中获得有价值的见解,并在各个领域做出明智的决策。

相关推荐
runepic11 分钟前
[python]使用 Pandas 处理 Excel 数据:分割与展开列操作
python·excel·pandas
Koi慢热28 分钟前
蓝桥杯python赛道我来了
网络·python·网络协议·安全·web安全
测试19982 小时前
树控件、下拉框、文本框常用测试用例
自动化测试·软件测试·python·功能测试·测试工具·职场和发展·测试用例
传说中胖子2 小时前
windows下pyenv与宝塔python冲突解决
python·pyenv
codists2 小时前
《Django 5 By Example》阅读笔记:p551-p560
python·django·编程人
神仙别闹3 小时前
基于Python+Sqlite3实现的搜索和推荐系统
开发语言·python·sqlite
PythonFun4 小时前
玩转Python中的JSON:从基础到进阶
前端·python·json
是Dream呀8 小时前
Python从0到100(七十七):计算机视觉-YOLOv5姿态估计实时检测人体关键点
python·yolo·计算机视觉
??? Meggie9 小时前
【Python】解决运行selenium调用chrome浏览器,监听端口被占用的方法
开发语言·chrome·python