python的第三方模块pandas模块学习笔记

pandas模块是python的第三方模块

Pandas 是一个开源的第三方 Python 库，从 Numpy 和 Matplotlib 的基础上构建而来，享有数据分析"三剑客之一"的盛名（NumPy、Matplotlib、Pandas）。Pandas 已经成为 Python 数据分析的必备高级工具，它的目标是成为强大、灵活、可以支持任何编程语言的数据分析工具，本文主要是对pandas进行入门，通过本文你将系统性了解pandas的基本使用方法。

安装

pandas常常和numpy模块一起使用

html 复制代码

pip install numpy
pip install pandas

使用笔记

bash 复制代码

import pandas as pd
import numpy as np
from pandas import DataFrame


# df.T     返回一个转置（行列颠倒的df对象）
# df.reset_index()     重置行索引，默认为0123这种的，旧的行索引保留并且转换为第一列
# df.reset_index(drop=True)     重置行索引，默认为0123这种的，旧的行索引不保留
# df.values     将df对象转换成为numpy数组对象
# np.tolist()     将np对象转换成为二维的列表
# df.columns    返回df对象的列索引对象
# df.index    返回df对象的行索引对象


# dfobj = DataFrame().T.reset_index().values.tolist()



df = pd.DataFrame([('bird', 389.0),
                   ('bird', 24.0),
                   ('mammal', 80.5),
                   ('mammal', np.nan)],
                  index=['falcon', 'parrot', 'lion', 'monkey'],
                  columns=('class', 'max_speed'))

print(df)
#          class  max_speed
# falcon    bird      389.0
# parrot    bird       24.0
# lion    mammal       80.5
# monkey  mammal        NaN



df = pd.DataFrame([('bird', 389.0),
                   ('bird', 24.0),
                   ('mammal', 80.5),
                   ('mammal', np.nan)])

print(df)
#         0      1
# 0    bird  389.0
# 1    bird   24.0
# 2  mammal   80.5
# 3  mammal    NaN


print(df.T)
#       0     1       2       3
# 0  bird  bird  mammal  mammal
# 1   389    24    80.5     NaN



print(df.reset_index())
#    index       0      1
# 0      0    bird  389.0
# 1      1    bird   24.0
# 2      2  mammal   80.5
# 3      3  mammal    NaN



print(df.reset_index(drop=True))
#         0      1
# 0    bird  389.0
# 1    bird   24.0
# 2  mammal   80.5
# 3  mammal    NaN


print(df.values)
# [['bird' 389.0]
#  ['bird' 24.0]
#  ['mammal' 80.5]
#  ['mammal' nan]]



print(df.values.tolist())
# [['bird', 389.0], ['bird', 24.0], ['mammal', 80.5], ['mammal', nan]]




print(df.columns)
# RangeIndex(start=0, stop=2, step=1)


print(df.columns.tolist())
# [0, 1]


print(df.index)
# RangeIndex(start=0, stop=4, step=1)


print(df.index.tolist())
# [0, 1, 2, 3]


df = pd.DataFrame([(123, 389.0),
                   (432, 24.0),
                   (34, 80.5),
                   (54, 87)])
print(df / 1000)
#        0       1
# 0  0.123  0.3890
# 1  0.432  0.0240
# 2  0.034  0.0805
# 3  0.054  0.0870