Pandas index操作索引
索引(index)是 Pandas 的重要工具,通过索引可以从 DataFame 中选择特定的行数和列数,这种选择数据的方式称为"子集选择"。
在 Pandas 中,索引值也被称为标签(label),它在 Jupyter 笔记本中以粗体字进行显示。索引可以加快数据访问的速度,它就好比数据的书签,通过它可以实现数据的快速查找。
创建索引
通过示例对 index 索引做进一步讲解。下面创建一个带有 index 索引的数据,并使用 read_csv() 这些读取数据:
python
import pandas as pd
df = pd.read_excel("C:\\Users\\qwy\\Desktop\\data\\empdata.xlsx")
print(df)
输出结果:
EMPNO ENAME JOB MGR HIREDATE SAL COMM DEPTNO
0 7369 SMITH CLERK 7902.0 1980-12-17 800 NaN 20
1 7499 ALLEN SALESMAN 7698.0 1981-02-20 1600 300.0 30
2 7521 WARD SALESMAN 7698.0 1981-02-22 1250 500.0 30
3 7566 JONES MANAGER 7839.0 1981-04-02 2975 NaN 20
4 7654 MARTIN SALESMAN 7698.0 1981-09-28 1250 1400.0 30
5 7698 BLAKE MANAGER 7839.0 1981-05-01 2850 NaN 30
6 7782 CLARK MANAGER 7839.0 1981-06-09 2450 NaN 10
7 7788 SCOTT ANALYST 7566.0 1987-04-19 3000 NaN 20
8 7839 KING PRESIDENT NaN 1981-11-17 5000 NaN 10
9 7844 TURNER SALESMAN 7698.0 1981-09-08 1500 0.0 30
10 7876 ADAMS CLERK 7788.0 1987-05-23 1100 NaN 20
11 7900 JAMES CLERK 7698.0 1981-12-03 950 NaN 30
12 7902 FORD ANALYST 7566.0 1981-12-03 3000 NaN 20
13 7934 MILLER CLERK 7782.0 1982-01-23 1300 NaN 10
通过列索引(标签)读取多列数据。
python
import pandas as pd
df = pd.read_excel("C:\\Users\\qwy\\Desktop\\data\\empdata.xlsx")
print('读取EMPNO、ENAME、JOB、MGR:\n',df[['EMPNO','ENAME','JOB','MGR']])
# 或使用loc函数
#print('读取EMPNO、ENAME、JOB、MGR:\n',df.loc[:,['EMPNO','ENAME','JOB','MGR']])
输出结果:
读取EMPNO、ENAME、JOB、MGR:
EMPNO ENAME JOB MGR
0 7369 SMITH CLERK 7902.0
1 7499 ALLEN SALESMAN 7698.0
2 7521 WARD SALESMAN 7698.0
3 7566 JONES MANAGER 7839.0
4 7654 MARTIN SALESMAN 7698.0
5 7698 BLAKE MANAGER 7839.0
6 7782 CLARK MANAGER 7839.0
7 7788 SCOTT ANALYST 7566.0
8 7839 KING PRESIDENT NaN
9 7844 TURNER SALESMAN 7698.0
10 7876 ADAMS CLERK 7788.0
11 7900 JAMES CLERK 7698.0
12 7902 FORD ANALYST 7566.0
13 7934 MILLER CLERK 7782.0
再看一组简单的示例:
python
import pandas as pd
df = pd.read_excel("C:\\Users\\qwy\\Desktop\\data\\empdata.xlsx",index_col='EMPNO')
print('读取ENAME、JOB、MGR:\n',df[['ENAME','JOB','MGR']])
输出结果:
读取EMPNO、ENAME、JOB、MGR:
ENAME JOB MGR
EMPNO
7369 SMITH CLERK 7902.0
7499 ALLEN SALESMAN 7698.0
7521 WARD SALESMAN 7698.0
7566 JONES MANAGER 7839.0
7654 MARTIN SALESMAN 7698.0
7698 BLAKE MANAGER 7839.0
7782 CLARK MANAGER 7839.0
7788 SCOTT ANALYST 7566.0
7839 KING PRESIDENT NaN
7844 TURNER SALESMAN 7698.0
7876 ADAMS CLERK 7788.0
7900 JAMES CLERK 7698.0
7902 FORD ANALYST 7566.0
7934 MILLER CLERK 7782.0
设置索引
set_index() 将已存在的列标签设置为 DataFrame 行索引。除了可以添加索引外,也可以替换已经存在的索引。比如您也可以把 Series 或者一个 DataFrme 设置成另一个 DataFrame 的索引。示例如下:
python
import pandas as pd
df = pd.read_excel("C:\\Users\\qwy\\Desktop\\data\\empdata.xlsx")
print('读取ENAME、JOB、MGR:\n',df[['EMPNO','ENAME','JOB','MGR']].set_index(['EMPNO']))
输出结果:
读取ENAME、JOB、MGR:
ENAME JOB MGR
EMPNO
7369 SMITH CLERK 7902.0
7499 ALLEN SALESMAN 7698.0
7521 WARD SALESMAN 7698.0
7566 JONES MANAGER 7839.0
7654 MARTIN SALESMAN 7698.0
7698 BLAKE MANAGER 7839.0
7782 CLARK MANAGER 7839.0
7788 SCOTT ANALYST 7566.0
7839 KING PRESIDENT NaN
7844 TURNER SALESMAN 7698.0
7876 ADAMS CLERK 7788.0
7900 JAMES CLERK 7698.0
7902 FORD ANALYST 7566.0
7934 MILLER CLERK 7782.0
重置索引
您可以使用 reset_index() 来恢复初始行索引,示例如下:
python
import pandas as pd
df = pd.read_excel("C:\\Users\\qwy\\Desktop\\data\\empdata.xlsx")
print('读取ENAME、JOB、MGR:\n',df[['EMPNO','ENAME','JOB','MGR']].set_index(['EMPNO']))
print('重置后的索引:\n',df[['EMPNO','ENAME','JOB','MGR']].reset_index())
输出结果:
读取ENAME、JOB、MGR:
ENAME JOB MGR
EMPNO
7369 SMITH CLERK 7902.0
7499 ALLEN SALESMAN 7698.0
7521 WARD SALESMAN 7698.0
7566 JONES MANAGER 7839.0
7654 MARTIN SALESMAN 7698.0
7698 BLAKE MANAGER 7839.0
7782 CLARK MANAGER 7839.0
7788 SCOTT ANALYST 7566.0
7839 KING PRESIDENT NaN
7844 TURNER SALESMAN 7698.0
7876 ADAMS CLERK 7788.0
7900 JAMES CLERK 7698.0
7902 FORD ANALYST 7566.0
7934 MILLER CLERK 7782.0
重置后的索引:
index EMPNO ENAME JOB MGR
0 0 7369 SMITH CLERK 7902.0
1 1 7499 ALLEN SALESMAN 7698.0
2 2 7521 WARD SALESMAN 7698.0
3 3 7566 JONES MANAGER 7839.0
4 4 7654 MARTIN SALESMAN 7698.0
5 5 7698 BLAKE MANAGER 7839.0
6 6 7782 CLARK MANAGER 7839.0
7 7 7788 SCOTT ANALYST 7566.0
8 8 7839 KING PRESIDENT NaN
9 9 7844 TURNER SALESMAN 7698.0
10 10 7876 ADAMS CLERK 7788.0
11 11 7900 JAMES CLERK 7698.0
12 12 7902 FORD ANALYST 7566.0
13 13 7934 MILLER CLERK 7782.0