-
在使用
pandas
的时候,有时候会需要将pandas
中的数据类型转换为python
中的list
,而pandas
也提供了tolist()
和to_list()
这两个方法来实现这一需求 -
几乎可以认为
pandas
中的tolist()
和to_list()
用法没有差别 -
还顺便介绍了
numpy
中的tolist()
方法,其主要特点是可以作用于任意维度的数组
1. tolist()
pandas.api.extensions.ExtensionArray.tolist()
Return a list of the values.
These are each a scalar type, which is a Python scalar (for str, int, float) or
a pandas scalar (for Timestamp/Timedelta/Interval/Period)
python
>>> arr = pd.array([1, 2, 3])
>>> arr.tolist()
[1, 2, 3]
这是官方文档上对于tolist()的说明与示例。从中可以看出:
- 该方法属于
pandas
扩展的,从其所属的pandas.api.extensions.ExtensionArray
即可看出 - 该方法返回一个
list
,list
中元素的类型既可以为python
的数据类型,也可以pandas
中的类型,(在较早的版本中,返回列表中的元素类型为numpy
类型或者pandas
类型)
下面用示例来介绍tolist()
方法的使用
python
df = pd.DataFrame(
{"A": [1, 2, 3], "B": [4, 5, 6]},
index=["x", "y", "z"]
)
python
A B
x 1 4
y 2 5
z 3 6
1.1. 不能直接用于DataFrame
python
df.tolist()
# AttributeError: 'DataFrame' object has no attribute 'tolist'
1.2. 用于index
和column
属性上
python
index_tolist = df.index.tolist()
print(index_tolist)
print(type(index_tolist))
print(type(index_tolist[0]))
# ['x', 'y', 'z']
# <class 'list'>
# <class 'str'>
python
columns_tolist = df.columns.tolist()
print(columns_tolist)
print(type(columns_tolist))
print(type(columns_tolist[0]))
# ['A', 'B']
# <class 'list'>
# <class 'str'>
1.3. 用于行列数据上
python
row_tolist = df.iloc[0].tolist()
print(row_tolist)
print(type(row_tolist))
print(type(row_tolist[0]))
# [1, 4]
# <class 'list'>
# <class 'int'>
python
col_tolist = df["A"].tolist()
print(col_tolist)
print(type(col_tolist))
print(type(col_tolist[0]))
# [1, 2, 3]
# <class 'list'>
# <class 'int'>
此处也表明
tolist()
在Series()
的用法
1.4. 用在多维索引上
python
index_df = pd.DataFrame(
[["bar", "one"], ["bar", "two"], ["foo", "one"], ["foo", "two"]],
columns=["first", "second"],
)
mul_index = pd.MultiIndex.from_frame(index_df)
mul_df = pd.DataFrame(np.random.randn(4, 3), index=mul_index)
python
0 1 2
first second
bar one -0.625643 0.533483 0.066657
two -1.759180 1.116185 0.264087
foo one -0.773947 -1.649559 1.865090
two 1.200301 -3.090575 -1.464554
python
mul_index_tolist = mul_df.index.tolist()
print(mul_index_tolist)
print(type(mul_index_tolist))
print(type(mul_index_tolist[0]))
print(type(mul_index_tolist[0][0]))
# [('bar', 'one'), ('bar', 'two'), ('foo', 'one'), ('foo', 'two')]
# <class 'list'>
# <class 'tuple'>
# <class 'str'>
2. to_list()
pandas.Index.to_list()
pandas.Series.to_list()
Return a list of the values.
These are each a scalar type, which is a Python scalar (for str, int, float) or
a pandas scalar (for Timestamp/Timedelta/Interval/Period)
从官方文档可以看出,to_list()
与 tolist()
的解释说明完全一致,所不同的是tolist()
属于pandas
扩展方法,而to_list()
则属于Index
和Series
类型的方法。
2.1. 不能直接用于DataFrame
python
df.to_list()
# AttributeError: 'DataFrame' object has no attribute 'to_list'
2.2. 用于index
和column
属性上
python
index_to_list = df.index.to_list()
print(index_to_list)
print(type(index_to_list))
print(type(index_to_list[0]))
# ['x', 'y', 'z']
# <class 'list'>
# <class 'str'>
python
columns_to_list = df.columns.to_list()
print(columns_to_list)
print(type(columns_to_list))
print(type(columns_to_list[0]))
# ['A', 'B']
# <class 'list'>
# <class 'str'>
2.3. 用于行列数据上
python
row_to_list = df.iloc[0].to_list()
print(row_to_list)
print(type(row_to_list))
print(type(row_to_list[0]))
# [1, 4]
# <class 'list'>
# <class 'int'>
python
col_to_list = df["A"].to_list()
print(col_to_list)
print(type(col_to_list))
print(type(col_to_list[0]))
# [1, 2, 3]
# <class 'list'>
# <class 'int'>
此处也表明
to_list()
在Series()
上的用法
2.4. 用在多维索引上
python
index_df = pd.DataFrame(
[["bar", "one"], ["bar", "two"], ["foo", "one"], ["foo", "two"]],
columns=["first", "second"],
)
mul_index = pd.MultiIndex.from_frame(index_df)
mul_df = pd.DataFrame(np.random.randn(4, 3), index=mul_index)
python
0 1 2
first second
bar one -0.625643 0.533483 0.066657
two -1.759180 1.116185 0.264087
foo one -0.773947 -1.649559 1.865090
two 1.200301 -3.090575 -1.464554
python
mul_index_to_list = mul_df.index.to_list()
print(mul_index_to_list)
print(type(mul_index_to_list))
print(type(mul_index_to_list[0]))
print(type(mul_index_to_list[0][0]))
# [('bar', 'one'), ('bar', 'two'), ('foo', 'one'), ('foo', 'two')]
# <class 'list'>
# <class 'tuple'>
# <class 'str'>
3. numpy
中的 tolist()
numpy.ndarray.tolist()
Return the array as an a.ndim-levels deep nested list of Python scalars.
Return a copy of the array data as a (nested) Python list.
Data items are converted to the nearest compatible builtin Python type, via the item function.
If a.ndim is 0, then since the depth of the nested list is 0, it will not be a list at all, but a simple Python scalar.
numpy
中的 tolist()
着重强调了两点:
- 列表中的元素类型都是
python
数据类型 - 可以对0维、1维和2维以及更高维度的
numpy.ndarray
进行转换,这一点是pandas
中的tolist
或to_list
所不具备的
python
a = np.uint32([1, 2])
a_list = list(a)
a_list # [1, 2]
type(a_list[0]) # <class 'numpy.uint32'>
a_tolist = a.tolist()
a_tolist # [1, 2]
type(a_tolist[0]) # <class 'int'>
python
a = np.array([[1, 2], [3, 4]])
list(a) # [array([1, 2]), array([3, 4])]
a.tolist() # [[1, 2], [3, 4]]
python
a = np.array(1)
# list(a)
# Traceback (most recent call last):
# ...
# TypeError: iteration over a 0-d array
a.tolist() # 1