Python数据分析：numpy-tutorial-student

numpy基础

Numpy简介

Numpy是Python语言的一个library numpy
Numpy主要支持矩阵操作和运算
Numpy非常高效，core代码由C语言写成
我们第三课要讲的pandas也是基于Numpy构建的一个library
现在比较流行的机器学习框架（例如Tensorflow/PyTorch等等），语法都与Numpy比较接近

Arrays/数组

看你数组的维度啦，我自己的话比较简单粗暴，一般直接把1维数组就看做向量/vector，2维数组看做2维矩阵，3维数组看做3维矩阵...

可以调用np.array去从list初始化一个数组:

python 复制代码

a = np.array([1, 2, 3])
print(a)

csharp 复制代码

[1 2 3]

python 复制代码

print(type(a))

arduino 复制代码

<class 'numpy.ndarray'>

python 复制代码

type([1,2,3])

复制代码

list

python 复制代码

a[2]

复制代码

python 复制代码

a[0] = 5

python 复制代码

print(a)

csharp 复制代码

[5 2 3]

python 复制代码

b = np.array([[1,2,3], [2,3,4]])
print(b)

lua 复制代码

[[1 2 3]
 [2 3 4]]

python 复制代码

print(type(b))

arduino 复制代码

<class 'numpy.ndarray'>

python 复制代码

print(b.shape)

scss 复制代码

(2, 3)

python 复制代码

print(b[0,2])

复制代码

ndarray = n dimensional array

python 复制代码

有一些内置的创建数组的函数:

python 复制代码

a = np.zeros((2,3))
print(a)

lua 复制代码

[[ 0.  0.  0.]
 [ 0.  0.  0.]]

python 复制代码

b = np.ones((1,2))
print(b)

lua 复制代码

[[ 1.  1.]]

python 复制代码

c = np.full((2,2), 8)
print(c)

lua 复制代码

[[8 8]
 [8 8]]

python 复制代码

d = np.eye(3)
print(d)

lua 复制代码

[[ 1.  0.  0.]
 [ 0.  1.  0.]
 [ 0.  0.  1.]]

python 复制代码

e = np.random.random((3,2))
print(e)

lua 复制代码

[[ 0.34222445  0.254503  ]
 [ 0.07192044  0.39303621]
 [ 0.64905403  0.77977616]]

python 复制代码

f = np.empty((2,3,4))
print(f)

lua 复制代码

[[[  8.09661985e-312   7.21335843e-322   0.00000000e+000   0.00000000e+000]
  [  1.29061414e-306   1.16095484e-028   5.28595592e-085   1.01445441e+242]
  [  2.93623251e-062   1.75300433e+243   4.25136003e-096   1.04716878e-142]]

 [[  4.54813897e-144   5.83001600e+199   7.48468178e+251   5.04621362e+180]
  [  7.49779533e+247   3.88625532e+285   2.02647441e+267   1.16317829e-028]
  [  8.76739488e+252   1.14011198e+243   5.54175224e+257   8.34402698e-308]]]

python 复制代码

print(f.shape)

scss 复制代码

(2, 3, 4)

python 复制代码

g = np.arange(15)
print(g)

css 复制代码

[ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14]

python 复制代码

print(g.shape)

scss 复制代码

(15,)

python 复制代码

type((2,3))

arduino 复制代码

tuple

数组可以有不同的数据类型

python 复制代码

arr = np.array([1,2,3])
print(arr.dtype)

go 复制代码

int32

python 复制代码

arr = np.array([1,2,3], dtype=np.float64)
print(arr.dtype)

go 复制代码

float64

python 复制代码

arr

scss 复制代码

array([ 1.,  2.,  3.])

python 复制代码

arr = np.array([1,2,3], dtype=np.int64)
print(arr.dtype)

go 复制代码

int64

生成数组时可以指定数据类型，如果不指定numpy会自动匹配合适的类型

使用astype复制数组并转换数据类型

python 复制代码

int_arr = np.array([1,2,3,4,5])
print(int_arr, int_arr.dtype)

csharp 复制代码

[1 2 3 4 5] int32

python 复制代码

float_arr = int_arr.astype(np.float64)
print(float_arr.dtype, float_arr)

css 复制代码

float64 [ 1.  2.  3.  4.  5.]

使用astype将float转换为int时小数部分被舍弃

python 复制代码

float_arr = np.array([3.5,2.3,4.8,-2.2])
print(float_arr)

css 复制代码

[ 3.5  2.3  4.8 -2.2]

python 复制代码

int_arr = float_arr.astype(np.int64)
print(int_arr, int_arr.dtype)

css 复制代码

[ 3  2  4 -2] int64

使用astype把字符串转换为数组，如果失败抛出异常。

python 复制代码

str_arr = np.array(['1.24', '2.2', '5.8', 'asas'], dtype=np.string_)
str_arr

php 复制代码

array([b'1.24', b'2.2', b'5.8', b'asas'], 
      dtype='|S4')

python 复制代码

float_arr = str_arr.astype(dtype=np.float)
print(float_arr)

sql 复制代码

---------------------------------------------------------------------------

ValueError                                Traceback (most recent call last)

<ipython-input-61-1d45899cf6ad> in <module>()
----> 1 float_arr = str_arr.astype(dtype=np.float)
      2 print(float_arr)


ValueError: could not convert string to float: 'asas'

astype使用其它数组的数据类型作为参数

python 复制代码

int_arr = np.arange(10)
float_arr = np.array([2.3,4.6,9.8])
print(float_arr.dtype, int_arr.dtype)

go 复制代码

float64 int32

python 复制代码

int_arr.astype(dtype=float_arr.dtype)

scss 复制代码

array([ 0.,  1.,  2.,  3.,  4.,  5.,  6.,  7.,  8.,  9.])

python 复制代码

Array indexing/数组取值和赋值

Numpy提供了蛮多种取值的方式的.

python 复制代码

a = np.array([[1,2,3,4], [5,6,7,8], [9,10,11,12]])
print(a)
print(a.shape)

lua 复制代码

[[ 1  2  3  4]
 [ 5  6  7  8]
 [ 9 10 11 12]]
(3, 4)

可以像list一样切片（多维数组可以从各个维度同时切片）:

python 复制代码

b = a[0:2,2:4].copy()
print(b)

lua 复制代码

[[3 4]
 [7 8]]

虽然，怎么说呢，不建议你这样去赋值，但是你确实可以修改切片出来的对象，然后完成对原数组的赋值.

python 复制代码

b[0,0] = 111111
print(b)
print(a)

lua 复制代码

[[111111      4]
 [     7      8]]
[[ 1  2  3  4]
 [ 5  6  7  8]
 [ 9 10 11 12]]

创建3x4的2维数组/矩阵

你就放心大胆地去取你想要的数咯:

python 复制代码

row_r1 = a[1,:]
print(row_r1, row_r1.shape)

scss 复制代码

[5 6 7 8] (4,)

python 复制代码

row_r2 = a[1:2, :]
print(row_r2, row_r2.shape)

lua 复制代码

[[5 6 7 8]] (1, 4)

python 复制代码

row_r3 = a[[1], :]
print(row_r3, row_r3.shape)

lua 复制代码

[[5 6 7 8]] (1, 4)

python 复制代码

arduino 复制代码

  File "<ipython-input-92-0f96395e9759>", line 1
    [1:2]
      ^
SyntaxError: invalid syntax

试试在第2个维度上切片也一样的:

python 复制代码

lua 复制代码

array([[ 1,  2,  3,  4],
       [ 5,  6,  7,  8],
       [ 9, 10, 11, 12]])

python 复制代码

col_r1 = a[:, 1]
print(col_r1, col_r1.shape)

scss 复制代码

[ 2  6 10] (3,)

python 复制代码

col_r2 = a[:, 1:2]
print(col_r2, col_r2.shape)

lua 复制代码

[[ 2]
 [ 6]
 [10]] (3, 1)

下面这个高级了，更自由地取值和组合，但是要看清楚一点:

python 复制代码

a = np.array([[1,2], [3, 4], [5, 6]])
print(a)

lua 复制代码

[[1 2]
 [3 4]
 [5 6]]

python 复制代码

print(a[[0,1,2], [0,1,0]])
print(a[[0,1,2], [0,1,0]].shape)

scss 复制代码

[1 4 5]
(3,)

python 复制代码

print(np.array([a[0,0], a[1,1], a[2,0]]))

csharp 复制代码

[1 4 5]

python 复制代码

print(a)
print(a[[0,0], [1,0]])
print(np.array([a[0,0], a[0,1]]))

css 复制代码

[[1 2]
 [3 4]
 [5 6]]
[2 1]
[1 2]

再来熟悉一下

先创建一个2维数组

python 复制代码

a = np.array([[1,2,3], [4,5,6], [7,8,9], [10, 11, 12]])
print(a)

lua 复制代码

[[ 1  2  3]
 [ 4  5  6]
 [ 7  8  9]
 [10 11 12]]

用下标生成一个向量

python 复制代码

b = np.array([0, 2,0,1])

你能看明白下面做的事情吗？

python 复制代码

a[np.arange(4), b]

scss 复制代码

array([ 1,  6,  7, 11])

既然可以取出来，我们当然也可以对这些元素操作咯

python 复制代码

a[np.arange(4), b] += 10
print(a)

lua 复制代码

[[11  2  3]
 [ 4  5 16]
 [17  8  9]
 [10 21 12]]

比较fashion的取法之一，用条件判定去取（但是很好用）:

python 复制代码

a = np.array([[1,2], [3, 4], [5, 6]])
print(a)

lua 复制代码

[[1 2]
 [3 4]
 [5 6]]

python 复制代码

bool_index = (a > 2)
print(bool_index)

python 复制代码

[[False False]
 [ True  True]
 [ True  True]]

用刚才的布尔型数组作为下标就可以去除符合条件的元素啦

python 复制代码

print(a[bool_index].shape)

scss 复制代码

(4,)

python 复制代码

其实一句话也可以完成是不是？

python 复制代码

print(a[a>2])

csharp 复制代码

[3 4 5 6]

那个，真的，其实还有很多细节，其他的方式去取值，你可以看看官方文档。

我们一起来来总结一下，看下面切片取值方式（对应颜色是取出来的结果）：

Datatypes

我们可以用dtype来看numpy数组中元素的类型:

python 复制代码

x = np.array([1, 2])  # numpy构建数组的时候自己会确定类型
y = np.array([1.0, 2.0])
z = np.array([1, 2], dtype=np.int64)

python 复制代码

print(x.dtype, y.dtype, z.dtype)

go 复制代码

int32 float64 int64

更多的内容可以读读文档.

数学运算

下面这些运算才是你在科学运算中经常经常会用到的，比如逐个元素的运算如下:

python 复制代码

x = np.array([[1,2],[3,4]], dtype=np.float64)
y = np.array([[5,6],[7,8]], dtype=np.float64)

python 复制代码

print(x)
print(y)

lua 复制代码

[[ 1.  2.]
 [ 3.  4.]]
[[ 5.  6.]
 [ 7.  8.]]

python 复制代码

print(x.shape)
print(y.shape)

scss 复制代码

(2, 2)
(2, 2)

逐元素求和有下面2种方式

python 复制代码

x+y

lua 复制代码

array([[  6.,   8.],
       [ 10.,  12.]])

python 复制代码

np.add(x,y)

lua 复制代码

array([[  6.,   8.],
       [ 10.,  12.]])

逐元素作差

python 复制代码

x-y

lua 复制代码

array([[-4., -4.],
       [-4., -4.]])

python 复制代码

np.subtract(x,y)

lua 复制代码

array([[-4., -4.],
       [-4., -4.]])

逐元素相乘

python 复制代码

x*y

lua 复制代码

array([[  5.,  12.],
       [ 21.,  32.]])

python 复制代码

np.multiply(x,y)

lua 复制代码

array([[  5.,  12.],
       [ 21.,  32.]])

逐元素相除

python 复制代码

x/y

lua 复制代码

array([[ 0.2       ,  0.33333333],
       [ 0.42857143,  0.5       ]])

python 复制代码

np.divide(x, y)

lua 复制代码

array([[ 0.2       ,  0.33333333],
       [ 0.42857143,  0.5       ]])

python 复制代码

np.sqrt(x)

lua 复制代码

array([[ 1.        ,  1.41421356],
       [ 1.73205081,  2.        ]])

逐元素求平方根！！！

那如果我要做矩阵的乘法运算怎么办！！！恩，别着急，照着下面写就可以了:

matrix multiplication

python 复制代码

v = np.array([9,10])
w = np.array([10,11])
print(v.shape)

scss 复制代码

(2,)

求向量内积

python 复制代码

v.dot(w)

复制代码

python 复制代码

np.dot(v,w)

复制代码

python 复制代码

矩阵的乘法

python 复制代码

x = np.array([[1,2], [3,4]])
y = np.array([[5,6], [7,8]])
print(x)
print()
print(y)

lua 复制代码

[[1 2]
 [3 4]]

[[5 6]
 [7 8]]

python 复制代码

scss 复制代码

array([ 9, 10])

python 复制代码

print(x.dot(v))

csharp 复制代码

[29 67]

python 复制代码

np.dot(x, v)

scss 复制代码

array([29, 67])

python 复制代码

x.dot(y)

lua 复制代码

array([[19, 22],
       [43, 50]])

python 复制代码

np.dot(x,y)

lua 复制代码

array([[19, 22],
       [43, 50]])

转置和数学公式一样，简单粗暴

python 复制代码

lua 复制代码

array([[1, 2],
       [3, 4]])

python 复制代码

x.T

lua 复制代码

array([[1, 3],
       [2, 4]])

需要说明一下，1维的vector转置还是自己

python 复制代码

v.shape

scss 复制代码

(2,)

python 复制代码

v.T.shape

scss 复制代码

(2,)

2维的就不一样了

python 复制代码

w = np.array([[1,2,3]])
print(w, w.shape)

lua 复制代码

[[1 2 3]] (1, 3)

python 复制代码

print(w.T)

lua 复制代码

[[1]
 [2]
 [3]]

python 复制代码

利用转置矩阵做dot product

python 复制代码

arr = np.random.randn(6,3)
arr

css 复制代码

array([[-0.14026475, -0.03781998, -0.27386714],
       [-0.72247734, -1.02956336, -0.25832996],
       [-0.12185501,  1.52968734, -1.02428656],
       [-0.46462044, -0.67144009,  0.14399562],
       [ 0.39160306,  0.26127618,  0.01423764],
       [ 1.27525835,  0.61498609,  0.45733376]])

python 复制代码

print(arr.T.dot(arr))

lua 复制代码

[[ 2.55200533  1.76128844  0.87175679]
 [ 1.76128844  4.29867937 -1.10222404]
 [ 0.87175679 -1.10222404  1.42099216]]

python 复制代码

print(np.dot(arr,arr))

erlang 复制代码

---------------------------------------------------------------------------

ValueError                                Traceback (most recent call last)

<ipython-input-158-8f8e98f47ed1> in <module>()
----> 1 print(np.dot(arr,arr))


ValueError: shapes (6,3) and (6,3) not aligned: 3 (dim 1) != 6 (dim 0)

高维的tensor也可以做转置

python 复制代码

arr = np.arange(16).reshape(2,2,4)
print(arr, arr.shape)

lua 复制代码

[[[ 0  1  2  3]
  [ 4  5  6  7]]

 [[ 8  9 10 11]
  [12 13 14 15]]] (2, 2, 4)

python 复制代码

print(arr.transpose((1,0,2)))

lua 复制代码

[[[ 0  1  2  3]
  [ 8  9 10 11]]

 [[ 4  5  6  7]
  [12 13 14 15]]]

python 复制代码

print(arr.transpose((0,2,1)))

lua 复制代码

[[[ 0  4]
  [ 1  5]
  [ 2  6]
  [ 3  7]]

 [[ 8 12]
  [ 9 13]
  [10 14]
  [11 15]]]

python 复制代码

print(arr.transpose((2,1,0)))

lua 复制代码

[[[ 0  8]
  [ 4 12]]

 [[ 1  9]
  [ 5 13]]

 [[ 2 10]
  [ 6 14]]

 [[ 3 11]
  [ 7 15]]]

python 复制代码

print(arr.swapaxes(1,2))

lua 复制代码

[[[ 0  4]
  [ 1  5]
  [ 2  6]
  [ 3  7]]

 [[ 8 12]
  [ 9 13]
  [10 14]
  [11 15]]]

python 复制代码

x = np.arange(24).reshape(2,3,4)
y = np.arange(8).reshape(4,2)
print(x)
print()
print(y)

lua 复制代码

[[[ 0  1  2  3]
  [ 4  5  6  7]
  [ 8  9 10 11]]

 [[12 13 14 15]
  [16 17 18 19]
  [20 21 22 23]]]

[[0 1]
 [2 3]
 [4 5]
 [6 7]]

python 复制代码

print(np.matmul(x,y).shape)

scss 复制代码

(2, 3, 2)

python 复制代码

print(np.dot(x,y).shape)

scss 复制代码

(2, 3, 2)

python 复制代码

x = np.arange(24).reshape(2,3,4)
y = np.arange(16).reshape(2,4,2)
print(x.dot(y).shape)

scss 复制代码

(2, 3, 2, 2)

python 复制代码

np.matmul(x,y).shape

scss 复制代码

(2, 3, 2)

你猜你做科学运算会最常用到的矩阵内元素的运算是什么？对啦，是求和，用 sum可以完成:

python 复制代码

x= np.array([[1,2], [3,4]])
print(x)

lua 复制代码

[[1 2]
 [3 4]]

python 复制代码

print(np.sum(x))
print(x.sum())

复制代码

10
10

python 复制代码

print(np.sum(x, axis=0))

csharp 复制代码

[4 6]

python 复制代码

print(np.sum(x, axis=1))

csharp 复制代码

[3 7]

python 复制代码

print(np.mean(x))
print(np.mean(x, axis=0))
print(np.mean(x, axis=1))

css 复制代码

2.5
[ 2.  3.]
[ 1.5  3.5]

python 复制代码

还有一些其他我们可以想到的运算，比如求和，求平均，求cumulative sum，sumulative product用numpy都可以做到

python 复制代码

print(x.cumsum(axis=0))
print(x.cumsum(axis=1))

lua 复制代码

[[1 2]
 [4 6]]
[[1 3]
 [3 7]]

python 复制代码

print(x.cumprod(axis=0))
print(x.cumprod(axis=1))

lua 复制代码

[[1 2]
 [3 8]]
[[ 1  2]
 [ 3 12]]

我想说最基本的运算就是上面这个样子，更多的运算可能得查查文档.

其实除掉基本运算，我们经常还需要做一些操作，比如矩阵的变形，转置和重排等等:

一维数组的排序

python 复制代码

arr = np.random.randn(8) * 10
print(arr)

css 复制代码

[-15.78997677 -12.10676988   0.51148087  -9.8265767    4.83704543  -2.74087509  11.02576343   3.51866316]

python 复制代码

arr.sort()
print(arr)

css 复制代码

[-15.78997677 -12.10676988  -9.8265767   -2.74087509   0.51148087   3.51866316   4.83704543  11.02576343]

二维数组也可以在某些维度上排序

python 复制代码

arr = np.random.randn(5,3) * 10
print(arr)

css 复制代码

[[ -1.64638996  -1.2114224    2.15896971]
 [  6.00536939  -7.71788794   6.54291646]
 [  2.30925965   4.72675272  -0.2759866 ]
 [  2.92760387 -11.45580554  -7.84082136]
 [ -5.01085861  -0.96178148  -5.77203831]]

python 复制代码

arr.sort(1)
print(arr)

css 复制代码

[[ -1.64638996  -1.2114224    2.15896971]
 [ -7.71788794   6.00536939   6.54291646]
 [ -0.2759866    2.30925965   4.72675272]
 [-11.45580554  -7.84082136   2.92760387]
 [ -5.77203831  -5.01085861  -0.96178148]]

下面我们做一个小案例，找出排序后位置在5%的数字

python 复制代码

large_arr = np.random.randn(1000)
large_arr.sort()
print(large_arr[int(0.05 * len(large_arr))])

diff 复制代码

-1.71556330464

Broadcasting

这个没想好哪个中文词最贴切，我们暂且叫它"传播吧":

作用是什么呢，我们设想一个场景，如果要用小的矩阵去和大的矩阵做一些操作，但是希望小矩阵能循环和大矩阵的那些块做一样的操作，那急需要Broadcasting啦

我们要做一件事情，给x的每一行都逐元素加上一个向量，然后生成y

python 复制代码

x = np.array([[1,2,3], [4,5,6], [7,8,9], [10, 11, 12]])
v = np.array([1, 0, 1])
print(x)
print()
print(v)

css 复制代码

[[ 1  2  3]
 [ 4  5  6]
 [ 7  8  9]
 [10 11 12]]

[1 0 1]

python 复制代码

y = np.empty_like(x)
print(y)

lua 复制代码

[[0 0 0]
 [0 0 0]
 [0 0 0]
 [0 0 0]]

python 复制代码

比较粗暴的方式是，用for循环逐个相加

python 复制代码

for i in range(4):
    y[i,:] = x[i,:] + v
    
print(y)

lua 复制代码

[[ 2  2  4]
 [ 5  5  7]
 [ 8  8 10]
 [11 11 13]]

这种方法当然可以啦，问题是不高效嘛，如果你的x矩阵行数非常多，那就很慢的咯:

Numpy broadcasting allows us to perform this computation without actually creating multiple copies of v. Consider this version, using broadcasting:

因为broadcasting的存在，你上面的操作可以简单地汇总成一个求和操作

python 复制代码

print(x.shape, v.shape)
x + v

lua 复制代码

(4, 3) (3,)





array([[ 2,  2,  4],
       [ 5,  5,  7],
       [ 8,  8, 10],
       [11, 11, 13]])

当操作两个array时，numpy会逐个比较它们的shape，在下述情况下，两arrays会兼容和输出broadcasting结果：

相等
其中一个为1，（进而可进行拷贝拓展已至，shape匹配）

比如求和的时候有：

python 复制代码

Image (3d array):  256 x 256 x 3
Scale (1d array):              3
Result (3d array): 256 x 256 x 3

A      (4d array):  8 x 1 x 6 x 1
B      (3d array):      7 x 1 x 5
Result (4d array):  8 x 7 x 6 x 5

A      (2d array):  5 x 4
B      (1d array):      1
Result (2d array):  5 x 4

A      (2d array):  15 x 3 x 5
B      (1d array):  15 x 1 x 5
Result (2d array):  15 x 3 x 5

下面是一些 broadcasting 的例子:

我们来理解一下broadcasting的这种用法

先把v变形成3x1的数组/矩阵，然后就可以broadcasting加在w上了:

python 复制代码

v = np.array([1,2,3])
w = np.array([4,5])
print(v.shape, w.shape)

scss 复制代码

(3,) (2,)

python 复制代码

v = v.reshape(3,1)
print(v.shape)

scss 复制代码

(3, 1)

python 复制代码

v + w

lua 复制代码

array([[5, 6],
       [6, 7],
       [7, 8]])

那如果要把一个矩阵的每一行都加上一个向量呢

python 复制代码

x = np.array([[1,2,3], [4,5,6]])
v = np.array([1,2,3])
print(x + v)

lua 复制代码

[[2 4 6]
 [5 7 9]]

python 复制代码

x = np.array([[1,2,3], [4,5,6]]) # 2x3的
w = np.array([4,5]) # 2

python 复制代码

(x.T + w).T

lua 复制代码

array([[ 5,  6,  7],
       [ 9, 10, 11]])

上面那个操作太复杂了，其实我们可以直接这么做嘛

python 复制代码

x + np.reshape(w, (2,1))

lua 复制代码

array([[ 5,  6,  7],
       [ 9, 10, 11]])

broadcasting当然可以逐元素运算了

总结一下broadcasting，可以看看下面的图：

逻辑运算

python 复制代码

x_arr = np.array([1.1, 1.2, 1.3, 1.4, 1.5])
y_arr = np.array([2.1, 2.2, 2.3, 2.4, 2.5])

python 复制代码

cond = np.array([True, False, True, True, False])

python 复制代码

print(np.where(cond, x_arr, y_arr))

css 复制代码

[ 1.1  2.2  1.3  1.4  2.5]

python 复制代码

arr = np.random.randn(4,4)
print(arr)

lua 复制代码

[[ 0.94534017  0.04955986  0.20184702  0.40272432]
 [-0.20872753 -0.04952711  0.0059752  -1.9356753 ]
 [ 1.20042485 -0.92834374  1.57810979 -0.96378859]
 [-2.09492281 -0.38678213 -0.67656147  1.45629059]]

python 复制代码

arr > 0

python 复制代码

array([[ True,  True,  True,  True],
       [False, False,  True, False],
       [ True, False,  True, False],
       [False, False, False,  True]], dtype=bool)

python 复制代码

print(np.where(arr > 0, 1,-1))

lua 复制代码

[[ 1  1  1  1]
 [-1 -1  1 -1]
 [ 1 -1  1 -1]
 [-1 -1 -1  1]]

python 复制代码

print(np.where(arr > 0, 1,arr))

lua 复制代码

[[ 1.          1.          1.          1.        ]
 [-0.20872753 -0.04952711  1.         -1.9356753 ]
 [ 1.         -0.92834374  1.         -0.96378859]
 [-2.09492281 -0.38678213 -0.67656147  1.        ]]

一些更高级的ndarray处理

使用reshape来改变tensor的形状

numpy可以很容易地把一维数组转成二维数组，三维数组。

python 复制代码

arr = np.arange(8)
print(arr.shape)

scss 复制代码

(8,)

python 复制代码

arr.reshape(2,4)

lua 复制代码

array([[0, 1, 2, 3],
       [4, 5, 6, 7]])

python 复制代码

arr.reshape(2,2,2)

lua 复制代码

array([[[0, 1],
        [2, 3]],

       [[4, 5],
        [6, 7]]])

python 复制代码

arr = np.arange(15)
print(arr.reshape(5,3).shape)

scss 复制代码

(5, 3)

python 复制代码

print(arr.reshape(5,-1).shape)

scss 复制代码

(5, 3)

如果我们在某一个维度上写上-1，numpy会帮我们自动推导出正确的维度

还可以从其他的ndarray中获取shape信息然后reshape

python 复制代码

other_arr = np.ones((3,5))
print(other_arr.shape) # tuple

scss 复制代码

(3, 5)

python 复制代码

arr.reshape(other_arr.shape)

lua 复制代码

array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14]])

python 复制代码

高维数组可以用ravel来拉平

python 复制代码

arr = arr.reshape(other_arr.shape)
print(arr)

lua 复制代码

[[ 0  1  2  3  4]
 [ 5  6  7  8  9]
 [10 11 12 13 14]]

python 复制代码

arr.ravel().shape

scss 复制代码

(15,)

连接两个二维数组

python 复制代码

arr1 = np.array([[1, 2, 3], [4, 5, 6]])
arr2 = np.array([[7, 8, 9], [10, 11, 12]])
print(arr1, "\n\n", arr2)

lua 复制代码

[[1 2 3]
 [4 5 6]] 

 [[ 7  8  9]
 [10 11 12]]

python 复制代码

np.concatenate([arr1, arr2], axis=0)

lua 复制代码

array([[ 1,  2,  3],
       [ 4,  5,  6],
       [ 7,  8,  9],
       [10, 11, 12]])

python 复制代码

np.concatenate([arr1, arr2], axis=1)

lua 复制代码

array([[ 1,  2,  3,  7,  8,  9],
       [ 4,  5,  6, 10, 11, 12]])

所谓堆叠，参考叠盘子。。。连接的另一种表述垂直stack与水平stack

python 复制代码

np.vstack((arr1, arr2)) # vertical

lua 复制代码

array([[ 1,  2,  3],
       [ 4,  5,  6],
       [ 7,  8,  9],
       [10, 11, 12]])

python 复制代码

np.hstack((arr1, arr2)) # horizontal

lua 复制代码

array([[ 1,  2,  3,  7,  8,  9],
       [ 4,  5,  6, 10, 11, 12]])

拆分数组

python 复制代码

arr = np.random.rand(5,5)
print(arr)

css 复制代码

[[ 0.80642843  0.43253953  0.24511404  0.21328645  0.50991311]
 [ 0.19373378  0.72169396  0.05192132  0.1746048   0.69771988]
 [ 0.59689743  0.82253158  0.03346062  0.9002945   0.03960687]
 [ 0.06061257  0.27390675  0.19740262  0.76815388  0.02035703]
 [ 0.58031701  0.63341072  0.75286027  0.82066801  0.24301514]]

python 复制代码

first, second, third = np.split(arr, [1,3], axis=0)
print(first, '\n\n', second, '\n\n', third)

lua 复制代码

[[ 0.80642843  0.43253953  0.24511404  0.21328645  0.50991311]] 

 [[ 0.19373378  0.72169396  0.05192132  0.1746048   0.69771988]
 [ 0.59689743  0.82253158  0.03346062  0.9002945   0.03960687]] 

 [[ 0.06061257  0.27390675  0.19740262  0.76815388  0.02035703]
 [ 0.58031701  0.63341072  0.75286027  0.82066801  0.24301514]]

堆叠辅助

python 复制代码

arr = np.arange(6)
arr1 = arr.reshape((3, 2))
arr2 = np.random.randn(3, 2)
print(arr1)
print(arr2)

lua 复制代码

[[0 1]
 [2 3]
 [4 5]]
[[-0.60392123 -0.1769936 ]
 [ 0.46523138  0.71963034]
 [-0.51733042  1.50108329]]

r_用于按行堆叠

python 复制代码

print(np.r_[arr1, arr2])
print()

css 复制代码

[[ 0.          1.        ]
 [ 2.          3.        ]
 [ 4.          5.        ]
 [-0.60392123 -0.1769936 ]
 [ 0.46523138  0.71963034]
 [-0.51733042  1.50108329]]

c_用于按列堆叠

python 复制代码

print(np.c_[np.r_[arr1, arr2], arr])
print()

css 复制代码

[[ 0.          1.          0.        ]
 [ 2.          3.          1.        ]
 [ 4.          5.          2.        ]
 [-0.60392123 -0.1769936   3.        ]
 [ 0.46523138  0.71963034  4.        ]
 [-0.51733042  1.50108329  5.        ]]

切片直接转为数组

python 复制代码

np.c_[1:6, -5:0]

css 复制代码

array([[ 1, -5],
       [ 2, -4],
       [ 3, -3],
       [ 4, -2],
       [ 5, -1]])

使用repeat来重复

python 复制代码

arr = np.arange(3)
print(arr)

csharp 复制代码

[0 1 2]

python 复制代码

print(arr.repeat(3))

csharp 复制代码

[0 0 0 1 1 1 2 2 2]

python 复制代码

print(arr.repeat([2,3,5]))

csharp 复制代码

[0 0 1 1 1 2 2 2 2 2]

按元素重复

指定axis来重复

python 复制代码

arr = np.random.rand(2,2)
print(arr)

lua 复制代码

[[ 0.9682022   0.99265567]
 [ 0.62174828  0.12614083]]

python 复制代码

print(arr.repeat(2, axis=0))

lua 复制代码

[[ 0.90063544  0.36862431  0.46734451]
 [ 0.90063544  0.36862431  0.46734451]
 [ 0.61467785  0.63962631  0.61288228]
 [ 0.61467785  0.63962631  0.61288228]]

python 复制代码

print(arr.repeat(2, axis=1))

lua 复制代码

[[ 0.90063544  0.90063544  0.36862431  0.36862431  0.46734451  0.46734451]
 [ 0.61467785  0.61467785  0.63962631  0.63962631  0.61288228  0.61288228]]

Tile: 参考贴瓷砖 numpy tile

python 复制代码

print(arr)
print()
print(np.tile(arr, 2))

lua 复制代码

[[ 0.9682022   0.99265567]
 [ 0.62174828  0.12614083]]

[[ 0.9682022   0.99265567  0.9682022   0.99265567]
 [ 0.62174828  0.12614083  0.62174828  0.12614083]]

python 复制代码

print(np.tile(arr, (2,3)))

lua 复制代码

[[ 0.9682022   0.99265567  0.9682022   0.99265567  0.9682022   0.99265567]
 [ 0.62174828  0.12614083  0.62174828  0.12614083  0.62174828  0.12614083]
 [ 0.9682022   0.99265567  0.9682022   0.99265567  0.9682022   0.99265567]
 [ 0.62174828  0.12614083  0.62174828  0.12614083  0.62174828  0.12614083]]

numpy的文件输入输出

读取csv文件作为数组

python 复制代码

arr = np.loadtxt('array_ex.txt', delimiter=',')
print(arr)

css 复制代码

[[ 0.580052  0.18673   1.040717  1.134411]
 [ 0.194163 -0.636917 -0.938659  0.124094]
 [-0.12641   0.268607 -0.695724  0.047428]
 [-1.484413  0.004176 -0.744203  0.005487]
 [ 2.302869  0.200131  1.670238 -1.88109 ]
 [-0.19323   1.047233  0.482803  0.960334]]

数组文件读写

python 复制代码

arr = np.arange(50).reshape(2,5,5)
print(arr)

css 复制代码

[[[ 0  1  2  3  4]
  [ 5  6  7  8  9]
  [10 11 12 13 14]
  [15 16 17 18 19]
  [20 21 22 23 24]]

 [[25 26 27 28 29]
  [30 31 32 33 34]
  [35 36 37 38 39]
  [40 41 42 43 44]
  [45 46 47 48 49]]]

python 复制代码

np.save('some_array', arr)

python 复制代码

arr2 = np.load('some_array.npy')
print(arr2)

css 复制代码

[[[ 0  1  2  3  4]
  [ 5  6  7  8  9]
  [10 11 12 13 14]
  [15 16 17 18 19]
  [20 21 22 23 24]]

 [[25 26 27 28 29]
  [30 31 32 33 34]
  [35 36 37 38 39]
  [40 41 42 43 44]
  [45 46 47 48 49]]]

python 复制代码

arr3 = np.arange(15).reshape(3,5)
np.savez("array_archive.npz", arr=arr, b=arr2, c=arr3)

python 复制代码

arch = np.load('array_archive.npz')
print(arch['arr'])

css 复制代码

[[[ 0  1  2  3  4]
  [ 5  6  7  8  9]
  [10 11 12 13 14]
  [15 16 17 18 19]
  [20 21 22 23 24]]

 [[25 26 27 28 29]
  [30 31 32 33 34]
  [35 36 37 38 39]
  [40 41 42 43 44]
  [45 46 47 48 49]]]

python 复制代码

arch['b']

css 复制代码

array([[[ 0,  1,  2,  3,  4],
        [ 5,  6,  7,  8,  9],
        [10, 11, 12, 13, 14],
        [15, 16, 17, 18, 19],
        [20, 21, 22, 23, 24]],

       [[25, 26, 27, 28, 29],
        [30, 31, 32, 33, 34],
        [35, 36, 37, 38, 39],
        [40, 41, 42, 43, 44],
        [45, 46, 47, 48, 49]]])

python 复制代码

arch['c']

lua 复制代码

array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14]])

python 复制代码

多个数组可以一起压缩存储

随堂小项目

用numpy写一个softmax

什么是softmax?

计算exponential
按行求和
每一行都要除以计算的和

python 复制代码

m = np.random.rand(10,10) * 10 + 1000
print(m)

yaml 复制代码

[[ 1008.64304012  1001.25079229  1006.81896868  1005.89015258  1008.8915297
   1001.84923866  1005.53509734  1005.34075305  1008.93404709
   1006.94897664]
 [ 1003.24267825  1003.72710741  1000.28354398  1000.32012105  1004.3690361
   1007.18390602  1002.49741606  1005.83510332  1009.19678396
   1002.32098566]
 [ 1002.32824002  1006.2813999   1009.27645662  1002.57259159
   1006.30743627  1000.35201323  1003.94430099  1008.79056869
   1007.40485841  1006.38239542]
 [ 1007.06228714  1006.01325352  1007.96901864  1002.34269542
   1000.75563221  1005.26357317  1006.14861174  1005.68119044
   1000.69006453  1007.21834125]
 [ 1004.15770428  1003.0554848   1005.55619032  1003.04000025
   1005.54338468  1002.23952638  1008.86317857  1006.96983789
   1005.84232318  1009.28833837]
 [ 1008.47151667  1006.30354927  1006.69274016  1004.12418543
   1007.17550972  1004.31758292  1007.27760499  1007.45250445
   1000.02943239  1002.25886446]
 [ 1000.63764781  1003.39894276  1008.26298759  1001.89295012
   1007.85388369  1004.67565255  1004.58872708  1003.24488815
   1000.39528914  1007.20964465]
 [ 1005.21815308  1007.42651355  1006.32407717  1003.0096329   1005.03545902
   1008.85925437  1009.57634418  1003.74546024  1003.40512867  1004.4437606 ]
 [ 1001.78786625  1008.73282377  1003.98906267  1008.17533941
   1002.79957584  1000.89332666  1007.64343999  1003.88248211
   1005.75517566  1008.27556001]
 [ 1002.05916059  1007.25663392  1009.48655775  1009.56831564
   1008.28488062  1004.92593854  1008.0468565   1007.53278621
   1001.94935121  1007.01473574]]

python 复制代码

# np.exp(m)
m_row_max = m.max(axis=1).reshape(10,1)
print(m_row_max, m_row_max.shape)

scss 复制代码

[[ 1008.93404709]
 [ 1009.19678396]
 [ 1009.27645662]
 [ 1007.96901864]
 [ 1009.28833837]
 [ 1008.47151667]
 [ 1008.26298759]
 [ 1009.57634418]
 [ 1008.73282377]
 [ 1009.56831564]] (10, 1)

python 复制代码

m = m - m_row_max
print(m)

css 复制代码

[[-0.29100696 -7.6832548  -2.11507841 -3.04389451 -0.04251738 -7.08480843  -3.39894975 -3.59329403  0.         -1.98507045]
 [-5.95410571 -5.46967655 -8.91323998 -8.87666291 -4.82774786 -2.01287794  -6.6993679  -3.36168065  0.         -6.8757983 ]
 [-6.9482166  -2.99505673  0.         -6.70386503 -2.96902035 -8.9244434  -5.33215563 -0.48588793 -1.87159821 -2.8940612 ]
 [-0.9067315  -1.95576512  0.         -5.62632322 -7.21338643 -2.70544547  -1.82040689 -2.2878282  -7.27895411 -0.75067739]
 [-5.13063409 -6.23285357 -3.73214805 -6.24833812 -3.74495369 -7.04881199  -0.4251598  -2.31850048 -3.44601518  0.        ]
 [ 0.         -2.16796739 -1.77877651 -4.34733123 -1.29600694 -4.15393374  -1.19391168 -1.01901222 -8.44208427 -6.21265221]
 [-7.62533977 -4.86404483  0.         -6.37003747 -0.4091039  -3.58733504  -3.67426051 -5.01809944 -7.86769844 -1.05334294]
 [-4.3581911  -2.14983063 -3.25226701 -6.56671127 -4.54088515 -0.71708981   0.         -5.83088393 -6.1712155  -5.13258358]
 [-6.94495753  0.         -4.7437611  -0.55748437 -5.93324793 -7.83949711  -1.08938379 -4.85034166 -2.97764811 -0.45726376]
 [-7.50915505 -2.31168172 -0.08175789  0.         -1.28343501 -4.6423771  -1.52145914 -2.03552943 -7.61896443 -2.5535799 ]]

python 复制代码

m_exp = np.exp(m)
print(m_exp, m_exp.shape)

scss 复制代码

[[  7.47510474e-01   4.60473707e-04   1.20623832e-01   4.76489585e-02    9.58373807e-01   8.37735258e-04   3.34083387e-02   2.75075701e-02    1.00000000e+00   1.37370936e-01]
 [  2.59516363e-03   4.21259451e-03   1.34595041e-04   1.39609277e-04    8.00452828e-03   1.33603617e-01   1.23169021e-03   3.46769303e-02    1.00000000e+00   1.03247308e-03]
 [  9.60346309e-04   5.00337887e-02   1.00000000e+00   1.22616357e-03    5.13535942e-02   1.33095533e-04   4.83363924e-03   6.15150742e-01    1.53877536e-01   5.53509641e-02]
 [  4.03842028e-01   1.41456204e-01   1.00000000e+00   3.60179403e-03    7.36658284e-04   6.68405420e-02   1.61959837e-01   1.01486632e-01    6.89906753e-04   4.72046684e-01]
 [  5.91281003e-03   1.96383995e-03   2.39413532e-02   1.93366498e-03    2.36367236e-02   8.68440058e-04   6.53665320e-01   9.84210597e-02    3.18723893e-02   1.00000000e+00]
 [  1.00000000e+00   1.14409931e-01   1.68844601e-01   1.29413038e-02    2.73622203e-01   1.57025251e-02   3.03033573e-01   3.60951306e-01    2.15600312e-04   2.00391561e-03]
 [  4.87929429e-04   7.71919784e-03   1.00000000e+00   1.71209508e-03    6.64245216e-01   2.76719769e-02   2.53681582e-02   6.61709096e-03    3.82914649e-04   3.48769883e-01]
 [  1.28015234e-02   1.16503889e-01   3.86864061e-02   1.40641513e-03    1.06639631e-02   4.88170860e-01   1.00000000e+00   2.93548106e-03    2.08869564e-03   5.90129429e-03]
 [  9.63481254e-04   1.00000000e+00   8.70584098e-03   5.72647825e-01    2.64986143e-03   3.93867063e-04   3.36423738e-01   7.82570334e-03    5.09124334e-02   6.33013352e-01]
 [  5.48043962e-04   9.90944624e-02   9.21495038e-01   1.00000000e+00    2.77083877e-01   9.63476759e-03   2.18392989e-01   1.30611314e-01    4.91050084e-04   7.78026413e-02]] (10, 10)

python 复制代码

m_exp_row_sum = m_exp.sum(axis=1).reshape(10,1)
print(m_exp_row_sum, m_exp_row_sum.shape)

scss 复制代码

[[ 3.07374213]
 [ 1.1856312 ]
 [ 1.93291987]
 [ 2.35266029]
 [ 1.8422156 ]
 [ 2.25172496]
 [ 2.08297446]
 [ 1.67915853]
 [ 2.6135361 ]
 [ 2.73515418]] (10, 1)

python 复制代码

m_softmax = m_exp / m_exp_row_sum
print(m_softmax)

css 复制代码

[[  2.43192319e-01   1.49808829e-04   3.92433154e-02   1.55019376e-02    3.11793823e-01   2.72545719e-04   1.08689465e-02   8.94921207e-03    3.25336336e-01   4.46917571e-02]
 [  2.18884559e-03   3.55303952e-03   1.13521845e-04   1.17751015e-04    6.75128005e-03   1.12685645e-01   1.03884767e-03   2.92476533e-02    8.43432594e-01   8.70821452e-04]
 [  4.96837103e-04   2.58850817e-02   5.17352020e-01   6.34358200e-04    2.65678857e-02   6.88572427e-05   2.50069303e-03   3.18249479e-01    7.96088542e-02   2.86359331e-02]
 [  1.71653353e-01   6.01260646e-02   4.25050742e-01   1.53094522e-03    3.13117150e-04   2.84106220e-02   6.88411489e-02   4.31369681e-02    2.93245377e-04   2.00643793e-01]
 [  3.20961891e-03   1.06602069e-03   1.29959562e-02   1.04964098e-03    1.28305957e-02   4.71410652e-04   3.54825635e-01   5.34253752e-02    1.73011179e-02   5.42824629e-01]
 [  4.44103973e-01   5.08099049e-02   7.49845582e-02   5.74728445e-03    1.21516707e-01   6.97355379e-03   1.34578414e-01   1.60299909e-01    9.57489550e-05   8.89946885e-04]
 [  2.34246477e-04   3.70585333e-03   4.80082698e-01   8.21947224e-04    3.18892636e-01   1.32848374e-02   1.21788138e-02   3.17675088e-03    1.83830698e-04   1.67438386e-01]
 [  7.62377296e-03   6.93823048e-02   2.30391624e-02   8.37571382e-04    6.35077805e-03   2.90723509e-01   5.95536385e-01   1.74818578e-03    1.24389425e-03   3.51443547e-03]
 [  3.68650448e-04   3.82623373e-01   3.33105824e-03   2.19108442e-01    1.01389892e-03   1.50702744e-04   1.28723586e-01   2.99429701e-03    1.94802870e-02   2.42205704e-01]
 [  2.00370409e-04   3.62299365e-02   3.36907895e-01   3.65610102e-01    1.01304664e-01   3.52256836e-03   7.98466830e-02   4.77528160e-02    1.79532871e-04   2.84454316e-02]]

python 复制代码

print(m_softmax.sum(axis=1))

css 复制代码

[ 1.  1.  1.  1.  1.  1.  1.  1.  1.  1.]

更多的numpy细节和用法可以查看一下官网numpy指南

Python数据分析：numpy-tutorial-student

numpy基础

Numpy简介

目录

Arrays/数组

Array indexing/数组取值和赋值

Datatypes

数学运算

Broadcasting

逻辑运算

一些更高级的ndarray处理

使用reshape来改变tensor的形状

连接两个二维数组

numpy的文件输入输出

随堂小项目