numpy基础
Numpy简介
- Numpy是Python语言的一个library numpy
- Numpy主要支持矩阵操作和运算
- Numpy非常高效,core代码由C语言写成
- 我们第三课要讲的pandas也是基于Numpy构建的一个library
- 现在比较流行的机器学习框架(例如Tensorflow/PyTorch等等),语法都与Numpy比较接近
目录
- 数组简介和数组的构造(ndarray)
- 数组取值和赋值
- 数学运算
- broadcasting
- 逻辑运算
- 数组高级操作
- 文件输入输出
- 随堂小项目:用Numpy写一个Softmax
python里面调用一个包,用import对吧, 所以我们import numpy
包:
python
import numpy as np
Arrays/数组
看你数组的维度啦,我自己的话比较简单粗暴,一般直接把1维数组就看做向量/vector,2维数组看做2维矩阵,3维数组看做3维矩阵...
可以调用np.array去从list初始化一个数组:
python
a = np.array([1, 2, 3])
print(a)
csharp
[1 2 3]
python
print(type(a))
arduino
<class 'numpy.ndarray'>
python
type([1,2,3])
list
python
a[2]
3
python
a[0] = 5
python
print(a)
csharp
[5 2 3]
python
b = np.array([[1,2,3], [2,3,4]])
print(b)
lua
[[1 2 3]
[2 3 4]]
python
print(type(b))
arduino
<class 'numpy.ndarray'>
python
print(b.shape)
scss
(2, 3)
python
print(b[0,2])
3
ndarray = n dimensional array
python
有一些内置的创建数组的函数:
python
a = np.zeros((2,3))
print(a)
lua
[[ 0. 0. 0.]
[ 0. 0. 0.]]
python
b = np.ones((1,2))
print(b)
lua
[[ 1. 1.]]
python
c = np.full((2,2), 8)
print(c)
lua
[[8 8]
[8 8]]
python
d = np.eye(3)
print(d)
lua
[[ 1. 0. 0.]
[ 0. 1. 0.]
[ 0. 0. 1.]]
python
e = np.random.random((3,2))
print(e)
lua
[[ 0.34222445 0.254503 ]
[ 0.07192044 0.39303621]
[ 0.64905403 0.77977616]]
python
f = np.empty((2,3,4))
print(f)
lua
[[[ 8.09661985e-312 7.21335843e-322 0.00000000e+000 0.00000000e+000]
[ 1.29061414e-306 1.16095484e-028 5.28595592e-085 1.01445441e+242]
[ 2.93623251e-062 1.75300433e+243 4.25136003e-096 1.04716878e-142]]
[[ 4.54813897e-144 5.83001600e+199 7.48468178e+251 5.04621362e+180]
[ 7.49779533e+247 3.88625532e+285 2.02647441e+267 1.16317829e-028]
[ 8.76739488e+252 1.14011198e+243 5.54175224e+257 8.34402698e-308]]]
python
print(f.shape)
scss
(2, 3, 4)
python
g = np.arange(15)
print(g)
css
[ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14]
python
print(g.shape)
scss
(15,)
python
type((2,3))
arduino
tuple
数组可以有不同的数据类型
python
arr = np.array([1,2,3])
print(arr.dtype)
go
int32
python
arr = np.array([1,2,3], dtype=np.float64)
print(arr.dtype)
go
float64
python
arr
scss
array([ 1., 2., 3.])
python
arr = np.array([1,2,3], dtype=np.int64)
print(arr.dtype)
go
int64
生成数组时可以指定数据类型,如果不指定numpy会自动匹配合适的类型
使用astype复制数组并转换数据类型
python
int_arr = np.array([1,2,3,4,5])
print(int_arr, int_arr.dtype)
csharp
[1 2 3 4 5] int32
python
float_arr = int_arr.astype(np.float64)
print(float_arr.dtype, float_arr)
css
float64 [ 1. 2. 3. 4. 5.]
使用astype将float转换为int时小数部分被舍弃
python
float_arr = np.array([3.5,2.3,4.8,-2.2])
print(float_arr)
css
[ 3.5 2.3 4.8 -2.2]
python
int_arr = float_arr.astype(np.int64)
print(int_arr, int_arr.dtype)
css
[ 3 2 4 -2] int64
使用astype把字符串转换为数组,如果失败抛出异常。
python
str_arr = np.array(['1.24', '2.2', '5.8', 'asas'], dtype=np.string_)
str_arr
php
array([b'1.24', b'2.2', b'5.8', b'asas'],
dtype='|S4')
python
float_arr = str_arr.astype(dtype=np.float)
print(float_arr)
sql
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-61-1d45899cf6ad> in <module>()
----> 1 float_arr = str_arr.astype(dtype=np.float)
2 print(float_arr)
ValueError: could not convert string to float: 'asas'
astype使用其它数组的数据类型作为参数
python
int_arr = np.arange(10)
float_arr = np.array([2.3,4.6,9.8])
print(float_arr.dtype, int_arr.dtype)
go
float64 int32
python
int_arr.astype(dtype=float_arr.dtype)
scss
array([ 0., 1., 2., 3., 4., 5., 6., 7., 8., 9.])
python
Array indexing/数组取值和赋值
Numpy提供了蛮多种取值的方式的.
python
a = np.array([[1,2,3,4], [5,6,7,8], [9,10,11,12]])
print(a)
print(a.shape)
lua
[[ 1 2 3 4]
[ 5 6 7 8]
[ 9 10 11 12]]
(3, 4)
可以像list一样切片(多维数组可以从各个维度同时切片):
python
b = a[0:2,2:4].copy()
print(b)
lua
[[3 4]
[7 8]]
虽然,怎么说呢,不建议你这样去赋值,但是你确实可以修改切片出来的对象,然后完成对原数组的赋值.
python
b[0,0] = 111111
print(b)
print(a)
lua
[[111111 4]
[ 7 8]]
[[ 1 2 3 4]
[ 5 6 7 8]
[ 9 10 11 12]]
创建3x4的2维数组/矩阵
你就放心大胆地去取你想要的数咯:
python
row_r1 = a[1,:]
print(row_r1, row_r1.shape)
scss
[5 6 7 8] (4,)
python
row_r2 = a[1:2, :]
print(row_r2, row_r2.shape)
lua
[[5 6 7 8]] (1, 4)
python
row_r3 = a[[1], :]
print(row_r3, row_r3.shape)
lua
[[5 6 7 8]] (1, 4)
python
arduino
File "<ipython-input-92-0f96395e9759>", line 1
[1:2]
^
SyntaxError: invalid syntax
试试在第2个维度上切片也一样的:
python
a
lua
array([[ 1, 2, 3, 4],
[ 5, 6, 7, 8],
[ 9, 10, 11, 12]])
python
col_r1 = a[:, 1]
print(col_r1, col_r1.shape)
scss
[ 2 6 10] (3,)
python
col_r2 = a[:, 1:2]
print(col_r2, col_r2.shape)
lua
[[ 2]
[ 6]
[10]] (3, 1)
下面这个高级了,更自由地取值和组合,但是要看清楚一点:
python
a = np.array([[1,2], [3, 4], [5, 6]])
print(a)
lua
[[1 2]
[3 4]
[5 6]]
python
print(a[[0,1,2], [0,1,0]])
print(a[[0,1,2], [0,1,0]].shape)
scss
[1 4 5]
(3,)
python
print(np.array([a[0,0], a[1,1], a[2,0]]))
csharp
[1 4 5]
python
print(a)
print(a[[0,0], [1,0]])
print(np.array([a[0,0], a[0,1]]))
css
[[1 2]
[3 4]
[5 6]]
[2 1]
[1 2]
再来熟悉一下
先创建一个2维数组
python
a = np.array([[1,2,3], [4,5,6], [7,8,9], [10, 11, 12]])
print(a)
lua
[[ 1 2 3]
[ 4 5 6]
[ 7 8 9]
[10 11 12]]
用下标生成一个向量
python
b = np.array([0, 2,0,1])
你能看明白下面做的事情吗?
python
a[np.arange(4), b]
scss
array([ 1, 6, 7, 11])
既然可以取出来,我们当然也可以对这些元素操作咯
python
a[np.arange(4), b] += 10
print(a)
lua
[[11 2 3]
[ 4 5 16]
[17 8 9]
[10 21 12]]
比较fashion的取法之一,用条件判定去取(但是很好用):
python
a = np.array([[1,2], [3, 4], [5, 6]])
print(a)
lua
[[1 2]
[3 4]
[5 6]]
python
bool_index = (a > 2)
print(bool_index)
python
[[False False]
[ True True]
[ True True]]
用刚才的布尔型数组作为下标就可以去除符合条件的元素啦
python
print(a[bool_index].shape)
scss
(4,)
python
其实一句话也可以完成是不是?
python
print(a[a>2])
csharp
[3 4 5 6]
那个,真的,其实还有很多细节,其他的方式去取值,你可以看看官方文档。
我们一起来来总结一下,看下面切片取值方式(对应颜色是取出来的结果):
Datatypes
我们可以用dtype来看numpy数组中元素的类型:
python
x = np.array([1, 2]) # numpy构建数组的时候自己会确定类型
y = np.array([1.0, 2.0])
z = np.array([1, 2], dtype=np.int64)
python
print(x.dtype, y.dtype, z.dtype)
go
int32 float64 int64
更多的内容可以读读文档.
数学运算
下面这些运算才是你在科学运算中经常经常会用到的,比如逐个元素的运算如下:
python
x = np.array([[1,2],[3,4]], dtype=np.float64)
y = np.array([[5,6],[7,8]], dtype=np.float64)
python
print(x)
print(y)
lua
[[ 1. 2.]
[ 3. 4.]]
[[ 5. 6.]
[ 7. 8.]]
python
print(x.shape)
print(y.shape)
scss
(2, 2)
(2, 2)
逐元素求和有下面2种方式
python
x+y
lua
array([[ 6., 8.],
[ 10., 12.]])
python
np.add(x,y)
lua
array([[ 6., 8.],
[ 10., 12.]])
逐元素作差
python
x-y
lua
array([[-4., -4.],
[-4., -4.]])
python
np.subtract(x,y)
lua
array([[-4., -4.],
[-4., -4.]])
逐元素相乘
python
x*y
lua
array([[ 5., 12.],
[ 21., 32.]])
python
np.multiply(x,y)
lua
array([[ 5., 12.],
[ 21., 32.]])
逐元素相除
python
x/y
lua
array([[ 0.2 , 0.33333333],
[ 0.42857143, 0.5 ]])
python
np.divide(x, y)
lua
array([[ 0.2 , 0.33333333],
[ 0.42857143, 0.5 ]])
python
np.sqrt(x)
lua
array([[ 1. , 1.41421356],
[ 1.73205081, 2. ]])
逐元素求平方根!!!
那如果我要做矩阵的乘法运算怎么办!!!恩,别着急,照着下面写就可以了:
python
v = np.array([9,10])
w = np.array([10,11])
print(v.shape)
scss
(2,)
求向量内积
python
v.dot(w)
200
python
np.dot(v,w)
200
python
矩阵的乘法
python
x = np.array([[1,2], [3,4]])
y = np.array([[5,6], [7,8]])
print(x)
print()
print(y)
lua
[[1 2]
[3 4]]
[[5 6]
[7 8]]
python
v
scss
array([ 9, 10])
python
print(x.dot(v))
csharp
[29 67]
python
np.dot(x, v)
scss
array([29, 67])
python
x.dot(y)
lua
array([[19, 22],
[43, 50]])
python
np.dot(x,y)
lua
array([[19, 22],
[43, 50]])
转置和数学公式一样,简单粗暴
python
x
lua
array([[1, 2],
[3, 4]])
python
x.T
lua
array([[1, 3],
[2, 4]])
需要说明一下,1维的vector转置还是自己
python
v.shape
scss
(2,)
python
v.T.shape
scss
(2,)
2维的就不一样了
python
w = np.array([[1,2,3]])
print(w, w.shape)
lua
[[1 2 3]] (1, 3)
python
print(w.T)
lua
[[1]
[2]
[3]]
python
利用转置矩阵做dot product
python
arr = np.random.randn(6,3)
arr
css
array([[-0.14026475, -0.03781998, -0.27386714],
[-0.72247734, -1.02956336, -0.25832996],
[-0.12185501, 1.52968734, -1.02428656],
[-0.46462044, -0.67144009, 0.14399562],
[ 0.39160306, 0.26127618, 0.01423764],
[ 1.27525835, 0.61498609, 0.45733376]])
python
print(arr.T.dot(arr))
lua
[[ 2.55200533 1.76128844 0.87175679]
[ 1.76128844 4.29867937 -1.10222404]
[ 0.87175679 -1.10222404 1.42099216]]
python
print(np.dot(arr,arr))
erlang
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-158-8f8e98f47ed1> in <module>()
----> 1 print(np.dot(arr,arr))
ValueError: shapes (6,3) and (6,3) not aligned: 3 (dim 1) != 6 (dim 0)
高维的tensor也可以做转置
python
arr = np.arange(16).reshape(2,2,4)
print(arr, arr.shape)
lua
[[[ 0 1 2 3]
[ 4 5 6 7]]
[[ 8 9 10 11]
[12 13 14 15]]] (2, 2, 4)
python
print(arr.transpose((1,0,2)))
lua
[[[ 0 1 2 3]
[ 8 9 10 11]]
[[ 4 5 6 7]
[12 13 14 15]]]
python
print(arr.transpose((0,2,1)))
lua
[[[ 0 4]
[ 1 5]
[ 2 6]
[ 3 7]]
[[ 8 12]
[ 9 13]
[10 14]
[11 15]]]
python
print(arr.transpose((2,1,0)))
lua
[[[ 0 8]
[ 4 12]]
[[ 1 9]
[ 5 13]]
[[ 2 10]
[ 6 14]]
[[ 3 11]
[ 7 15]]]
python
print(arr.swapaxes(1,2))
lua
[[[ 0 4]
[ 1 5]
[ 2 6]
[ 3 7]]
[[ 8 12]
[ 9 13]
[10 14]
[11 15]]]
python
x = np.arange(24).reshape(2,3,4)
y = np.arange(8).reshape(4,2)
print(x)
print()
print(y)
lua
[[[ 0 1 2 3]
[ 4 5 6 7]
[ 8 9 10 11]]
[[12 13 14 15]
[16 17 18 19]
[20 21 22 23]]]
[[0 1]
[2 3]
[4 5]
[6 7]]
python
print(np.matmul(x,y).shape)
scss
(2, 3, 2)
python
print(np.dot(x,y).shape)
scss
(2, 3, 2)
python
x = np.arange(24).reshape(2,3,4)
y = np.arange(16).reshape(2,4,2)
print(x.dot(y).shape)
scss
(2, 3, 2, 2)
python
np.matmul(x,y).shape
scss
(2, 3, 2)
你猜你做科学运算会最常用到的矩阵内元素的运算是什么?对啦,是求和,用 sum
可以完成:
python
x= np.array([[1,2], [3,4]])
print(x)
lua
[[1 2]
[3 4]]
python
print(np.sum(x))
print(x.sum())
10
10
python
print(np.sum(x, axis=0))
csharp
[4 6]
python
print(np.sum(x, axis=1))
csharp
[3 7]
python
print(np.mean(x))
print(np.mean(x, axis=0))
print(np.mean(x, axis=1))
css
2.5
[ 2. 3.]
[ 1.5 3.5]
python
还有一些其他我们可以想到的运算,比如求和,求平均,求cumulative sum,sumulative product用numpy都可以做到
python
print(x.cumsum(axis=0))
print(x.cumsum(axis=1))
lua
[[1 2]
[4 6]]
[[1 3]
[3 7]]
python
print(x.cumprod(axis=0))
print(x.cumprod(axis=1))
lua
[[1 2]
[3 8]]
[[ 1 2]
[ 3 12]]
我想说最基本的运算就是上面这个样子,更多的运算可能得查查文档.
其实除掉基本运算,我们经常还需要做一些操作,比如矩阵的变形,转置和重排等等:
一维数组的排序
python
arr = np.random.randn(8) * 10
print(arr)
css
[-15.78997677 -12.10676988 0.51148087 -9.8265767 4.83704543 -2.74087509 11.02576343 3.51866316]
python
arr.sort()
print(arr)
css
[-15.78997677 -12.10676988 -9.8265767 -2.74087509 0.51148087 3.51866316 4.83704543 11.02576343]
二维数组也可以在某些维度上排序
python
arr = np.random.randn(5,3) * 10
print(arr)
css
[[ -1.64638996 -1.2114224 2.15896971]
[ 6.00536939 -7.71788794 6.54291646]
[ 2.30925965 4.72675272 -0.2759866 ]
[ 2.92760387 -11.45580554 -7.84082136]
[ -5.01085861 -0.96178148 -5.77203831]]
python
arr.sort(1)
print(arr)
css
[[ -1.64638996 -1.2114224 2.15896971]
[ -7.71788794 6.00536939 6.54291646]
[ -0.2759866 2.30925965 4.72675272]
[-11.45580554 -7.84082136 2.92760387]
[ -5.77203831 -5.01085861 -0.96178148]]
下面我们做一个小案例,找出排序后位置在5%的数字
python
large_arr = np.random.randn(1000)
large_arr.sort()
print(large_arr[int(0.05 * len(large_arr))])
diff
-1.71556330464
Broadcasting
这个没想好哪个中文词最贴切,我们暂且叫它"传播吧":
作用是什么呢,我们设想一个场景,如果要用小的矩阵去和大的矩阵做一些操作,但是希望小矩阵能循环和大矩阵的那些块做一样的操作,那急需要Broadcasting啦
我们要做一件事情,给x的每一行都逐元素加上一个向量,然后生成y
python
x = np.array([[1,2,3], [4,5,6], [7,8,9], [10, 11, 12]])
v = np.array([1, 0, 1])
print(x)
print()
print(v)
css
[[ 1 2 3]
[ 4 5 6]
[ 7 8 9]
[10 11 12]]
[1 0 1]
python
y = np.empty_like(x)
print(y)
lua
[[0 0 0]
[0 0 0]
[0 0 0]
[0 0 0]]
python
比较粗暴的方式是,用for循环逐个相加
python
for i in range(4):
y[i,:] = x[i,:] + v
print(y)
lua
[[ 2 2 4]
[ 5 5 7]
[ 8 8 10]
[11 11 13]]
这种方法当然可以啦,问题是不高效嘛,如果你的x矩阵行数非常多,那就很慢的咯:
Numpy broadcasting allows us to perform this computation without actually creating multiple copies of v. Consider this version, using broadcasting:
因为broadcasting的存在,你上面的操作可以简单地汇总成一个求和操作
python
print(x.shape, v.shape)
x + v
lua
(4, 3) (3,)
array([[ 2, 2, 4],
[ 5, 5, 7],
[ 8, 8, 10],
[11, 11, 13]])
当操作两个array时,numpy会逐个比较它们的shape,在下述情况下,两arrays会兼容和输出broadcasting结果:
- 相等
- 其中一个为1,(进而可进行拷贝拓展已至,shape匹配)
比如求和的时候有:
python
Image (3d array): 256 x 256 x 3
Scale (1d array): 3
Result (3d array): 256 x 256 x 3
A (4d array): 8 x 1 x 6 x 1
B (3d array): 7 x 1 x 5
Result (4d array): 8 x 7 x 6 x 5
A (2d array): 5 x 4
B (1d array): 1
Result (2d array): 5 x 4
A (2d array): 15 x 3 x 5
B (1d array): 15 x 1 x 5
Result (2d array): 15 x 3 x 5
下面是一些 broadcasting 的例子:
我们来理解一下broadcasting的这种用法
先把v变形成3x1的数组/矩阵,然后就可以broadcasting加在w上了:
python
v = np.array([1,2,3])
w = np.array([4,5])
print(v.shape, w.shape)
scss
(3,) (2,)
python
v = v.reshape(3,1)
print(v.shape)
scss
(3, 1)
python
v + w
lua
array([[5, 6],
[6, 7],
[7, 8]])
那如果要把一个矩阵的每一行都加上一个向量呢
python
x = np.array([[1,2,3], [4,5,6]])
v = np.array([1,2,3])
print(x + v)
lua
[[2 4 6]
[5 7 9]]
python
x = np.array([[1,2,3], [4,5,6]]) # 2x3的
w = np.array([4,5]) # 2
python
(x.T + w).T
lua
array([[ 5, 6, 7],
[ 9, 10, 11]])
上面那个操作太复杂了,其实我们可以直接这么做嘛
python
x + np.reshape(w, (2,1))
lua
array([[ 5, 6, 7],
[ 9, 10, 11]])
broadcasting当然可以逐元素运算了
总结一下broadcasting,可以看看下面的图:
逻辑运算
python
x_arr = np.array([1.1, 1.2, 1.3, 1.4, 1.5])
y_arr = np.array([2.1, 2.2, 2.3, 2.4, 2.5])
python
cond = np.array([True, False, True, True, False])
python
print(np.where(cond, x_arr, y_arr))
css
[ 1.1 2.2 1.3 1.4 2.5]
python
arr = np.random.randn(4,4)
print(arr)
lua
[[ 0.94534017 0.04955986 0.20184702 0.40272432]
[-0.20872753 -0.04952711 0.0059752 -1.9356753 ]
[ 1.20042485 -0.92834374 1.57810979 -0.96378859]
[-2.09492281 -0.38678213 -0.67656147 1.45629059]]
python
arr > 0
python
array([[ True, True, True, True],
[False, False, True, False],
[ True, False, True, False],
[False, False, False, True]], dtype=bool)
python
print(np.where(arr > 0, 1,-1))
lua
[[ 1 1 1 1]
[-1 -1 1 -1]
[ 1 -1 1 -1]
[-1 -1 -1 1]]
python
print(np.where(arr > 0, 1,arr))
lua
[[ 1. 1. 1. 1. ]
[-0.20872753 -0.04952711 1. -1.9356753 ]
[ 1. -0.92834374 1. -0.96378859]
[-2.09492281 -0.38678213 -0.67656147 1. ]]
一些更高级的ndarray处理
使用reshape来改变tensor的形状
numpy可以很容易地把一维数组转成二维数组,三维数组。
python
arr = np.arange(8)
print(arr.shape)
scss
(8,)
python
arr.reshape(2,4)
lua
array([[0, 1, 2, 3],
[4, 5, 6, 7]])
python
arr.reshape(2,2,2)
lua
array([[[0, 1],
[2, 3]],
[[4, 5],
[6, 7]]])
python
arr = np.arange(15)
print(arr.reshape(5,3).shape)
scss
(5, 3)
python
print(arr.reshape(5,-1).shape)
scss
(5, 3)
如果我们在某一个维度上写上-1,numpy会帮我们自动推导出正确的维度
还可以从其他的ndarray中获取shape信息然后reshape
python
other_arr = np.ones((3,5))
print(other_arr.shape) # tuple
scss
(3, 5)
python
arr.reshape(other_arr.shape)
lua
array([[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14]])
python
高维数组可以用ravel来拉平
python
arr = arr.reshape(other_arr.shape)
print(arr)
lua
[[ 0 1 2 3 4]
[ 5 6 7 8 9]
[10 11 12 13 14]]
python
arr.ravel().shape
scss
(15,)
连接两个二维数组
python
arr1 = np.array([[1, 2, 3], [4, 5, 6]])
arr2 = np.array([[7, 8, 9], [10, 11, 12]])
print(arr1, "\n\n", arr2)
lua
[[1 2 3]
[4 5 6]]
[[ 7 8 9]
[10 11 12]]
python
np.concatenate([arr1, arr2], axis=0)
lua
array([[ 1, 2, 3],
[ 4, 5, 6],
[ 7, 8, 9],
[10, 11, 12]])
python
np.concatenate([arr1, arr2], axis=1)
lua
array([[ 1, 2, 3, 7, 8, 9],
[ 4, 5, 6, 10, 11, 12]])
所谓堆叠,参考叠盘子。。。连接的另一种表述 垂直stack与水平stack
python
np.vstack((arr1, arr2)) # vertical
lua
array([[ 1, 2, 3],
[ 4, 5, 6],
[ 7, 8, 9],
[10, 11, 12]])
python
np.hstack((arr1, arr2)) # horizontal
lua
array([[ 1, 2, 3, 7, 8, 9],
[ 4, 5, 6, 10, 11, 12]])
拆分数组
python
arr = np.random.rand(5,5)
print(arr)
css
[[ 0.80642843 0.43253953 0.24511404 0.21328645 0.50991311]
[ 0.19373378 0.72169396 0.05192132 0.1746048 0.69771988]
[ 0.59689743 0.82253158 0.03346062 0.9002945 0.03960687]
[ 0.06061257 0.27390675 0.19740262 0.76815388 0.02035703]
[ 0.58031701 0.63341072 0.75286027 0.82066801 0.24301514]]
python
first, second, third = np.split(arr, [1,3], axis=0)
print(first, '\n\n', second, '\n\n', third)
lua
[[ 0.80642843 0.43253953 0.24511404 0.21328645 0.50991311]]
[[ 0.19373378 0.72169396 0.05192132 0.1746048 0.69771988]
[ 0.59689743 0.82253158 0.03346062 0.9002945 0.03960687]]
[[ 0.06061257 0.27390675 0.19740262 0.76815388 0.02035703]
[ 0.58031701 0.63341072 0.75286027 0.82066801 0.24301514]]
堆叠辅助
python
arr = np.arange(6)
arr1 = arr.reshape((3, 2))
arr2 = np.random.randn(3, 2)
print(arr1)
print(arr2)
lua
[[0 1]
[2 3]
[4 5]]
[[-0.60392123 -0.1769936 ]
[ 0.46523138 0.71963034]
[-0.51733042 1.50108329]]
r_用于按行堆叠
python
print(np.r_[arr1, arr2])
print()
css
[[ 0. 1. ]
[ 2. 3. ]
[ 4. 5. ]
[-0.60392123 -0.1769936 ]
[ 0.46523138 0.71963034]
[-0.51733042 1.50108329]]
c_用于按列堆叠
python
print(np.c_[np.r_[arr1, arr2], arr])
print()
css
[[ 0. 1. 0. ]
[ 2. 3. 1. ]
[ 4. 5. 2. ]
[-0.60392123 -0.1769936 3. ]
[ 0.46523138 0.71963034 4. ]
[-0.51733042 1.50108329 5. ]]
切片直接转为数组
python
np.c_[1:6, -5:0]
css
array([[ 1, -5],
[ 2, -4],
[ 3, -3],
[ 4, -2],
[ 5, -1]])
使用repeat来重复
python
arr = np.arange(3)
print(arr)
csharp
[0 1 2]
python
print(arr.repeat(3))
csharp
[0 0 0 1 1 1 2 2 2]
python
print(arr.repeat([2,3,5]))
csharp
[0 0 1 1 1 2 2 2 2 2]
按元素重复
指定axis来重复
python
arr = np.random.rand(2,2)
print(arr)
lua
[[ 0.9682022 0.99265567]
[ 0.62174828 0.12614083]]
python
print(arr.repeat(2, axis=0))
lua
[[ 0.90063544 0.36862431 0.46734451]
[ 0.90063544 0.36862431 0.46734451]
[ 0.61467785 0.63962631 0.61288228]
[ 0.61467785 0.63962631 0.61288228]]
python
print(arr.repeat(2, axis=1))
lua
[[ 0.90063544 0.90063544 0.36862431 0.36862431 0.46734451 0.46734451]
[ 0.61467785 0.61467785 0.63962631 0.63962631 0.61288228 0.61288228]]
Tile: 参考贴瓷砖 numpy tile
python
print(arr)
print()
print(np.tile(arr, 2))
lua
[[ 0.9682022 0.99265567]
[ 0.62174828 0.12614083]]
[[ 0.9682022 0.99265567 0.9682022 0.99265567]
[ 0.62174828 0.12614083 0.62174828 0.12614083]]
python
print(np.tile(arr, (2,3)))
lua
[[ 0.9682022 0.99265567 0.9682022 0.99265567 0.9682022 0.99265567]
[ 0.62174828 0.12614083 0.62174828 0.12614083 0.62174828 0.12614083]
[ 0.9682022 0.99265567 0.9682022 0.99265567 0.9682022 0.99265567]
[ 0.62174828 0.12614083 0.62174828 0.12614083 0.62174828 0.12614083]]
numpy的文件输入输出
读取csv文件作为数组
python
arr = np.loadtxt('array_ex.txt', delimiter=',')
print(arr)
css
[[ 0.580052 0.18673 1.040717 1.134411]
[ 0.194163 -0.636917 -0.938659 0.124094]
[-0.12641 0.268607 -0.695724 0.047428]
[-1.484413 0.004176 -0.744203 0.005487]
[ 2.302869 0.200131 1.670238 -1.88109 ]
[-0.19323 1.047233 0.482803 0.960334]]
数组文件读写
python
arr = np.arange(50).reshape(2,5,5)
print(arr)
css
[[[ 0 1 2 3 4]
[ 5 6 7 8 9]
[10 11 12 13 14]
[15 16 17 18 19]
[20 21 22 23 24]]
[[25 26 27 28 29]
[30 31 32 33 34]
[35 36 37 38 39]
[40 41 42 43 44]
[45 46 47 48 49]]]
python
np.save('some_array', arr)
python
arr2 = np.load('some_array.npy')
print(arr2)
css
[[[ 0 1 2 3 4]
[ 5 6 7 8 9]
[10 11 12 13 14]
[15 16 17 18 19]
[20 21 22 23 24]]
[[25 26 27 28 29]
[30 31 32 33 34]
[35 36 37 38 39]
[40 41 42 43 44]
[45 46 47 48 49]]]
python
arr3 = np.arange(15).reshape(3,5)
np.savez("array_archive.npz", arr=arr, b=arr2, c=arr3)
python
arch = np.load('array_archive.npz')
print(arch['arr'])
css
[[[ 0 1 2 3 4]
[ 5 6 7 8 9]
[10 11 12 13 14]
[15 16 17 18 19]
[20 21 22 23 24]]
[[25 26 27 28 29]
[30 31 32 33 34]
[35 36 37 38 39]
[40 41 42 43 44]
[45 46 47 48 49]]]
python
arch['b']
css
array([[[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14],
[15, 16, 17, 18, 19],
[20, 21, 22, 23, 24]],
[[25, 26, 27, 28, 29],
[30, 31, 32, 33, 34],
[35, 36, 37, 38, 39],
[40, 41, 42, 43, 44],
[45, 46, 47, 48, 49]]])
python
arch['c']
lua
array([[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14]])
python
多个数组可以一起压缩存储
随堂小项目
用numpy写一个softmax
- 计算exponential
- 按行求和
- 每一行都要除以计算的和
python
m = np.random.rand(10,10) * 10 + 1000
print(m)
yaml
[[ 1008.64304012 1001.25079229 1006.81896868 1005.89015258 1008.8915297
1001.84923866 1005.53509734 1005.34075305 1008.93404709
1006.94897664]
[ 1003.24267825 1003.72710741 1000.28354398 1000.32012105 1004.3690361
1007.18390602 1002.49741606 1005.83510332 1009.19678396
1002.32098566]
[ 1002.32824002 1006.2813999 1009.27645662 1002.57259159
1006.30743627 1000.35201323 1003.94430099 1008.79056869
1007.40485841 1006.38239542]
[ 1007.06228714 1006.01325352 1007.96901864 1002.34269542
1000.75563221 1005.26357317 1006.14861174 1005.68119044
1000.69006453 1007.21834125]
[ 1004.15770428 1003.0554848 1005.55619032 1003.04000025
1005.54338468 1002.23952638 1008.86317857 1006.96983789
1005.84232318 1009.28833837]
[ 1008.47151667 1006.30354927 1006.69274016 1004.12418543
1007.17550972 1004.31758292 1007.27760499 1007.45250445
1000.02943239 1002.25886446]
[ 1000.63764781 1003.39894276 1008.26298759 1001.89295012
1007.85388369 1004.67565255 1004.58872708 1003.24488815
1000.39528914 1007.20964465]
[ 1005.21815308 1007.42651355 1006.32407717 1003.0096329 1005.03545902
1008.85925437 1009.57634418 1003.74546024 1003.40512867 1004.4437606 ]
[ 1001.78786625 1008.73282377 1003.98906267 1008.17533941
1002.79957584 1000.89332666 1007.64343999 1003.88248211
1005.75517566 1008.27556001]
[ 1002.05916059 1007.25663392 1009.48655775 1009.56831564
1008.28488062 1004.92593854 1008.0468565 1007.53278621
1001.94935121 1007.01473574]]
python
# np.exp(m)
m_row_max = m.max(axis=1).reshape(10,1)
print(m_row_max, m_row_max.shape)
scss
[[ 1008.93404709]
[ 1009.19678396]
[ 1009.27645662]
[ 1007.96901864]
[ 1009.28833837]
[ 1008.47151667]
[ 1008.26298759]
[ 1009.57634418]
[ 1008.73282377]
[ 1009.56831564]] (10, 1)
python
m = m - m_row_max
print(m)
css
[[-0.29100696 -7.6832548 -2.11507841 -3.04389451 -0.04251738 -7.08480843 -3.39894975 -3.59329403 0. -1.98507045]
[-5.95410571 -5.46967655 -8.91323998 -8.87666291 -4.82774786 -2.01287794 -6.6993679 -3.36168065 0. -6.8757983 ]
[-6.9482166 -2.99505673 0. -6.70386503 -2.96902035 -8.9244434 -5.33215563 -0.48588793 -1.87159821 -2.8940612 ]
[-0.9067315 -1.95576512 0. -5.62632322 -7.21338643 -2.70544547 -1.82040689 -2.2878282 -7.27895411 -0.75067739]
[-5.13063409 -6.23285357 -3.73214805 -6.24833812 -3.74495369 -7.04881199 -0.4251598 -2.31850048 -3.44601518 0. ]
[ 0. -2.16796739 -1.77877651 -4.34733123 -1.29600694 -4.15393374 -1.19391168 -1.01901222 -8.44208427 -6.21265221]
[-7.62533977 -4.86404483 0. -6.37003747 -0.4091039 -3.58733504 -3.67426051 -5.01809944 -7.86769844 -1.05334294]
[-4.3581911 -2.14983063 -3.25226701 -6.56671127 -4.54088515 -0.71708981 0. -5.83088393 -6.1712155 -5.13258358]
[-6.94495753 0. -4.7437611 -0.55748437 -5.93324793 -7.83949711 -1.08938379 -4.85034166 -2.97764811 -0.45726376]
[-7.50915505 -2.31168172 -0.08175789 0. -1.28343501 -4.6423771 -1.52145914 -2.03552943 -7.61896443 -2.5535799 ]]
python
m_exp = np.exp(m)
print(m_exp, m_exp.shape)
scss
[[ 7.47510474e-01 4.60473707e-04 1.20623832e-01 4.76489585e-02 9.58373807e-01 8.37735258e-04 3.34083387e-02 2.75075701e-02 1.00000000e+00 1.37370936e-01]
[ 2.59516363e-03 4.21259451e-03 1.34595041e-04 1.39609277e-04 8.00452828e-03 1.33603617e-01 1.23169021e-03 3.46769303e-02 1.00000000e+00 1.03247308e-03]
[ 9.60346309e-04 5.00337887e-02 1.00000000e+00 1.22616357e-03 5.13535942e-02 1.33095533e-04 4.83363924e-03 6.15150742e-01 1.53877536e-01 5.53509641e-02]
[ 4.03842028e-01 1.41456204e-01 1.00000000e+00 3.60179403e-03 7.36658284e-04 6.68405420e-02 1.61959837e-01 1.01486632e-01 6.89906753e-04 4.72046684e-01]
[ 5.91281003e-03 1.96383995e-03 2.39413532e-02 1.93366498e-03 2.36367236e-02 8.68440058e-04 6.53665320e-01 9.84210597e-02 3.18723893e-02 1.00000000e+00]
[ 1.00000000e+00 1.14409931e-01 1.68844601e-01 1.29413038e-02 2.73622203e-01 1.57025251e-02 3.03033573e-01 3.60951306e-01 2.15600312e-04 2.00391561e-03]
[ 4.87929429e-04 7.71919784e-03 1.00000000e+00 1.71209508e-03 6.64245216e-01 2.76719769e-02 2.53681582e-02 6.61709096e-03 3.82914649e-04 3.48769883e-01]
[ 1.28015234e-02 1.16503889e-01 3.86864061e-02 1.40641513e-03 1.06639631e-02 4.88170860e-01 1.00000000e+00 2.93548106e-03 2.08869564e-03 5.90129429e-03]
[ 9.63481254e-04 1.00000000e+00 8.70584098e-03 5.72647825e-01 2.64986143e-03 3.93867063e-04 3.36423738e-01 7.82570334e-03 5.09124334e-02 6.33013352e-01]
[ 5.48043962e-04 9.90944624e-02 9.21495038e-01 1.00000000e+00 2.77083877e-01 9.63476759e-03 2.18392989e-01 1.30611314e-01 4.91050084e-04 7.78026413e-02]] (10, 10)
python
m_exp_row_sum = m_exp.sum(axis=1).reshape(10,1)
print(m_exp_row_sum, m_exp_row_sum.shape)
scss
[[ 3.07374213]
[ 1.1856312 ]
[ 1.93291987]
[ 2.35266029]
[ 1.8422156 ]
[ 2.25172496]
[ 2.08297446]
[ 1.67915853]
[ 2.6135361 ]
[ 2.73515418]] (10, 1)
python
m_softmax = m_exp / m_exp_row_sum
print(m_softmax)
css
[[ 2.43192319e-01 1.49808829e-04 3.92433154e-02 1.55019376e-02 3.11793823e-01 2.72545719e-04 1.08689465e-02 8.94921207e-03 3.25336336e-01 4.46917571e-02]
[ 2.18884559e-03 3.55303952e-03 1.13521845e-04 1.17751015e-04 6.75128005e-03 1.12685645e-01 1.03884767e-03 2.92476533e-02 8.43432594e-01 8.70821452e-04]
[ 4.96837103e-04 2.58850817e-02 5.17352020e-01 6.34358200e-04 2.65678857e-02 6.88572427e-05 2.50069303e-03 3.18249479e-01 7.96088542e-02 2.86359331e-02]
[ 1.71653353e-01 6.01260646e-02 4.25050742e-01 1.53094522e-03 3.13117150e-04 2.84106220e-02 6.88411489e-02 4.31369681e-02 2.93245377e-04 2.00643793e-01]
[ 3.20961891e-03 1.06602069e-03 1.29959562e-02 1.04964098e-03 1.28305957e-02 4.71410652e-04 3.54825635e-01 5.34253752e-02 1.73011179e-02 5.42824629e-01]
[ 4.44103973e-01 5.08099049e-02 7.49845582e-02 5.74728445e-03 1.21516707e-01 6.97355379e-03 1.34578414e-01 1.60299909e-01 9.57489550e-05 8.89946885e-04]
[ 2.34246477e-04 3.70585333e-03 4.80082698e-01 8.21947224e-04 3.18892636e-01 1.32848374e-02 1.21788138e-02 3.17675088e-03 1.83830698e-04 1.67438386e-01]
[ 7.62377296e-03 6.93823048e-02 2.30391624e-02 8.37571382e-04 6.35077805e-03 2.90723509e-01 5.95536385e-01 1.74818578e-03 1.24389425e-03 3.51443547e-03]
[ 3.68650448e-04 3.82623373e-01 3.33105824e-03 2.19108442e-01 1.01389892e-03 1.50702744e-04 1.28723586e-01 2.99429701e-03 1.94802870e-02 2.42205704e-01]
[ 2.00370409e-04 3.62299365e-02 3.36907895e-01 3.65610102e-01 1.01304664e-01 3.52256836e-03 7.98466830e-02 4.77528160e-02 1.79532871e-04 2.84454316e-02]]
python
print(m_softmax.sum(axis=1))
css
[ 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
更多的numpy细节和用法可以查看一下官网numpy指南