机器学习04——numpy

1、numpy介绍

Numpy（Numerical Python）是一个开源的Python科学计算库，用于快速处理任意维度的数组。

Numpy支持常见的数组和矩阵操作。对于同样的数值计算任务，使用Numpy比直接使用Python要简洁的多。

Numpy使用ndarray对象来处理多维数组，该对象是一个快速而灵活的大数据容器。

2、ndarray举例

python 复制代码

#ndarray举例
import numpy as np

score=np.array([[1,2,3],[2,3,4]])

score

结果

python 复制代码

array([[1, 2, 3],
       [2, 3, 4]])

3、ndarray和python原生list效率对比

python 复制代码

#ndarray和python原生list效率对比
#机器学习最大特点就是大量的数据运算，数组的存储效率和输入输出性能优于嵌套列表
#1、内存块风格
#2、支持并行化运算（向量化运算）：当系统有多个核心时，会自动并行运算
#3、效率远高于纯python代码：numpy底层使用c语言编写，内部解除了GIL全局解释器锁，对数组的操作速度不受python解释器的限制

import random
import time
import numpy as np

a=[]
for i in range(100000000):
    a.append(random.random()) #向列表a中添加一亿个随机浮点数，random 模块的一个函数。它会生成一个 0.0 到 1.0 之间 的随机浮点数

%time sum1=sum(a) #魔法命令。测量"对列表a进行求和"这一操作所消耗的时间，运行后面紧跟的那行代码。在执行完后，自动打印出 CPU 执行这段代码所花的具体时间

b=np.array(a) #将 Python 的原生列表（List）a 转换为一个 NumPy 数组（Array）并赋值给 b。
%time sum2=np.sum(b)

#CPU times（CPU 真正干活的时间）
#user (用户态时间)
#sys (内核态时间)
#total (总和)
#Wall time（你实际等待的时间）

结果

python 复制代码

CPU times: user 326 ms, sys: 847 ms, total: 1.17 s
Wall time: 1.64 s
CPU times: user 44.2 ms, sys: 73.4 ms, total: 118 ms
Wall time: 119 ms

4、ndarray介绍

python 复制代码

#ndarray的属性

score.shape #返回数组维度的元祖，如(2,3)为2行3列
score.ndim #返回数组维数，2为二维
score.size #返回数组中元祖的数量
score.itemsize #返回一个数组元素的长度(字节)，8则每个元素默认8字节
score.dtype #返回数组元素的类型

python 复制代码

#ndarray的形状

a=np.array([1,2,3,4]) #看左边或者右边中括号的个数，一个中括号为一维数组，两个中括号为二维数组，等等
b=np.array([[1,2],[2,3]])
c=np.array([[[1,2],[2,3]],[[2,3],[4,5]]])
a.shape
b.shape
c.shape  #(2, 2, 2)

python 复制代码

#ndarray的类型

type(score.dtype) #输出：numpy.dtypes.Int64DType

b = np.array([[1, 2, 3], [4, 5, 6]], dtype=np.float32)

结果

python 复制代码

array([[1., 2., 3.],
       [4., 5., 6.]], dtype=float32)

5、数组基本操作

5-1、生成数组的方法

5-1-1、生成0,1 数组

python 复制代码

#生成0,1 数组

ones = np.ones([4,8])
ones

结果

python 复制代码

array([[1., 1., 1., 1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1., 1., 1., 1.]])

python 复制代码

np.zeros_like(ones)

结果

python 复制代码

array([[0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0.]])

5-1-2、从现在数组中生成

python 复制代码

a = np.array([[1, 2, 3], [4, 5, 6]])
a
a1 = np.array(a) # 深拷贝
a1
a2 = np.asarray(a) # 浅拷贝
a2
a[0, 0] = 1000
a
a1
a2

结果

python 复制代码

array([[1, 2, 3],
       [4, 5, 6]])

array([[1, 2, 3],
       [4, 5, 6]])

array([[1, 2, 3],
       [4, 5, 6]])

array([[1000,    2,    3],
       [   4,    5,    6]])

array([[1, 2, 3],
       [4, 5, 6]])

array([[1000,    2,    3],
       [   4,    5,    6]])

5-1-3、生成固定范围的数组

python 复制代码

np.linspace(0, 100, 11) #等间隔

np.arange(10, 100, 2) #等步长

np.logspace(0, 2, 3) #生成指数数据

结果

python 复制代码

array([  0.,  10.,  20.,  30.,  40.,  50.,  60.,  70.,  80.,  90., 100.])

array([10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 40, 42,
       44, 46, 48, 50, 52, 54, 56, 58, 60, 62, 64, 66, 68, 70, 72, 74, 76,
       78, 80, 82, 84, 86, 88, 90, 92, 94, 96, 98])

array([  1.,  10., 100.])

5-1-4、生成随机数组

5-1-4-1、正态分布

python 复制代码

x1 = np.random.normal(1.75, 1, 100000000)

# 1.创建画布
plt.figure(figsize=(20, 8), dpi=100)

# 2.绘制图像
plt.hist(x1, 1000)

# 3.显示图像
plt.show()

stock_change = np.random.normal(0, 1, [4,5])
stock_change

结果

python 复制代码

array([2.14726132, 2.34220066, 1.24955806, ..., 0.27842733, 0.90682495,
       1.75303785])

array([[-0.5142373 , -0.32586912,  0.39132714,  1.02290317, -0.33438889],
       [-0.08775205,  1.99655647,  0.24488145, -1.25742494, -0.35522986],
       [ 0.85280747,  1.87762957,  0.68582294,  1.05605474, -1.6015672 ],
       [-0.53759709,  1.62663522, -0.2319302 , -0.27205088, -0.49244907]])

5-1-4-2、均匀分布

python 复制代码

x2 = np.random.uniform(-1, 1, 100000000)

x2

# 1.创建画布
plt.figure(figsize=(20, 8), dpi=100)

# 2.绘制图像
plt.hist(x2, 1000)

# 3.显示图像
plt.show()

结果

python 复制代码

array([-0.2977719 , -0.10336683, -0.77723208, ...,  0.85248355,
       -0.52078581,  0.79849061])

5-2、数组的索引、切片

python 复制代码

stock_change

stock_change[0, 0:3]

a1 = np.array([ [[1,2,3],[4,5,6]], [[12,3,34],[5,6,7]]])

a1

a1[1,0,0]

结果

python 复制代码

array([[-0.5142373 , -0.32586912,  0.39132714,  1.02290317, -0.33438889],
       [-0.08775205,  1.99655647,  0.24488145, -1.25742494, -0.35522986],
       [ 0.85280747,  1.87762957,  0.68582294,  1.05605474, -1.6015672 ],
       [-0.53759709,  1.62663522, -0.2319302 , -0.27205088, -0.49244907]])


array([-0.5142373 , -0.32586912,  0.39132714])

array([[[ 1,  2,  3],
        [ 4,  5,  6]],

       [[12,  3, 34],
        [ 5,  6,  7]]])

12

5-3、形状修改

python 复制代码

stock_change.shape

stock_change

stock_change.reshape([5,4])

stock_change.reshape([-1, 2])

# stock_change.reshape([3,-1]) # 报错

stock_change

stock_change.resize([10, 2])

stock_change

stock_change.T

结果

python 复制代码

(4, 5)

array([[-0.5142373 , -0.32586912,  0.39132714,  1.02290317, -0.33438889],
       [-0.08775205,  1.99655647,  0.24488145, -1.25742494, -0.35522986],
       [ 0.85280747,  1.87762957,  0.68582294,  1.05605474, -1.6015672 ],
       [-0.53759709,  1.62663522, -0.2319302 , -0.27205088, -0.49244907]])

array([[-0.5142373 , -0.32586912,  0.39132714,  1.02290317],
       [-0.33438889, -0.08775205,  1.99655647,  0.24488145],
       [-1.25742494, -0.35522986,  0.85280747,  1.87762957],
       [ 0.68582294,  1.05605474, -1.6015672 , -0.53759709],
       [ 1.62663522, -0.2319302 , -0.27205088, -0.49244907]])

array([[-0.5142373 , -0.32586912],
       [ 0.39132714,  1.02290317],
       [-0.33438889, -0.08775205],
       [ 1.99655647,  0.24488145],
       [-1.25742494, -0.35522986],
       [ 0.85280747,  1.87762957],
       [ 0.68582294,  1.05605474],
       [-1.6015672 , -0.53759709],
       [ 1.62663522, -0.2319302 ],
       [-0.27205088, -0.49244907]])

array([[-0.5142373 , -0.32586912,  0.39132714,  1.02290317, -0.33438889],
       [-0.08775205,  1.99655647,  0.24488145, -1.25742494, -0.35522986],
       [ 0.85280747,  1.87762957,  0.68582294,  1.05605474, -1.6015672 ],
       [-0.53759709,  1.62663522, -0.2319302 , -0.27205088, -0.49244907]])

array([[-0.5142373 , -0.32586912],
       [ 0.39132714,  1.02290317],
       [-0.33438889, -0.08775205],
       [ 1.99655647,  0.24488145],
       [-1.25742494, -0.35522986],
       [ 0.85280747,  1.87762957],
       [ 0.68582294,  1.05605474],
       [-1.6015672 , -0.53759709],
       [ 1.62663522, -0.2319302 ],
       [-0.27205088, -0.49244907]])

array([[-0.5142373 ,  0.39132714, -0.33438889,  1.99655647, -1.25742494,
         0.85280747,  0.68582294, -1.6015672 ,  1.62663522, -0.27205088],
       [-0.32586912,  1.02290317, -0.08775205,  0.24488145, -0.35522986,
         1.87762957,  1.05605474, -0.53759709, -0.2319302 , -0.49244907]])

5-4、类型修改

python 复制代码

stock_change.astype(np.int64)

a1 = np.array([ [[1,2,3],[4,5,6]], [[12,3,34],[5,6,7]]])

a1

a1.tostring()

结果

python 复制代码

array([[ 0,  0],
       [ 0,  1],
       [ 0,  0],
       [ 1,  0],
       [-1,  0],
       [ 0,  1],
       [ 0,  1],
       [-1,  0],
       [ 1,  0],
       [ 0,  0]])

array([[[ 1,  2,  3],
        [ 4,  5,  6]],

       [[12,  3, 34],
        [ 5,  6,  7]]])


b'\x01\x00\x00\x00\x00\x00\x00\x00\x02\x00\x00\x00\x00\x00\x00\x00\x03\x00\x00\x00\x00\x00\x00\x00\x04\x00\x00\x00\x00\x00\x00\x00\x05\x00\x00\x00\x00\x00\x00\x00\x06\x00\x00\x00\x00\x00\x00\x00\x0c\x00\x00\x00\x00\x00\x00\x00\x03\x00\x00\x00\x00\x00\x00\x00"\x00\x00\x00\x00\x00\x00\x00\x05\x00\x00\x00\x00\x00\x00\x00\x06\x00\x00\x00\x00\x00\x00\x00\x07\x00\x00\x00\x00\x00\x00\x00'

5-5、数组的去重

python 复制代码

a = np.array([[1,2,3,4],[2,3,4,5]])

a

np.unique(a)

结果

python 复制代码

array([[1, 2, 3, 4],
       [2, 3, 4, 5]])

array([1, 2, 3, 4, 5])

6、ndarray的运算

6-1、逻辑运算

python 复制代码

score = np.random.randint(40, 100, (10, 5))
score
test_score = score[6:, 0:5]
test_score
test_score > 60
test_score[test_score > 60] = 1
test_score

结果

python 复制代码

array([[89, 56, 51, 79, 91],
       [66, 95, 57, 50, 58],
       [44, 88, 79, 83, 74],
       [76, 71, 44, 65, 44],
       [96, 45, 45, 91, 76],
       [53, 52, 54, 90, 70],
       [97, 50, 73, 48, 50],
       [90, 46, 60, 71, 44],
       [94, 40, 52, 41, 73],
       [96, 95, 68, 69, 47]])

array([[97, 50, 73, 48, 50],
       [90, 46, 60, 71, 44],
       [94, 40, 52, 41, 73],
       [96, 95, 68, 69, 47]])

array([[ True, False,  True, False, False],
       [ True, False, False,  True, False],
       [ True, False, False, False,  True],
       [ True,  True,  True,  True, False]])

array([[ 1, 50,  1, 48, 50],
       [ 1, 46, 60,  1, 44],
       [ 1, 40, 52, 41,  1],
       [ 1,  1,  1,  1, 47]])

6-2、通用判断函数

python 复制代码

np.all(score[0:2, :] > 60)

score

np.any(score[0:2, :] > 90)

结果

python 复制代码

False

array([[89, 56, 51, 79, 91],
       [66, 95, 57, 50, 58],
       [44, 88, 79, 83, 74],
       [76, 71, 44, 65, 44],
       [96, 45, 45, 91, 76],
       [53, 52, 54, 90, 70],
       [ 1, 50,  1, 48, 50],
       [ 1, 46, 60,  1, 44],
       [ 1, 40, 52, 41,  1],
       [ 1,  1,  1,  1, 47]])

True

6-3、np.where（三元运算符）

python 复制代码

temp = score[:4, :4]

temp

np.where(temp > 60, 1, 0)

np.where(np.logical_and(temp > 60, temp < 90), 1, 0)

np.where(np.logical_or(temp > 90, temp < 60), 1, 0)

结果

python 复制代码

array([[89, 56, 51, 79],
       [66, 95, 57, 50],
       [44, 88, 79, 83],
       [76, 71, 44, 65]])

array([[1, 0, 0, 1],
       [1, 1, 0, 0],
       [0, 1, 1, 1],
       [1, 1, 0, 1]])

array([[1, 0, 0, 1],
       [1, 0, 0, 0],
       [0, 1, 1, 1],
       [1, 1, 0, 1]])


array([[0, 1, 1, 0],
       [0, 1, 1, 1],
       [1, 0, 0, 0],
       [0, 0, 1, 0]])

6-4、统计运算

python 复制代码

temp = score[:4, :]

temp

np.max(temp)

np.mean(temp)

np.max(temp, axis=0)

np.max(temp, axis=1)

np.argmax(temp)

np.argmax(temp, axis=0)

np.argmin(temp, axis=0)

结果

python 复制代码

array([[89, 56, 51, 79, 91],
       [66, 95, 57, 50, 58],
       [44, 88, 79, 83, 74],
       [76, 71, 44, 65, 44]])

95

68.0

array([89, 95, 79, 83, 91])

array([91, 95, 88, 76])

6

array([0, 1, 2, 2, 0])

array([2, 0, 3, 1, 3])

6-5、数组间的运算

python 复制代码

a = np.array([[1,2,3],[3,4,5]])

a

a + 3

a / 2

a = [1,2,3]
a * 3

arr1 = np.array([[1, 2, 3, 2, 1, 4], [5, 6, 1, 2, 3, 1]])
arr2 = np.array([[1, 2, 3, 4], [3, 4, 5, 6]])

# arr1 + arr2 # 不可以进行计算
arr1 = np.array([[1, 2, 3, 2, 1, 4], [5, 6, 1, 2, 3, 1]])
arr2 = np.array([[1], [3]])

arr1.shape

arr2.shape

arr2+arr1

结果

python 复制代码

array([[1, 2, 3],
       [3, 4, 5]])

array([[4, 5, 6],
       [6, 7, 8]])

array([[0.5, 1. , 1.5],
       [1.5, 2. , 2.5]])

[1, 2, 3, 1, 2, 3, 1, 2, 3]

(2, 6)

(2, 1)

array([[2, 3, 4, 3, 2, 5],
       [8, 9, 4, 5, 6, 4]])

6-6、矩阵运算

python 复制代码

a = np.array([[80, 86],
[82, 80],
[85, 78],
[90, 90],
[86, 82],
[82, 90],
[78, 80],
[92, 94]])

a

b = np.array([[0.7], [0.3]])

b

np.dot(a,b)

np.matmul(a,b)

c = 10

np.dot(c, a)

# np.matmul(c, a)  # matmul不支持矩阵和标量的乘法

结果

python 复制代码

array([[80, 86],
       [82, 80],
       [85, 78],
       [90, 90],
       [86, 82],
       [82, 90],
       [78, 80],
       [92, 94]])

array([[0.7],
       [0.3]])

array([[81.8],
       [81.4],
       [82.9],
       [90. ],
       [84.8],
       [84.4],
       [78.6],
       [92.6]])

array([[81.8],
       [81.4],
       [82.9],
       [90. ],
       [84.8],
       [84.4],
       [78.6],
       [92.6]])

array([[800, 860],
       [820, 800],
       [850, 780],
       [900, 900],
       [860, 820],
       [820, 900],
       [780, 800],
       [920, 940]])