基于神经网络推理的单词分布式表示

公众号：尤而小屋

作者：Peter

编辑：Peter

大家好，我是Peter~

1 单词的分布式表示：基于推理的方法

基于计数的方法会根据一个单词周围的单词的出现次数来表示该单词：生成共现矩阵、再对共现矩阵实施SVD降维，最终获得密集向量（单词的分布式表示）。在大规模的语料库上该方法是不可行的。

基于推理的方法使用神经网络，通常在mini-batch数据 上进行学习；基于计数的方法使用整个语料库统计数据。

基于推理的方法引入了模型，比如神经网络，模型接收上下文信息作为输入，并输出（可能出现的）各个单词的出现概率。

2 神经网络处理单词

神经网络处理单词的方法：one-hot表示，出现的单词位置表示为1，其余位置表示为0

在上图，单词可以表示为文本、单词ID和one-hot表示

基于神经元的全连接变换：

全连接层通过箭头连接所有节点；这些箭头拥有权重参数，它们和输入层神经元的加权构成中间层的神经元。在这里不考虑偏置

没有偏置的全连接相当于在计算矩阵乘积，类似于MatMul层。

2.1 使用np.dot实现全连接层

In [1]:

javascript 复制代码

import numpy as np

In [2]:

ini 复制代码

# 输入
c = np.array([[1,0,0,0,0,0,0]])  # one-hot表示
# 权重
W = np.random.randn(7,3)

# 中间节点：矩阵相乘
h = np.dot(c,W)
h

Out[2]:

lua 复制代码

array([[-0.24025078, -0.95342417,  0.08198498]])

In [3]:

lua 复制代码

print(c)
print("------------------------")
print(W)
print("------------------------")
print(h)
[[1 0 0 0 0 0 0]]
------------------------
[[-0.24025078 -0.95342417  0.08198498]
 [ 1.03823676 -0.03157699 -1.43020582]
 [-1.25773892 -0.7873342  -0.52444877]
 [-2.64360095  0.14705658 -1.03074045]
 [ 1.6621703  -1.88693396 -0.13974575]
 [-0.15429704  0.79857634  0.03640158]
 [ 1.15015805  0.57186146 -0.42180272]]
------------------------
[[-0.24025078 -0.95342417  0.08198498]]

In [4]:

复制代码

c.shape

Out[4]:

scss 复制代码

(1, 7)

In [5]:

复制代码

W.shape

Out[5]:

scss 复制代码

(7, 3)

c和W相乘后矩阵的shape是(1,3)

In [6]:

复制代码

h.shape

Out[6]:

scss 复制代码

(1, 3)

c和W的矩阵乘积相当于提取"权重"的对应行向量：

2.2 使用MatMul层实现

In [7]:

python 复制代码

class MatMul:
    def __init__(self, W):
        self.params = [W]  # 保存学习的参数，此时只有权重W
        self.grads = [np.zeros_like(W)]  # 梯度保存在grads
        self.x = None
    
    # 前向传播
    def forward(self, x):
        W, = self.params    # 参数
        out = np.dot(x,W)   # 输出
        self.x = x
        return out
    
    # 后向传播
    def backward(self, dout):
        W, = self.params
        dx = np.dot(dout, W.T)
        dW = np.dot(self.x.T, dout)
        # grads[0][...] 使用了省略号：可以固定Numpy数组的内存地址，覆盖Numpy数组的元素
        # grads[0]=dW 浅复制   grads[0][...] = dW 深复制
        self.grads[0][...] = dW  # 实例变量grads中设置权重的梯度；grads列表中每个元素是Numpy数组
        return dx

In [8]:

ini 复制代码

# 输入
# c = np.array([[1,0,0,0,0,0,0]])  # one-hot表示
# 权重
# W = np.random.randn(7,3)

layer = MatMul(W)
h = layer.forward(c)

h
(1, 7)
(7, 3)

Out[8]:

lua 复制代码

array([[-0.24025078, -0.95342417,  0.08198498]])

结果和单独使用np.dot是相同的

3 实现word2vec

为了实现word2vec，使用**continuous bag-of-words（CBOW）**的模型作为神经网络。

word2vec中两个主要的神经网络：

CBOW模型
skip-graw模型

3.1 CBOW模型

CBOW模型是根据上下文预测目标词的神经网络（目标词target是指中间的单词，它周围的单词是上下文）

CBOW模型输入：上下文

CBOW模型的网络架构：

模型有两个输入
两个输入层到中间层由相同的权重 <math xmlns="http://www.w3.org/1998/Math/MathML"> W i n W_{in} </math>Win完成
中间层到输出层由权重 <math xmlns="http://www.w3.org/1998/Math/MathML"> W o u t W_{out} </math>Wout完成
中间层的神经元是各个输入层到全连接层变换后得到的值的"均值"。比如第一个输入层转化为h1，第二个输入层转化为h2，那么中间层的神经元是 <math xmlns="http://www.w3.org/1998/Math/MathML"> h 1 + h 2 2 \frac{h_1+h_2}{2} </math>2h1+h2

全连接层的权重 <math xmlns="http://www.w3.org/1998/Math/MathML"> W i n W_{in} </math>Win是一个 <math xmlns="http://www.w3.org/1998/Math/MathML"> 7 × 3 7×3 </math>7×3的矩阵：

<math xmlns="http://www.w3.org/1998/Math/MathML"> W i n W_{in} </math>Win是保存着各行对应各个单词的分布式表示。

从层的视角理解CBOW模型：

两个输入层MatMul经过 <math xmlns="http://www.w3.org/1998/Math/MathML"> W i n W_{in} </math>Win到中间层（取均值）
中间层经过 <math xmlns="http://www.w3.org/1998/Math/MathML"> W o u t W_{out} </math>Wout到输出层

3.2 实现CBOW模型

In [9]:

javascript 复制代码

import numpy as np

In [10]:

lua 复制代码

# 样本上下文数据的one-hot表示
c1 = np.array([[1,0,0,0,0,0,0]])
c2 = np.array([[0,0,1,0,0,0,0]])

设置权重初始值：

In [11]:

ini 复制代码

W_in = np.random.randn(7,3)
W_out = np.random.randn(3,7)

生成层的实现：

In [12]:

ini 复制代码

in_layer1= MatMul(W_in)  # 输入层共享权重W_in
in_layer2= MatMul(W_in)

out_layer= MatMul(W_out)

基于正向传播：

In [13]:

ini 复制代码

# 2个中间层
h1 = in_layer1.forward(c1)
h2 = in_layer2.forward(c2)
h = 0.5 * (h1 + h2)

# 输出层实现
s = out_layer.forward(h)
s
(1, 7)
(7, 3)
(1, 7)
(7, 3)
(1, 3)
(3, 7)

Out[13]:

lua 复制代码

array([[-2.31846731,  0.4739228 ,  0.80760338,  1.11129182, -0.28630727,
         0.1856385 , -0.98685292]])

3.3 CBOW模型的学习

在上面的例子中，you和goodbye是上下文，作为模型的输入；神经网络预测单词应该是say，作为输出。

CBOW模型的学习就是不断调整权重，以使预测更准确。

下面看一个处理多类别分类的神经网络：

使用Softmax函数将得分转化为概率
再经过概率和标签之间的交叉损失熵误差作为损失进行学习

将Softmax层和Cross Entropy Error层实现为一个Softmax with Loss层

3.4 word2vec的权重和分布式表示

两个权重：

输入侧的全连接层的权重 <math xmlns="http://www.w3.org/1998/Math/MathML"> W i n W_{in} </math>Win：每行对应于各个单词的分布式表示
输出侧的全连接层的权重 <math xmlns="http://www.w3.org/1998/Math/MathML"> W o u t W_{out} </math>Wout：保存对单词含义进行了编码的向量

一般结论：仅使用输入侧的权重 <math xmlns="http://www.w3.org/1998/Math/MathML"> W i n W_{in} </math>Win作为单词最终的分布式表示

4 学习数据的准备

4.1 上下文和目标词

word2vec模型中的输入和输出

输入：上下文contexts
输出：上下文包围的中间次，即目标词target

4.2 文本预处理

In [14]:

ini 复制代码

def preprocess(text):
    text = text.lower()
    text = text.replace(".", " .")
    words = text.split(" ")  # 基于空格的切割
    
    word_to_id = {}
    id_to_word = {}
    
    for word in words:
        if word not in word_to_id:
            new_id = len(word_to_id)  # 长度加1作为新ID
            word_to_id[word] = new_id
            id_to_word[new_id] = word
            
    # 单词列表转成单词ID列表
    corpus = np.array([word_to_id[w] for w in words])
    
    return corpus, word_to_id, id_to_word  # 返回语料库，单词ID-字典映射，ID单词-字典映射

In [15]:

ini 复制代码

text = "You say goodbye and I say hello."
corpus, word_to_id, id_to_word = preprocess(text)

corpus

Out[15]:

scss 复制代码

array([0, 1, 2, 3, 4, 1, 5, 6])

In [16]:

ini 复制代码

vocab_size = len(word_to_id)
vocab_size

Out[16]:

复制代码

In [17]:

复制代码

id_to_word

Out[17]:

css 复制代码

{0: 'you', 1: 'say', 2: 'goodbye', 3: 'and', 4: 'i', 5: 'hello', 6: '.'}

4.3 生成上下文和目标词（create_contexts_target）

然后根据单词ID列表corpus生成contexts和target：实现一个给定corpus返回contexts和target的函数

contexts[0]保存的是的第0维上下文的数据，contexts[1]保存的是的第1维上下文的数据；
target[0]保存的是第0个目标词，target[1]保存的是第1个目标词

定义一个生成上下文和目标的函数：

In [18]:

python 复制代码

def create_contexts_target(corpus, window_size=1):
    """
    corpus: 输入的文本数据或文本序列
    window_size: 上下文窗口的大小，默认值为1
    """
    target = corpus[window_size: -window_size]  # 从corpus提取目标序列中心词的上下文  window_size确定目标词周围的词数（不包括目标词本身）
    
    contexts = []  # 上下文
    
    for idx in range(window_size, len(corpus) - window_size):   # 外部循环遍历corpus中的每个词，但跳过窗口大小范围内的词（即不含目标词本身）
        cs = []  # 用于存储当前词的上下文
        for t in range(-window_size, window_size+1): # 使用内部循环遍历上下文窗口范围内的所有词（包括中心词）
            if t == 0:  # 跳过单词本身
                continue
            cs.append(corpus[idx + t])
        contexts.append(cs)
        
    return np.array(contexts), np.array(target)

In [19]:

ini 复制代码

contexts, target = create_contexts_target(corpus, window_size=1)
contexts

Out[19]:

css 复制代码

array([[0, 2],
       [1, 3],
       [2, 4],
       [3, 1],
       [4, 5],
       [1, 6]])

In [20]:

复制代码

target

Out[20]:

scss 复制代码

array([1, 2, 3, 4, 1, 5])

4.4 单词ID转化为one-hot表示（convert_one_hot）

上面得到的上下文和目标词的元素还是单词ID，需要转成one-hot表示才能喂给CBOW模型：

In [21]:

复制代码

corpus

Out[21]:

scss 复制代码

array([0, 1, 2, 3, 4, 1, 5, 6])

In [22]:

复制代码

corpus.shape

Out[22]:

scss 复制代码

(8,)

In [23]:

复制代码

corpus.ndim

Out[23]:

复制代码

In [24]:

复制代码

vocab_size

Out[24]:

复制代码

In [25]:

python 复制代码

def convert_one_hot(corpus, vocab_size):
    """
    功能：上下文和目标单词的ID列表转化为one-hot表示
    corpus：单词ID列表
    vocab_size：词汇个数
    返回最终的二维或者三维Numpy数组
    """
    N = corpus.shape[0]  # 获取行数
    
    if corpus.ndim == 1:  # 如果是一维数组
        one_hot = np.zeros((N, vocab_size), dtype=np.int32)  # 初始化
        for idx, word_id in enumerate(corpus):  # 遍历单词列表
            one_hot[idx,word_id] = 1
                  
    elif corpus.ndim == 2:
        C = corpus.shape[1]
        one_hot = np.zeros((N,C,vocab_size), dtype=np.int32)
        
        for idx_0, word_ids in enumerate(corpus):
            for idx_1, word_id in enumerate(word_ids):
                one_hot[idx_0, idx_1, word_id] = 1
                
    return one_hot

将上下文contexts和目标词target转成one-hot表示：

In [26]:

bash 复制代码

contexts  # 转换前

Out[26]:

css 复制代码

array([[0, 2],
       [1, 3],
       [2, 4],
       [3, 1],
       [4, 5],
       [1, 6]])

In [27]:

ini 复制代码

contexts = convert_one_hot(contexts, vocab_size)
contexts  # 转换后

Out[27]:

lua 复制代码

array([[[1, 0, 0, 0, 0, 0, 0],
        [0, 0, 1, 0, 0, 0, 0]],

       [[0, 1, 0, 0, 0, 0, 0],
        [0, 0, 0, 1, 0, 0, 0]],

       [[0, 0, 1, 0, 0, 0, 0],
        [0, 0, 0, 0, 1, 0, 0]],

       [[0, 0, 0, 1, 0, 0, 0],
        [0, 1, 0, 0, 0, 0, 0]],

       [[0, 0, 0, 0, 1, 0, 0],
        [0, 0, 0, 0, 0, 1, 0]],

       [[0, 1, 0, 0, 0, 0, 0],
        [0, 0, 0, 0, 0, 0, 1]]])

In [28]:

ini 复制代码

target = convert_one_hot(target, vocab_size)
target

Out[28]:

css 复制代码

array([[0, 1, 0, 0, 0, 0, 0],
       [0, 0, 1, 0, 0, 0, 0],
       [0, 0, 0, 1, 0, 0, 0],
       [0, 0, 0, 0, 1, 0, 0],
       [0, 1, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 1, 0]])

5 CBOW模型的实现

CBOW模型需要实现的网络架构：

为了实现SimpleCBOW类，需要事先准备两个类：MatMul和SoftmaxWithLoss

5.1 交叉损失熵Crossentropy-Error

In [29]:

shell 复制代码

# class MatMul:
#     def __init__(self, W):
#         self.params = [W]  # 保存学习的参数，此时只有权重W
#         self.grads = [np.zeros_like(W)]  # 梯度保存在grads
#         self.x = None
    
#     # 前向传播
#     def forward(self, x):
#         W, = self.params    # 参数
#         out = np.dot(x,W)   # 输出
#         self.x = x
#         return out
    
#     # 后向传播
#     def backward(self, dout):
#         W, = self.params
#         dx = np.dot(dout, W.T)
#         dW = np.dot(self.x.T, dout)
#         # grads[0][...] 使用了省略号：可以固定Numpy数组的内存地址，覆盖Numpy数组的元素
#         # grads[0]=dW 浅复制   grads[0][...] = dW 深复制
#         self.grads[0][...] = dW  # 实例变量grads中设置权重的梯度；grads列表中每个元素是Numpy数组
#         return dx

In [30]:

python 复制代码

def cross_entropy_error(y, t):
    """
    交叉熵损失的实现
    """
    if y.ndim == 1:
        t = t.reshape(1, t.size)
        y = y.reshape(1, y.size)
        
    # 在监督标签为one-hot-vector的情况下，转换为正确解标签的索引
    if t.size == y.size:
        t = t.argmax(axis=1)
             
    batch_size = y.shape[0]

    return -np.sum(np.log(y[np.arange(batch_size), t] + 1e-7)) / batch_size

5.2 SoftmaxWithLoss类

SoftmaxWithLoss类的实现需要现实现softmax和cross_entropy_error函数，最终合并成SoftmaxWithLoss类：

In [31]:

python 复制代码

def softmax(x):
    """
    softmax函数的实现
    """
    if x.ndim == 2:
        x = x - x.max(axis=1, keepdims=True)
        x = np.exp(x)
        x /= x.sum(axis=1, keepdims=True)
    elif x.ndim == 1:
        x = x - np.max(x)
        x = np.exp(x) / np.sum(np.exp(x))

    return x


class SoftmaxWithLoss:
    def __init__(self):
        self.params, self.grads = [], []
        self.y = None  # softmax的输出
        self.t = None  # 监督标签
        
    def forward(self,x,t):
        self.t = t
        self.y = softmax(x)  # 用到前面定义的softmax函数
        
        if self.t.size == self.y.size:
            self.t = self.t.argmax(axis=1)
        
        loss = cross_entropy_error(self.y, self.t)
        return loss
    
    def backward(self, dout=1):
        batch_size = self.t.shape[0]
        
        dx = self.y.copy()
        dx[np.arange(batch_size), self.t] -= 1
        dx *= dout
        dx = dx / batch_size
        
        return dx

5.3 SimpleCBOW类

In [32]:

python 复制代码

class SimpleCBOW():
    def __init__(self, vocab_size,hidden_size): 
        """
        vocab_size：词汇个数
        hidden_size：中间层的神经元个数
        """
        V,H = vocab_size, hidden_size
        # 初始化权重；数值很小
        W_in = 0.01 * np.random.randn(V,H).astype("f")  # 指定类型为32位浮点数
        W_out = 0.01 * np.random.randn(H,V).astype("f")
        
        # 生成层：两个输入侧的层 + 一个输出侧的层 + 一个SoftmaxWithLoss层
        self.in_layer0 = MatMul(W_in)  # 使用MatMul层
        self.in_layer1 = MatMul(W_in)
        self.out_layer = MatMul(W_out)
        self.loss_layer = SoftmaxWithLoss()
        
        # 将所有的权重和梯度保存到列表
        layers = [self.in_layer0, self.in_layer1, self.out_layer]
        
        self.params, self.grads = [], []
        
        for layer in layers:
            self.params += layer.params
            self.grads += layer.grads
            
        # 将单词的分布式表示设置为成员变量
        self.word_vecs = W_in
        
        
    def forward(self,contexts,target):
        """
        前项传播功能
        
        contexts,target：上下文和目标单词作为输入
        loss：作为返回值
        """
        h0 = self.in_layer0.forward(contexts[:, 0])
        h1 = self.in_layer1.forward(contexts[:, 1])
        
        h = (h0 + h1) * 0.5
        
        score = self.out_layer.forward(h)
        loss = self.loss_layer.forward(score, target)
        return loss
    
    
    def backward(self, dout=1):
        """
        反向传播功能：在与正向传播相反的方向上传播梯度
        """
        ds = self.loss_layer.backward(dout)
        da = self.out_layer.backward(ds)
        da *= 0.5
        
        self.in_layer1.backward(da)
        self.in_layer0.backward(da)
        return None

5.4 Adam类

定义优化器

In [33]:

python 复制代码

class Adam:
    """
    Adam优化器实现
    """
    def __init__(self, lr=0.001, beta1=0.9, beta2=0.999):
        self.lr = lr
        self.beta1 = beta1
        self.beta2 = beta2
        self.iter = 0
        self.m = None
        self.v = None
        
    def update(self, params, grads):
        if self.m is None:
            self.m, self.v = [], []
            for param in params:
                self.m.append(np.zeros_like(param))
                self.v.append(np.zeros_like(param))
        
        self.iter += 1
        lr_t = self.lr * np.sqrt(1.0 - self.beta2**self.iter) / (1.0 - self.beta1**self.iter)

        for i in range(len(params)):
            self.m[i] += (1 - self.beta1) * (grads[i] - self.m[i])
            self.v[i] += (1 - self.beta2) * (grads[i]**2 - self.v[i])
            
            params[i] -= lr_t * self.m[i] / (np.sqrt(self.v[i]) + 1e-7)

5.5 参数去重

In [34]:

python 复制代码

# 参数去重

def remove_duplicate(params, grads):
    '''
    将参数列表中重复的权重整合为1个，
    加上与该权重对应的梯度
    '''
    params, grads = params[:], grads[:]  # 副本

    while True:
        find_flg = False
        L = len(params)

        for i in range(0, L - 1):
            for j in range(i + 1, L):
                # 在共享权重的情况下
                if params[i] is params[j]:
                    grads[i] += grads[j]  # 加上梯度
                    find_flg = True
                    params.pop(j)
                    grads.pop(j)
                # 在作为转置矩阵共享权重的情况下（weight tying）
                elif params[i].ndim == 2 and params[j].ndim == 2 and \
                     params[i].T.shape == params[j].shape and np.all(params[i].T == params[j]):
                    grads[i] += grads[j].T
                    find_flg = True
                    params.pop(j)
                    grads.pop(j)

                if find_flg: 
                    break
            if find_flg: 
                break

        if not find_flg:
            break

    return params, grads

5.6 Trainer类

模型训练类Trainer的实现

In [35]:

ini 复制代码

# coding: utf-8

import numpy as np
import time
import matplotlib.pyplot as plt


class Trainer:
    def __init__(self, model, optimizer):
        self.model = model
        self.optimizer = optimizer
        self.loss_list = []
        self.eval_interval = None
        self.current_epoch = 0

    def fit(self, x, t, max_epoch=10, batch_size=32, max_grad=None, eval_interval=20):
        data_size = len(x)
        max_iters = data_size // batch_size
        self.eval_interval = eval_interval
        model, optimizer = self.model, self.optimizer
        total_loss = 0
        loss_count = 0

        start_time = time.time()
        for epoch in range(max_epoch):
            # 打乱
            idx = np.random.permutation(np.arange(data_size))
            x = x[idx]
            t = t[idx]

            for iters in range(max_iters):
                batch_x = x[iters*batch_size:(iters+1)*batch_size]
                batch_t = t[iters*batch_size:(iters+1)*batch_size]

                # 计算梯度，更新参数
                loss = model.forward(batch_x, batch_t)
                model.backward()
                params, grads = remove_duplicate(model.params, model.grads)  # 将共享的权重整合为1个
                if max_grad is not None:
                    clip_grads(grads, max_grad)
                optimizer.update(params, grads)
                total_loss += loss
                loss_count += 1

                # 评价
                if (eval_interval is not None) and (iters % eval_interval) == 0:
                    avg_loss = total_loss / loss_count
                    elapsed_time = time.time() - start_time
                    print('| epoch %d |  iter %d / %d | time %d[s] | loss %.2f'
                          % (self.current_epoch + 1, iters + 1, max_iters, elapsed_time, avg_loss))
                    self.loss_list.append(float(avg_loss))
                    total_loss, loss_count = 0, 0

            self.current_epoch += 1

    def plot(self, ylim=None):
        x = np.arange(len(self.loss_list))
        if ylim is not None:
            plt.ylim(*ylim)
        plt.plot(x, self.loss_list, label='train')
        plt.xlabel('iterations (x' + str(self.eval_interval) + ')')
        plt.ylabel('loss')
        plt.show()

6 模型训练

In [36]:

ini 复制代码

# 参数设置
window_size = 1
hidden_size = 5
batch_size = 3
max_epoch = 1000

训练数据的准备：

In [37]:

ini 复制代码

text = "You say goodbye and I say hello."
corpus, word_to_id, id_to_word = preprocess(text)
corpus

Out[37]:

scss 复制代码

array([0, 1, 2, 3, 4, 1, 5, 6])

In [38]:

ini 复制代码

vocab_size = len(word_to_id)
vocab_size

Out[38]:

复制代码

In [39]:

ini 复制代码

contexts, target = create_contexts_target(corpus, window_size)

In [40]:

ini 复制代码

# 单词ID转成one-hot表示
contexts = convert_one_hot(contexts, vocab_size)
target = convert_one_hot(target, vocab_size)

In [41]:

scss 复制代码

print(target.shape)
print(contexts.shape)
print("--------------")
(6, 7)
(6, 2, 7)
--------------

建立模型并训练：

In [42]:

ini 复制代码

model = SimpleCBOW(vocab_size, hidden_size)

In [43]:

ini 复制代码

optimizer = Adam()  # 优化器

In [44]:

ini 复制代码

trainer = Trainer(model, optimizer) # 实例化模型

In [45]:

scss 复制代码

trainer.fit(contexts, target, max_epoch, batch_size)

trainer.plot()

从上面的图中能够看出来：损失在不断减小。

In [46]:

ini 复制代码

# 输入侧的MatMul层的权重赋值给成员变量word_vecs（保存权重）
word_vecs = model.word_vecs

for word_id, word in id_to_word.items():
    print(word_id, word, word_vecs[word_id])
0 you [ 1.1080161  1.1736006  1.1718642 -1.09501    1.3160094]
1 say [-1.235726   -0.53013486  0.04562146  1.2181959  -1.2348078 ]
2 goodbye [ 0.90616506  0.66335976  0.77228236 -0.9329069   0.5517408 ]
3 and [-1.0277987 -1.6456901 -1.5887768  1.034229  -1.0579547]
4 i [ 0.91229594  0.677518    0.77916825 -0.95334023  0.535506  ]
5 hello [ 1.0837361  1.1706287  1.1803834 -1.1276909  1.3076633]
6 . [-1.0578176  1.6483775  1.366552   1.0791936 -1.0280125]

In [47]:

bash 复制代码

word_vecs   # word_vecs的各行保存了对应的单词ID的分布式表示

Out[47]:

ini 复制代码

array([[ 1.1080161 ,  1.1736006 ,  1.1718642 , -1.09501   ,  1.3160094 ],
       [-1.235726  , -0.53013486,  0.04562146,  1.2181959 , -1.2348078 ],
       [ 0.90616506,  0.66335976,  0.77228236, -0.9329069 ,  0.5517408 ],
       [-1.0277987 , -1.6456901 , -1.5887768 ,  1.034229  , -1.0579547 ],
       [ 0.91229594,  0.677518  ,  0.77916825, -0.95334023,  0.535506  ],
       [ 1.0837361 ,  1.1706287 ,  1.1803834 , -1.1276909 ,  1.3076633 ],
       [-1.0578176 ,  1.6483775 ,  1.366552  ,  1.0791936 , -1.0280125 ]],
      dtype=float32)

这就是得到的单词密集向量，即单词的分布式表示。