【图神经网络——PubMed】

PubMed

1、导包:

python 复制代码
import time

import torch
from torch_geometric.datasets import Planetoid
from torch_geometric.transforms import NormalizeFeatures
import matplotlib.pyplot as plt
import torch.nn as nn
from torch_geometric.nn import GCNConv
import torch.nn.functional as F

2、加载数据集

python 复制代码
dataset = Planetoid(root='data/Planetoid',name='PubMed',transform=NormalizeFeatures())

观测数据集

python 复制代码
print(f'Dataset: {dataset}')
print('==================')
print(f'Number of graphs:{len(dataset)}')
print(f'Number of features:{dataset.num_features}')
print(f'Number of classes:{dataset.num_classes}')

data = dataset[0]

print()
# Data(x=[19717, 500], edge_index=[2, 88648], y=[19717], train_mask=[19717], val_mask=[19717], test_mask=[19717])
print(data)
print('=====================================================')
print(f'Number of nodes:{data.num_nodes}')
print(f'Number of edges:{data.num_edges}')
print(f'Average node degree: {data.num_edges / data.num_nodes:.2f}')
print(f'Number of training nodes: {data.train_mask.sum()}')
print(f'Training node label trate: {int(data.train_mask.sum()) / data.num_nodes:.2f}')
print(f'Has isolated nodes: {data.has_isolated_nodes()}')
print(f'Has self-loops:{data.has_self_loops()}')
print(f'Is undirected:{data.is_undirected()}')

# train_mask数据集中有True和False 两种类型,True表示是训练集,
# 使用data.train_mask.sum() 统计出train_mask 中属于训练集的元素
print(int(data.train_mask.sum()))
print(int(data.val_mask.sum()))
print(int(data.test_mask.sum()))
print('===========================')

输出如下:Dataset: PubMed()

Number of graphs:1

Number of features:500
Number of classes:3
Data(x=19717, 500, edge_index=2, 88648, y=19717, train_mask=19717, val_mask=19717, test_mask=19717)

Number of nodes:19717

Number of edges:88648

Average node degree: 4.50

Number of training nodes: 60

Training node label trate: 0.00

Has isolated nodes: False

Has self-loops:False

Is undirected:True

60

500

1000

3、数据获取并绘制图画

python 复制代码
# 统计每个类别的数量
# 怎么说呢?例如PubMed 数据集中 y=[19717],找出 标签y中所有唯一的标签,也就是去重,然后再找出每个唯一的标签所出现的次数并返回
# 例如:data.y = torch.tensor([0, 1, 0, 2, 1, 2, 2, 0])
# 那么唯一的标签就是:unique_labels= tensor([0, 1, 2])
# 出现的次数就是:   counts       = tensor([3, 2, 3])
unique_labels,counts = torch.unique(data.y,return_counts=True)

# 打印每个类别的数量
for label,count in zip(unique_labels,counts):
    print(f'类别{label.item()}:{count.item()}个样本')

# 使用直方图实现可视化
plt.bar(unique_labels.numpy(),counts.numpy())
plt.xlabel('labels')
plt.ylabel('counts')

# 在每个条形上方添加数字标签]
for label,count in zip(unique_labels,counts):
    plt.text(label.item(),count.item(),str(count.item()),ha='center',va='bottom')
# plt.show()

4、定义模型:

python 复制代码
class GCN(nn.Module):
    def __init__(self,output_channels=3):
        super(GCN,self).__init__()

        self.covn1 = GCNConv(500,32)
        self.covn2 = GCNConv(32,output_channels)

        self.dp = nn.Dropout(0.5)
    def forward(self,x,edge_index):
        x = self.covn1(x,edge_index)
        x = F.relu(x)
        x = self.dp(x)
        x = self.covn2(x,edge_index)
        return x

5、定义训练函数和测试函数

python 复制代码
```python
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model = GCN().to(device)
optimizer = torch.optim.Adam(model.parameters(),lr=0.01,weight_decay=1e-4)
criterion = nn.CrossEntropyLoss(reduction='mean').to(device)
data.to(device)

def train():
    model.train() # 设置为训练模式
    optimizer.zero_grad()
    train_outputs = model(data.x,data.edge_index)
    loss = criterion(train_outputs[data.train_mask],data.y[data.train_mask])
    loss.backward()
    optimizer.step()
    return loss
def test():
    model.eval() # 设置为评估模式
    outputs = model(data.x,data.edge_index)
    # 进行预测
    preds = outputs.argmax(dim=1)
    accs = []
    for mask in [data.train_mask,data.val_mask,data.test_mask]:
        correct = preds[mask] == data.y[mask]
        accs.append(int(correct.sum()) / int(mask.sum()))
    return accs

小小理解:

如何理解上边的test呢?

举个例子:data = {

'x': torch.randn(10, 5), # 10个节点,每个节点有5个特征

'edge_index': torch.tensor(\[0, 1, 2, 3, 4, 5, 6, 7, 8, 9,

1, 2, 3, 4, 5, 6, 7, 8, 9, 0]), # 10条边,连成一个环

'y': torch.tensor(0, 1, 0, 1, 0, 1, 0, 1, 0, 1), # 10个节点的标签

'train_mask': torch.tensor(True, True, True, True, False, False, False, False, False, False), # 前4个节点为训练集

'val_mask': torch.tensor(False, False, False, False, True, True, True, False, False, False), # 中间3个节点为验证集

'test_mask': torch.tensor(False, False, False, False, False, False, False, True, True, True) # 最后3个节点为测试集

观测train_mask,val_mask,test_mask的形状大小,这也就是他们三个大小形状相同原因,只是设置不同的True/False 来进行区分

假设经过 outputs = model(data.x,data.edge_index) 之后 输出的数据的如下:

torch.tensor([

0.1, 0.9, 0.8, 0.2, 0.4, 0.6, 0.3, 0.7, 0.6, 0.4,

0.5, 0.5\], \[0.9, 0.1\], \[0.2, 0.8\], \[0.7, 0.3\], \[0.4, 0.6

]) # 假设输出是一个形状为 (10, 2) 的张量,表示每个节点的类别概率}

进行预测 preds = output.argmax(dim=1)之后,得到

preds = 1, 0, 1, 1, 0, 0, 0, 1, 0, 1

对于训练集train_mask来说:

preds = 1, 0, 1, 1, 0, 0, 0, 1, 0, 1

train_mask = tensor(True, True, True, True, False, False, False, False, False, False)

data.y = 0, 1, 0, 1, 0, 1, 0, 1, 0, 1

前两行选出了 preds中的前4个 1,0,1,1

后两行选出了 y 中的前4个 0,1,0,1

predsmask的作用是筛选出 对应的训练集,同样data'y'mask 也是筛选出y中对应的训练标签

predsmask = 1,0,1,1 data'y'mask =0,1,0,1

经过: correct = predsmask == data.ymask 得到:

correct = 0,0,0,1

故:correct.sum() = 1, 而mask.sum() =4

最后:accs.append(1/4)

然后依次计算验证集,测试集

6、开始训练:

python 复制代码
start_time = time.time()
for epoch in range(101):
    loss = train()
    train_acc,val_acc,test_acc = test()
    if epoch % 10 == 0:
        print('Epoch#{:03d},Loss:{:.4f},Train_Accuracy:{:.4f},Val_Accuracy:{:.4f},Test_Accuracy:{:.4f}'.format(epoch,loss,train_acc,val_acc,test_acc))

end_time = time.time()
elapsed_tiem = end_time - start_time
print(f'Elapsed_time:{elapsed_tiem}  seconds')

输出结果如下:

Epoch#000,Loss:1.0993,Train_Accuracy:0.6333,Val_Accuracy:0.6020,Test_Accuracy:0.6020

Epoch#010,Loss:0.9639,Train_Accuracy:0.9167,Val_Accuracy:0.7160,Test_Accuracy:0.7180

Epoch#020,Loss:0.7399,Train_Accuracy:0.9500,Val_Accuracy:0.7520,Test_Accuracy:0.7430

Epoch#030,Loss:0.4626,Train_Accuracy:0.9500,Val_Accuracy:0.7600,Test_Accuracy:0.7520

Epoch#040,Loss:0.3323,Train_Accuracy:0.9500,Val_Accuracy:0.7660,Test_Accuracy:0.7640

Epoch#050,Loss:0.2175,Train_Accuracy:0.9833,Val_Accuracy:0.7820,Test_Accuracy:0.7710

Epoch#060,Loss:0.1686,Train_Accuracy:1.0000,Val_Accuracy:0.7840,Test_Accuracy:0.7740

Epoch#070,Loss:0.1014,Train_Accuracy:1.0000,Val_Accuracy:0.7820,Test_Accuracy:0.7730

Epoch#080,Loss:0.0863,Train_Accuracy:1.0000,Val_Accuracy:0.7880,Test_Accuracy:0.7810

Epoch#090,Loss:0.0832,Train_Accuracy:1.0000,Val_Accuracy:0.7760,Test_Accuracy:0.7780

Epoch#100,Loss:0.0750,Train_Accuracy:1.0000,Val_Accuracy:0.7800,Test_Accuracy:0.7840

Elapsed_time:1.9149479866027832 seconds

相关推荐
小江的记录本7 小时前
【JVM虚拟机】垃圾回收GC:四种引用类型:强引用、软引用、弱引用、虚引用(附《思维导图》+《面试高频考点清单》)
java·jvm·spring boot·后端·python·spring·面试
墨神谕7 小时前
人工智能(三)— 神经网络的训练
人工智能·神经网络·机器学习
APIshop7 小时前
Python 获取 1688 商品采集 API 接口 | 工厂货源自动化对接商品信息 | 无需选品
运维·python·自动化
deepin_sir7 小时前
10 - 函数
开发语言·python
charlee448 小时前
《GIS基础原理与技术实践》配套案例(Python版)
python·conda·numpy·gis·环境配置
枫叶林FYL8 小时前
项目十:事件溯源仓储管理系统(WMS)仿真实现
开发语言·python
君为先-bey8 小时前
CogVideoX——Transformer从文本到视频的扩散模型
深度学习·音视频·transformer·扩散模型
青风979 小时前
SDDGR:基于稳定扩散的深度生成重放,用于类增量对象检测(CVPR 2024)
网络·人工智能·深度学习·神经网络·计算机视觉
忆~遂愿9 小时前
《大模型驱动软件测试》| 软件工程3.0时代,大模型驱动测试实战指南
人工智能·深度学习·神经网络·机器学习·自然语言处理·软件工程·知识图谱
这是谁的博客?10 小时前
Mamba 状态空间模型深度解析:挑战 Transformer 的新一代架构
深度学习·ai·架构·transformer·ssm·mamba·状态空间模型