【图神经网络——PubMed】

PubMed

1、导包:

python 复制代码
import time

import torch
from torch_geometric.datasets import Planetoid
from torch_geometric.transforms import NormalizeFeatures
import matplotlib.pyplot as plt
import torch.nn as nn
from torch_geometric.nn import GCNConv
import torch.nn.functional as F

2、加载数据集

python 复制代码
dataset = Planetoid(root='data/Planetoid',name='PubMed',transform=NormalizeFeatures())

观测数据集

python 复制代码
print(f'Dataset: {dataset}')
print('==================')
print(f'Number of graphs:{len(dataset)}')
print(f'Number of features:{dataset.num_features}')
print(f'Number of classes:{dataset.num_classes}')

data = dataset[0]

print()
# Data(x=[19717, 500], edge_index=[2, 88648], y=[19717], train_mask=[19717], val_mask=[19717], test_mask=[19717])
print(data)
print('=====================================================')
print(f'Number of nodes:{data.num_nodes}')
print(f'Number of edges:{data.num_edges}')
print(f'Average node degree: {data.num_edges / data.num_nodes:.2f}')
print(f'Number of training nodes: {data.train_mask.sum()}')
print(f'Training node label trate: {int(data.train_mask.sum()) / data.num_nodes:.2f}')
print(f'Has isolated nodes: {data.has_isolated_nodes()}')
print(f'Has self-loops:{data.has_self_loops()}')
print(f'Is undirected:{data.is_undirected()}')

# train_mask数据集中有True和False 两种类型,True表示是训练集,
# 使用data.train_mask.sum() 统计出train_mask 中属于训练集的元素
print(int(data.train_mask.sum()))
print(int(data.val_mask.sum()))
print(int(data.test_mask.sum()))
print('===========================')

输出如下:Dataset: PubMed()

Number of graphs:1

Number of features:500
Number of classes:3
Data(x=[19717, 500], edge_index=[2, 88648], y=[19717], train_mask=[19717], val_mask=[19717], test_mask=[19717])

Number of nodes:19717

Number of edges:88648

Average node degree: 4.50

Number of training nodes: 60

Training node label trate: 0.00

Has isolated nodes: False

Has self-loops:False

Is undirected:True

60

500

1000

3、数据获取并绘制图画

python 复制代码
# 统计每个类别的数量
# 怎么说呢?例如PubMed 数据集中 y=[19717],找出 标签y中所有唯一的标签,也就是去重,然后再找出每个唯一的标签所出现的次数并返回
# 例如:data.y = torch.tensor([0, 1, 0, 2, 1, 2, 2, 0])
# 那么唯一的标签就是:unique_labels= tensor([0, 1, 2])
# 出现的次数就是:   counts       = tensor([3, 2, 3])
unique_labels,counts = torch.unique(data.y,return_counts=True)

# 打印每个类别的数量
for label,count in zip(unique_labels,counts):
    print(f'类别{label.item()}:{count.item()}个样本')

# 使用直方图实现可视化
plt.bar(unique_labels.numpy(),counts.numpy())
plt.xlabel('labels')
plt.ylabel('counts')

# 在每个条形上方添加数字标签]
for label,count in zip(unique_labels,counts):
    plt.text(label.item(),count.item(),str(count.item()),ha='center',va='bottom')
# plt.show()

4、定义模型:

python 复制代码
class GCN(nn.Module):
    def __init__(self,output_channels=3):
        super(GCN,self).__init__()

        self.covn1 = GCNConv(500,32)
        self.covn2 = GCNConv(32,output_channels)

        self.dp = nn.Dropout(0.5)
    def forward(self,x,edge_index):
        x = self.covn1(x,edge_index)
        x = F.relu(x)
        x = self.dp(x)
        x = self.covn2(x,edge_index)
        return x

5、定义训练函数和测试函数

python 复制代码
```python
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model = GCN().to(device)
optimizer = torch.optim.Adam(model.parameters(),lr=0.01,weight_decay=1e-4)
criterion = nn.CrossEntropyLoss(reduction='mean').to(device)
data.to(device)

def train():
    model.train() # 设置为训练模式
    optimizer.zero_grad()
    train_outputs = model(data.x,data.edge_index)
    loss = criterion(train_outputs[data.train_mask],data.y[data.train_mask])
    loss.backward()
    optimizer.step()
    return loss
def test():
    model.eval() # 设置为评估模式
    outputs = model(data.x,data.edge_index)
    # 进行预测
    preds = outputs.argmax(dim=1)
    accs = []
    for mask in [data.train_mask,data.val_mask,data.test_mask]:
        correct = preds[mask] == data.y[mask]
        accs.append(int(correct.sum()) / int(mask.sum()))
    return accs

小小理解:

如何理解上边的test呢?

举个例子:data = {

'x': torch.randn(10, 5), # 10个节点,每个节点有5个特征

'edge_index': torch.tensor([[0, 1, 2, 3, 4, 5, 6, 7, 8, 9],

1, 2, 3, 4, 5, 6, 7, 8, 9, 0\]\]), # 10条边,连成一个环 'y': torch.tensor(\[0, 1, 0, 1, 0, 1, 0, 1, 0, 1\]), # 10个节点的标签 'train_mask': torch.tensor(\[True, True, True, True, False, False, False, False, False, False\]), # 前4个节点为训练集 'val_mask': torch.tensor(\[False, False, False, False, True, True, True, False, False, False\]), # 中间3个节点为验证集 'test_mask': torch.tensor(\[False, False, False, False, False, False, False, True, True, True\]) # 最后3个节点为测试集 # 观测train_mask,val_mask,test_mask的形状大小,这也就是他们三个大小形状相同原因,只是设置不同的True/False 来进行区分 假设经过 outputs = model(data.x,data.edge_index) 之后 输出的数据的如下: torch.tensor(\[ \[0.1, 0.9\], \[0.8, 0.2\], \[0.4, 0.6\], \[0.3, 0.7\], \[0.6, 0.4\], \[0.5, 0.5\], \[0.9, 0.1\], \[0.2, 0.8\], \[0.7, 0.3\], \[0.4, 0.6

]) # 假设输出是一个形状为 (10, 2) 的张量,表示每个节点的类别概率}

进行预测 preds = output.argmax(dim=1)之后,得到

preds = [1, 0, 1, 1, 0, 0, 0, 1, 0, 1]

对于训练集train_mask来说:

preds = [ 1, 0, 1, 1, 0, 0, 0, 1, 0, 1]

train_mask = tensor([True, True, True, True, False, False, False, False, False, False])

data.y = [ 0, 1, 0, 1, 0, 1, 0, 1, 0, 1]

前两行选出了 preds中的前4个 [1,0,1,1]

后两行选出了 y 中的前4个 [0,1,0,1]

preds[mask]的作用是筛选出 对应的训练集,同样data['y'][mask] 也是筛选出y中对应的训练标签

preds[mask] = [1,0,1,1] data['y'][mask] =[0,1,0,1]

经过: correct = preds[mask] == data.y[mask] 得到:

correct = [0,0,0,1]

故:correct.sum() = 1, 而mask.sum() =4

最后:accs.append(1/4)

然后依次计算验证集,测试集

6、开始训练:

python 复制代码
start_time = time.time()
for epoch in range(101):
    loss = train()
    train_acc,val_acc,test_acc = test()
    if epoch % 10 == 0:
        print('Epoch#{:03d},Loss:{:.4f},Train_Accuracy:{:.4f},Val_Accuracy:{:.4f},Test_Accuracy:{:.4f}'.format(epoch,loss,train_acc,val_acc,test_acc))

end_time = time.time()
elapsed_tiem = end_time - start_time
print(f'Elapsed_time:{elapsed_tiem}  seconds')

输出结果如下:

Epoch#000,Loss:1.0993,Train_Accuracy:0.6333,Val_Accuracy:0.6020,Test_Accuracy:0.6020

Epoch#010,Loss:0.9639,Train_Accuracy:0.9167,Val_Accuracy:0.7160,Test_Accuracy:0.7180

Epoch#020,Loss:0.7399,Train_Accuracy:0.9500,Val_Accuracy:0.7520,Test_Accuracy:0.7430

Epoch#030,Loss:0.4626,Train_Accuracy:0.9500,Val_Accuracy:0.7600,Test_Accuracy:0.7520

Epoch#040,Loss:0.3323,Train_Accuracy:0.9500,Val_Accuracy:0.7660,Test_Accuracy:0.7640

Epoch#050,Loss:0.2175,Train_Accuracy:0.9833,Val_Accuracy:0.7820,Test_Accuracy:0.7710

Epoch#060,Loss:0.1686,Train_Accuracy:1.0000,Val_Accuracy:0.7840,Test_Accuracy:0.7740

Epoch#070,Loss:0.1014,Train_Accuracy:1.0000,Val_Accuracy:0.7820,Test_Accuracy:0.7730

Epoch#080,Loss:0.0863,Train_Accuracy:1.0000,Val_Accuracy:0.7880,Test_Accuracy:0.7810

Epoch#090,Loss:0.0832,Train_Accuracy:1.0000,Val_Accuracy:0.7760,Test_Accuracy:0.7780

Epoch#100,Loss:0.0750,Train_Accuracy:1.0000,Val_Accuracy:0.7800,Test_Accuracy:0.7840

Elapsed_time:1.9149479866027832 seconds

相关推荐
逢生博客19 分钟前
使用 Python 项目管理工具 uv 快速创建 MCP 服务(Cherry Studio、Trae 添加 MCP 服务)
python·sqlite·uv·deepseek·trae·cherry studio·mcp服务
xwz小王子22 分钟前
Nature Communications 面向形状可编程磁性软材料的数据驱动设计方法—基于随机设计探索与神经网络的协同优化框架
深度学习
堕落似梦26 分钟前
Pydantic增强SQLALchemy序列化(FastAPI直接输出SQLALchemy查询集)
python
生信碱移1 小时前
大语言模型时代,单细胞注释也需要集思广益(mLLMCelltype)
人工智能·经验分享·深度学习·语言模型·自然语言处理·数据挖掘·数据可视化
坐吃山猪1 小时前
Python-Agent调用多个Server-FastAPI版本
开发语言·python·fastapi
Bruce-li__2 小时前
使用Django REST Framework快速开发API接口
python·django·sqlite
小兜全糖(xdqt)2 小时前
python 脚本引用django中的数据库model
python·django
Arenaschi2 小时前
SQLite 是什么?
开发语言·网络·python·网络协议·tcp/ip
纪元A梦2 小时前
华为OD机试真题——推荐多样性(2025A卷:200分)Java/python/JavaScript/C++/C语言/GO六种最佳实现
java·javascript·c++·python·华为od·go·华为od机试题
硅谷秋水2 小时前
通过模仿学习实现机器人灵巧操作:综述(上)
人工智能·深度学习·机器学习·计算机视觉·语言模型·机器人