PubMed
- 1、导包:
- 2、加载数据集
- [输出如下:Dataset: PubMed()](#输出如下:Dataset: PubMed())
- [Number of graphs:1 Number of features:500 Number of classes:3 Data(x=[19717, 500], edge_index=[2, 88648], y=[19717], train_mask=[19717], val_mask=[19717], test_mask=[19717])](#Number of graphs:1 Number of features:500 Number of classes:3 Data(x=[19717, 500], edge_index=[2, 88648], y=[19717], train_mask=[19717], val_mask=[19717], test_mask=[19717]))
- 3、数据获取并绘制图画
- 4、定义模型:
- 5、定义训练函数和测试函数
- 6、开始训练:
1、导包:
python
import time
import torch
from torch_geometric.datasets import Planetoid
from torch_geometric.transforms import NormalizeFeatures
import matplotlib.pyplot as plt
import torch.nn as nn
from torch_geometric.nn import GCNConv
import torch.nn.functional as F
2、加载数据集
python
dataset = Planetoid(root='data/Planetoid',name='PubMed',transform=NormalizeFeatures())
观测数据集
python
print(f'Dataset: {dataset}')
print('==================')
print(f'Number of graphs:{len(dataset)}')
print(f'Number of features:{dataset.num_features}')
print(f'Number of classes:{dataset.num_classes}')
data = dataset[0]
print()
# Data(x=[19717, 500], edge_index=[2, 88648], y=[19717], train_mask=[19717], val_mask=[19717], test_mask=[19717])
print(data)
print('=====================================================')
print(f'Number of nodes:{data.num_nodes}')
print(f'Number of edges:{data.num_edges}')
print(f'Average node degree: {data.num_edges / data.num_nodes:.2f}')
print(f'Number of training nodes: {data.train_mask.sum()}')
print(f'Training node label trate: {int(data.train_mask.sum()) / data.num_nodes:.2f}')
print(f'Has isolated nodes: {data.has_isolated_nodes()}')
print(f'Has self-loops:{data.has_self_loops()}')
print(f'Is undirected:{data.is_undirected()}')
# train_mask数据集中有True和False 两种类型,True表示是训练集,
# 使用data.train_mask.sum() 统计出train_mask 中属于训练集的元素
print(int(data.train_mask.sum()))
print(int(data.val_mask.sum()))
print(int(data.test_mask.sum()))
print('===========================')
输出如下:Dataset: PubMed()
Number of graphs:1
Number of features:500
Number of classes:3
Data(x=[19717, 500], edge_index=[2, 88648], y=[19717], train_mask=[19717], val_mask=[19717], test_mask=[19717])Number of nodes:19717
Number of edges:88648
Average node degree: 4.50
Number of training nodes: 60
Training node label trate: 0.00
Has isolated nodes: False
Has self-loops:False
Is undirected:True
60
500
1000
3、数据获取并绘制图画
python
# 统计每个类别的数量
# 怎么说呢?例如PubMed 数据集中 y=[19717],找出 标签y中所有唯一的标签,也就是去重,然后再找出每个唯一的标签所出现的次数并返回
# 例如:data.y = torch.tensor([0, 1, 0, 2, 1, 2, 2, 0])
# 那么唯一的标签就是:unique_labels= tensor([0, 1, 2])
# 出现的次数就是: counts = tensor([3, 2, 3])
unique_labels,counts = torch.unique(data.y,return_counts=True)
# 打印每个类别的数量
for label,count in zip(unique_labels,counts):
print(f'类别{label.item()}:{count.item()}个样本')
# 使用直方图实现可视化
plt.bar(unique_labels.numpy(),counts.numpy())
plt.xlabel('labels')
plt.ylabel('counts')
# 在每个条形上方添加数字标签]
for label,count in zip(unique_labels,counts):
plt.text(label.item(),count.item(),str(count.item()),ha='center',va='bottom')
# plt.show()
4、定义模型:
python
class GCN(nn.Module):
def __init__(self,output_channels=3):
super(GCN,self).__init__()
self.covn1 = GCNConv(500,32)
self.covn2 = GCNConv(32,output_channels)
self.dp = nn.Dropout(0.5)
def forward(self,x,edge_index):
x = self.covn1(x,edge_index)
x = F.relu(x)
x = self.dp(x)
x = self.covn2(x,edge_index)
return x
5、定义训练函数和测试函数
python
```python
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model = GCN().to(device)
optimizer = torch.optim.Adam(model.parameters(),lr=0.01,weight_decay=1e-4)
criterion = nn.CrossEntropyLoss(reduction='mean').to(device)
data.to(device)
def train():
model.train() # 设置为训练模式
optimizer.zero_grad()
train_outputs = model(data.x,data.edge_index)
loss = criterion(train_outputs[data.train_mask],data.y[data.train_mask])
loss.backward()
optimizer.step()
return loss
def test():
model.eval() # 设置为评估模式
outputs = model(data.x,data.edge_index)
# 进行预测
preds = outputs.argmax(dim=1)
accs = []
for mask in [data.train_mask,data.val_mask,data.test_mask]:
correct = preds[mask] == data.y[mask]
accs.append(int(correct.sum()) / int(mask.sum()))
return accs
小小理解:
如何理解上边的test呢?
举个例子:data = {
'x': torch.randn(10, 5), # 10个节点,每个节点有5个特征
'edge_index': torch.tensor([[0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
[1, 2, 3, 4, 5, 6, 7, 8, 9, 0]]), # 10条边,连成一个环
'y': torch.tensor([0, 1, 0, 1, 0, 1, 0, 1, 0, 1]), # 10个节点的标签
'train_mask': torch.tensor([True, True, True, True, False, False, False, False, False, False]), # 前4个节点为训练集
'val_mask': torch.tensor([False, False, False, False, True, True, True, False, False, False]), # 中间3个节点为验证集
'test_mask': torch.tensor([False, False, False, False, False, False, False, True, True, True]) # 最后3个节点为测试集
观测train_mask,val_mask,test_mask的形状大小,这也就是他们三个大小形状相同原因,只是设置不同的True/False 来进行区分
假设经过 outputs = model(data.x,data.edge_index) 之后 输出的数据的如下:
torch.tensor([
[0.1, 0.9], [0.8, 0.2], [0.4, 0.6], [0.3, 0.7], [0.6, 0.4],
[0.5, 0.5], [0.9, 0.1], [0.2, 0.8], [0.7, 0.3], [0.4, 0.6]
]) # 假设输出是一个形状为 (10, 2) 的张量,表示每个节点的类别概率}
进行预测 preds = output.argmax(dim=1)之后,得到
preds = [1, 0, 1, 1, 0, 0, 0, 1, 0, 1]
对于训练集train_mask来说:
preds = [ 1, 0, 1, 1, 0, 0, 0, 1, 0, 1]
train_mask = tensor([True, True, True, True, False, False, False, False, False, False])
data.y = [ 0, 1, 0, 1, 0, 1, 0, 1, 0, 1]
前两行选出了 preds中的前4个 [1,0,1,1]
后两行选出了 y 中的前4个 [0,1,0,1]
preds[mask]的作用是筛选出 对应的训练集,同样data['y'][mask] 也是筛选出y中对应的训练标签
preds[mask] = [1,0,1,1] data['y'][mask] =[0,1,0,1]
经过: correct = preds[mask] == data.y[mask] 得到:
correct = [0,0,0,1]
故:correct.sum() = 1, 而mask.sum() =4
最后:accs.append(1/4)
然后依次计算验证集,测试集
6、开始训练:
python
start_time = time.time()
for epoch in range(101):
loss = train()
train_acc,val_acc,test_acc = test()
if epoch % 10 == 0:
print('Epoch#{:03d},Loss:{:.4f},Train_Accuracy:{:.4f},Val_Accuracy:{:.4f},Test_Accuracy:{:.4f}'.format(epoch,loss,train_acc,val_acc,test_acc))
end_time = time.time()
elapsed_tiem = end_time - start_time
print(f'Elapsed_time:{elapsed_tiem} seconds')
输出结果如下:
Epoch#000,Loss:1.0993,Train_Accuracy:0.6333,Val_Accuracy:0.6020,Test_Accuracy:0.6020
Epoch#010,Loss:0.9639,Train_Accuracy:0.9167,Val_Accuracy:0.7160,Test_Accuracy:0.7180
Epoch#020,Loss:0.7399,Train_Accuracy:0.9500,Val_Accuracy:0.7520,Test_Accuracy:0.7430
Epoch#030,Loss:0.4626,Train_Accuracy:0.9500,Val_Accuracy:0.7600,Test_Accuracy:0.7520
Epoch#040,Loss:0.3323,Train_Accuracy:0.9500,Val_Accuracy:0.7660,Test_Accuracy:0.7640
Epoch#050,Loss:0.2175,Train_Accuracy:0.9833,Val_Accuracy:0.7820,Test_Accuracy:0.7710
Epoch#060,Loss:0.1686,Train_Accuracy:1.0000,Val_Accuracy:0.7840,Test_Accuracy:0.7740
Epoch#070,Loss:0.1014,Train_Accuracy:1.0000,Val_Accuracy:0.7820,Test_Accuracy:0.7730
Epoch#080,Loss:0.0863,Train_Accuracy:1.0000,Val_Accuracy:0.7880,Test_Accuracy:0.7810
Epoch#090,Loss:0.0832,Train_Accuracy:1.0000,Val_Accuracy:0.7760,Test_Accuracy:0.7780
Epoch#100,Loss:0.0750,Train_Accuracy:1.0000,Val_Accuracy:0.7800,Test_Accuracy:0.7840
Elapsed_time:1.9149479866027832 seconds