图数据集的加载

原文参考官方文档:

https://pytorch-geometric.readthedocs.io/en/latest/modules/loader.html

torch_geometric.loader 库中, 该库中包含了多种 图数据集的 加载方式,

这里主要介绍 DenseDataLoader and DataLoader 这两者之间的区别;

Difference between DenseDataLoader and DataLoader in PyTorch Geometric

1. DenseDataLoader:

  • Purpose: Specifically designed for loading batches of dense graph data where all graph attributes have the same shape.

  • Stacking: Stacks all graph attributes in a new dimension, which means that all graph data needs to be dense (i.e., have the same shape).

  • Use Case: Ideal for situations where the adjacency matrices and feature matrices of graphs in the dataset are of consistent size and can be stacked without any padding or truncation.

  • Implementation : Uses a custom collate_fn that stacks all attributes of the graph data objects into a new dimension. This is suitable for dense graph data.

    python 复制代码
    class DenseDataLoader(torch.utils.data.DataLoader):
        ...
        def __init__(self, dataset: Union[Dataset, List[Data]],
                     batch_size: int = 1, shuffle: bool = False, **kwargs):
            kwargs.pop('collate_fn', None)  # Ensure no other collate function is used
            super().__init__(dataset, batch_size=batch_size, shuffle=shuffle,
                             collate_fn=collate_fn, **kwargs)

2. DataLoader:

  • Purpose: General-purpose data loader for PyTorch Geometric datasets. It can handle both homogeneous and heterogeneous graph data.

  • Flexibility: Can handle varying graph sizes and structures by merging data objects into mini-batches. Suitable for heterogeneous data where graph attributes may differ in shape and size.

  • Collate Function : Uses a custom collate function, Collater, which can handle different types of data elements (e.g., BaseData, Tensor, etc.). This function is versatile and can manage the complexity of heterogeneous graph data.

  • Use Case: Ideal for most graph data scenarios, especially when graphs vary in size and shape, and when working with both homogeneous and heterogeneous data.

    python 复制代码
    class DataLoader(torch.utils.data.DataLoader):
        ...
        def __init__(
            self,
            dataset: Union[Dataset, Sequence[BaseData], DatasetAdapter],
            batch_size: int = 1,
            shuffle: bool = False,
            follow_batch: Optional[List[str]] = None,
            exclude_keys: Optional[List[str]] = None,
            **kwargs,
        ):
            kwargs.pop('collate_fn', None)  # Ensure no other collate function is used
            super().__init__(
                dataset,
                batch_size,
                shuffle,
                collate_fn=Collater(dataset, follow_batch, exclude_keys),
                **kwargs,
            )

Key Differences

  1. Data Shape Consistency:

    • DenseDataLoader: Requires all graph attributes to have the same shape.
    • DataLoader: Can handle variable graph sizes and shapes.
  2. Batching Mechanism:

    • DenseDataLoader: Stacks all attributes into a new dimension, suitable for dense data.
    • DataLoader : Uses the Collater class to handle complex data batching, suitable for heterogeneous and variable-sized graph data.
  3. Use Cases:

    • DenseDataLoader: Best for datasets with consistent graph sizes and shapes.
    • DataLoader: Best for general-purpose graph data loading, especially with varying graph structures.

Practical Example of Each Loader

DenseDataLoader Example:

python 复制代码
from torch_geometric.datasets import TUDataset
from torch_geometric.loader import DenseDataLoader

# Load a dataset where all graphs have the same number of nodes
dataset = TUDataset(root='/tmp/ENZYMES', name='ENZYMES')

# Create a DenseDataLoader
loader = DenseDataLoader(dataset, batch_size=32, shuffle=True)

for batch in loader:
    print(batch)
    break

DataLoader Example:

python 复制代码
from torch_geometric.datasets import TUDataset
from torch_geometric.loader import DataLoader

# Load a dataset with graphs of varying sizes
dataset = TUDataset(root='/tmp/ENZYMES', name='ENZYMES')

# Create a DataLoader
loader = DataLoader(dataset, batch_size=32, shuffle=True)

for batch in loader:
    print(batch)
    break

Summary

  • Use DenseDataLoader when working with datasets where all graphs have the same size and shape.
  • Use DataLoader for more flexible and general-purpose graph data loading, especially when dealing with variable graph structures.
相关推荐
king of code porter17 分钟前
百宝箱企业版搭建智能体应用-平台概述
人工智能·大模型·智能体
愚公搬代码21 分钟前
【愚公系列】《AI短视频创作一本通》004-AI短视频的准备工作(创作AI短视频的基本流程)
人工智能·音视频
物联网软硬件开发-轨物科技22 分钟前
【轨物洞见】告别“被动维修”!预测性运维如何重塑老旧电站的资产价值?
运维·人工智能
电商API_1800790524723 分钟前
第三方淘宝商品详情 API 全维度调用指南:从技术对接到生产落地
java·大数据·前端·数据库·人工智能·网络爬虫
梁辰兴37 分钟前
百亿美元赌注变数,AI军备竞赛迎来转折点?
人工智能·ai·大模型·openai·英伟达·梁辰兴·ai军备竞赛
PaperRed ai写作降重助手39 分钟前
智能写作ai论文生成软件推荐
人工智能·aigc·ai写作·智能降重·paperred
龙山云仓42 分钟前
No140:AI世间故事-对话康德——先验哲学与AI理性:范畴、道德律与自主性
大数据·人工智能·深度学习·机器学习·全文检索·lucene
IT·小灰灰1 小时前
30行PHP,利用硅基流动API,网页客服瞬间上线
开发语言·人工智能·aigc·php
新缸中之脑2 小时前
编码代理的未来
人工智能
Anarkh_Lee2 小时前
【小白也能实现智能问数智能体】使用开源的universal-db-mcp在coze中实现问数 AskDB智能体
数据库·人工智能·ai·开源·ai编程