dgraph example数据导入

数据准备

下载地址: github.com/hypermodein...

下载1million.rdf.gz, 下载1million.schema

数据清除

学习目的,清除所有之前旧的数据

css 复制代码
curl --location 'localhost:8080/alter' \
--header 'Content-Type: application/json' \
--data '{"drop_all": true}'

数据导入

下载dgraph 源代码, build

实时导入

css 复制代码
dgraph live -f 1million.rdf.gz --schema 1million.schema --alpha localhost:9080 --zero localhost:5080

离线导入

执行:

cmd 复制代码
.\live.exe bulk -f .\1million.rdf.gz --schema 1million.schema --zero localhost:508

结果

result 复制代码
I0704 11:11:44.465439    8408 init.go:68] 

Dgraph version   : dev
Dgraph codename  :
Dgraph SHA-256   : 3634731c41fb274ea640f9477985f0e79a1ddfd5ca28a7707b6694c1e7000c7c
Commit SHA-1     :
Commit timestamp :
Branch           :
Go version       : go1.24.4
jemalloc enabled : false

For Dgraph official documentation, visit https://dgraph.io/docs.
For discussions about Dgraph     , visit https://discuss.dgraph.io.
For fully-managed Dgraph Cloud   , visit https://dgraph.io/cloud.

Licensed under the Apache Public License 2.0.
© Hypermode Inc.


Encrypted input: false; Encrypted output: false
{
        "DataFiles": ".\\1million.rdf.gz",
        "DataFormat": "",
        "SchemaFile": "1million.schema",
        "GqlSchemaFile": "",
        "OutDir": "./out",
        "ReplaceOutDir": false,
        "TmpDir": "tmp",
        "NumGoroutines": 4,
        "MapBufSize": 2147483648,
        "PartitionBufSize": 4194304,
        "SkipMapPhase": false,
        "CleanupTmp": true,
        "NumReducers": 1,
        "Version": false,
        "StoreXids": false,
        "ZeroAddr": "localhost:5080",
        "ConnStr": "",
        "HttpAddr": "localhost:8080",
        "IgnoreErrors": false,
        "CustomTokenizers": "",
        "NewUids": false,
        "ClientDir": "",
        "Encrypted": false,
        "EncryptedOut": false,
        "MapShards": 1,
        "ReduceShards": 1,
        "Namespace": 18446744073709551615,
        "EncryptionKey": null,
        "Badger": {
                "Dir": "",
                "ValueDir": "",
                "SyncWrites": false,
                "NumVersionsToKeep": 1,
                "ReadOnly": false,
                "Logger": {},
                "Compression": 1,
                "InMemory": false,
                "MetricsEnabled": true,
                "NumGoroutines": 8,
                "MemTableSize": 67108864,
                "BaseTableSize": 2097152,
                "BaseLevelSize": 10485760,
                "LevelSizeMultiplier": 10,
                "TableSizeMultiplier": 2,
                "MaxLevels": 7,
                "VLogPercentile": 0,
                "ValueThreshold": 1048576,
                "NumMemtables": 5,
                "BlockSize": 4096,
                "BloomFalsePositive": 0.01,
                "BlockCacheSize": 20132659,
                "IndexCacheSize": 46976204,
                "NumLevelZeroTables": 5,
                "NumLevelZeroTablesStall": 15,
                "ValueLogFileSize": 1073741823,
                "ValueLogMaxEntries": 1000000,
                "NumCompactors": 4,
                "CompactL0OnClose": false,
                "LmaxCompaction": false,
                "ZSTDCompressionLevel": 0,
                "VerifyValueChecksum": false,
                "EncryptionKey": "",
                "EncryptionKeyRotationDuration": 864000000000000,
                "BypassLockGuard": false,
                "ChecksumVerificationMode": 0,
                "DetectConflicts": true,
                "NamespaceOffset": -1,
                "ExternalMagicVersion": 0
        }
}

The bulk loader needs to open many files at once. This number depends on the size of the data set loaded, the map file output size, and the level of indexing. 100,000 is adequate for most data set sizes. See `man ulimit` for details of how to change the limit.
Nonfatal error: max open file limit could not be detected: Cannot detect max open files on this platform

Connecting to zero at localhost:5080
Using Go memory
Processing file (1 out of 1): .\1million.rdf.gz
[11:11:45+0800] MAP 01s nquad_count:651.3k err_count:0.000 nquad_speed:507.9k/sec edge_count:3.380M edge_speed:2.635M/sec jemalloc: 0 B 
[11:11:46+0800] MAP 02s nquad_count:1.042M err_count:0.000 nquad_speed:456.4k/sec edge_count:4.719M edge_speed:2.068M/sec jemalloc: 0 B 
Shard tmp\map_output\000 -> Reduce tmp\shards\shard_0\000
badger 2025/07/04 11:11:46 INFO: All 0 tables opened in 0s
badger 2025/07/04 11:11:46 INFO: Discard stats nextEmptySlot: 0
badger 2025/07/04 11:11:46 INFO: Set nextTxnTs to 0
badger 2025/07/04 11:11:46 INFO: All 0 tables opened in 0s
badger 2025/07/04 11:11:46 INFO: Discard stats nextEmptySlot: 0
badger 2025/07/04 11:11:46 INFO: Set nextTxnTs to 0
badger 2025/07/04 11:11:46 INFO: DropAll called. Blocking writes...
badger 2025/07/04 11:11:46 INFO: Writes flushed. Stopping compactions now...
badger 2025/07/04 11:11:46 INFO: Deleted 0 SSTables. Now deleting value logs...
badger 2025/07/04 11:11:46 INFO: Value logs deleted. Creating value log file: 1
badger 2025/07/04 11:11:46 INFO: Deleted 1 value log files. DropAll done.
Num Encoders: 4
Final Histogram of buffer sizes: 
 -- Histogram:
Min value: 241739340
Max value: 241739340
Count: 1
50p: 65536.00
75p: 65536.00
90p: 65536.00
[134217728, 268435456) 1 100.00% 100.00%
 --

[11:11:47+0800] REDUCE 03s 0.00% edge_count:0.000 edge_speed:0.000/sec plist_count:0.000 plist_speed:0.000/sec. Num Encoding MBs: 230. jemalloc: 0 B 
[11:11:48+0800] REDUCE 04s 55.45% edge_count:2.617M edge_speed:2.617M/sec plist_count:354.1k plist_speed:354.1k/sec. Num Encoding MBs: 230. jemalloc: 0 B 
[11:11:49+0800] REDUCE 05s 100.00% edge_count:4.719M edge_speed:2.360M/sec plist_count:1.179M plist_speed:589.6k/sec. Num Encoding MBs: 230. jemalloc: 0 B 
Finishing stream id: 1
Finishing stream id: 2
Finishing stream id: 3
badger 2025/07/04 11:11:50 INFO: Table created: 2 at level: 6 for stream: 2. Size: 604 KiB
badger 2025/07/04 11:11:50 INFO: Table created: 3 at level: 6 for stream: 3. Size: 473 KiB
Finishing stream id: 4
badger 2025/07/04 11:11:50 INFO: Table created: 4 at level: 6 for stream: 4. Size: 2.0 MiB
badger 2025/07/04 11:11:50 INFO: Table created: 1 at level: 6 for stream: 1. Size: 40 MiB
Finishing stream id: 5
Finishing stream id: 6
badger 2025/07/04 11:11:50 INFO: Table created: 6 at level: 6 for stream: 6. Size: 355 KiB
badger 2025/07/04 11:11:50 INFO: Table created: 5 at level: 6 for stream: 5. Size: 3.5 MiB
Finishing stream id: 7
badger 2025/07/04 11:11:50 INFO: Table created: 7 at level: 6 for stream: 7. Size: 2.5 MiB
Finishing stream id: 8
Finishing stream id: 9
badger 2025/07/04 11:11:50 INFO: Table created: 9 at level: 6 for stream: 9. Size: 258 KiB
badger 2025/07/04 11:11:50 INFO: Table created: 8 at level: 6 for stream: 8. Size: 3.1 MiB
Writing count index for "0-starring" rev=false
Writing count index for "0-genre" rev=false
Writing count index for "0-director.film" rev=false
Writing count index for "0-actor.film" rev=false
Writing split lists back to the main DB now
badger 2025/07/04 11:11:50 INFO: Number of ranges found: 2
badger 2025/07/04 11:11:50 INFO: Sent range 0 for iteration: [, 040000000000000000000b6467726170682e747970650202506572736f6e0000000000000001ffffffffffffd875) of size: 0 B
badger 2025/07/04 11:11:50 INFO: copying split keys to main DB Streaming about 0 B of uncompressed data (0 B on disk)
badger 2025/07/04 11:11:50 INFO: Sent range 1 for iteration: [040000000000000000000b6467726170682e747970650202506572736f6e0000000000000001ffffffffffffd875, ) of size: 0 B
badger 2025/07/04 11:11:50 INFO: copying split keys to main DB Sent data of size 1.5 MiB
badger 2025/07/04 11:11:50 INFO: Table created: 15 at level: 6 for stream: 12. Size: 84 KiB
badger 2025/07/04 11:11:50 INFO: Table created: 14 at level: 6 for stream: 17. Size: 199 KiB
badger 2025/07/04 11:11:50 INFO: Table created: 11 at level: 6 for stream: 11. Size: 83 KiB
badger 2025/07/04 11:11:50 INFO: Table created: 10 at level: 6 for stream: 13. Size: 218 KiB
badger 2025/07/04 11:11:50 INFO: Table created: 16 at level: 6 for stream: 14. Size: 97 KiB
badger 2025/07/04 11:11:50 INFO: Table created: 13 at level: 6 for stream: 16. Size: 598 KiB
badger 2025/07/04 11:11:50 INFO: Table created: 12 at level: 6 for stream: 10. Size: 2.3 MiB
badger 2025/07/04 11:11:50 INFO: Resuming writes
badger 2025/07/04 11:11:50 INFO: Lifetime L0 stalled for: 0s
badger 2025/07/04 11:11:50 INFO:
Level 0 [ ]: NumTables: 01. Size: 1000 B of 0 B. Score: 0.00->0.00 StaleData: 0 B Target FileSize: 64 MiB
Level 1 [ ]: NumTables: 00. Size: 0 B of 10 MiB. Score: 0.00->0.00 StaleData: 0 B Target FileSize: 2.0 MiB
Level 2 [ ]: NumTables: 00. Size: 0 B of 10 MiB. Score: 0.00->0.00 StaleData: 0 B Target FileSize: 2.0 MiB
Level 3 [ ]: NumTables: 00. Size: 0 B of 10 MiB. Score: 0.00->0.00 StaleData: 0 B Target FileSize: 2.0 MiB
Level 4 [ ]: NumTables: 00. Size: 0 B of 10 MiB. Score: 0.00->0.00 StaleData: 0 B Target FileSize: 2.0 MiB
Level 5 [B]: NumTables: 00. Size: 0 B of 10 MiB. Score: 0.00->0.00 StaleData: 0 B Target FileSize: 2.0 MiB
Level 6 [ ]: NumTables: 16. Size: 56 MiB of 56 MiB. Score: 0.00->0.00 StaleData: 0 B Target FileSize: 4.0 MiB
Level Done
badger 2025/07/04 11:11:50 INFO: Lifetime L0 stalled for: 0s
badger 2025/07/04 11:11:50 INFO:
Level 0 [ ]: NumTables: 01. Size: 1.5 MiB of 0 B. Score: 0.00->0.00 StaleData: 0 B Target FileSize: 64 MiB
Level 1 [ ]: NumTables: 00. Size: 0 B of 10 MiB. Score: 0.00->0.00 StaleData: 0 B Target FileSize: 2.0 MiB
Level 2 [ ]: NumTables: 00. Size: 0 B of 10 MiB. Score: 0.00->0.00 StaleData: 0 B Target FileSize: 2.0 MiB
Level 3 [ ]: NumTables: 00. Size: 0 B of 10 MiB. Score: 0.00->0.00 StaleData: 0 B Target FileSize: 2.0 MiB
Level 4 [ ]: NumTables: 00. Size: 0 B of 10 MiB. Score: 0.00->0.00 StaleData: 0 B Target FileSize: 2.0 MiB
Level 5 [ ]: NumTables: 00. Size: 0 B of 10 MiB. Score: 0.00->0.00 StaleData: 0 B Target FileSize: 2.0 MiB
Level 6 [B]: NumTables: 00. Size: 0 B of 10 MiB. Score: 0.00->0.00 StaleData: 0 B Target FileSize: 2.0 MiB
Level Done
[11:11:50+0800] REDUCE 06s 100.00% edge_count:4.719M edge_speed:1.735M/sec plist_count:1.179M plist_speed:433.4k/sec. Num Encoding MBs: 0. jemalloc: 0 B
Total: 06s

说明已经成功导入了。

数据加载

因为dgraph bulk 并不会写入正在运行的alpha 节点, 二十离线构建的数据目录。 这些构建好的数据并不在默认的data目录中,而是在buil 时指定的--out目录,默认是(./out)。目前我放在了 {$workspace}/build 目录下, 所以这个导入命令就会把数据写入到 ./build/out/0/p 目录下

所以配置alpha 启动参数时候,也需要重新调整arguments

bash 复制代码
alpha --trace "jaeger=http://localhost:14268; ratio=0.99;" --security "whitelist=0.0.0.0/0;" --postings ./build/out/0/p --wal ./build/out/0/w

数据查询

e.g.

css 复制代码
curl --location 'http://localhost:8080/query' \
--header 'Content-Type: application/dql' \
--data '
{
  actors(func: has(starring), first: 10) {
    uid
    name
    starring {
      name
    }
  }
}
'
相关推荐
全栈老石40 分钟前
拆解低代码引擎核心:元数据驱动的"万能表"架构
数据库·低代码
子玖1 小时前
go实现通过ip解析城市
后端·go
Java不加班1 小时前
Java 后端定时任务实现方案与工程化指南
后端
心在飞扬2 小时前
RAG 进阶检索学习笔记
后端
Moment2 小时前
想要长期陪伴你的助理?先从部署一个 OpenClaw 开始 😍😍😍
前端·后端·github
Das1_2 小时前
【Golang 数据结构】Slice 底层机制
后端·go
得物技术2 小时前
深入剖析Spark UI界面:参数与界面详解|得物技术
大数据·后端·spark
古时的风筝2 小时前
花10 分钟时间,把终端改造成“生产力武器”:Ghostty + Yazi + Lazygit 配置全流程
前端·后端·程序员
Cache技术分享2 小时前
340. Java Stream API - 理解并行流的额外开销
前端·后端