dgraph example数据导入

数据准备

下载地址: github.com/hypermodein...

下载1million.rdf.gz, 下载1million.schema

数据清除

学习目的,清除所有之前旧的数据

css 复制代码
curl --location 'localhost:8080/alter' \
--header 'Content-Type: application/json' \
--data '{"drop_all": true}'

数据导入

下载dgraph 源代码, build

实时导入

css 复制代码
dgraph live -f 1million.rdf.gz --schema 1million.schema --alpha localhost:9080 --zero localhost:5080

离线导入

执行:

cmd 复制代码
.\live.exe bulk -f .\1million.rdf.gz --schema 1million.schema --zero localhost:508

结果

result 复制代码
I0704 11:11:44.465439    8408 init.go:68] 

Dgraph version   : dev
Dgraph codename  :
Dgraph SHA-256   : 3634731c41fb274ea640f9477985f0e79a1ddfd5ca28a7707b6694c1e7000c7c
Commit SHA-1     :
Commit timestamp :
Branch           :
Go version       : go1.24.4
jemalloc enabled : false

For Dgraph official documentation, visit https://dgraph.io/docs.
For discussions about Dgraph     , visit https://discuss.dgraph.io.
For fully-managed Dgraph Cloud   , visit https://dgraph.io/cloud.

Licensed under the Apache Public License 2.0.
© Hypermode Inc.


Encrypted input: false; Encrypted output: false
{
        "DataFiles": ".\\1million.rdf.gz",
        "DataFormat": "",
        "SchemaFile": "1million.schema",
        "GqlSchemaFile": "",
        "OutDir": "./out",
        "ReplaceOutDir": false,
        "TmpDir": "tmp",
        "NumGoroutines": 4,
        "MapBufSize": 2147483648,
        "PartitionBufSize": 4194304,
        "SkipMapPhase": false,
        "CleanupTmp": true,
        "NumReducers": 1,
        "Version": false,
        "StoreXids": false,
        "ZeroAddr": "localhost:5080",
        "ConnStr": "",
        "HttpAddr": "localhost:8080",
        "IgnoreErrors": false,
        "CustomTokenizers": "",
        "NewUids": false,
        "ClientDir": "",
        "Encrypted": false,
        "EncryptedOut": false,
        "MapShards": 1,
        "ReduceShards": 1,
        "Namespace": 18446744073709551615,
        "EncryptionKey": null,
        "Badger": {
                "Dir": "",
                "ValueDir": "",
                "SyncWrites": false,
                "NumVersionsToKeep": 1,
                "ReadOnly": false,
                "Logger": {},
                "Compression": 1,
                "InMemory": false,
                "MetricsEnabled": true,
                "NumGoroutines": 8,
                "MemTableSize": 67108864,
                "BaseTableSize": 2097152,
                "BaseLevelSize": 10485760,
                "LevelSizeMultiplier": 10,
                "TableSizeMultiplier": 2,
                "MaxLevels": 7,
                "VLogPercentile": 0,
                "ValueThreshold": 1048576,
                "NumMemtables": 5,
                "BlockSize": 4096,
                "BloomFalsePositive": 0.01,
                "BlockCacheSize": 20132659,
                "IndexCacheSize": 46976204,
                "NumLevelZeroTables": 5,
                "NumLevelZeroTablesStall": 15,
                "ValueLogFileSize": 1073741823,
                "ValueLogMaxEntries": 1000000,
                "NumCompactors": 4,
                "CompactL0OnClose": false,
                "LmaxCompaction": false,
                "ZSTDCompressionLevel": 0,
                "VerifyValueChecksum": false,
                "EncryptionKey": "",
                "EncryptionKeyRotationDuration": 864000000000000,
                "BypassLockGuard": false,
                "ChecksumVerificationMode": 0,
                "DetectConflicts": true,
                "NamespaceOffset": -1,
                "ExternalMagicVersion": 0
        }
}

The bulk loader needs to open many files at once. This number depends on the size of the data set loaded, the map file output size, and the level of indexing. 100,000 is adequate for most data set sizes. See `man ulimit` for details of how to change the limit.
Nonfatal error: max open file limit could not be detected: Cannot detect max open files on this platform

Connecting to zero at localhost:5080
Using Go memory
Processing file (1 out of 1): .\1million.rdf.gz
[11:11:45+0800] MAP 01s nquad_count:651.3k err_count:0.000 nquad_speed:507.9k/sec edge_count:3.380M edge_speed:2.635M/sec jemalloc: 0 B 
[11:11:46+0800] MAP 02s nquad_count:1.042M err_count:0.000 nquad_speed:456.4k/sec edge_count:4.719M edge_speed:2.068M/sec jemalloc: 0 B 
Shard tmp\map_output\000 -> Reduce tmp\shards\shard_0\000
badger 2025/07/04 11:11:46 INFO: All 0 tables opened in 0s
badger 2025/07/04 11:11:46 INFO: Discard stats nextEmptySlot: 0
badger 2025/07/04 11:11:46 INFO: Set nextTxnTs to 0
badger 2025/07/04 11:11:46 INFO: All 0 tables opened in 0s
badger 2025/07/04 11:11:46 INFO: Discard stats nextEmptySlot: 0
badger 2025/07/04 11:11:46 INFO: Set nextTxnTs to 0
badger 2025/07/04 11:11:46 INFO: DropAll called. Blocking writes...
badger 2025/07/04 11:11:46 INFO: Writes flushed. Stopping compactions now...
badger 2025/07/04 11:11:46 INFO: Deleted 0 SSTables. Now deleting value logs...
badger 2025/07/04 11:11:46 INFO: Value logs deleted. Creating value log file: 1
badger 2025/07/04 11:11:46 INFO: Deleted 1 value log files. DropAll done.
Num Encoders: 4
Final Histogram of buffer sizes: 
 -- Histogram:
Min value: 241739340
Max value: 241739340
Count: 1
50p: 65536.00
75p: 65536.00
90p: 65536.00
[134217728, 268435456) 1 100.00% 100.00%
 --

[11:11:47+0800] REDUCE 03s 0.00% edge_count:0.000 edge_speed:0.000/sec plist_count:0.000 plist_speed:0.000/sec. Num Encoding MBs: 230. jemalloc: 0 B 
[11:11:48+0800] REDUCE 04s 55.45% edge_count:2.617M edge_speed:2.617M/sec plist_count:354.1k plist_speed:354.1k/sec. Num Encoding MBs: 230. jemalloc: 0 B 
[11:11:49+0800] REDUCE 05s 100.00% edge_count:4.719M edge_speed:2.360M/sec plist_count:1.179M plist_speed:589.6k/sec. Num Encoding MBs: 230. jemalloc: 0 B 
Finishing stream id: 1
Finishing stream id: 2
Finishing stream id: 3
badger 2025/07/04 11:11:50 INFO: Table created: 2 at level: 6 for stream: 2. Size: 604 KiB
badger 2025/07/04 11:11:50 INFO: Table created: 3 at level: 6 for stream: 3. Size: 473 KiB
Finishing stream id: 4
badger 2025/07/04 11:11:50 INFO: Table created: 4 at level: 6 for stream: 4. Size: 2.0 MiB
badger 2025/07/04 11:11:50 INFO: Table created: 1 at level: 6 for stream: 1. Size: 40 MiB
Finishing stream id: 5
Finishing stream id: 6
badger 2025/07/04 11:11:50 INFO: Table created: 6 at level: 6 for stream: 6. Size: 355 KiB
badger 2025/07/04 11:11:50 INFO: Table created: 5 at level: 6 for stream: 5. Size: 3.5 MiB
Finishing stream id: 7
badger 2025/07/04 11:11:50 INFO: Table created: 7 at level: 6 for stream: 7. Size: 2.5 MiB
Finishing stream id: 8
Finishing stream id: 9
badger 2025/07/04 11:11:50 INFO: Table created: 9 at level: 6 for stream: 9. Size: 258 KiB
badger 2025/07/04 11:11:50 INFO: Table created: 8 at level: 6 for stream: 8. Size: 3.1 MiB
Writing count index for "0-starring" rev=false
Writing count index for "0-genre" rev=false
Writing count index for "0-director.film" rev=false
Writing count index for "0-actor.film" rev=false
Writing split lists back to the main DB now
badger 2025/07/04 11:11:50 INFO: Number of ranges found: 2
badger 2025/07/04 11:11:50 INFO: Sent range 0 for iteration: [, 040000000000000000000b6467726170682e747970650202506572736f6e0000000000000001ffffffffffffd875) of size: 0 B
badger 2025/07/04 11:11:50 INFO: copying split keys to main DB Streaming about 0 B of uncompressed data (0 B on disk)
badger 2025/07/04 11:11:50 INFO: Sent range 1 for iteration: [040000000000000000000b6467726170682e747970650202506572736f6e0000000000000001ffffffffffffd875, ) of size: 0 B
badger 2025/07/04 11:11:50 INFO: copying split keys to main DB Sent data of size 1.5 MiB
badger 2025/07/04 11:11:50 INFO: Table created: 15 at level: 6 for stream: 12. Size: 84 KiB
badger 2025/07/04 11:11:50 INFO: Table created: 14 at level: 6 for stream: 17. Size: 199 KiB
badger 2025/07/04 11:11:50 INFO: Table created: 11 at level: 6 for stream: 11. Size: 83 KiB
badger 2025/07/04 11:11:50 INFO: Table created: 10 at level: 6 for stream: 13. Size: 218 KiB
badger 2025/07/04 11:11:50 INFO: Table created: 16 at level: 6 for stream: 14. Size: 97 KiB
badger 2025/07/04 11:11:50 INFO: Table created: 13 at level: 6 for stream: 16. Size: 598 KiB
badger 2025/07/04 11:11:50 INFO: Table created: 12 at level: 6 for stream: 10. Size: 2.3 MiB
badger 2025/07/04 11:11:50 INFO: Resuming writes
badger 2025/07/04 11:11:50 INFO: Lifetime L0 stalled for: 0s
badger 2025/07/04 11:11:50 INFO:
Level 0 [ ]: NumTables: 01. Size: 1000 B of 0 B. Score: 0.00->0.00 StaleData: 0 B Target FileSize: 64 MiB
Level 1 [ ]: NumTables: 00. Size: 0 B of 10 MiB. Score: 0.00->0.00 StaleData: 0 B Target FileSize: 2.0 MiB
Level 2 [ ]: NumTables: 00. Size: 0 B of 10 MiB. Score: 0.00->0.00 StaleData: 0 B Target FileSize: 2.0 MiB
Level 3 [ ]: NumTables: 00. Size: 0 B of 10 MiB. Score: 0.00->0.00 StaleData: 0 B Target FileSize: 2.0 MiB
Level 4 [ ]: NumTables: 00. Size: 0 B of 10 MiB. Score: 0.00->0.00 StaleData: 0 B Target FileSize: 2.0 MiB
Level 5 [B]: NumTables: 00. Size: 0 B of 10 MiB. Score: 0.00->0.00 StaleData: 0 B Target FileSize: 2.0 MiB
Level 6 [ ]: NumTables: 16. Size: 56 MiB of 56 MiB. Score: 0.00->0.00 StaleData: 0 B Target FileSize: 4.0 MiB
Level Done
badger 2025/07/04 11:11:50 INFO: Lifetime L0 stalled for: 0s
badger 2025/07/04 11:11:50 INFO:
Level 0 [ ]: NumTables: 01. Size: 1.5 MiB of 0 B. Score: 0.00->0.00 StaleData: 0 B Target FileSize: 64 MiB
Level 1 [ ]: NumTables: 00. Size: 0 B of 10 MiB. Score: 0.00->0.00 StaleData: 0 B Target FileSize: 2.0 MiB
Level 2 [ ]: NumTables: 00. Size: 0 B of 10 MiB. Score: 0.00->0.00 StaleData: 0 B Target FileSize: 2.0 MiB
Level 3 [ ]: NumTables: 00. Size: 0 B of 10 MiB. Score: 0.00->0.00 StaleData: 0 B Target FileSize: 2.0 MiB
Level 4 [ ]: NumTables: 00. Size: 0 B of 10 MiB. Score: 0.00->0.00 StaleData: 0 B Target FileSize: 2.0 MiB
Level 5 [ ]: NumTables: 00. Size: 0 B of 10 MiB. Score: 0.00->0.00 StaleData: 0 B Target FileSize: 2.0 MiB
Level 6 [B]: NumTables: 00. Size: 0 B of 10 MiB. Score: 0.00->0.00 StaleData: 0 B Target FileSize: 2.0 MiB
Level Done
[11:11:50+0800] REDUCE 06s 100.00% edge_count:4.719M edge_speed:1.735M/sec plist_count:1.179M plist_speed:433.4k/sec. Num Encoding MBs: 0. jemalloc: 0 B
Total: 06s

说明已经成功导入了。

数据加载

因为dgraph bulk 并不会写入正在运行的alpha 节点, 二十离线构建的数据目录。 这些构建好的数据并不在默认的data目录中,而是在buil 时指定的--out目录,默认是(./out)。目前我放在了 {$workspace}/build 目录下, 所以这个导入命令就会把数据写入到 ./build/out/0/p 目录下

所以配置alpha 启动参数时候,也需要重新调整arguments

bash 复制代码
alpha --trace "jaeger=http://localhost:14268; ratio=0.99;" --security "whitelist=0.0.0.0/0;" --postings ./build/out/0/p --wal ./build/out/0/w

数据查询

e.g.

css 复制代码
curl --location 'http://localhost:8080/query' \
--header 'Content-Type: application/dql' \
--data '
{
  actors(func: has(starring), first: 10) {
    uid
    name
    starring {
      name
    }
  }
}
'
相关推荐
墨黎芜12 分钟前
SQL Server从入门到精通——C#与数据库
数据库·学习·信息可视化
爱学习的阿磊13 分钟前
持续集成/持续部署(CI/CD) for Python
jvm·数据库·python
一个响当当的名号13 分钟前
lectrue10 排序和聚合算法
数据库
hamawari19 分钟前
SQL语法
数据库·sql·oracle
陌上丨24 分钟前
Redis内存使用率在95%以上,请问是什么原因?如何解决?
数据库·redis·缓存
m0_5613596736 分钟前
使用PyQt5创建现代化的桌面应用程序
jvm·数据库·python
2301_7903009637 分钟前
用Python实现自动化的Web测试(Selenium)
jvm·数据库·python
m0_561359671 小时前
使用Docker容器化你的Python应用
jvm·数据库·python
一条闲鱼_mytube1 小时前
MySQL vs PostgreSQL 对比
数据库·mysql·postgresql