实验原址:mit 6.824 Lab2 Raft
中文翻译:mit 6.824 Lab2 翻译
Raft论文: In Search of an Understandable Consensus Algorithm
Raft论文翻译:(zhuanlan.zhihu.com/p/524885008)
介绍
Lab 2C 的实验,是实现状态持久化,当机器宕机后,可以恢复到原来的数据。
实现依旧按照Raft论文的图figure2来。如果发现一些特别奇怪的Bug,先读论文,看看自己想的,和论文的描述是否存在出入。
持久化函数
go
func (rf *Raft) persist() {
// Your code here (2C).
// Example:
w := new(bytes.Buffer)
e := labgob.NewEncoder(w)
e.Encode(rf.currentTerm)
e.Encode(rf.votedFor)
e.Encode(rf.logs)
e.Encode(rf.lastIncludedIndex)
e.Encode(rf.lastIncludedTerm)
raftstate := w.Bytes()
rf.persister.Save(raftstate, rf.snapshot)
}
读取持久化状态函数
go
func (rf *Raft) readPersist(data []byte) {
if data == nil || len(data) <= 1 { // bootstrap without any state?
return
}
// Your code here (2C).
// Example:
r := bytes.NewBuffer(data)
d := labgob.NewDecoder(r)
var currentTerm int
var votedFor int
var logs[] LogEntry
var lastIncludedIndex int
var lastIncludedTerm int
if d.Decode(¤tTerm) != nil ||
d.Decode(&votedFor) != nil ||
d.Decode(&logs) != nil ||
d.Decode(&lastIncludedIndex) != nil ||
d.Decode(&lastIncludedTerm) != nil{
DPrintf("Server %v readPersist Fail",rf.me)
} else {
rf.currentTerm = currentTerm
rf.votedFor = votedFor
rf.logs = logs
rf.lastIncludedIndex = lastIncludedIndex
rf.lastIncludedTerm = lastIncludedTerm
rf.commitIndex = lastIncludedIndex
rf.lastApplied = lastIncludedIndex
DPrintf("server %v readPersist 成功\n", rf.me)
}
}
何时进行持久化?
阅读图figure2,我们需要持久化的有三个,voteFor,currentTerm,log[]。因此,最简单粗暴的方式,就是在改变上述一个状态时,立马进行持久化。
例如Start函数中
go
func (rf *Raft) Start(command interface{}) (int, int, bool) {
// Your code here (2B).
//注意并发调用时,出现并发问题
rf.mu.Lock()
defer rf.mu.Unlock()
if rf.state != Leader {
return -1,-1,false
}
rf.logs = append(rf.logs,LogEntry{
Command: command,
Term: rf.currentTerm,
})
rf.persist()
rf.nextBeatTime = time.Now()
DPrintf("new command %v\n",command)
return rf.ToVirtualIndex(len(rf.logs) - 1), rf.currentTerm, true
}
不过,建议还是在那个RPC的最后,统一进行持久化才好。
何时读取持久化状态数据?
在启动的时候读取。
go
func Make(peers []*labrpc.ClientEnd, me int,
persister *Persister, applyCh chan ApplyMsg) *Raft {
//...
rf.readPersist(persister.ReadRaftState())
//...
}
关于测试
2C的代码实现并不是很难。
难点在于测试。测试要求我们实现快速回退功能,并且会发送各种乱序RPC、各种宕机重新选举、日志重复提交等等。
注意:一旦发现此RPC是旧RPC,一律不处理。
go
if args.Term < rf.currentTerm {
DPrintf("Old Leader %v ",args.LeaderId)
reply.Success = false
reply.Term = rf.currentTerm
rf.mu.Unlock()
return
}
论文描述的图figure 8

结果
sql
Test (2C): basic persistence ...
... Passed -- 3.6 3 80 20090 6
Test (2C): more persistence ...
... Passed -- 15.6 5 956 198434 16
Test (2C): partitioned leader and one follower crash, leader restarts ...
... Passed -- 1.4 3 34 8632 4
Test (2C): Figure 8 ...
... Passed -- 30.3 5 1092 228602 56
Test (2C): unreliable agreement ...
... Passed -- 1.5 5 564 189862 246
Test (2C): Figure 8 (unreliable) ...
... Passed -- 32.8 5 9592 31040584 138
Test (2C): churn ...
... Passed -- 16.2 5 9668 26051487 2236
Test (2C): unreliable churn ...
... Passed -- 16.2 5 4084 4285599 995
PASS
ok 6.5840/raft 117.658s