核心就是
1 前端分片
2 后端组装
3 md5校验
Spark-md5介绍
用md5就是为了防止文件被篡改,小的文件直接可以用整个文件传入,返回文件md5,但是越大的文件(超过2G),如果用整文件的方式,时间会很久,分片计算md5会节约时间;另外MD5算法是一个单向哈希函数,不可逆的。
来分析下官网的例子
javascript
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8">
<title>incremental md5</title>
<style>
body{text-align:center;font:13px Tahoma}
form{margin:9vh auto}
pre{background:#ffd;border:1px solid orange;padding:1em;margin:0 auto;display:none;text-align:left;line-height:1.25}
</style>
</head>
<body>
<h1>incremental md5 demo</h1>
<h3>with <a target="_blank" href="//github.com/satazor/SparkMD5">SparkMD5</a></h3>
<form method="POST" enctype="multipart/form-data" onsubmit="return false;" ><input id=file type=file placeholder="select a file" /></form>
<pre id=log></pre>
<script src="//cdn.rawgit.com/satazor/SparkMD5/master/spark-md5.min.js"></script>
<script>
var log=document.getElementById("log");
document.getElementById("file").addEventListener("change", function() {
var blobSlice = File.prototype.slice || File.prototype.mozSlice || File.prototype.webkitSlice,
file = this.files[0],
chunkSize = 2097152, // read in chunks of 2MB
chunks = Math.ceil(file.size / chunkSize), // 要分成几片
currentChunk = 0,
spark = new SparkMD5.ArrayBuffer(), // 初始化MD5实例
frOnload = function(e){
log.innerHTML+="\nread chunk number "+parseInt(currentChunk+1)+" of "+chunks;
spark.append(e.target.result); // append array buffer 追加缓冲
currentChunk++;
if (currentChunk < chunks)
loadNext();
else
log.innerHTML+="\nfinished loading :)\n\ncomputed hash:\n"+spark.end()+"\n\nyou can select another file now!\n"; // 结束
},
frOnerror = function () {
log.innerHTML+="\noops, something went wrong.";
};
function loadNext() {
var fileReader = new FileReader();
fileReader.onload = frOnload;
fileReader.onerror = frOnerror;
var start = currentChunk * chunkSize,
end = ((start + chunkSize) >= file.size) ? file.size : start + chunkSize;
fileReader.readAsArrayBuffer(blobSlice.call(file, start, end));
};
log.style.display="inline-block";
log.innerHTML="file name: "+file.name+" ("+file.size.toString().replace(/\B(?=(?:\d{3})+(?!\d))/g, ',')+" bytes)\n";
loadNext();
});
</script>
</body>
</html>
核心的就是这三个:
spark = new SparkMD5.ArrayBuffer(),
spark.append()
spark.end()
前端实现
前端要怎么分块,核心代码,chunkSize有高手建议不要超过5M,计算出起始和结束的字节数,用
File.prototype.slice来截取
javascript
const start = currentChunk * chunkSize;
const end = start + chunkSize >= file.size ? file.size : start + chunkSize;
const blobSlice = File.prototype.slice;
const blob = blobSlice.call(file, start, end);
然后把这一部分的blob算出切片的md5,一并发给后端后发送merge请求
后端实现
上传接口
包括切片的index,切片的md5,文件名称
如果当前分片文件不存在,则存储
javascript
import * as fse from "fs-extra";
const outStream = fse.createWriteStream(chunkPath);
const inStream = Readable.from(file.stream ?? file.buffer); //blob
inStream.pipe(outStream);
merge
合并文件
javascript
// 分片循环
const buffer = await fse.readFile(chunkPath);
await fse.appendFile(finalPath, buffer);
剩余的一些问题
1 上传过程中,如果刷新页面,或者网络问题,断点续传的考虑, 需要后端存储切片信息,发给前端已有切片index,前端再重新继续切片请求
2 校验已有文件,无需重传