引言
现在市场的用人筛选标准是什么? 怎么才算是【亮点】
别人想不到的你能想到!
别人做不出来的你能做出来!!
别人能做出来的你做的更好!!!
技术架构图剧透
需要知道的名词解释
- 切片
将大文件分割为多个连续小文件块,降低单次传输压力。 - 内容Hash计算
通过算法(如SHA-256)生成文件唯一指纹,用于秒传和完整性校验。 - 秒传or急速上传
服务器已存在相同文件时,无需重复上传,直接标记为完成。 - 断点续传
上传中断后,可从断点处继续传输未完成的分片。 - 分片上传
将切片后的文件块并发上传至服务器,支持失败重试。 - 合并分片
服务端将全部分片按顺序安全拼接为完整文件,并验证完整性。
背景 - Situation
有下面几种场景,那你可能需要搞一个大文件上传逻辑了
- 网盘系统
- 视频
- 医疗影像存储
- 高清游戏资源包
- 设计文件
任务拆解 - Task
- 切片
- 文件内容hash
- 秒传处理
- 并发上传
- 断点续传
- 上传进度
- 合并分片
- ※ 性能优化 ※
行动 - Action
带着架构图去看一下实现代码
切片
js
/**
* 切分分片
* @param file 文件
* @param chunkSize 分片大小
* @returns 分片列表
*/
const createChunk = (file: any, chunkSize: number) => {
const fileChunkList = []
let cur = 0
let index = 0
while (cur < file.size) {
const chunk = file.slice(cur, cur + chunkSize)
fileChunkList.push({
chunk,
fileHash: uploader.fileHash,
chunkNumber: index + 1
})
cur += chunkSize
index++
}
totalChunks = fileChunkList?.length || 1
return fileChunkList
}
hash计算
javascript
import SparkMD5 from 'spark-md5'
/**
* 计算hash by文件内容
* @param file 文件
*/
function generateHash(file): Promise<string> {
// TODO 使用web worker生成hash
// Web Worker一定程度上缩短hash时间;避免阻塞UI解决白屏问题,但不是最优解,后边会说到
return new Promise(resolve => {
console.time('load file buffer')
const fileReader = new FileReader()
fileReader.readAsArrayBuffer(file)
fileReader.onload = function (e) {
console.timeEnd('load file buffer')
console.time('generate hash')
let fileMd5 = SparkMD5.ArrayBuffer.hash(e.target.result)
console.timeEnd('generate hash')
resolve(fileMd5)
}
})
}
极速上传 + 断点续传
js
// 极速上传-包含跳过文件上传和跳过部分chunk(断点续传)
const speedResponse = await options.speedApi({
fileHash: uploader.fileHash,
file: uploader.file
})
if (res[options.fields.skipUpload] && !options.disableSpeedUpload) {
return Promise.resolve()
}
分片上传
js
const uploadChunks = async (chunkList, uploaded) => {
const list = chunkList
// 过滤之前上传成功的chunk;过滤hash异常
.filter(({ fileHash, chunkNumber }) => fileHash === uploader.fileHash && !uploaded.includes(chunkNumber))
.map(({ chunk, chunkNumber }) => {
const formData = new FormData()
return {
formData,
chunkNumber,
chunk,
chunkList
}
})
const requestList = await list.map(({ formData, chunkNumber, chunk, chunkList }) =>
options
.uploadApi({
formData,
file: uploader.file,
chunk,
chunkSize: options.chunkSize,
fileHash: uploader.fileHash,
chunkNumber,
totalChunks
})
)
await Promise.all(requestList).then(() => {
mergeChunks()
})
合并分片
这里就一个接口,不赘述了
文件hash算法优化 - 抽样hash
针对大文件hash时间过长、白屏问题,进行hash方案优化
抽样hash理论
对文件进行分片(注意和上述的切片不是一回事),采样每个分片的前1kb数据,合并采样分片并计算hash,大大缩短hash时间 - 为了计算结果更精准这里的分片数量可以多一些
抽样hash核心
空间换时间 + 减少内容大小
抽样hash图解:
实现代码如下:
js
function speedGenerateHash(file): Promise<string> {
const opts = {
// 分片数量 - 注意如果file小于10M chunkNum会自动设置为1
chunkNum: 10,
// 每个分片中采样大小
sampleSize: 1024
}
return new Promise(async resolve => {
console.time('generate hash time')
// ① 过滤小文件切片逻辑 - 如果file小 chunkNum会自动设置为1
let sampleChunkList = filterSmallFile(file, opts)
// ②③ 切分分片 && 对分片采样
sampleChunkList = getSampleChunk(file, opts)
// ④ 合并采样数据
const sampleMerged = new Blob(sampleChunkList, { type: file.type })
// ⑤ 生成hash
doGenerateHash(sampleMerged, resolve)
})
}
// 过滤小文件切片逻辑
function filterSmallFile(file, opts) {
if (file.size <= 10 * 1024 * 1024) {
opts.chunkNum = 1
return [file]
}
return []
}
// 切分分片 && 对分片采样
function getSampleChunk(file, opts) {
const sampleChunkList = []
const chunkSize = Math.floor(file.size / opts.chunkNum)
let cur = 0
new Array(opts.chunkNum).fill(1).forEach(() => {
const sampleChunk = file.slice(cur, opts.sampleSize)
sampleChunkList.push(sampleChunk)
cur += chunkSize
})
return sampleChunkList
}
// 生成hash
function doGenerateHash(file, resolve) {
const fileReader = new FileReader()
fileReader.readAsArrayBuffer(file)
fileReader.onload = function (e) {
let fileMd5 = SparkMD5.ArrayBuffer.hash(e.target.result)
console.timeEnd('generate hash time')
resolve(fileMd5)
}
}
hash优化效果-量化数据
优化前: 1G文件hash时间 10000ms
优化后: 1G文件hash时间 8ms
为什么不用Web Worker
利用Web Worker可以避免阻塞UI解决浏览器卡死的问题,利用多线程缩短hash时间(1G文件hash时间 2000ms),但存在以下问题
- 内存泄漏风险:Web Worker中未正确释放资源,导致内存占用增加。
- 通信性能:主线程和Worker之间频繁传递大数据量导致延迟。
- 兼容性和新特性支持:虽然主流浏览器支持,但某些新API可能存在差异。
- 调试困难:Worker线程的调试工具是否有所改进。
- 生命周期管理:Worker的创建和销毁不当导致资源浪费。
- 安全限制:Worker中无法访问DOM,某些操作需要代理,可能引发问题。
- 错误处理:Worker中的异常未捕获导致应用崩溃。
- 线程竞争:共享数据时的同步问题,尽管Worker通常不共享内存,但使用SharedArrayBuffer时可能出现。
后端优化 - 流式处理文件操作
某人把分片数据存成文件,但是生成文件时直接把分片文件读到了内存,在写入生成文件,一个1G文件直接OOM了...,这操作简直崩溃
后端优化 - 定时清理无用分片
对于大文件上传来说,并不是所有人都等着上传完,经常有上传失败就不管了,久而久之服务器磁盘满了
产出
- 支撑业务交付
- 产出通用大文件上传组件和Hooks,支撑N个业务正常使用
- 协助后端解决服务可用性问题、磁盘打满风险
- xx范围内分享大文件上传,影响XXX人
- 产出大文件上传处理的方法论/SOP文档
补充一下抽象组件、hooks、hash逻辑
组件
js
<template>
<el-upload ref="upload" class="large-file-upload" :http-request="fileChange" action="" :show-file-list="false" multiple>
<template v-slot:trigger>
<slot>
<el-button type="primary"><uploadSvg>上传文件</el-button>
</slot>
</template>
</el-upload>
</template>
<script setup lang="ts">
import { inject } from 'vue'
import { filetransferUploadFile, speedUpload } from '@/api/file'
import uploadSvg from '../svg/upload.svg'
import { useLargeFileUpload } from '@/hooks/large-file-upload/index'
import type { Ref } from 'vue'
import type { IFileBaseInfo } from '../../../../types/file'
const props = defineProps({
extraParams: Object
})
const emit = defineEmits<{
uploadSuccess: []
}>()
const baseInfo = inject<Ref<IFileBaseInfo>>('baseInfo')
const { fileChange, status } = useLargeFileUpload({
// 分片大小
chunkSize: 50 * 1024 * 1024,
// 极速上传
speedApi: ({ fileHash, file }) => {
return speedUpload({
identifier: fileHash,
fileName: file?.name,
filePath: baseInfo.value.filePath,
projectId: baseInfo.value.projectId,
systemId: baseInfo.value.systemId,
fileCategoryId: baseInfo.value.fileCategoryId,
...(props.extraParams || {})
})
},
// 上传chunk接口
uploadApi: ({ formData, file, chunk, fileHash, chunkNumber, totalChunks, chunkSize }) => {
return filetransferUploadFile(
chunk,
{
identifier: fileHash,
chunkNumber,
chunkSize,
fileName: file?.name,
filePath: baseInfo.value.filePath,
projectId: baseInfo.value.projectId,
systemId: baseInfo.value.systemId,
totalChunks: totalChunks,
totalSize: file?.size,
fileCategoryId: baseInfo.value.fileCategoryId,
...(props.extraParams || {})
},
formData
)
},
// 合并chunk接口 - 也可以通过uploadApi接口控制合并行为
mergeApi: () => Promise.resolve(),
// 钩子
hooks: {
// 上传分片钩子 status代表成功和失败
uploadChunk: ({ chunkNumber, status }) => {
console.log('chunk upload', chunkNumber, status)
},
// 合并分片钩子
mergeChunk: () => {
console.log('merge chunk')
emit('uploadSuccess')
},
// 极速上传成功回调函数
skipUpload: ({ speedResponse }): Record<string, any> => {
const skipUpload = speedResponse.data?.skipUpload
const uploaded = speedResponse.data?.uploaded || []
if (speedResponse?.data?.skipUpload) {
emit('uploadSuccess')
}
return {
skipUpload,
uploaded
}
}
}
})
</script>
<style lang="scss" scoped>
.large-file-upload {
:deep() .el-upload.el-upload--text {
width: 100%;
justify-content: left;
}
}
</style>
Vue3 Hooks
typescript
import { ref } from 'vue'
import { speedGenerateHash } from '@/utils/hash'
import type { UploadStatus, IOptions } from './types'
import type { Ref } from 'vue'
export function useLargeFileUpload(userOptions: IOptions) {
// uploadApi 为纯净的接口请求 不要添加catch,否则会认为上传成功
const defaultOptions: IOptions = {
disableSpeedUpload: false,
chunkSize: 50 * 1024 * 1024,
fields: {
skipUpload: 'skipUpload',
uploaded: 'uploaded'
},
speedApi: () => Promise.resolve(),
uploadApi: () => Promise.resolve(),
mergeApi: () => Promise.resolve(),
hooks: {
uploadChunk: () => {},
mergeChunk: () => {},
skipUpload: () => {}
}
}
const options: IOptions = Object.assign({}, defaultOptions, userOptions)
const status: Ref<UploadStatus> = ref('ready')
const uploader = {
file: null,
fileHash: ''
}
let totalChunks = 1
let requestList = []
const resetData = () => {
requestList.forEach(xhr => xhr?.abort?.())
requestList = []
}
const isFileExit = file => {
if (file) return true
console.error('没有找到文件,请检查!')
return false
}
const mergeChunks = () => {
const params = {
fileName: uploader.file?.name,
fileHash: uploader.fileHash,
totalChunks: totalChunks
}
options
.mergeApi(params)
.then(() => {
status.value = 'success'
options.hooks?.mergeChunk?.({
status: 'success',
...params
})
})
.catch(() => {
status.value = 'fail'
options.hooks?.mergeChunk?.({
status: 'fail',
...params
})
})
}
/**
* 切分分片
* @param file 文件
* @param chunkSize 分片大小
* @returns 分片列表
*/
const createChunk = (file: any, chunkSize: number) => {
const fileChunkList = []
let cur = 0
let index = 0
while (cur < file.size) {
const chunk = file.slice(cur, cur + chunkSize)
fileChunkList.push({
chunk,
fileHash: uploader.fileHash,
chunkNumber: index + 1
})
cur += chunkSize
index++
}
totalChunks = fileChunkList?.length || 1
return fileChunkList
}
const uploadChunks = async (chunkList, uploaded) => {
const list = chunkList
// 过滤之前上传成功的chunk;过滤hash异常-一般不存在
.filter(({ fileHash, chunkNumber }) => fileHash === uploader.fileHash && !uploaded.includes(chunkNumber))
.map(({ chunk, chunkNumber }) => {
const formData = new FormData()
return {
formData,
chunkNumber,
chunk,
chunkList
}
})
const requestList = await list.map(({ formData, chunkNumber, chunk, chunkList }) =>
options
.uploadApi({
formData,
file: uploader.file,
chunk,
chunkSize: options.chunkSize,
fileHash: uploader.fileHash,
chunkNumber,
totalChunks
})
.then(() => {
options.hooks?.uploadChunk?.({
status: 'success',
chunk,
chunkNumber,
chunkList
})
})
.catch(err => {
console.log(err)
options.hooks?.uploadChunk?.({
status: 'fail',
chunk,
chunkNumber,
chunkList,
err
})
throw new Error('上传分片信息出现错误', { cause: err })
})
)
await Promise.all(requestList).then(() => {
mergeChunks()
})
}
const uploadInit = async () => {
if (!isFileExit(uploader.file)) return
status.value = 'uploading'
uploader.fileHash = await speedGenerateHash(uploader.file)
const chunkList = createChunk(uploader.file, options.chunkSize)
// 快速上传-包含跳过文件上传和跳过部分chunk(断点续传)
const speedResponse = await options.speedApi({
fileHash: uploader.fileHash,
file: uploader.file
})
const res = options.hooks?.skipUpload?.({
speedResponse
})
if (res[options.fields.skipUpload] && !options.disableSpeedUpload) {
return Promise.resolve()
}
// 上传分片
await uploadChunks(chunkList, res[options.fields.uploaded] || [])
}
const fileChange = async e => {
// 默认e作为input change的回调参数,同时兼容该函数作为方法处理files
const file = e?.file || e
if (!isFileExit(file)) return
resetData()
uploader.file = file
await uploadInit()
}
return {
fileChange,
status
}
}
hash 工具方法
javascript
import SparkMD5 from 'spark-md5'
// 旧版hash-作对比用
export function generateHash(file): Promise<string> {
// TODO 使用web worker生成hash
return new Promise(resolve => {
console.time('load file buffer')
const fileReader = new FileReader()
fileReader.readAsArrayBuffer(file)
fileReader.onload = function (e) {
console.timeEnd('load file buffer')
console.time('generate hash')
let fileMd5 = SparkMD5.ArrayBuffer.hash(e.target.result)
console.timeEnd('generate hash')
resolve(fileMd5)
}
})
}
// 整体逻辑(空间换时间): 切分分片并采样 -> 合并采样数据 -> 采样生成hash
export function speedGenerateHash(file): Promise<string> {
const opts = {
// 分片数量 - 注意如果file小于10M chunkNum会自动设置为1
chunkNum: 10,
// 每个分片中采样大小
sampleSize: 1024
}
return new Promise(async resolve => {
console.time('generate hash time')
// ① 过滤小文件切片逻辑 - 如果file小 chunkNum会自动设置为1
let sampleChunkList = filterSmallFile(file, opts)
// ②③ 切分分片 && 对分片采样
sampleChunkList = getSampleChunk(file, opts)
// ④ 合并采样数据
const sampleMerged = new Blob(sampleChunkList, { type: file.type })
// ⑤ 生成hash
doGenerateHash(sampleMerged, resolve)
})
}
function doGenerateHash(file, resolve) {
const fileReader = new FileReader()
fileReader.readAsArrayBuffer(file)
fileReader.onload = function (e) {
let fileMd5 = SparkMD5.ArrayBuffer.hash(e.target.result)
console.timeEnd('generate hash time')
resolve(fileMd5)
}
}
function getSampleChunk(file, opts) {
const sampleChunkList = []
const chunkSize = Math.floor(file.size / opts.chunkNum)
let cur = 0
new Array(opts.chunkNum).fill(1).forEach(() => {
const sampleChunk = file.slice(cur, opts.sampleSize)
sampleChunkList.push(sampleChunk)
cur += chunkSize
})
return sampleChunkList
}
function filterSmallFile(file, opts) {
if (file.size <= 10 * 1024 * 1024) {
opts.chunkNum = 1
return [file]
}
return []
}
export function getFinnalyHash(sampleChunkList, opts) {
let sampleChunksMd5 = []
return new Promise(resolve => {
sampleChunkList.forEach(sampleChunk => {
const fileReader = new FileReader()
fileReader.readAsArrayBuffer(sampleChunk)
fileReader.onload = function (e) {
let fileMd5 = SparkMD5.ArrayBuffer.hash(e.target.result)
sampleChunksMd5.push(fileMd5)
// 采样分片hash全部生成成功 -> 生成最终hash
if (sampleChunksMd5.length === opts.chunkNum) {
resolve(sampleChunksMd5)
}
}
})
})
}
export function mergeHash(hashList, resolve) {
const encoder = new TextEncoder()
// 使用 TextEncoder 将字符串编码为 Uint8Array
const uint8Array = encoder.encode(hashList.join(''))
const hash = SparkMD5.ArrayBuffer.hash(uint8Array.buffer)
resolve(hash)
}
### 实在写不动了,挂上TODO吧
1. 进度条,大文件上传必不可少
2. 并发上传限流
3. 失败重试
4. 大文件上传钩子说明
5. ...
面试官最喜欢的量化数据
- hash效率提升,1G文件 10000ms -> 8ms
- 抽象大文件上传组件和Hooks,支撑N个业务正常使用
- 服务端OOM率 99% -> 0%
- 完全解决页面卡死问题 100% -> 0%
面试官最喜欢问的个人思考
详见上述hash优化、弃用Web Worker、流式合并、定期清理碎片等模块,这就即是你的亮点,也能体现你的个人思考,最后不忘记说你对你
一键三连
关注、点赞 + 收藏,带你一步步晋升前端专家,抵御寒冬~