基本概念

在 Node.js 中，Stream 是一个重要的概念，它用于处理数据块（chunks）的流动，特别是在处理大量数据或需要实时处理数据时。Stream 提供了一种高效且内存友好的方式来处理数据，因为它允许你一次只处理一小部分数据，而不是一次性加载整个数据集到内存中。
Node.js 中的 Stream 主要有四种类型：Readable（可读）、Writable（可写）、Duplex（可读可写）和 Transform（转换，是 Duplex 的一个特殊版本，可以在写入时修改或转换数据）。

ReadableStream（可读流）

ReadableStream 用于从源（如文件、网络连接或其他数据提供者）读取数据。当你从可读流中读取数据时，数据会被分成多个数据块（chunks），然后可以通过监听 data 事件来逐个处理这些数据块。

事件：
- data：当有新数据可读时触发。
- end：当没有更多数据可读时触发。
- error：在读取过程中发生错误时触发。
- close：底层资源（如文件描述符）被关闭时触发。
- readable：当流中有数据可读时触发（可能触发多次）。
方法：
- read([size])：从流中读取指定数量的数据。如果没有指定 size，则读取尽可能多的数据。
- pipe(destination[, options])：将可读流的数据发送到可写流。
- pause()：暂停读取数据。
- resume()：恢复从流中读取数据。
- isPaused()：返回一个布尔值，表示流是否已暂停。

ReadableStream简单例子

javascript 复制代码

const readableStream = fs.createReadableStream('./test.txt')

// 监听data 事件内容自动回调用 readableStream.read() 方法读取数据
readableStream.on('data', (chunk) => {
  console.log(chunk.toString());
});

readableStream.on('end', () => {
  console.log('读取完成')
})

// 或者
readableStream.on('readable', () => {
  while ((chunk = readableStream.read()) !== null) {
    console.log(chunk.toString());
  }
})

WritableStream（可写流）

WritableStream 用于将数据写入目标（如文件、网络连接或其他数据消费者）。你可以通过调用 write() 方法将数据写入可写流，并通过监听 drain、finish 和 error 事件来处理写入过程中的事件。

事件：
- drain：当调用 write() 方法返回 false 后，可以继续写入更多数据时触发。
- finish 或 close：所有数据都已被刷新到底层系统时触发。
- error：在写入过程中发生错误时触发。
- pipe：当可读流通过 pipe() 方法将数据发送到可写流时触发。
方法：
- write(chunk[, encoding][, callback])：将数据写入流。如果返回 false，则表示需要等待 drain 事件后再继续写入。
- end([chunk][, encoding][, callback])：标记流的结束。可选地，你可以提供一个额外的数据块来写入。
- setDefaultEncoding(encoding)：设置默认的字符编码。

WritableStream简单例子

javascript 复制代码

const writableSteam = fs.createWriteStream('./test2.txt');
writableSteam.write('Hello, world!');
writableSteam.write('Hello, world!');
writableSteam.end('end');


// 使用 pipe 复制文件
const readStream = fs.createReadStream('./text.txt')
const writeStream = fs.createWriteStream('./text2.txt')

readStream.pipe(writeStream)

DuplexStream（可读可写流）

DuplexStream 既是可读流又是可写流。例如，TCP 套接字就是 DuplexStream 的一个例子。

DuplexStream简单例子

javascript 复制代码

const net = require('node:net');
const client = net.createConnection({ port: 8124 }, () => {
  // 'connect' listener.
  console.log('connected to server!');
  // 写入数据
  client.write('world!\r\n');
});

// 读取数据
client.on('data', (data) => {
  console.log(data.toString());
  client.end();
});
client.on('end', () => {
  console.log('disconnected from server');
});

TransformStream（转换流）

TransformStream 是 DuplexStream 的一个特殊版本，可以在数据块写入流的同时进行转换。转换流在内部维护一个可读流和一个可写流，并在写入数据时自动从可读流中读取转换后的数据。例如，zlib.createGzip() 就是一个转换流，用于压缩写入的数据并在读取时解压缩。

TransformStream简单例子

javascript 复制代码

const { createGzip } = require('node:zlib');
const { pipeline } = require('node:stream');
const {
  createReadStream,
  createWriteStream,
} = require('node:fs');

const gzip = createGzip();
const source = createReadStream('input.txt');
const destination = createWriteStream('input.txt.gz');

pipeline(source, gzip, destination, (err) => {
  if (err) {
    console.error('An error occurred:', err);
    process.exitCode = 1;
  }
});

Stream pipe 方法的核心原理

javascript 复制代码

function pipe(src, dest) {
  src.on('data', (chunk) => {
    const ret = dest.write(chunk);
    if (!ret) {
      src.pause()
    }
  })

  dest.on('drain', () => {
    src.resume()
  })

  src.on('end', () => {
    dest.end()
  })
}

监听 readStream 的 data 事件，将数据写入 writeStream 中，当write 方法返回false（writeStream 内部缓冲区已满不能再写入数据，）时暂停 readStream
监听 writeStream 的 drain 事件，表明 writeStream 可以再次写入数据，则让 readStream 继续读出数据
监听 readSream 的 end 事件，表明 readStream 的数据已读取完毕，则需关闭 writeStream

自定义可读流

自定义可读流需要继承 Readable 类，并实现 _read 方法

javascript 复制代码

const { Readable } = require('stream');  
  
class MyReadable extends Readable {  
  constructor(options) {  
    super(options);  
  
    // 假设我们有一个数据数组，我们将从中推送数据  
    this.data = ['Hello, ', 'World!', 'This is a custom readable stream.'];  
    this.index = 0;  
  }  
  
  _read(size) {  
    let chunk;  
    if (this.index < this.data.length) {  
      // 推送数据到可读流的内部缓冲区  
      chunk = Buffer.from(this.data[this.index++]);  
      this.push(chunk);  
  
      // 如果这是最后一个数据块，则推送 null 表示结束  
      if (this.index === this.data.length) {  
        this.push(null);  
      }  
    }  
  }  
}
const myReadable = new MyReadable();
myReadable.on('data', (chunk) => {
  console.log(chunk.toString());
})
myReadable.on('end', () => {
  console.log('end');
})

自定义可写流

自定义可写流需要继承 Writeable 类，并实现 _write 方法

javascript 复制代码

const { Writable } = require('stream');  
  
class MyWritable extends Writable {  
  constructor(options) {  
    super(options);  
    this.data = [];  
  }  
  
  _write(chunk, encoding, callback) {  
    this.data.push(chunk.toString());  
    console.log(`Received ${chunk.length} bytes of data.`);  
    callback(); // 告诉 Node.js 可以继续写入更多数据  
  }  
  
  _final(callback) {  
    const fullData = this.data.join('');  
    console.log('Full data received:', fullData);  
    callback(); // 告诉 Node.js 清理工作已完成  
  }  
}  
  
// 实例化你的可写流  
const myWritable = new MyWritable();  
  
// 写入数据  
myWritable.write('Hello, ');  
myWritable.write('World!');  
  
// 结束流  
myWritable.end();  
  
// 监听 'finish' 事件，该事件在流结束时触发  
myWritable.on('finish', () => {  
  console.log('Stream has finished.');  
});

Stream 实战：解析表单（multipart/form-data ）数据

POST 请求 multipart/form-data 数据格式

bash 复制代码

POST /test HTTP/1.1
Host: foo.example
Content-Type: multipart/form-data;boundary="boundary"

--boundary
Content-Disposition: form-data; name="field1"

value1
--boundary
Content-Disposition: form-data; name="field2"; filename="example.txt"

value2
--boundary--

从上面的格式中可以看到每个字段的数据都是由 --boudary 分割的，boudary 是由http客户端生成的，不同的客户端 boundary 可能不同，但可以从 http 头部 Content-Type 字段中获取如上述的

bash 复制代码

Content-Type: multipart/form-data;boundary="boundary"

下面是chrome 浏览器的案例

bash 复制代码

POST /test HTTP/1.1
Host: foo.example
Content-Type: multipart/form-data;boundary="----WebKitFormBoundaryOqn3A15LMyB0YGYh"

------WebKitFormBoundaryOqn3A15LMyB0YGYh
Content-Disposition: form-data; name="username"

test1
------WebKitFormBoundaryOqn3A15LMyB0YGYh
Content-Disposition: form-data; name="username"

test2
------WebKitFormBoundaryOqn3A15LMyB0YGYh
Content-Disposition: form-data; name="file"; filename="package.json"
Content-Type: application/json


------WebKitFormBoundaryOqn3A15LMyB0YGYh--

为方便解析我们用自己的语言描述下 mutlipart/form-data 的格式

bash 复制代码

--boundary\r\n
Content-Disposition: form-data; name="field1"\r\n
\r\n
value1\r\n
--boundary\r\n
Content-Disposition: form-data; name="field2"; filename="example.txt"\r\n
[Content-Type: ...]
\r\n
value2\r\n
--boundary--\r\n

其中 \r\n 是换行符（为方便描述这里显示的写出来了），[] 中的内容是可选的

解析 multipart/form-data

我们知道 multipart/form-data 既可以包含文本数据，也可以包含文件数据，而文件数据可能是比较大的，因此在解析 multipart/form-data 数据时，需求以stream 流的方式一点一点的解析，而不能将所有的数据全部加载到内存后再一并解析，否则容易造成内存不足而导致程序崩溃

自定义可写流

javascript 复制代码

const { Writable } = require('stream');
class BodyParser extends Writable {
  constructor(options) {
    super(options)
    // 内部缓冲区
    this._buffer = Buffer.alloc(this.writableHighWaterMark);
    // 读取请求数据
    options.req.pipe(this)
  }
  
  _write(chunk, encoding, callback) {
    
  }
}

this.writableHighWaterMark 内部缓冲区的大小默认 64 kb 可通过 options.highWaterMark 修改

读取 boundary

定义 _setBoundary 从req.headers中读取boundary

javascript 复制代码

const { Writable } = require('stream');
class BodyParser extends Writable {
  constructor(options) {
    super(options)
    this._setBoundary(options.req);
  }
  
  _write(chunk, encoding, callback) {
    
  }
  _setBoundary(req) {
    // 从req.headers中读取boundary
    const contentType = req.headers['content-type'];
    if (!contentType) {
      throw new Error('Content-Type is required');
    }
    const contentTypeInfo = this._parseContentType(contentType);
    if (!contentTypeInfo.boundary) {
      throw new Error('boundary is required');
    }
    this._boundary = contentTypeInfo.boundary;
  }
  
  _parseContentType(contentType) {
    if (!contentType) {
      throw new Error('Content-Type value is required');
    }
    const contentTypeValueItems = contentType.split('; ');
    const contentTypeType = contentTypeValueItems[0];
    const contentTypeInfo = {
      charset: 'utf-8',
      value: contentTypeType,
    };
    contentTypeValueItems.slice(1).forEach((item) => {
      const [key, value] = item.split('=');
      contentTypeInfo[trimQuotation(key)] = trimQuotation(value);
    });
    return contentTypeInfo;
  }
}

定义解析方法

从数据格式看出，数据主要是由 --boundary、\r\n、 Content-Disposition(头部信息)分隔的，所以需要将数据用 --boundary、\r\n 和 Content-Disposition(头部信息)分隔

我们可以定义_readBoundary、_readDelimiter 和_readContentDisposition方法

在解析之前，我们先思考一个问题：

要找出分隔符（如 --boundary）可以使用 this._buffer.indexOf方法找到下标
内部缓冲区 this._buffer 是有限的默认是 64 kb，当 this._buffer 写满时，有可能仍然找不到分隔符（--boundary）：如下面这种情况

缓冲区大小不够 --boundary 分隔符没有完全写入，这个时候 this._buffer.indexOf('--boundary') 方法返回 -1 ，

因此应该等待更多的数据写入才能找到下一个分隔符，但此时缓冲区剩余空间已经不足，无法写入。解决办法就是：当找不到下一个分隔符时，缓冲区的内容有一部分是上一个表单字段的值的一部分，因此可以从缓冲区中取出一部分内容到解析出来的表单字段的值中（多个部分需要拼接），剩余的内容需要移动到头部，为保证取出的内容一定不包含分隔符，取出内容的长度等于写入的长度减去分隔符的长度。

上图中写入的长度为 19,分隔符的长度为 10，所以需要取出 9 个字符如图

为记录缓冲区写入的长度和已读取的长度，可以用两个指针来记录 _p 和 _readIndex
解析过程大致如下

源码实现

javascript 复制代码

const { Writable } = require('stream');
const { Buffer } = require('buffer');
const path = require('path');
const { uuid, trimQuotation } = require('./utils');
const fs = require('fs');
const http = require('http');

const DELIMITER = Buffer.from('\r\n');
const FILE_STREAM = Symbol('fileStream');
class FormParser extends Writable {
  constructor(options) {
    super(options);
    this._rawFormData = {};
    this.formData = {};
    this._options = options;
    this._p = 0;
    this._readIndex = 0;
    this._buffer = Buffer.alloc(this.writableHighWaterMark);
    this._readTypeIndex = 0;
    this._readTypes = [
      '_readBoundary',
      '_readDelimiter',
      '_readContentDisposition',
      '_readDelimiter',
    ];
    this._boundary = 'boundary';
    // 读取的content-disposition对象
    this._contentDispositionObj = null;
    this._setBoundary(options.req);
    this._options.req.pipe(this);
  }
  _write(chunk, _, callback) {
    let writeLength = 0;
    // 对应上图的循环过程
    while (writeLength < chunk.length) {
      this._fillToLeft();
      const copyLength = chunk.copy(
        this._buffer,
        this._p,
        writeLength
      );
      this._p += copyLength;
      writeLength += copyLength;
      try {
        this._readFormData();
      } catch (e) {
        callback && callback(e);
        return;
      }
    }
    callback && callback();
  }

  _final(callback) {
    // console.log(this._buffer.toString())
    this._readBoundary(true);
    this._normalizeFormData();

    console.log(this.formData);
    callback(); // 告诉 Node.js 清理工作已完成
  }

  // 读取\r\n
  _readDelimiter() {
    const findIndex = this._getBuffer().indexOf(DELIMITER);
    if (findIndex !== 0) {
      return false;
    }
    this._readIndex += DELIMITER.byteLength;
    return true;
  }

  // 读取分隔符
  _readBoundary(end = false) {
    let boundary = '--' + this._boundary;
    if (end) {
      boundary = '--';
    }
    const boundaryBuf = Buffer.from(boundary);

    const findIndex = this._getBuffer().indexOf(boundaryBuf);
    if (findIndex === -1) {
      const endIndex = this._getLastIndex() - boundaryBuf.byteLength;
      if (endIndex > this._readIndex) {
        this._collectionFormData(
          this._readIndex,
          this._getLastIndex() - boundaryBuf.byteLength
        );
        this._readIndex = this._getLastIndex() - boundaryBuf.byteLength;
      }
      return false;
    }

    // 如果读取到了分隔符，说明上一个数据已经读取完毕，将上一个数据存储到data中
    if (this._contentDispositionObj) {
      this._collectionFormData(
        this._readIndex,
        this._readIndex + findIndex - DELIMITER.byteLength
      );
      this._contentDispositionObj = null;
    }
    this._readIndex += findIndex + boundaryBuf.byteLength;
    return true;
  }

  // 读取 Content-Disposition
  _readContentDisposition() {
    if (this._getBuffer().indexOf('Content-Disposition') === -1) {
      return false;
    }
    const findIndex = this._getBuffer().indexOf(DELIMITER);
    if (findIndex === -1) {
      return false;
    }
    const contentDisposition = this._buffer
      .subarray(this._readIndex, this._readIndex + findIndex)
      .toString();
    const contentDispositionValue = contentDisposition.split(': ')[1];
    if (!contentDispositionValue) {
      throw new Error('Content-Disposition value is required');
    }
    const contentDispositionValueItems = contentDispositionValue.split('; ');
    if (contentDispositionValueItems[0] !== 'form-data') {
      throw new Error('Content-Disposition value is not form-data');
    }
    const contentDispositionObj = {
      id: uuid(),
    };
    contentDispositionValueItems.slice(1).forEach((item) => {
      const [key, value] = item.split('=');
      contentDispositionObj[trimQuotation(key)] = trimQuotation(value);
    });

    // 移动指针
    const readIndex = this._readIndex;
    this._readIndex += findIndex + DELIMITER.byteLength;

    if (contentDispositionObj.filename) {
      // 读取 Content-Type
      const findContentTypeIndex = this._getBuffer().indexOf('Content-Type');
      if (findContentTypeIndex === -1) {
        this._readIndex = readIndex;
        return false;
      }
      const contentTypeEndIndex = this._getBuffer().indexOf(DELIMITER);
      if (contentTypeEndIndex === -1) {
        this._readIndex = readIndex;
        return false;
      }
      const contentType = this._buffer
        .subarray(this._readIndex, this._readIndex + contentTypeEndIndex)
        .toString();
      const contentTypeInfo = this._parseContentType(
        contentType.split(': ')[1]
      );
      // console.log('content-disposition', contentTypeValueItems)
      contentDispositionObj['contentType'] = contentTypeInfo.value;
      contentDispositionObj['contentTypeInfo'] = contentTypeInfo;
      this._readIndex += contentTypeEndIndex + DELIMITER.byteLength;
      if (contentDispositionObj['filename']) {
        const filename = path.join(
          this._options.uploadDir,
          uuid() + path.extname(contentDispositionObj['filename'])
        );
        const fileStream = this._options.getFile
          ? Promise.resolve(this._options.getFile(contentDispositionObj))
          : fs.createWriteStream(filename);
        contentDispositionObj[FILE_STREAM] = fileStream;
        contentDispositionObj['filename'] = filename;
      }
    }

    this._contentDispositionObj = contentDispositionObj;

    return true;
  }

  _getBuffer() {
    return this._buffer.subarray(this._readIndex, this._getLastIndex());
  }

  _setBoundary(req) {
    // 从req.headers中读取boundary
    const contentType = req.headers['content-type'];
    if (!contentType) {
      throw new Error('Content-Type is required');
    }
    const contentTypeInfo = this._parseContentType(contentType);
    if (!contentTypeInfo.boundary) {
      throw new Error('boundary is required');
    }
    this._boundary = contentTypeInfo.boundary;
  }

  _getLastIndex() {
    return this._p;
  }

  // 收集表单数据
  _collectionFormData(startIndex, endIndex) {
    if (this._contentDispositionObj) {
      const data = Buffer.from(this._buffer.subarray(startIndex, endIndex));
      let fieldValues = this._rawFormData[this._contentDispositionObj.name];
      if (!fieldValues) {
        fieldValues = [];
        this._rawFormData[this._contentDispositionObj.name] = fieldValues;
      }

      // 如果是文件，将数据写入到文件中
      let fieldValue = fieldValues.find(
        (item) => item.id === this._contentDispositionObj.id
      );
      if (fieldValue) {
        if (fieldValue.isFile) {
          fieldValue.data.write(data);
        } else {
          fieldValue.data = Buffer.concat([fieldValue.data, data]);
        }
      } else {
        if (this._contentDispositionObj[FILE_STREAM]) {
          this._contentDispositionObj[FILE_STREAM].write(data);
        }
        fieldValues.push({
          id: this._contentDispositionObj.id,
          data: this._contentDispositionObj[FILE_STREAM] || data,
          contentTypeInfo: this._contentDispositionObj.contentTypeInfo,
          filename: this._contentDispositionObj.filename,
          isFile: !!this._contentDispositionObj[FILE_STREAM],
        });
      }
    }
  }

  _normalizeFormData() {
    // console.log(this._rawFormData)
    for (const key in this._rawFormData) {
      const fieldValues = this._rawFormData[key];
      this.formData[key] = [];
      fieldValues.forEach((fieldValue) => {
        // console.log(fieldValue)
        if (fieldValue.contentTypeInfo) {
          const { value } = fieldValue.contentTypeInfo;
          if (fieldValue.isFile) {
            this.formData[key].push({
              filename: fieldValue.filename,
              contentType: fieldValue.contentTypeInfo.value,
              file: fieldValue.data,
            });
          } else {
            switch (value) {
              case 'text/plain':
                this.formData[key].push(fieldValue.data.toString());
              case 'application/json':
                this.formData[key].push(JSON.parse(fieldValue.data.toString()));
                break;
              default:
            }
          }
        } else {
          this.formData[key].push(fieldValue.data.toString());
        }
      });
    }
  }

  getFormData(key) {
    if (key) {
      return this.formData[key].length === 1
        ? this.formData[key][0]
        : this.formData[key];
    }
    return this.formData;
  }

  _parseContentType(contentType) {
    if (!contentType) {
      throw new Error('Content-Type value is required');
    }
    const contentTypeValueItems = contentType.split('; ');
    const contentTypeType = contentTypeValueItems[0];
    const contentTypeInfo = {
      charset: 'utf-8',
      value: contentTypeType,
    };
    contentTypeValueItems.slice(1).forEach((item) => {
      const [key, value] = item.split('=');
      contentTypeInfo[trimQuotation(key)] = trimQuotation(value);
    });
    return contentTypeInfo;
  }

  // 将数据填充移动到左边
  _fillToLeft() {
    if (this._readIndex > 0 && this._readIndex < this._p) {
      const copyLength = this._buffer
        .subarray(this._readIndex)
        .copy(this._buffer);
      this._p = copyLength;
      this._readIndex = 0;
    }
  }

  _readFormData() {
    while (this._readTypeIndex < this._readTypes.length) {
      if (this._readIndex >= this._getLastIndex()) return;
      const readType = this._readTypes[this._readTypeIndex];
      const isRead = this[readType]();
      if (!isRead) {
        return;
      }
      this._readTypeIndex++;
    }
    this._readTypeIndex = 0;
  }
}

const server = http.createServer((req, res) => {
  if (req.method === 'POST') {
    const formParser = new FormParser({
      highWaterMark: 150,
      req,
      uploadDir: path.resolve(__dirname, 'uploads'),
    });
    formParser.on('finish', () => {
      res.setHeader('Content-Type', 'application/json');
      res.end(JSON.stringify(formParser.getFormData()));
    });
  } else {
    res.setHeader('Content-Type', 'text/html');
    fs.createReadStream(path.resolve(__dirname, 'index.html')).pipe(res);
  }
});

server.listen(3000, () => {
  console.log('server is running on 3000');
});

index.html

html 复制代码

<!DOCTYPE html>
<html lang="en">
<head>
  <meta charset="UTF-8">
  <meta name="viewport" content="width=device-width, initial-scale=1.0">
  <title>Document</title>
</head>
<body>
  <form action="/" method="post" enctype="multipart/form-data">
    <div>userName: <input name="username" /></div>
    <div>userName2: <input name="username" /></div>
    <div>file: <input name="file" type="file"/></div>
    <div><input type="submit" value="提交" /></div>
  </form>
</body>
</html>

深入理解node.js中的Stream，解析multipart/form-data实战

基本概念

ReadableStream（可读流）

ReadableStream简单例子

WritableStream（可写流）

WritableStream简单例子

DuplexStream（可读可写流）

DuplexStream简单例子

TransformStream（转换流）

TransformStream简单例子

Stream pipe 方法的核心原理

自定义可读流

自定义可写流

Stream 实战：解析表单（multipart/form-data ）数据

POST 请求 multipart/form-data 数据格式

解析 multipart/form-data

自定义可写流

读取 boundary

定义解析方法

源码实现