Node.JS多线程PromisePool之Async库实现

What's Async

Async is a utility module which provides straight-forward, powerful functions for working with asynchronous JavaScript. Although originally designed for use with Node.js and installable via npm i async, it can also be used directly in the browser. A ESM/MJS version is included in the main async package that should automatically be used with compatible bundlers such as Webpack and Rollup.

Async - NpmJS

async - npm (npmjs.com)https://www.npmjs.com/package/async

Async - Document

https://caolan.github.io/async/

Async异步是一个实用程序模块，它为使用异步JavaScript提供了直接的、功能强大的功能。

虽然最初设计用于Node.js，并且可以通过npm i异步安装，但它也可以直接在浏览器中使用。

的一个ESM/MJS版本包含在主异步包中，它应该自动与兼容的捆绑包一起使用，如Web包和汇总。

※Async多线程实战※

创建一个名为threadPool.js的文件，并添加以下代码：

javascript 复制代码

const async = require('async');

// 创建一个包含5个worker的线程池
const threadPool = async.queue((task, callback) => {
  // 模拟一个耗时操作
  setTimeout(() => {
    console.log('Task completed:', task);
    callback();
  }, 1000);
}, 5);

// 添加任务到线程池
for (let i = 0; i < 10; i++) {
  threadPool.push(i, (err) => {
    if (err) {
      console.error('Error:', err);
    } else {
      console.log('Task finished:', i);
    }
  });
}

在这个示例中，我们创建了一个包含5个worker的线程池。然后，我们向线程池添加了10个任务。线程池会并发执行这些任务，但最多只能有5个任务同时运行。当一个任务完成时，线程池会自动分配下一个任务给空闲的worker。

官方一些有用的例子

Quick Examples

javascript 复制代码

async.map(['file1','file2','file3'], fs.stat, function(err, results) {
    // results is now an array of stats for each file
});

async.filter(['file1','file2','file3'], function(filePath, callback) {
  fs.access(filePath, function(err) {
    callback(null, !err)
  });
}, function(err, results) {
    // results now equals an array of the existing files
});

async.parallel([
    function(callback) { ... },
    function(callback) { ... }
], function(err, results) {
    // optional callback
});

async.series([
    function(callback) { ... },
    function(callback) { ... }
]);

There are many more functions available so take a look at the docs below for a full list. This module aims to be comprehensive, so if you feel anything is missing please create a GitHub issue for it.

Common Pitfalls (StackOverflow)

Synchronous iteration functions

If you get an error like RangeError: Maximum call stack size exceeded. or other stack overflow issues when using async, you are likely using a synchronous iteratee. By synchronous we mean a function that calls its callback on the same tick in the javascript event loop, without doing any I/O or using any timers. Calling many callbacks iteratively will quickly overflow the stack. If you run into this issue, just defer your callback with async.setImmediate to start a new call stack on the next tick of the event loop.

This can also arise by accident if you callback early in certain cases:

javascript 复制代码

async.eachSeries(hugeArray, function iteratee(item, callback) {
    if (inCache(item)) {
        callback(null, cache[item]); // if many items are cached, you'll overflow
    } else {
        doSomeIO(item, callback);
    }
}, function done() {
    //...
});

Just change it to:

javascript 复制代码

async.eachSeries(hugeArray, function iteratee(item, callback) {
    if (inCache(item)) {
        async.setImmediate(function() {
            callback(null, cache[item]);
        });
    } else {
        doSomeIO(item, callback);
        //...
    }
});

Async does not guard against synchronous iteratees for performance reasons. If you are still running into stack overflows, you can defer as suggested above, or wrap functions with async.ensureAsync Functions that are asynchronous by their nature do not have this problem and don't need the extra callback deferral.

If JavaScript's event loop is still a bit nebulous, check out this article or this talk for more detailed information about how it works.

Multiple callbacks

Make sure to always return when calling a callback early, otherwise you will cause multiple callbacks and unpredictable behavior in many cases.

javascript 复制代码

async.waterfall([
    function(callback) {
        getSomething(options, function (err, result) {
            if (err) {
                callback(new Error("failed getting something:" + err.message));
                // we should return here
            }
            // since we did not return, this callback still will be called and
            // `processData` will be called twice
            callback(null, result);
        });
    },
    processData
], done)

It is always good practice to return callback(err, result) whenever a callback call is not the last statement of a function.

Using ES2017 `async` functions

Async accepts async functions wherever we accept a Node-style callback function. However, we do not pass them a callback, and instead use the return value and handle any promise rejections or errors thrown.

javascript 复制代码

async.mapLimit(files, 10, async file => { // <- no callback!
    const text = await util.promisify(fs.readFile)(dir + file, 'utf8')
    const body = JSON.parse(text) // <- a parse error here will be caught automatically
    if (!(await checkValidity(body))) {
        throw new Error(`${file} has invalid contents`) // <- this error will also be caught
    }
    return body // <- return a value!
}, (err, contents) => {
    if (err) throw err
    console.log(contents)
})

We can only detect native async functions, not transpiled versions (e.g. with Babel). Otherwise, you can wrap async functions in async.asyncify().

Binding a context to an iteratee

This section is really about bind, not about Async. If you are wondering how to make Async execute your iteratees in a given context, or are confused as to why a method of another library isn't working as an iteratee, study this example:

javascript 复制代码

// Here is a simple object with an (unnecessarily roundabout) squaring method
var AsyncSquaringLibrary = {
    squareExponent: 2,
    square: function(number, callback){
        var result = Math.pow(number, this.squareExponent);
        setTimeout(function(){
            callback(null, result);
        }, 200);
    }
};

async.map([1, 2, 3], AsyncSquaringLibrary.square, function(err, result) {
    // result is [NaN, NaN, NaN]
    // This fails because the `this.squareExponent` expression in the square
    // function is not evaluated in the context of AsyncSquaringLibrary, and is
    // therefore undefined.
});

async.map([1, 2, 3], AsyncSquaringLibrary.square.bind(AsyncSquaringLibrary), function(err, result) {
    // result is [1, 4, 9]
    // With the help of bind we can attach a context to the iteratee before
    // passing it to Async. Now the square function will be executed in its
    // 'home' AsyncSquaringLibrary context and the value of `this.squareExponent`
    // will be as expected.
});

Subtle Memory Leaks

There are cases where you might want to exit early from async flow, when calling an Async method inside another async function:

javascript 复制代码

function myFunction (args, outerCallback) {
    async.waterfall([
        //...
        function (arg, next) {
            if (someImportantCondition()) {
                return outerCallback(null)
            }
        },
        function (arg, next) {/*...*/}
    ], function done (err) {
        //...
    })
}

Something happened in a waterfall where you want to skip the rest of the execution, so you call an outer callack. However, Async will still wait for that inner next callback to be called, leaving some closure scope allocated.

As of version 3.0, you can call any Async callback with false as the error argument, and the rest of the execution of the Async method will be stopped or ignored.

javascript 复制代码

        function (arg, next) {
            if (someImportantCondition()) {
                outerCallback(null)
                return next(false) // ← signal that you called an outer callback
            }
        },

Mutating collections while processing them

If you pass an array to a collection method (such as each, mapLimit, or filterSeries), and then attempt to push, pop, or splice additional items on to the array, this could lead to unexpected or undefined behavior. Async will iterate until the original length of the array is met, and the indexes of items pop()ed or splice()d could already have been processed. Therefore, it is not recommended to modify the array after Async has begun iterating over it. If you do need to push, pop, or splice, use a queue instead.

Node.JS多线程PromisePool之Async库实现

What's Async

Async - NpmJS

Async - Document

※Async多线程实战※

官方一些有用的例子

Quick Examples

Common Pitfalls (StackOverflow)

Synchronous iteration functions

Multiple callbacks

Using ES2017 async functions

Binding a context to an iteratee

Subtle Memory Leaks

Mutating collections while processing them

Using ES2017 `async` functions