Async map with limited parallelism in Node.js
A Array.map() is a very useful function but, unfortunately, it only works with synchronous functions. A simple workaround for using
async map functions is to use
Promose.all() or its more tolerant brother
It works like this: the
.map() will convert each array item to a promise, so we’ll end up with an array of promises to resolve. There are two ways of doing this:
Promise.all()throws if the map function throws (MDN)
Promise.allSettled()runs the map function for the whole array even if sometimes it throws (MDN)
Therefore, the output of the
.allSettled() is an array of objects that tells whether the execution failed or not.
The output of
.allSettled() is an array of objects each looking like this:
There’s a catch though: unlike a “normal”
.map(), the map functions will not execute serially. The
- Consume a lot of memory as each map function holds all its variables as long as it is running. If you’re running lambda, for example, it may easily crash your runtime (or you have to pay extra to bump to a beefier execution runtime)
- Hit the rate limits: if the map is accessing an API for each call
It would be nice if we could somehow limit those parallel runs. One option is to use the
eachLimit() function from the popular
async module. Another option is
Bluebird.map() with the
concurrency option set to a lower value. But what if we don’t want to import a dependency for such a simple use case? Let’s experiment and learn something.
Limit parallel calls
Let’s define a hypothetical problem first. We have 100 URLs that we want to fetch but we don’t want to have more than 10 parallel calls at a time. We’re going to hit Google because they can usually handle this kind of load with ease!
Now let’s put these together and write a program that takes those 100 URLs, maps them to their contents, and prints the results:
Now we need to write the
mapAllSettled() function which is basically like
Promise.allSettled(array.map(asyncMapFn)) but with a limit. Its signature looks like this:
async function mapAllSettled(array, mapFn, limit).
But let’s step back a bit and see what the execution will look like. For the sake of simplicity let’s say we have 10 URLs. If we were to fetch all of them at once, we would have something like this:
But if we were to have four fetches as the same time, it’d look like this:
As soon as one fetch is done, we’ll proceed with the next one. At each time we have four ongoing fetches. Let’s rearrange the runtime into four lanes which will be executed by some “workers”:
The workers all “consume” the same array but “insert” the result in the right position in the resulting array in a way that the mapped value for URL number seven ends up in position seven of the resulting array.
Here’s where the generators come in handy. We can define a generator that takes an array and
yields what the map function expects:
To keep the output format consistent with the
Promise.allSettled(), we can run the map functions in a
try..catch block and emit the result in an object with a similar format:
Each worker uses the generator function to fetch the
index and a reference to the
array , then calls
mapItem() to run the async
I’ve added some
console.timeEnd() to make the output more understandable but, basically, this function has two lines of code:
for..ofloop consumes data from the generator
mapItem()calls the user-specified
mapFn()and returns its results in an object that has the same format as
Now let’s write
mapAllSettled() which basically creates those workers and waits for them to finish, then returns the results:
The key here is to share the generator (
gen) between the workers. Obviously, there is no point in processing if the array is empty so we get that edge case out of the way on line four. Also, there’s no point in having more workers than there are array elements, therefore in line 10 we make sure that
limit is at most equal to the array length.
limit defaults to the array length, which makes
mapAllSettled() behave exactly like
Promise.allSettled() because all the map functions will run in parallel. But the whole point of this function is to give the users control to set that to a lower number.
The full code is on Github if you want to play with it (MIT license).