Async map with limited parallelism in Node.js
A Array.map() is a very useful function but, unfortunately, it only works with synchronous functions. A simple workaround for using async
map functions is to use Promose.all()
or its more tolerant brother Promise.allSettled()
:
It works like this: the .map()
will convert each array item to a promise, so we’ll end up with an array of promises to resolve. There are two ways of doing this:
Promise.all()
throws if the map function throws (MDN)Promise.allSettled()
runs the map function for the whole array even if sometimes it throws (MDN)
Therefore, the output of the .allSettled()
is an array of objects that tells whether the execution failed or not.
The output of .allSettled()
is an array of objects each looking like this:
There’s a catch though: unlike a “normal” .map()
, the map functions will not execute serially. The async
map functions will be running at the same time. Although JavaScript is normally a single-threaded language, it means the resources allocated (like memory and ports) for each function will be occupied until the promise is resolved or rejected. For huge arrays, however, we are going to run a huge number of map functions at the same time. This can potentially:
- Consume a lot of memory as each map function holds all its variables as long as it is running. If you’re running lambda, for example, it may easily crash your runtime (or you have to pay extra to bump to a beefier execution runtime)
- Hit the rate limits: if the map is accessing an API for each call
It would be nice if we could somehow limit those parallel runs. One option is to use the eachLimit()
function from the popular async
module. Another option is Bluebird.map()
with the concurrency
option set to a lower value. But what if we don’t want to import a dependency for such a simple use case? Let’s experiment and learn something.
Limit parallel calls
Right off the bat, we’re going to use generators. I know it’s a feature of JavaScript that many developers (including myself) don’t use often, but hear me out: for this case, it’ll reduce our memory usage and create cleaner code.
Example
Let’s define a hypothetical problem first. We have 100 URLs that we want to fetch but we don’t want to have more than 10 parallel calls at a time. We’re going to hit Google because they can usually handle this kind of load with ease!
Now let’s put these together and write a program that takes those 100 URLs, maps them to their contents, and prints the results:
Now we need to write the mapAllSettled()
function which is basically like Promise.allSettled(array.map(asyncMapFn))
but with a limit. Its signature looks like this: async function mapAllSettled(array, mapFn, limit)
.
But let’s step back a bit and see what the execution will look like. For the sake of simplicity let’s say we have 10 URLs. If we were to fetch all of them at once, we would have something like this:
But if we were to have four fetches as the same time, it’d look like this:
As soon as one fetch is done, we’ll proceed with the next one. At each time we have four ongoing fetches. Let’s rearrange the runtime into four lanes which will be executed by some “workers”:
The workers all “consume” the same array but “insert” the result in the right position in the resulting array in a way that the mapped value for URL number seven ends up in position seven of the resulting array.
Here’s where the generators come in handy. We can define a generator that takes an array and yield
s what the map function expects:
To keep the output format consistent with the Promise.allSettled()
, we can run the map functions in a try..catch
block and emit the result in an object with a similar format:
Each worker uses the generator function to fetch the currentItem
, index
and a reference to the array
, then calls mapItem()
to run the async mapFn()
:
I’ve added some console.time()
and console.timeEnd()
to make the output more understandable but, basically, this function has two lines of code:
- The
for..of
loop consumes data from the generator - the
mapItem()
calls the user-specifiedmapFn()
and returns its results in an object that has the same format asPromise.allSettled()
Now let’s write mapAllSettled()
which basically creates those workers and waits for them to finish, then returns the results:
The key here is to share the generator (gen
) between the workers. Obviously, there is no point in processing if the array is empty so we get that edge case out of the way on line four. Also, there’s no point in having more workers than there are array elements, therefore in line 10 we make sure that limit
is at most equal to the array length.
Conclusion
The limit
defaults to the array length, which makes mapAllSettled()
behave exactly like Promise.allSettled()
because all the map functions will run in parallel. But the whole point of this function is to give the users control to set that to a lower number.
The full code is on Github if you want to play with it (MIT license).
Did you like what you read? Follow me here or on LinkedIn. I write about technical leadership and web architecture. If you want to translate or republish this article, here’s a quick guide.