Issues with BullMQ Worker Not Running in Vercel Production Environment

I’m working on a project where I’m using Vercel to handle API routes for enqueuing jobs using BullMQ and Redis. Everything works perfectly in development, but I’m running into an issue in production. The api’s make call to openAi so the response takes over a minute each time.

Here’s my setup:

  • Vercel: I’m using Vercel to deploy my Next.js API routes. The routes include one for starting a job (/api/startArticleGeneration) and another for checking the status of a job.
  • BullMQ & Redis: I’m using BullMQ to manage job queues, with Redis as the backend. The job creation and status checking work fine, but the actual job processing doesn’t seem to happen in production after deploying to Vercel.

The problem:

When I trigger the job creation API, it responds with a 202 Accepted status, but the job doesn’t seem to be processed after that. The same setup works perfectly in my local development environment, where the jobs are processed as expected.

I suspect the issue is related to Vercel’s serverless environment, where long-running processes aren’t supported but I am not entirely sure. Is there some way to handle this with Vercel or should I redeploy elsewhere.

I would expect to see a timeout error in the runtime logs if the function is exceeding the time limit. You could increase the max duration for your serverless functions and see if that helps. The current maximum is 60s for Hobby and 300s for Pro teams.

You mentioned that this may be related to the serverless environment, and that seems plausible. Vercel Functions stop execution after a response is fulfilled, and any async work is terminated. This is definitely different from traditional server environments where asynchronous work can run indefinitely.

For a job to continue processing you would need to keep the Function invocation running, meaning you can’t send a response until the job is finished. The waitUntil` primitive helps support these use cases. Just be aware that extending the execution of a Function also incurs more usage, which eventually may lead to increased costs.

It’s not timing out because Im returning a response that the queue has started. There are no more logs after the first log. The API takes over 1.5 minutes to complete which is why I am trying to complete the tasks in the background.

Is there any other way to handle a situation like this with out deploying elsewhere?

Streaming could be an option. You can learn more about that here: Streaming Functions

This topic was automatically closed 2 days after the last reply. New replies are no longer allowed.