# Serverless Worker autoscaling

> **💡 Tip:**
> SUPPORT, STABILITY, and DEPENDENCY INFO
>
> Serverless Workers are in [Pre-release](/evaluate/development-production-features/release-stages#pre-release) and available to select Temporal Cloud customers.
> To request access during Pre-release, create a [support ticket](/cloud/support#support-ticket) or contact your account team.
> APIs are experimental and may be subject to backwards-incompatible changes.
> [Sign up for updates](https://temporal.io/pages/serverless-workers-updates) to be notified when Serverless Workers reach Public Preview.
>

The [Worker Controller Instance (WCI)](/serverless-workers#worker-controller-instance) autoscales Serverless Workers
using two signals: sync match failure and Task Queue backlog. The autoscaling algorithm differs by compute provider
because of differences in cold start latency, invocation duration limits, and provider APIs.

## Scaling signals

Both compute providers use the same two signals to drive scaling decisions.

### Sync match failure 

When a Task is submitted, the [Matching Service](/temporal-service/temporal-server#matching-service) attempts to route
it directly to an available Worker. If no Worker is available, the sync match fails, and the Matching Service pushes a
signal to the WCI. Because the Matching Service pushes match failures as they happen rather than the WCI polling on a
timer, scaling is responsive.

### Task Queue backlog 

The WCI monitors Task Queue metadata to determine whether pending Tasks exist without enough Workers to process them. If
there are Tasks on the queue and not enough Workers, the WCI scales up.

## AWS Lambda 

The Lambda algorithm is event-driven and reactive. Sync match failure is the primary control signal, and backlog aids
sizing.

When the WCI needs more capacity, it calls the Lambda `InvokeFunction` API to start new Workers. Each call is a discrete
action ("invoke N more functions"), not a target state. The WCI does not manage a fleet of instances.

### Scale-out

On sync match failure, the WCI invokes new Lambda functions. Because Lambda cold start is sub-second to low
single-digit seconds, reactive-only control does not create meaningful backlog overshoot. The WCI can scale from zero
with low latency.

### Scale-in

Scale-in is automatic. Each Lambda invocation runs until the Worker has finished processing available Tasks or
approaches the 15-minute execution time limit, then shuts down. There is no drain logic or stabilization window. The WCI
does not need to actively remove capacity.

### Instance model

Each invocation is independent. The Worker starts, creates a fresh client connection, processes multiple Tasks until near
the execution time limit, and then shuts down gracefully. There is no shared state across invocations.

## GCP Cloud Run 

The Cloud Run algorithm is a hybrid rate-plus-backlog controller. It extends the base algorithm with a latency-first
fast-path that reacts to sync match failures.

Unlike Lambda, the WCI outputs a target state ("there should be _c_ instances") rather than discrete invocations. The
WCI adjusts Cloud Run's instance count through the Cloud Run admin API.

### Scale-out

The algorithm uses four layers to determine the desired instance count:

1. **Feedforward base capacity.** The WCI estimates the required fleet size from the Task arrival rate, divided by per-instance throughput at the target utilization. Feedforward sizing is critical because Cloud Run cold start is approximately 10-30 seconds. Waiting for backlog to signal under-provisioning means new capacity is 10-30 seconds away.
2. **Backlog-drain correction.** If a backlog exists, the WCI adds instances to drain it within the target queue wait time.
3. **Warm-reserve headroom.** The WCI maintains extra capacity above the feedforward estimate to absorb sync match failures without triggering cold starts.
4. **Sync match fast-path.** On any sync match failure, the WCI immediately re-evaluates and scales out if the current fleet is undersized. This event-triggered path bypasses the regular control interval.

The final desired count is the maximum of the reactive and event-driven calculations, clamped to the configured minimum
and maximum instance counts, and quantized to the scaling granularity.

### Scale-in

Scale-in is conservative to avoid oscillation:

- **Scale-down stabilization window.** After a scale-down decision, the WCI waits (default 300 seconds) before removing instances. If load increases during this window, the scale-down is canceled.
- **Hold after scale-out.** After scaling out, the WCI holds the new capacity for a minimum period before considering scale-in.
- **Drain logic.** When removing instances, the WCI drains them over a configurable horizon, allowing in-flight Tasks to complete before the instance is terminated.

### Minimum instances

Setting `c_min >= 1` keeps at least one instance warm at all times. With constant traffic, this behaves like an
always-on Worker with elastic scale-up and scale-down. Setting `c_min = 0` enables full scale-to-zero but means the
first Task after an idle period incurs a cold start.

### Tuning parameters

The following parameters control Cloud Run autoscaling behavior. These are starting points for latency-first operation.

| Parameter                  | Starting value              | Description                                                                                 |
| -------------------------- | --------------------------- | ------------------------------------------------------------------------------------------- |
| Control interval           | 15s                         | How often the WCI re-evaluates the desired instance count.                                  |
| Utilization target         | 0.70-0.80                   | Target per-instance utilization for feedforward sizing.                                      |
| Queue wait target          | 3-5s                        | Target time a Task should wait in the queue before being picked up.                         |
| Drain horizon              | 30-60s                      | How long the WCI allows for in-flight Tasks to complete when removing an instance.          |
| Event cooldown             | max(5s, 0.25 x scale-up latency) | Minimum time between event-triggered scale-out evaluations.                            |
| Scale-down stabilization   | 300s                        | How long the WCI waits after a scale-down decision before removing instances.               |
| Hold after scale-out       | max(60s, 2 x scale-up latency)   | Minimum time to hold new capacity before considering scale-in.                         |
| Min instances              | >= 1 for latency-first      | Minimum instance count. Set to 0 for full scale-to-zero.                                    |
| Scaling granularity        | 1                           | Minimum step size for scaling changes.                                                      |
