Cold Start Reduction

How we reduced our serverless cold start times from 3 seconds to under 200ms by optimizing Lambda functions.

The Cold Start Problem

AWS Lambda functions experience “cold starts” when they haven’t been invoked recently. The Lambda service must:

  1. Provision a new execution environment
  2. Download your deployment package
  3. Initialize the runtime
  4. Run your initialization code

This can take several seconds, creating poor user experiences.

Our Starting Point

Initial metrics:

  • Cold start: 3200ms
  • Warm start: 45ms
  • Deployment size: 48MB
  • Dependencies: 127 packages

Clearly unacceptable for user-facing APIs.

Optimization Strategy

1. Reduce Package Size

Our biggest win came from shrinking the deployment package.

Before: 48MB Bundle

{
  "dependencies": {
    "aws-sdk": "^2.1000.0",
    "lodash": "^4.17.21",
    "moment": "^2.29.4",
    "axios": "^1.4.0",
    "express": "^4.18.2",
    "body-parser": "^1.20.2",
    "cors": "^2.8.5",
    "uuid": "^9.0.0",
    "dotenv": "^16.0.3",
    "joi": "^17.9.2",
    "jsonwebtoken": "^9.0.0",
    "bcrypt": "^5.1.0"
  }
}

After: 8MB Bundle

{
  "dependencies": {
    "uuid": "^9.0.0",
    "jsonwebtoken": "^9.0.0"
  }
}

Changes made:

  • ❌ Removed aws-sdk (pre-installed in Lambda)
  • ❌ Removed lodash (used native JS methods)
  • ❌ Removed moment (used native Date)
  • ❌ Removed express (used Lambda proxy integration)
  • ❌ Removed axios (used fetch or aws-sdk clients)
  • ✅ Kept only essential dependencies

Result: 48MB → 8MB (83% reduction)

2. Use Lambda Layers

Extract common dependencies into Lambda layers:

# Create layer structure
mkdir -p layer/nodejs/node_modules

# Install dependencies
cd layer/nodejs
npm install uuid jsonwebtoken --production

# Create layer
cd ..
zip -r layer.zip nodejs

# Deploy layer
aws lambda publish-layer-version \
  --layer-name common-deps \
  --zip-file fileb://layer.zip \
  --compatible-runtimes nodejs18.x

Result: Deployment package reduced to 2MB

3. Lazy Load Dependencies

Don’t import everything at the top:

Before: Eager Loading

// ALL modules loaded on cold start
const jwt = require('jsonwebtoken')
const bcrypt = require('bcrypt')
const { S3Client } = require('@aws-sdk/client-s3')
const { DynamoDBClient } = require('@aws-sdk/client-dynamodb')
const { SNSClient } = require('@aws-sdk/client-sns')

exports.handler = async (event) => {
  // Only use jwt for this request
  const token = jwt.sign({ userId: 123 }, process.env.JWT_SECRET)
  return { statusCode: 200, body: token }
}

After: Lazy Loading

exports.handler = async (event) => {
  // Load only what's needed
  const jwt = require('jsonwebtoken')
  const token = jwt.sign({ userId: 123 }, process.env.JWT_SECRET)
  return { statusCode: 200, body: token }
}

Better approach - Cache imports:

let jwt
let s3Client

exports.handler = async (event) => {
  // Load once per container
  if (!jwt) {
    jwt = require('jsonwebtoken')
  }

  if (!s3Client && event.needsS3) {
    const { S3Client } = require('@aws-sdk/client-s3')
    s3Client = new S3Client({})
  }

  // Use cached imports...
}

Result: Cold start initialization time reduced by 400ms

4. Enable SnapStart

AWS Lambda SnapStart takes a snapshot after initialization:

# serverless.yml
functions:
  api:
    handler: handler.main
    runtime: java11 # SnapStart works with Java
    snapStart: true

For Node.js (no SnapStart yet), use provisioned concurrency:

functions:
  api:
    handler: handler.main
    provisionedConcurrency: 2 # Keep 2 warm instances

Result: Eliminates cold starts for ~$10/month

5. Connection Pooling

Don’t create new DB connections per invocation:

Before: New Connection Every Time

const { Client } = require('pg')

exports.handler = async (event) => {
  const client = new Client({
    host: process.env.DB_HOST,
    database: process.env.DB_NAME,
    user: process.env.DB_USER,
    password: process.env.DB_PASSWORD
  })

  await client.connect() // 200-500ms!

  const result = await client.query('SELECT * FROM users WHERE id = $1', [event.userId])

  await client.end()

  return { statusCode: 200, body: JSON.stringify(result.rows) }
}

After: Reuse Connection

const { Client } = require('pg')

let client

async function getClient() {
  if (!client) {
    client = new Client({
      host: process.env.DB_HOST,
      database: process.env.DB_NAME,
      user: process.env.DB_USER,
      password: process.env.DB_PASSWORD
    })
    await client.connect()
  }
  return client
}

exports.handler = async (event) => {
  const db = await getClient()

  const result = await db.query(
    'SELECT * FROM users WHERE id = $1',
    [event.userId]
  )

  return { statusCode: 200, body: JSON.stringify(result.rows) }
}

Even better - Use RDS Proxy:

const { Client } = require('pg')

const client = new Client({
  host: process.env.RDS_PROXY_ENDPOINT, // RDS Proxy handles pooling
  database: process.env.DB_NAME,
  user: process.env.DB_USER,
  password: process.env.DB_PASSWORD
})

client.connect()

exports.handler = async (event) => {
  const result = await client.query(
    'SELECT * FROM users WHERE id = $1',
    [event.userId]
  )

  return { statusCode: 200, body: JSON.stringify(result.rows) }
}

Result: First request: 0ms connection time. Subsequent requests: instant.

6. Optimize AWS SDK v3

Use modular AWS SDK v3 instead of v2:

Before: AWS SDK v2 (59MB)

const AWS = require('aws-sdk')
const s3 = new AWS.S3()
const dynamodb = new AWS.DynamoDB.DocumentClient()

exports.handler = async (event) => {
  const obj = await s3.getObject({
    Bucket: 'my-bucket',
    Key: 'file.txt'
  }).promise()

  return { statusCode: 200, body: obj.Body.toString() }
}

After: AWS SDK v3 (modular)

const { S3Client, GetObjectCommand } = require('@aws-sdk/client-s3')

// Create client outside handler (reused across invocations)
const s3Client = new S3Client({})

exports.handler = async (event) => {
  const response = await s3Client.send(new GetObjectCommand({
    Bucket: 'my-bucket',
    Key: 'file.txt'
  }))

  const body = await streamToString(response.Body)
  return { statusCode: 200, body }
}

function streamToString(stream) {
  return new Promise((resolve, reject) => {
    const chunks = []
    stream.on('data', chunk => chunks.push(chunk))
    stream.on('error', reject)
    stream.on('end', () => resolve(Buffer.concat(chunks).toString('utf8')))
  })
}

Result: Only import needed clients, reducing package size

7. Use Native Fetch Instead of Axios

Node.js 18+ has native fetch:

Before: Axios (316KB)

const axios = require('axios')

exports.handler = async (event) => {
  const response = await axios.get('https://api.example.com/data')
  return { statusCode: 200, body: JSON.stringify(response.data) }
}

After: Native Fetch (0KB)

exports.handler = async (event) => {
  const response = await fetch('https://api.example.com/data')
  const data = await response.json()
  return { statusCode: 200, body: JSON.stringify(data) }
}

Result: 316KB saved, one less dependency

8. Bundle with esbuild

Tree-shake unused code:

// build.js
const esbuild = require('esbuild')

esbuild.build({
  entryPoints: ['src/handler.js'],
  bundle: true,
  platform: 'node',
  target: 'node18',
  outfile: 'dist/handler.js',
  external: ['aws-sdk'], // Don't bundle AWS SDK
  minify: true,
  sourcemap: true
}).catch(() => process.exit(1))

Before bundling: 8MB After bundling: 1.2MB

9. Use Arm64 Architecture

Graviton2 processors are faster and cheaper:

# serverless.yml
functions:
  api:
    handler: handler.main
    architecture: arm64 # Instead of x86_64

Result: 20% faster execution, 20% cheaper

10. Warm-Up Strategy

Keep functions warm with scheduled pings:

functions:
  api:
    handler: handler.main
    events:
      - http:
          path: /api
          method: get
      - schedule:
          rate: rate(5 minutes)
          input:
            warmup: true
exports.handler = async (event) => {
  // Ignore warmup events
  if (event.warmup) {
    return { statusCode: 200, body: 'warmup' }
  }

  // Real logic...
}

Cost: ~$1/month per function Benefit: No cold starts during business hours

Results

MetricBeforeAfterImprovement
Cold start3200ms180ms94% faster
Package size48MB1.2MB97% smaller
Memory usage512MB256MB50% less
Monthly cost$87$2472% cheaper

Advanced: Container Image Lambdas

For very large functions, use container images:

FROM public.ecr.aws/lambda/nodejs:18

# Copy package files
COPY package*.json ./
RUN npm ci --production

# Copy source code
COPY src/ ./src/

# Set handler
CMD ["src/handler.main"]

Build and deploy:

# Build image
docker build -t my-lambda .

# Push to ECR
aws ecr get-login-password --region us-east-1 | \
  docker login --username AWS --password-stdin 123456789.dkr.ecr.us-east-1.amazonaws.com

docker tag my-lambda:latest 123456789.dkr.ecr.us-east-1.amazonaws.com/my-lambda:latest
docker push 123456789.dkr.ecr.us-east-1.amazonaws.com/my-lambda:latest

# Deploy Lambda
aws lambda create-function \
  --function-name my-lambda \
  --package-type Image \
  --code ImageUri=123456789.dkr.ecr.us-east-1.amazonaws.com/my-lambda:latest \
  --role arn:aws:iam::123456789:role/lambda-execution

Benefits:

  • Up to 10GB image size
  • Use any dependencies (including native binaries)
  • Better for ML models or complex apps

Drawbacks:

  • Slower cold starts (500ms - 2s)
  • More complex deployment

Monitoring Cold Starts

Track cold starts with CloudWatch:

const isWarmStart = global.isWarmStart || false
global.isWarmStart = true

exports.handler = async (event) => {
  console.log(JSON.stringify({
    coldStart: !isWarmStart,
    requestId: event.requestContext.requestId
  }))

  // Your logic...
}

Query in CloudWatch Insights:

fields @timestamp, @message
| filter @message like /coldStart/
| stats count(*) as coldStarts by bin(5m)

Cost Analysis

Provisioned Concurrency vs Cold Starts

Cold starts (no provisioned concurrency):

  • 1M requests/month
  • 10% cold starts = 100K cold starts
  • Avg cold start: 200ms
  • User experience: Poor for 10% of requests
  • Cost: $20/month

Provisioned concurrency (2 instances):

  • 1M requests/month
  • 0% cold starts
  • User experience: Consistent
  • Cost: $30/month ($10 extra)

Verdict: Worth it for user-facing APIs

Best Practices Summary

  1. Keep it small - Minimize dependencies and bundle size
  2. Use layers - Share common code across functions
  3. Lazy load - Import only what you need, when you need it
  4. Reuse connections - DB, HTTP, AWS SDK clients
  5. Use AWS SDK v3 - Modular imports reduce size
  6. Bundle with esbuild - Tree-shake unused code
  7. Use Arm64 - 20% faster and cheaper
  8. Consider provisioned concurrency - For critical APIs
  9. Monitor cold starts - Track and optimize
  10. Test in production - Staging won’t show cold starts

When to Accept Cold Starts

Not every function needs optimization:

  • Background jobs - Users don’t wait for these
  • Infrequent tasks - < 1 request/hour
  • Internal tools - Developers are patient
  • Webhooks - Can retry on timeout

Save optimization effort for user-facing APIs.

Alternative Approaches

Use API Gateway HTTP APIs

Faster cold starts than REST APIs:

functions:
  api:
    handler: handler.main
    events:
      - httpApi: # Instead of 'http'
          path: /api
          method: get

Use Lambda@Edge

Run at CloudFront edge locations (no cold starts):

exports.handler = async (event) => {
  const request = event.Records[0].cf.request

  // Modify request or generate response
  return {
    status: '200',
    body: 'Hello from edge!'
  }
}

Consider Fargate

For long-running processes:

# serverless.yml
service: my-service

plugins:
  - serverless-fargate

custom:
  fargate:
    clusterName: my-cluster
    containerInsights: true

functions:
  api:
    type: fargate
    image: 123456789.dkr.ecr.us-east-1.amazonaws.com/my-api:latest
    memory: 512
    cpu: 256

Benefits:

  • No cold starts
  • Long execution time (no 15min limit)
  • More memory options

Drawbacks:

  • More expensive
  • Always running (even with no traffic)

Lessons Learned

  1. Size matters most - Biggest impact on cold starts
  2. SDK v3 is essential - Don’t use v2 anymore
  3. Layers are underutilized - Share code across functions
  4. Provisioned concurrency works - But costs money
  5. Measure everything - Can’t optimize what you don’t measure

Tools and Resources