Building Serverless Applications with AWS Lambda: Production Guide - NextGenBeing Building Serverless Applications with AWS Lambda: Production Guide - NextGenBeing
Back to discoveries

Complete Guide to Building a Serverless Application with AWS Lambda

Learn how we built a production serverless application handling 10M+ requests per month using AWS Lambda, API Gateway, and DynamoDB—including the mistakes that cost us $4,000 in our first week.

Cloud Computing Premium Content 31 min read
Aaron Vasquez

Aaron Vasquez

May 20, 2026 0 views
Complete Guide to Building a Serverless Application with AWS Lambda
Photo by Growtika on Unsplash
Size:
Height:
📖 31 min read 📝 9,762 words 👁 Focus mode: ✨ Eye care:

Listen to Article

Loading...
0:00 / 0:00
0:00 0:00
Low High
0% 100%
⏸ Paused ▶️ Now playing... Ready to play ✓ Finished

Last November, our team made the decision to rebuild our analytics dashboard as a serverless application on AWS Lambda. We were processing about 2 million API requests per month on a traditional EC2-based stack, and our infrastructure costs were climbing faster than our revenue. My CTO, Marcus, had been pushing for serverless for months, arguing we were paying for idle capacity 70% of the time. I was skeptical—I'd heard the horror stories about cold starts and vendor lock-in.

Three months later, we're handling 10 million requests per month, our infrastructure costs dropped by 60%, and I'm writing this guide because I wish someone had written it for me. But here's the thing: our first week in production was a disaster. We racked up $4,000 in unexpected Lambda costs because I made assumptions about how pricing worked. We had API endpoints timing out because I didn't understand execution context reuse. And we nearly lost a major client when our database connection pooling strategy fell apart under load.

This isn't going to be another tutorial that shows you how to deploy a "Hello World" Lambda function. You can find that in the AWS docs. Instead, I'm going to walk you through building a real production serverless application—the kind that handles actual user traffic, integrates with multiple AWS services, and needs to stay up when things go wrong. I'll show you the code we're actually running, the mistakes we made, and the hard-won lessons that aren't in any documentation.

Why We Chose Serverless (And Why You Might Not)

Before I dive into implementation details, let me be honest about the decision-making process. Serverless wasn't an obvious choice for us, and it might not be right for your use case either.

Our application is an analytics API that receives event data from client applications, processes it, stores it in DynamoDB, and serves aggregated reports through REST endpoints. Traffic is spiky—we get hit hard during business hours (9 AM to 6 PM EST) and see almost nothing at night. Our traditional EC2 setup meant we were paying for t3.large instances 24/7 to handle peak load, even though they sat at 5% CPU utilization for 16 hours a day.

I ran the numbers. Our monthly EC2 costs were around $850 (three t3.large instances for redundancy, plus load balancer). Our actual compute needs during peak hours suggested we needed maybe 2-3 hours of full capacity daily. Lambda's pricing model—pay per request and execution time—meant we'd only pay for what we actually used.

But here's what I didn't consider initially: Lambda has limitations that can bite you hard. Each function execution is limited to 15 minutes maximum. Memory ranges from 128 MB to 10 GB. You can't maintain persistent connections between invocations (at least not reliably). And if your application needs sub-10ms response times consistently, cold starts will kill you.

We spent two weeks prototyping before committing. I built a proof-of-concept version of our event ingestion endpoint and load-tested it with Apache Bench. Here's what that looked like:

ab -n 10000 -c 100 -p event.json -T application/json \
  https://api.example.com/events

Output from the initial test:

Concurrency Level:      100
Time taken for tests:   45.234 seconds
Complete requests:      10000
Failed requests:        0
Total transferred:      2890000 bytes
Requests per second:    221.08 [#/sec] (mean)
Time per request:       452.340 [ms] (mean)
Time per request:       4.523 [ms] (mean, across all concurrent requests)

Those numbers were acceptable for our use case. Most importantly, failed requests were zero. But I noticed something concerning in CloudWatch metrics—about 15% of requests were taking 800-1200ms, while the rest were under 200ms. That was my first encounter with cold starts, and I didn't fully understand what I was seeing yet.

My colleague Sarah, who'd worked with Lambda at her previous company, warned me: "Cold starts are going to be your biggest pain point. You need a strategy from day one." She was right, but I didn't fully grasp it until we hit production.

The Architecture We Built (And How It Evolved)

Let me show you what our serverless architecture looks like now, after three months of iteration. This isn't what we started with—I'll explain the evolution as we go.

Our current setup consists of:

  • API Gateway (REST API) as our entry point
  • Six Lambda functions (three for ingestion, three for reporting)
  • DynamoDB for data storage (two tables: events and aggregations)
  • S3 for storing raw event data as backup
  • CloudWatch for logging and monitoring
  • EventBridge for scheduled aggregation jobs
  • SQS for async event processing when we need guaranteed delivery

The request flow for our event ingestion endpoint looks like this:

  1. Client POSTs event data to API Gateway endpoint
  2. API Gateway triggers the ingest-event Lambda function
  3. Lambda validates the event, writes to DynamoDB, and sends raw data to S3
  4. Lambda returns a 202 Accepted response to the client
  5. DynamoDB Streams trigger the process-event Lambda for async processing
  6. Processed data goes into our aggregations table

Here's the critical lesson I learned: keep your Lambda functions small and focused. My first version tried to do everything in one function—validation, storage, processing, and aggregation. It was a 500-line monolith that took 3-4 seconds to execute and cost us a fortune. Breaking it into smaller functions reduced our average execution time to 200ms and cut costs by 70%.

Let me show you what the ingestion function looks like now:

// ingest-event/index.js
const AWS = require('aws-sdk');
const dynamodb = new AWS.DynamoDB.DocumentClient();
const s3 = new AWS.S3();

// Initialize clients outside handler for connection reuse
const TABLE_NAME = process.env.EVENTS_TABLE;
const BUCKET_NAME = process.env.RAW_EVENTS_BUCKET;

exports.handler = async (event) => {
  const startTime = Date.now();
  
  try {
    // Parse and validate the incoming event
    const body = JSON.parse(event.body);
    const validationError = validateEvent(body);
    
    if (validationError) {
      return {
        statusCode: 400,
        body: JSON.stringify({ error: validationError })
      };
    }
    
    // Generate unique ID and timestamp
    const eventId = generateEventId();
    const timestamp = Date.now();
    
    // Prepare DynamoDB item
    const item = {
      eventId,
      timestamp,
      userId: body.userId,
      eventType: body.eventType,
      properties: body.properties,
      ttl: timestamp + (90 * 24 * 60 * 60) // 90 days TTL
    };
    
    // Write to DynamoDB and S3 in parallel
    await Promise.all([
      dynamodb.put({
        TableName: TABLE_NAME,
        Item: item
      }).promise(),
      
      s3.putObject({
        Bucket: BUCKET_NAME,
        Key: `events/${new Date().toISOString().split('T')[0]}/${eventId}.json`,
        Body: JSON.stringify(body),
        ContentType: 'application/json'
      }).promise()
    ]);
    
    const duration = Date.now() - startTime;
    console.log(`Event processed in ${duration}ms`);
    
    return {
      statusCode: 202,
      body: JSON.stringify({ 
        eventId,
        message: 'Event accepted for processing'
      })
    };
    
  } catch (error) {
    console.error('Error processing event:', error);
    
    return {
      statusCode: 500,
      body: JSON.stringify({ 
        error: 'Internal server error',
        requestId: event.requestContext.requestId
      })
    };
  }
};

function validateEvent(body) {
  if (!body.userId) return 'userId is required';
  if (!body.eventType) return 'eventType is required';
  if (!body.properties || typeof body.properties !== 'object') {
    return 'properties must be an object';
  }
  return null;
}

function generateEventId() {
  return `evt_${Date.now()}_${Math.random().toString(36).substr(2, 9)}`;
}

When I deploy this function and test it, here's what the CloudWatch logs look like:

START RequestId: a1b2c3d4-e5f6-7890-abcd-ef1234567890 Version: $LATEST
Event processed in 145ms
END RequestId: a1b2c3d4-e5f6-7890-abcd-ef1234567890
REPORT RequestId: a1b2c3d4-e5f6-7890-abcd-ef1234567890
Duration: 147.82 ms  Billed Duration: 148 ms  Memory Size: 512 MB  Max Memory Used: 78 MB
Init Duration: 312.45 ms

Notice that Init Duration? That's the cold start—312ms to initialize the execution environment. For warm invocations (when Lambda reuses an existing environment), that Init Duration disappears and we're down to just the 148ms execution time.

The Cold Start Problem (And Three Solutions That Actually Work)

Cold starts nearly killed our production launch. Here's what happened: we deployed on a Friday afternoon (mistake #1—never deploy on Friday). Traffic was low initially, so everything looked fine. Monday morning at 9 AM, we got slammed with 500 requests in the first minute. About 30% of those requests timed out.

I pulled up CloudWatch Insights and ran this query to see what was going on:

fields @timestamp, @duration, @initDuration
| filter @type = "REPORT"
| stats avg(@duration), avg(@initDuration), max(@duration), max(@initDuration) by bin(5m)

The results were brutal:

Average Duration: 156ms
Average Init Duration: 1,247ms (when present)
Max Duration: 2,341ms
Max Init Duration: 3,892ms
Percentage of requests with cold starts: 28%

Nearly a third of our requests were experiencing cold starts over a second long. Our API Gateway timeout was set to 3 seconds, so we were barely squeaking by. But some requests were hitting that 3.8-second initialization and timing out completely.

I tried three different approaches to solve this, and I'll tell you which ones actually worked in production.

Solution 1: Provisioned Concurrency (Expensive But Effective)

Provisioned Concurrency keeps Lambda execution environments warm and ready to respond immediately. You specify how many concurrent executions to keep initialized, and AWS maintains that pool for you.

I configured it through the AWS Console initially, then moved to Infrastructure as Code with the AWS CDK:

// cdk/lambda-stack.js
const lambda = require('@aws-cdk/aws-lambda');

const ingestFunction = new lambda.Function(this, 'IngestEvent', {
  runtime: lambda.Runtime.NODEJS_18_X,
  handler: 'index.handler',
  code: lambda.Code.fromAsset('lambda/ingest-event'),
  memorySize: 512,
  timeout: Duration.seconds(5),
  environment: {
    EVENTS_TABLE: eventsTable.tableName,
    RAW_EVENTS_BUCKET: rawEventsBucket.bucketName
  }
});

// Add provisioned concurrency
const version = ingestFunction.currentVersion;
const alias = new lambda.Alias(this, 'IngestEventAlias', {
  aliasName: 'production',
  version: version,
  provisionedConcurrentExecutions: 5
});

The impact was immediate. Cold start percentage dropped from 28% to less than 1%. But here's the catch: Provisioned Concurrency is expensive. I was paying $0.015 per GB-hour for 5 concurrent executions with 512 MB memory. That's about $27 per month just to keep functions warm, on top of the actual execution costs.

For our high-traffic ingestion endpoint, it was worth it. For our reporting endpoints that get hit maybe 100 times per day? Absolutely not worth it.

Solution 2: Scheduled Warming (Clever But Fragile)

My second approach was to use EventBridge to ping our Lambda functions every 5 minutes, keeping them warm without paying for Provisioned Concurrency. I set up a scheduled rule:

// cdk/warming-stack.js
const events = require('@aws-cdk/aws-events');
const targets = require('@aws-cdk/aws-events-targets');

const warmingRule = new events.Rule(this, 'WarmingRule', {
  schedule: events.Schedule.rate(Duration.minutes(5)),
  description: 'Keep Lambda functions warm'
});

warmingRule.addTarget(new targets.LambdaFunction(ingestFunction, {
  event: events.RuleTargetInput.fromObject({
    source: 'warming',
    action: 'ping'
  })
}));

In my Lambda function, I added logic to detect and handle warming requests:

exports.handler = async (event) => {
  // Handle warming pings
  if (event.source === 'warming' && event.action === 'ping') {
    console.log('Warming ping received');
    return { statusCode: 200, body: 'warm' };
  }
  
  // Regular processing logic continues...
};

This reduced cold starts to about 8%, which was acceptable for our lower-traffic endpoints. But it's fragile—if traffic suddenly spikes and you need more than one concurrent execution, those additional invocations will still cold start. And you're paying for those warming invocations (though they're cheap since they do almost nothing).

Solution 3: Optimize Initialization (The Real Solution)

The approach that actually solved our cold start problem long-term was optimizing what happens during initialization. I profiled our Lambda function and discovered we were doing a ton of unnecessary work during cold starts.

Here's what I found:

  • Loading the entire AWS SDK (all services) added 400ms
  • Establishing database connections during initialization added 300ms
  • Loading a large JSON configuration file added 150ms

I refactored to only load what we needed:

// Before: Loading entire AWS SDK
const AWS = require('aws-sdk');
const dynamodb = new AWS.DynamoDB.DocumentClient();
const s3 = new AWS.S3();

// After: Loading only specific clients
const { DynamoDB } = require('@aws-sdk/client-dynamodb');
const { DynamoDBDocument } = require('@aws-sdk/lib-dynamodb');
const { S3 } = require('@aws-sdk/client-s3');

// Initialize clients lazily
let dynamoClient;
let s3Client;

function getDynamoClient() {
  if (!dynamoClient) {
    const client = new DynamoDB({});
    dynamoClient = DynamoDBDocument.from(client);
  }
  return dynamoClient;
}

function getS3Client() {
  if (!s3Client) {
    s3Client = new S3({});
  }
  return s3Client;
}

exports.handler = async (event) => {
  const dynamo = getDynamoClient();
  const s3 = getS3Client();
  
  // Rest of handler logic...
};

I also moved configuration loading to happen lazily on first request:

let config;

function getConfig() {
  if (!config) {
    config = JSON.parse(process.env.CONFIG_JSON);
  }
  return config;
}

These optimizations reduced our cold start time from 1,247ms average to 412ms average—a 67% improvement. Combined with scheduled warming for our medium-traffic endpoints and Provisioned Concurrency for our highest-traffic endpoint, we got cold starts under control.

Unlock Premium Content

You've read 30% of this article

What's in the full article

  • Complete step-by-step implementation guide
  • Working code examples you can copy-paste
  • Advanced techniques and pro tips
  • Common mistakes to avoid
  • Real-world examples and metrics

Join 10,000+ developers who love our premium content

Aaron Vasquez

Aaron Vasquez

Author

Covers DevOps practices, CI/CD pipelines, Kubernetes, and platform engineering. Contributing author at NextGenBeing.

Never Miss an Article

Get our best content delivered to your inbox weekly. No spam, unsubscribe anytime.

Comments (0)

Please log in to leave a comment.

Log In

Related Articles

Don't miss the next deep dive

Get one well-researched tutorial in your inbox each week. No spam, unsubscribe anytime.