Code Quality at Scale: Production Lessons from 50M Requests/Day - NextGenBeing Code Quality at Scale: Production Lessons from 50M Requests/Day - NextGenBeing
Back to discoveries

Code Quality in Production: Hard Lessons from Scaling to 50 Million Requests

Real strategies from debugging production disasters, refactoring legacy systems, and maintaining code quality under pressure at scale

Data Science Premium Content 36 min read
Admin

Admin

May 1, 2026 11 views
Size:
Height:
📖 36 min read 📝 11,531 words 👁 Focus mode: ✨ Eye care:

Listen to Article

Loading...
0:00 / 0:00
0:00 0:00
Low High
0% 100%
⏸ Paused ▶️ Now playing... Ready to play ✓ Finished

Last September, our API started throwing timeout errors at 3 AM. We'd just crossed 10 million requests per day, and suddenly, everything that worked fine at 5 million was falling apart. The culprit? A seemingly innocent method that had grown to 450 lines over 18 months, with nested conditionals seven levels deep. My coworker Jake and I spent the next 48 hours untangling it, and that experience fundamentally changed how I think about code quality.

Here's the thing nobody tells you: code quality isn't about following rules from a book. It's about survival at scale. It's about whether you can debug a production issue at 2 AM without wanting to quit your job. It's about whether the new developer can ship a feature in two days or two weeks.

I've been writing production code for eight years now, across three startups and one enterprise company. I've seen codebases that made me want to cry and ones that made me jealous. I've written code I'm proud of and code I'm deeply ashamed of. And through all of it, I've learned that code quality is less about perfection and more about making deliberate trade-offs that match your constraints.

This post is everything I wish someone had told me five years ago, before I learned it the expensive way. I'm going to share the specific techniques that saved our team when we scaled from 1 million to 50 million requests per day, the refactoring patterns that actually worked in production, and the quality practices that survived contact with real deadlines and real business pressure.

The Real Cost of Poor Code Quality (And Why It's Not What You Think)

Most articles about code quality talk about "technical debt" like it's some abstract concept. Let me make it concrete with numbers from our actual systems.

In Q2 2024, we tracked the time our team spent on different types of work. We had six engineers, and here's what we discovered:

  • Feature development: 35% of our time (about 14 hours per week per engineer)
  • Bug fixes: 28% of our time (11 hours per week)
  • Understanding existing code: 22% of our time (9 hours per week)
  • Refactoring to add features: 15% of our time (6 hours per week)

That "understanding existing code" number shocked me. We were spending nearly two full days per week per engineer just figuring out what the hell our own code did. That's not counting the time spent in code review trying to understand pull requests, or the time spent in meetings explaining how systems worked.

When we dug deeper, we found that 80% of that time was spent on 20% of our codebase. And that 20% had specific characteristics:

  1. Functions longer than 100 lines (average time to understand: 15 minutes)
  2. Classes with more than 10 dependencies (average time to understand: 25 minutes)
  3. Code with no tests (average time to understand: 20 minutes, plus fear)
  4. Clever abstractions that saved 5 lines (average time to understand: 30 minutes, plus anger)

The worst offender was our payment processing service. It was a single 2,400-line file that handled everything from credit card validation to webhook processing to refund logic. Every time we needed to add a payment method, it took a full week. Not because the integration was hard, but because we had to understand this monster first.

Here's what that cost us in real money: Our CTO Sarah calculated that if we could reduce "understanding time" by just 50%, we'd effectively add 1.5 engineers worth of capacity. At our salaries, that's roughly $180,000 per year in opportunity cost. That's the real cost of poor code quality.

The Function That Broke Everything (And What I Learned From It)

Let me show you the actual function that caused our 3 AM incident. I've simplified it slightly, but this is essentially what we had:

public function processOrder($orderId, $userId, $items, $shippingAddress, 
                             $billingAddress, $paymentMethod, $promoCode = null) 
{
    // Validate user
    $user = User::find($userId);
    if (!$user) {
        throw new Exception('User not found');
    }
    if ($user->status !== 'active') {
        throw new Exception('User account is not active');
    }
    if ($user->email_verified_at === null) {
        throw new Exception('Email not verified');
    }
    
    // Validate items
    $total = 0;
    foreach ($items as $item) {
        $product = Product::find($item['product_id']);
        if (!$product) {
            throw new Exception('Product not found: ' . $item['product_id']);
        }
        if ($product->stock < $item['quantity']) {
            throw new Exception('Insufficient stock for: ' . $product->name);
        }
        $total += $product->price * $item['quantity'];
    }
    
    // Apply promo code
    if ($promoCode) {
        $promo = PromoCode::where('code', $promoCode)->first();
        if ($promo && $promo->expires_at > now() && $promo->uses < $promo->max_uses) {
            if ($promo->type === 'percentage') {
                $total = $total * (1 - $promo->value / 100);
            } else if ($promo->type === 'fixed') {
                $total = $total - $promo->value;
            }
            $promo->increment('uses');
        }
    }
    
    // Calculate shipping
    $shippingCost = 0;
    if ($shippingAddress['country'] === 'US') {
        if ($total > 100) {
            $shippingCost = 0;
        } else {
            $shippingCost = 9.99;
        }
    } else {
        $shippingCost = 29.99;
    }
    $total += $shippingCost;
    
    // Process payment
    try {
        if ($paymentMethod['type'] === 'credit_card') {
            $charge = Stripe::charges()->create([
                'amount' => $total * 100,
                'currency' => 'usd',
                'source' => $paymentMethod['token'],
            ]);
        } else if ($paymentMethod['type'] === 'paypal') {
            // PayPal processing logic (another 50 lines)
        }
    } catch (StripeException $e) {
        Log::error('Payment failed: ' . $e->getMessage());
        throw new Exception('Payment processing failed');
    }
    
    // Create order
    $order = Order::create([
        'user_id' => $userId,
        'total' => $total,
        'status' => 'pending',
        'shipping_address' => json_encode($shippingAddress),
        'billing_address' => json_encode($billingAddress),
    ]);
    
    // Create order items
    foreach ($items as $item) {
        OrderItem::create([
            'order_id' => $order->id,
            'product_id' => $item['product_id'],
            'quantity' => $item['quantity'],
            'price' => Product::find($item['product_id'])->price,
        ]);
        
        // Update inventory
        $product = Product::find($item['product_id']);
        $product->decrement('stock', $item['quantity']);
    }
    
    // Send confirmation email
    Mail::to($user->email)->send(new OrderConfirmation($order));
    
    // Update user stats
    $user->increment('total_orders');
    $user->last_order_at = now();
    $user->save();
    
    return $order;
}

This function is 100 lines in my simplified version. The real one was 450 lines. And here's what was wrong with it:

The N+1 Problem: See that Product::find() call inside the foreach loop? At 10 million requests per day, we were hitting the database 50 million extra times per day. Each query took about 5ms, which means we were spending 69 hours of database time per day on unnecessary queries.

The Transaction Problem: Notice there's no database transaction? When the email sending failed (which happened about 0.1% of the time), we'd already charged the customer and decremented inventory. We had about 200 orders per day in this broken state.

The Testing Problem: How do you test this? You need to mock Stripe, mock the mail system, seed the database with users and products, and somehow verify that inventory decremented correctly. Our test for this function was 300 lines and took 8 seconds to run.

The Readability Problem: When I first looked at this code at 3 AM, it took me 20 minutes to understand what it did. And I wrote half of it.

Here's what we did to fix it. This took us three weeks of careful refactoring, but the results were worth it:

// OrderService.php
class OrderService
{
    public function __construct(
        private UserValidator $userValidator,
        private InventoryService $inventoryService,
        private PricingService $pricingService,
        private PaymentGateway $paymentGateway,
        private OrderRepository $orderRepository,
        private NotificationService $notificationService
    ) {}
    
    public function createOrder(CreateOrderRequest $request): Order
    {
        return DB::transaction(function () use ($request) {
            $this->userValidator->validateForCheckout($request->userId);
            
            $items = $this->inventoryService->reserveItems($request->items);
            
            $pricing = $this->pricingService->calculateTotal(
                $items,
                $request->shippingAddress,
                $request->promoCode
            );
            
            $payment = $this->paymentGateway->charge(
                $pricing->total,
                $request->paymentMethod
            );
            
            $order = $this->orderRepository->create(
                $request->userId,
                $items,
                $pricing,
                $payment
            );
            
            $this->notificationService->sendOrderConfirmation($order);
            
            return $order;
        });
    }
}

The difference is night and day. Each service class is 50-100 lines and does one thing. The transaction ensures data consistency. The dependencies are explicit and mockable. And the main method reads like a story: validate user, reserve items, calculate pricing, charge payment, create order, send notification.

Here are the actual performance improvements we measured:

Before refactoring:

  • Average response time: 850ms
  • Database queries per request: 47
  • Memory usage: 12MB per request
  • Test execution time: 8.2 seconds
  • Time to add new payment method: 5 days

After refactoring:

  • Average response time: 180ms (78% improvement)
  • Database queries per request: 8 (83% reduction)
  • Memory usage: 4MB per request (67% reduction)
  • Test execution time: 1.4 seconds (83% improvement)
  • Time to add new payment method: 4 hours (96% improvement)

But the most important metric was this: when we had another payment gateway issue two months later, I fixed it in 15 minutes instead of 2 hours. That's the real value of code quality.

The Single Responsibility Principle (And Why Most Explanations Miss the Point)

Every article about code quality mentions the Single Responsibility Principle. Most of them explain it terribly. They say things like "a class should only have one reason to change" which sounds profound but doesn't help you write actual code.

Here's how I actually think about it: A piece of code should be about one thing, and when you read it, you shouldn't be surprised by what it does.

Let me show you what I mean with a real example from our codebase. This was our original UserController:

class UserController extends Controller
{
    public function update(Request $request, $id)
    {
        $user = User::findOrFail($id);
        
        // Update profile
        $user->name = $request->name;
        $user->email = $request->email;
        $user->bio = $request->bio;
        
        // Handle avatar upload
        if ($request->hasFile('avatar')) {
            $path = $request->file('avatar')->store('avatars', 's3');
            $user->avatar_url = Storage::disk('s3')->url($path);
            
            // Delete old avatar
            if ($user->getOriginal('avatar_url')) {
                $oldPath = parse_url($user->getOriginal('avatar_url'), PHP_URL_PATH);
                Storage::disk('s3')->delete($oldPath);
            }
        }
        
        // Update email subscription preferences
        if ($request->has('email_notifications')) {
            $user->email_notifications = $request->email_notifications;
            
            // Sync with Mailchimp
            $mailchimp = new MailchimpMarketing\ApiClient();
            $mailchimp->setConfig([
                'apiKey' => config('services.mailchimp.key'),
                'server' => config('services.mailchimp.server'),
            ]);
            
            try {
                $mailchimp->lists->setListMember(
                    config('services.mailchimp.list_id'),
                    md5(strtolower($user->email)),
                    [
                        'email_address' => $user->email,
                        'status_if_new' => 'subscribed',
                        'merge_fields' => [
                            'FNAME' => $user->first_name,
                            'LNAME' => $user->last_name,
                        ],
                    ]
                );
            } catch (Exception $e) {
                Log::error('Mailchimp sync failed: ' . $e->getMessage());
            }
        }
        
        // Update Stripe customer
        if ($request->has('payment_method')) {
            try {
                \Stripe\Stripe::setApiKey(config('services.stripe.secret'));
                
                if (!$user->stripe_customer_id) {
                    $customer = \Stripe\Customer::create([
                        'email' => $user->email,
                        'name' => $user->name,
                        'payment_method' => $request->payment_method,
                        'invoice_settings' => [
                            'default_payment_method' => $request->payment_method,
                        ],
                    ]);
                    $user->stripe_customer_id = $customer->id;
                } else {
                    \Stripe\Customer::update($user->stripe_customer_id, [
                        'payment_method' => $request->payment_method,
                        'invoice_settings' => [
                            'default_payment_method' => $request->payment_method,
                        ],
                    ]);
                }
            } catch (\Stripe\Exception\ApiErrorException $e) {
                return response()->json(['error' => 'Payment method update failed'], 422);
            }
        }
        
        $user->save();
        
        // Log activity
        Activity::create([
            'user_id' => $user->id,
            'type' => 'profile_updated',
            'metadata' => $request->all(),
        ]);
        
        return response()->json($user);
    }
}

This controller does seven different things:

  1. Updates user profile data
  2. Handles file uploads to S3
  3. Manages email subscriptions with Mailchimp
  4. Updates Stripe customer records
  5. Saves the user model
  6. Logs activity
  7. Returns a JSON response

When I needed to debug why Stripe updates were failing, I had to read through 150 lines of code about avatars and email subscriptions. When I needed to change how we handled avatar uploads, I had to worry about breaking payment processing. This is what "multiple responsibilities" actually means in practice.

Here's what we refactored it to:

class UserController extends Controller
{
    public function __construct(
        private UserService $userService,
        private AvatarService $avatarService,
        private EmailSubscriptionService $emailService,
        private PaymentMethodService $paymentService
    ) {}
    
    public function update(UpdateUserRequest $request, User $user): JsonResponse
    {
        DB::transaction(function () use ($request, $user) {
            $this->userService->updateProfile($user, $request->validated());
            
            if ($request->hasFile('avatar')) {
                $this->avatarService->update($user, $request->file('avatar'));
            }
            
            if ($request->has('email_notifications')) {
                $this->emailService->updatePreferences($user, $request->email_notifications);
            }
            
            if ($request->has('payment_method')) {
                $this->paymentService->updateMethod($user, $request->payment_method);
            }
        });
        
        return response()->json($user->fresh());
    }
}

Now when the Stripe integration fails, I open PaymentMethodService and I know exactly where to look. When we need to switch from S3 to Cloudflare R2, I only touch AvatarService.

Unlock Premium Content

You've read 30% of this article

What's in the full article

  • Complete step-by-step implementation guide
  • Working code examples you can copy-paste
  • Advanced techniques and pro tips
  • Common mistakes to avoid
  • Real-world examples and metrics

Join 10,000+ developers who love our premium content

Never Miss an Article

Get our best content delivered to your inbox weekly. No spam, unsubscribe anytime.

Comments (0)

Please log in to leave a comment.

Log In

Related Articles