Building a SaaS with LangChain: Architecture and Scaling

Building a Software-as-a-Service (SaaS) application with LangChain presents unique challenges beyond typical web applications. You’re not just dealing with user authentication and data storage – you’re managing AI model costs, rate limiting, tenant isolation, and complex billing based on token usage. This guide walks through building a production-ready LangChain SaaS architecture that can scale from your first customer to 10,000 and beyond.

Architecture Overview
Multi-Tenant Design Patterns
API Gateway and Rate Limiting
Billing Integration with Stripe
Usage Tracking and Quotas
Tenant Isolation Strategies
Scaling from 0 to 10k Customers
Complete Example Application
Deployment and Operations
Lessons Learned and Best Practices

Architecture Overview

A production LangChain SaaS requires careful consideration of multiple layers. Here’s the high-level architecture that has proven successful for scaling AI applications:

graph TB
    subgraph "Client Layer"
        WEB[Web App]
        API[API Clients]
        SDK[SDKs]
    end

    subgraph "API Gateway"
        GW[Kong/AWS API Gateway]
        AUTH[Auth Service]
        RATE[Rate Limiter]
    end

    subgraph "Application Layer"
        APP1[App Server 1]
        APP2[App Server 2]
        APP3[App Server N]
        QUEUE[Job Queue]
    end

    subgraph "AI Layer"
        LC[LangChain Service]
        CACHE[Vector Cache]
        EMB[Embeddings Service]
    end

    subgraph "Data Layer"
        PG[(PostgreSQL)]
        REDIS[(Redis)]
        S3[(S3/Object Storage)]
        VECTOR[(Vector DB)]
    end

    subgraph "Monitoring"
        LOG[Logging]
        METRIC[Metrics]
        TRACE[Tracing]
    end

    WEB --> GW
    API --> GW
    SDK --> GW

    GW --> AUTH
    GW --> RATE
    GW --> APP1
    GW --> APP2
    GW --> APP3

    APP1 --> LC
    APP2 --> LC
    APP3 --> LC

    APP1 --> QUEUE
    APP2 --> QUEUE
    APP3 --> QUEUE

    LC --> CACHE
    LC --> EMB
    LC --> VECTOR

    APP1 --> PG
    APP2 --> PG
    APP3 --> PG

    APP1 --> REDIS
    APP2 --> REDIS
    APP3 --> REDIS

    LC --> S3

    APP1 --> LOG
    APP2 --> LOG
    APP3 --> LOG
    LC --> METRIC

Key Components

API Gateway: Central entry point handling authentication, rate limiting, and request routing
Application Servers: Stateless Node.js/Python servers running your business logic
LangChain Service: Dedicated service layer for AI operations
Data Stores: PostgreSQL for relational data, Redis for caching, S3 for documents, Vector DB for embeddings
Monitoring Stack: Comprehensive logging, metrics, and distributed tracing

Multi-Tenant Design Patterns

Multi-tenancy is crucial for SaaS applications. With LangChain, you need to consider both data isolation and AI resource isolation.

Database-Level Isolation

// @filename: index.ts
// Database schema with tenant isolation
interface TenantSchema {
  id: string
  name: string
  plan: 'starter' | 'professional' | 'enterprise'
  settings: {
    maxTokensPerMonth: number
    maxConcurrentRequests: number
    allowedModels: string[]
    customPrompts: boolean
    dataRetentionDays: number
  }
  createdAt: Date
  updatedAt: Date
}

interface UserSchema {
  id: string
  tenantId: string // Foreign key to tenant
  email: string
  role: 'admin' | 'user' | 'viewer'
  apiKeys: ApiKey[]
}

interface ConversationSchema {
  id: string
  tenantId: string // Ensures data isolation
  userId: string
  messages: Message[]
  tokenUsage: {
    promptTokens: number
    completionTokens: number
    totalCost: number
  }
  metadata: Record<string, any>
  createdAt: Date
}

Application-Level Tenant Context

// @filename: index.ts
// Middleware for tenant context injection
export class TenantContextMiddleware {
  async use(req: Request, res: Response, next: NextFunction) {
    try {
      // Extract tenant from JWT or API key
      const tenantId = await this.extractTenantId(req);

      if (!tenantId) {
        return res.status(401).json({ error: 'Invalid tenant context' });
      }

      // Load tenant configuration
      const tenant = await this.tenantService.getTenant(tenantId);

      // Inject tenant context
      req.context = {
        tenantId: tenant.id,
        tenant: tenant,
        limits: {
          maxTokens: tenant.settings.maxTokensPerMonth,
          remainingTokens: await this.getRemaining Tokens(tenant.id),
          concurrentRequests: tenant.settings.maxConcurrentRequests
        }
      };

      next();
    } catch (error) {
      res.status(500).json({ error: 'Failed to establish tenant context' });
    }
  }

  private async extractTenantId(req: Request): Promise<string | null> {
    // Check API key header
    const apiKey = req.headers['x-api-key'];
    if (apiKey) {
      return await this.tenantService.getTenantIdFromApiKey(apiKey);
    }

    // Check JWT token
    const token = req.headers.authorization?.split(' ')[1];
    if (token) {
      const decoded = jwt.verify(token, process.env.JWT_SECRET);
      return decoded.tenantId;
    }

    return null;
  }
}

LangChain Tenant Isolation

// @filename: index.ts
// Tenant-aware LangChain service
export class TenantLangChainService {
  private chains: Map<string, ConversationChain> = new Map()

  async getChain(tenantId: string): Promise<ConversationChain> {
    // Check if chain exists for tenant
    if (this.chains.has(tenantId)) {
      return this.chains.get(tenantId)!
    }

    // Load tenant-specific configuration
    const config = await this.loadTenantConfig(tenantId)

    // Create tenant-specific LLM instance
    const llm = new ChatOpenAI({
      openAIApiKey: config.apiKey || process.env.OPENAI_API_KEY,
      modelName: config.model || 'gpt-3.5-turbo',
      temperature: config.temperature || 0.7,
      maxTokens: config.maxTokens || 1000,
      callbacks: [
        new TokenUsageCallback(tenantId),
        new TenantRateLimitCallback(tenantId),
      ],
    })

    // Create tenant-specific vector store
    const vectorStore = await this.createTenantVectorStore(tenantId)

    // Create conversation chain with memory
    const memory = new BufferMemory({
      memoryKey: 'chat_history',
      returnMessages: true,
      inputKey: 'question',
      outputKey: 'answer',
    })

    const chain = new ConversationalRetrievalQAChain({
      llm,
      vectorStore,
      memory,
      returnSourceDocuments: true,
    })

    this.chains.set(tenantId, chain)
    return chain
  }

  private async createTenantVectorStore(
    tenantId: string
  ): Promise<VectorStore> {
    // Create isolated vector store namespace
    return new PineconeStore({
      pineconeIndex: this.pineconeIndex,
      namespace: `tenant_${tenantId}`,
      textKey: 'text',
      embeddingModel: new OpenAIEmbeddings({
        openAIApiKey: process.env.OPENAI_API_KEY,
      }),
    })
  }
}

API Gateway and Rate Limiting

A robust API gateway is essential for managing multi-tenant traffic and enforcing limits.

Rate Limiting Strategy

// @filename: index.ts
// Rate limiting configuration per tenant tier
const rateLimitConfig = {
  starter: {
    windowMs: 60 * 1000, // 1 minute
    maxRequests: 10,
    maxTokensPerMinute: 10000,
    maxConcurrent: 2,
  },
  professional: {
    windowMs: 60 * 1000,
    maxRequests: 60,
    maxTokensPerMinute: 50000,
    maxConcurrent: 5,
  },
  enterprise: {
    windowMs: 60 * 1000,
    maxRequests: 600,
    maxTokensPerMinute: 500000,
    maxConcurrent: 20,
  },
}

// Redis-based rate limiter
export class TenantRateLimiter {
  constructor(private redis: Redis) {}

  async checkLimit(
    tenantId: string,
    type: 'request' | 'token',
    amount: number = 1
  ): Promise<RateLimitResult> {
    const tenant = await this.getTenant(tenantId)
    const config = rateLimitConfig[tenant.plan]

    const key = `rate_limit:${tenantId}:${type}`
    const window = config.windowMs
    const limit =
      type === 'request' ? config.maxRequests : config.maxTokensPerMinute

    // Sliding window implementation
    const now = Date.now()
    const windowStart = now - window

    // Remove old entries
    await this.redis.zremrangebyscore(key, '-inf', windowStart)

    // Count current usage
    const currentUsage = await this.redis.zcard(key)

    if (currentUsage + amount > limit) {
      return {
        allowed: false,
        limit,
        remaining: Math.max(0, limit - currentUsage),
        resetAt: new Date(now + window),
      }
    }

    // Add new entry
    await this.redis.zadd(key, now, `${now}:${amount}`)
    await this.redis.expire(key, Math.ceil(window / 1000))

    return {
      allowed: true,
      limit,
      remaining: limit - currentUsage - amount,
      resetAt: new Date(now + window),
    }
  }

  async checkConcurrent(tenantId: string): Promise<boolean> {
    const tenant = await this.getTenant(tenantId)
    const config = rateLimitConfig[tenant.plan]

    const key = `concurrent:${tenantId}`
    const current = await this.redis.get(key)

    if (parseInt(current || '0') >= config.maxConcurrent) {
      return false
    }

    await this.redis.incr(key)
    await this.redis.expire(key, 300) // 5 minute expiry

    return true
  }

  async releaseConcurrent(tenantId: string): Promise<void> {
    const key = `concurrent:${tenantId}`
    await this.redis.decr(key)
  }
}

API Gateway Implementation

// @filename: index.ts
// Express middleware for API gateway
export class ApiGateway {
  constructor(
    private rateLimiter: TenantRateLimiter,
    private usageTracker: UsageTracker
  ) {}

  async handleRequest(req: Request, res: Response, next: NextFunction) {
    const tenantId = req.context.tenantId

    // Check request rate limit
    const requestLimit = await this.rateLimiter.checkLimit(tenantId, 'request')
    if (!requestLimit.allowed) {
      return res.status(429).json({
        error: 'Rate limit exceeded',
        retryAfter: requestLimit.resetAt,
      })
    }

    // Check concurrent request limit
    const canProceed = await this.rateLimiter.checkConcurrent(tenantId)
    if (!canProceed) {
      return res.status(429).json({
        error: 'Concurrent request limit exceeded',
      })
    }

    // Track request
    const requestId = uuidv4()
    req.context.requestId = requestId

    // Set rate limit headers
    res.setHeader('X-RateLimit-Limit', requestLimit.limit)
    res.setHeader('X-RateLimit-Remaining', requestLimit.remaining)
    res.setHeader('X-RateLimit-Reset', requestLimit.resetAt.toISOString())

    // Handle response completion
    res.on('finish', async () => {
      await this.rateLimiter.releaseConcurrent(tenantId)

      // Track usage if LangChain was used
      if (req.context.tokenUsage) {
        await this.usageTracker.trackUsage({
          tenantId,
          requestId,
          tokens: req.context.tokenUsage,
          cost: req.context.cost,
          timestamp: new Date(),
        })
      }
    })

    next()
  }
}

Billing Integration with Stripe

Integrating billing requires careful tracking of usage and flexible pricing models.

Stripe Setup and Price Models

// @filename: index.ts
// Stripe product and price configuration
export const stripePricing = {
  products: {
    starter: 'prod_starter123',
    professional: 'prod_prof456',
    enterprise: 'prod_ent789',
  },
  prices: {
    starter: {
      monthly: 'price_starter_monthly',
      usage: {
        tokens: 'price_starter_tokens', // $0.01 per 1k tokens after included
        documents: 'price_starter_docs', // $0.10 per document after included
      },
    },
    professional: {
      monthly: 'price_prof_monthly',
      usage: {
        tokens: 'price_prof_tokens', // $0.008 per 1k tokens
        documents: 'price_prof_docs', // $0.08 per document
      },
    },
    enterprise: {
      monthly: 'price_ent_monthly',
      usage: {
        tokens: 'price_ent_tokens', // $0.006 per 1k tokens
        documents: 'price_ent_docs', // $0.06 per document
      },
    },
  },
}

// Billing service implementation
export class BillingService {
  private stripe: Stripe

  constructor() {
    this.stripe = new Stripe(process.env.STRIPE_SECRET_KEY!, {
      apiVersion: '2023-10-16',
    })
  }

  async createCustomer(tenant: TenantSchema): Promise<string> {
    const customer = await this.stripe.customers.create({
      name: tenant.name,
      email: tenant.billingEmail,
      metadata: {
        tenantId: tenant.id,
        plan: tenant.plan,
      },
    })

    return customer.id
  }

  async createSubscription(
    tenantId: string,
    plan: string
  ): Promise<Stripe.Subscription> {
    const tenant = await this.getTenant(tenantId)

    // Create subscription with base plan
    const subscription = await this.stripe.subscriptions.create({
      customer: tenant.stripeCustomerId,
      items: [
        {
          price: stripePricing.prices[plan].monthly,
        },
        {
          price: stripePricing.prices[plan].usage.tokens,
          quantity: 0, // Usage-based, will be reported later
        },
        {
          price: stripePricing.prices[plan].usage.documents,
          quantity: 0,
        },
      ],
      metadata: {
        tenantId,
      },
    })

    return subscription
  }

  async reportUsage(tenantId: string, usage: UsageReport): Promise<void> {
    const tenant = await this.getTenant(tenantId)
    const subscription = await this.getActiveSubscription(
      tenant.stripeCustomerId
    )

    // Find usage-based subscription items
    const tokenItem = subscription.items.data.find(
      (item) => item.price.id === stripePricing.prices[tenant.plan].usage.tokens
    )

    const docItem = subscription.items.data.find(
      (item) =>
        item.price.id === stripePricing.prices[tenant.plan].usage.documents
    )

    // Report token usage
    if (tokenItem && usage.tokens > 0) {
      await this.stripe.subscriptionItems.createUsageRecord(tokenItem.id, {
        quantity: Math.ceil(usage.tokens / 1000), // Billed per 1k tokens
        timestamp: Math.floor(usage.timestamp.getTime() / 1000),
        action: 'increment',
      })
    }

    // Report document usage
    if (docItem && usage.documents > 0) {
      await this.stripe.subscriptionItems.createUsageRecord(docItem.id, {
        quantity: usage.documents,
        timestamp: Math.floor(usage.timestamp.getTime() / 1000),
        action: 'increment',
      })
    }
  }
}

Webhook Handling

// @filename: index.ts
// Stripe webhook handler
export class StripeWebhookHandler {
  async handleWebhook(req: Request, res: Response) {
    const sig = req.headers['stripe-signature'] as string
    let event: Stripe.Event

    try {
      event = this.stripe.webhooks.constructEvent(
        req.body,
        sig,
        process.env.STRIPE_WEBHOOK_SECRET!
      )
    } catch (err) {
      return res.status(400).send(`Webhook Error: ${err.message}`)
    }

    switch (event.type) {
      case 'customer.subscription.created':
      case 'customer.subscription.updated':
        await this.handleSubscriptionChange(
          event.data.object as Stripe.Subscription
        )
        break

      case 'customer.subscription.deleted':
        await this.handleSubscriptionCancellation(
          event.data.object as Stripe.Subscription
        )
        break

      case 'invoice.payment_succeeded':
        await this.handlePaymentSuccess(event.data.object as Stripe.Invoice)
        break

      case 'invoice.payment_failed':
        await this.handlePaymentFailure(event.data.object as Stripe.Invoice)
        break
    }

    res.json({ received: true })
  }

  private async handleSubscriptionChange(subscription: Stripe.Subscription) {
    const tenantId = subscription.metadata.tenantId

    // Update tenant plan based on subscription
    const plan = this.extractPlanFromSubscription(subscription)
    await this.tenantService.updatePlan(tenantId, plan)

    // Update limits
    await this.updateTenantLimits(tenantId, plan)
  }

  private async handlePaymentFailure(invoice: Stripe.Invoice) {
    const tenantId = invoice.subscription_details?.metadata?.tenantId

    if (tenantId) {
      // Suspend tenant after grace period
      await this.tenantService.scheduleSupension(tenantId, 7) // 7 day grace period

      // Send notification
      await this.notificationService.sendPaymentFailureNotification(tenantId)
    }
  }
}

Usage Tracking and Quotas

Accurate usage tracking is critical for billing and enforcing quotas.

Token Usage Tracking

// @filename: index.ts
// Comprehensive usage tracking system
export class UsageTracker {
  constructor(
    private db: Database,
    private redis: Redis,
    private billing: BillingService
  ) {}

  async trackTokenUsage(params: {
    tenantId: string
    userId: string
    requestId: string
    promptTokens: number
    completionTokens: number
    model: string
    cost: number
  }): Promise<void> {
    const timestamp = new Date()

    // Store detailed usage record
    await this.db.usage.create({
      ...params,
      totalTokens: params.promptTokens + params.completionTokens,
      timestamp,
    })

    // Update real-time counters in Redis
    const dailyKey = `usage:${params.tenantId}:daily:${this.getDateKey()}`
    const monthlyKey = `usage:${params.tenantId}:monthly:${this.getMonthKey()}`

    const pipeline = this.redis.pipeline()

    // Increment counters
    pipeline.hincrby(
      dailyKey,
      'tokens',
      params.promptTokens + params.completionTokens
    )
    pipeline.hincrby(dailyKey, 'requests', 1)
    pipeline.hincrbyfloat(dailyKey, 'cost', params.cost)

    pipeline.hincrby(
      monthlyKey,
      'tokens',
      params.promptTokens + params.completionTokens
    )
    pipeline.hincrby(monthlyKey, 'requests', 1)
    pipeline.hincrbyfloat(monthlyKey, 'cost', params.cost)

    // Set expiry
    pipeline.expire(dailyKey, 60 * 60 * 24 * 7) // 7 days
    pipeline.expire(monthlyKey, 60 * 60 * 24 * 35) // 35 days

    await pipeline.exec()

    // Check quotas
    await this.checkAndEnforceQuotas(params.tenantId)
  }

  async checkAndEnforceQuotas(tenantId: string): Promise<QuotaStatus> {
    const tenant = await this.getTenant(tenantId)
    const monthlyUsage = await this.getMonthlyUsage(tenantId)

    const quotaStatus: QuotaStatus = {
      tokensUsed: monthlyUsage.tokens,
      tokensLimit: tenant.settings.maxTokensPerMonth,
      tokensRemaining: Math.max(
        0,
        tenant.settings.maxTokensPerMonth - monthlyUsage.tokens
      ),
      percentUsed:
        (monthlyUsage.tokens / tenant.settings.maxTokensPerMonth) * 100,
      willExceedAt: this.predictExceedance(
        monthlyUsage,
        tenant.settings.maxTokensPerMonth
      ),
    }

    // Send alerts at thresholds
    if (quotaStatus.percentUsed >= 80 && !tenant.alerts.sent80) {
      await this.sendQuotaAlert(tenantId, 80)
    }

    if (quotaStatus.percentUsed >= 90 && !tenant.alerts.sent90) {
      await this.sendQuotaAlert(tenantId, 90)
    }

    if (quotaStatus.percentUsed >= 100) {
      await this.enforceQuotaLimit(tenantId)
    }

    return quotaStatus
  }

  private async enforceQuotaLimit(tenantId: string) {
    // Set quota exceeded flag
    await this.redis.set(`quota_exceeded:${tenantId}`, '1', 'EX', 3600)

    // Notify tenant
    await this.notificationService.sendQuotaExceededNotification(tenantId)

    // Log event
    await this.auditLog.log({
      tenantId,
      event: 'quota_exceeded',
      timestamp: new Date(),
    })
  }
}

Document and Storage Tracking

// @filename: index.ts
// Document usage tracking
export class DocumentTracker {
  async trackDocument(params: {
    tenantId: string
    documentId: string
    size: number
    type: string
    operation: 'upload' | 'process' | 'delete'
  }): Promise<void> {
    // Record document operation
    await this.db.documentOperations.create(params)

    // Update storage metrics
    if (params.operation === 'upload') {
      await this.redis.hincrby(
        `storage:${params.tenantId}`,
        'totalBytes',
        params.size
      )
      await this.redis.hincrby(`storage:${params.tenantId}`, 'documentCount', 1)
    } else if (params.operation === 'delete') {
      await this.redis.hincrby(
        `storage:${params.tenantId}`,
        'totalBytes',
        -params.size
      )
      await this.redis.hincrby(
        `storage:${params.tenantId}`,
        'documentCount',
        -1
      )
    }

    // Check storage quotas
    await this.checkStorageQuotas(params.tenantId)
  }

  async getStorageMetrics(tenantId: string): Promise<StorageMetrics> {
    const data = await this.redis.hgetall(`storage:${tenantId}`)

    return {
      totalBytes: parseInt(data.totalBytes || '0'),
      documentCount: parseInt(data.documentCount || '0'),
      averageSize: data.documentCount
        ? parseInt(data.totalBytes || '0') / parseInt(data.documentCount || '1')
        : 0,
    }
  }
}

Tenant Isolation Strategies

Ensuring complete isolation between tenants is crucial for security and compliance.

Vector Store Isolation

// @filename: index.ts
// Isolated vector stores per tenant
export class TenantVectorStoreManager {
  private vectorStores: Map<string, VectorStore> = new Map()

  async getVectorStore(tenantId: string): Promise<VectorStore> {
    if (this.vectorStores.has(tenantId)) {
      return this.vectorStores.get(tenantId)!
    }

    // Create isolated namespace in Pinecone
    const vectorStore = await PineconeStore.fromExistingIndex(
      new OpenAIEmbeddings({
        openAIApiKey: process.env.OPENAI_API_KEY,
      }),
      {
        pineconeIndex: this.pineconeIndex,
        namespace: `tenant_${tenantId}`, // Isolated namespace
        filter: { tenantId }, // Additional filter for safety
      }
    )

    this.vectorStores.set(tenantId, vectorStore)
    return vectorStore
  }

  async addDocuments(tenantId: string, documents: Document[]): Promise<void> {
    const vectorStore = await this.getVectorStore(tenantId)

    // Add tenant metadata to all documents
    const taggedDocuments = documents.map((doc) => ({
      ...doc,
      metadata: {
        ...doc.metadata,
        tenantId,
        indexedAt: new Date().toISOString(),
      },
    }))

    await vectorStore.addDocuments(taggedDocuments)

    // Track document count
    await this.documentTracker.trackDocuments({
      tenantId,
      count: documents.length,
      operation: 'index',
    })
  }

  async search(
    tenantId: string,
    query: string,
    k: number = 4
  ): Promise<Document[]> {
    const vectorStore = await this.getVectorStore(tenantId)

    // Search with tenant filter
    const results = await vectorStore.similaritySearch(
      query,
      k,
      { tenantId } // Ensure tenant isolation
    )

    return results
  }
}

Memory and Cache Isolation

// @filename: index.ts
// Tenant-isolated memory management
export class TenantMemoryManager {
  private memories: Map<string, BaseMemory> = new Map()

  getMemoryKey(tenantId: string, conversationId: string): string {
    return `${tenantId}:${conversationId}`
  }

  async getMemory(
    tenantId: string,
    conversationId: string
  ): Promise<BufferMemory> {
    const key = this.getMemoryKey(tenantId, conversationId)

    if (this.memories.has(key)) {
      return this.memories.get(key) as BufferMemory
    }

    // Load memory from Redis with tenant isolation
    const memory = new BufferMemory({
      returnMessages: true,
      memoryKey: 'chat_history',
      chatHistory: new RedisChatMessageHistory({
        sessionId: key,
        client: this.redis,
        keyPrefix: `memory:${tenantId}:`, // Tenant prefix
      }),
    })

    this.memories.set(key, memory)
    return memory
  }

  async clearMemory(tenantId: string, conversationId: string): Promise<void> {
    const key = this.getMemoryKey(tenantId, conversationId)

    // Clear from cache
    this.memories.delete(key)

    // Clear from Redis
    await this.redis.del(`memory:${tenantId}:${conversationId}`)
  }

  async clearAllTenantMemories(tenantId: string): Promise<void> {
    // Find all memories for tenant
    const keys = await this.redis.keys(`memory:${tenantId}:*`)

    if (keys.length > 0) {
      await this.redis.del(...keys)
    }

    // Clear from local cache
    for (const [key, _] of this.memories) {
      if (key.startsWith(tenantId)) {
        this.memories.delete(key)
      }
    }
  }
}

Scaling from 0 to 10k Customers

Scaling a LangChain SaaS requires careful planning at each growth stage.

Stage 1: 0-100 Customers (MVP)

// @filename: server.js
// Simple architecture for early stage
export class MVPArchitecture {
  // Single server setup
  async initialize() {
    const app = express()

    // Basic middleware
    app.use(express.json())
    app.use(cors())

    // Simple in-memory rate limiting
    const rateLimiter = rateLimit({
      windowMs: 15 * 60 * 1000, // 15 minutes
      max: 100, // limit each IP to 100 requests per windowMs
    })

    app.use('/api/', rateLimiter)

    // Single LangChain instance
    const llm = new ChatOpenAI({
      modelName: 'gpt-3.5-turbo',
      temperature: 0.7,
    })

    // Single vector store
    const vectorStore = await HNSWLib.fromTexts(
      [''],
      [{}],
      new OpenAIEmbeddings()
    )

    // Basic API endpoints
    app.post('/api/chat', async (req, res) => {
      try {
        const { message, tenantId } = req.body

        // Simple tenant check
        const tenant = await db.tenants.findUnique({ where: { id: tenantId } })
        if (!tenant) {
          return res.status(404).json({ error: 'Tenant not found' })
        }

        // Process with LangChain
        const response = await llm.call([new HumanMessage(message)])

        // Track usage
        await db.usage.create({
          data: {
            tenantId,
            tokens: response.llmOutput?.tokenUsage?.totalTokens || 0,
            cost: calculateCost(response.llmOutput?.tokenUsage),
          },
        })

        res.json({ response: response.content })
      } catch (error) {
        res.status(500).json({ error: error.message })
      }
    })

    app.listen(3000)
  }
}

Stage 2: 100-1000 Customers (Growth)

// @filename: app.js
// Architecture for growth stage
export class GrowthArchitecture {
  async initialize() {
    // Load balancer with multiple app servers
    const cluster = require('cluster')
    const numCPUs = require('os').cpus().length

    if (cluster.isMaster) {
      // Fork workers
      for (let i = 0; i < numCPUs; i++) {
        cluster.fork()
      }

      cluster.on('exit', (worker, code, signal) => {
        console.log(`Worker ${worker.process.pid} died`)
        cluster.fork() // Replace dead workers
      })
    } else {
      // Worker process
      const app = express()

      // Redis for distributed rate limiting
      const redisClient = new Redis({
        host: process.env.REDIS_HOST,
        port: process.env.REDIS_PORT,
        password: process.env.REDIS_PASSWORD,
      })

      // Distributed rate limiter
      const rateLimiter = new RateLimiterRedis({
        storeClient: redisClient,
        keyPrefix: 'rl',
        points: 100,
        duration: 900, // 15 minutes
      })

      // Connection pooling for databases
      const pgPool = new Pool({
        connectionString: process.env.DATABASE_URL,
        max: 20,
        idleTimeoutMillis: 30000,
        connectionTimeoutMillis: 2000,
      })

      // Shared vector store with connection pooling
      const pinecone = new PineconeClient()
      await pinecone.init({
        apiKey: process.env.PINECONE_API_KEY,
        environment: process.env.PINECONE_ENV,
      })

      // Queue for heavy operations
      const bullQueue = new Bull('langchain-jobs', {
        redis: {
          host: process.env.REDIS_HOST,
          port: process.env.REDIS_PORT,
          password: process.env.REDIS_PASSWORD,
        },
      })

      // Process jobs in background
      bullQueue.process(async (job) => {
        const { type, data } = job.data

        switch (type) {
          case 'process_documents':
            await processDocuments(data)
            break
          case 'generate_embeddings':
            await generateEmbeddings(data)
            break
        }
      })

      app.listen(3000)
    }
  }
}

Stage 3: 1000-10k Customers (Scale)

// @filename: index.ts
// Enterprise-grade architecture
export class EnterpriseArchitecture {
  async initialize() {
    // Kubernetes deployment configuration
    const k8sDeployment = {
      apiVersion: 'apps/v1',
      kind: 'Deployment',
      metadata: {
        name: 'langchain-saas-api',
        labels: {
          app: 'langchain-saas',
        },
      },
      spec: {
        replicas: 10, // Start with 10 replicas
        selector: {
          matchLabels: {
            app: 'langchain-saas',
          },
        },
        template: {
          metadata: {
            labels: {
              app: 'langchain-saas',
            },
          },
          spec: {
            containers: [
              {
                name: 'api',
                image: 'langchain-saas:latest',
                ports: [
                  {
                    containerPort: 3000,
                  },
                ],
                resources: {
                  requests: {
                    memory: '2Gi',
                    cpu: '1000m',
                  },
                  limits: {
                    memory: '4Gi',
                    cpu: '2000m',
                  },
                },
                env: [
                  {
                    name: 'NODE_ENV',
                    value: 'production',
                  },
                  {
                    name: 'DATABASE_URL',
                    valueFrom: {
                      secretKeyRef: {
                        name: 'database-secret',
                        key: 'url',
                      },
                    },
                  },
                ],
              },
            ],
          },
        },
      },
    }

    // Horizontal Pod Autoscaler
    const hpa = {
      apiVersion: 'autoscaling/v2',
      kind: 'HorizontalPodAutoscaler',
      metadata: {
        name: 'langchain-saas-hpa',
      },
      spec: {
        scaleTargetRef: {
          apiVersion: 'apps/v1',
          kind: 'Deployment',
          name: 'langchain-saas-api',
        },
        minReplicas: 10,
        maxReplicas: 100,
        metrics: [
          {
            type: 'Resource',
            resource: {
              name: 'cpu',
              target: {
                type: 'Utilization',
                averageUtilization: 70,
              },
            },
          },
          {
            type: 'Resource',
            resource: {
              name: 'memory',
              target: {
                type: 'Utilization',
                averageUtilization: 80,
              },
            },
          },
        ],
      },
    }

    // Multi-region database setup
    const databaseConfig = {
      primary: {
        host: 'db-primary.us-east-1.rds.amazonaws.com',
        database: 'langchain_saas',
        max: 100,
        idleTimeoutMillis: 30000,
      },
      replicas: [
        {
          host: 'db-replica-1.us-west-2.rds.amazonaws.com',
          database: 'langchain_saas',
          max: 50,
          idleTimeoutMillis: 30000,
        },
        {
          host: 'db-replica-2.eu-west-1.rds.amazonaws.com',
          database: 'langchain_saas',
          max: 50,
          idleTimeoutMillis: 30000,
        },
      ],
    }

    // Global CDN for static assets
    const cdnConfig = {
      provider: 'cloudflare',
      zones: ['us', 'eu', 'asia'],
      caching: {
        'api/embeddings': 3600, // 1 hour
        'api/documents': 86400, // 24 hours
      },
    }
  }
}

Complete Example Application

Here’s a complete example of a production-ready LangChain SaaS application:

// @filename: server.js
// Main application entry point
import express from 'express'
import { ChatOpenAI } from 'langchain/chat_models/openai'
import { ConversationalRetrievalQAChain } from 'langchain/chains'
import { PineconeStore } from 'langchain/vectorstores/pinecone'
import { OpenAIEmbeddings } from 'langchain/embeddings/openai'
import { Document } from 'langchain/document'
import Bull from 'bull'
import Stripe from 'stripe'
import { createClient } from 'redis'
import { Pool } from 'pg'

// Initialize services
const app = express()
const stripe = new Stripe(process.env.STRIPE_SECRET_KEY!)
const redis = createClient({ url: process.env.REDIS_URL })
const pgPool = new Pool({ connectionString: process.env.DATABASE_URL })
const jobQueue = new Bull('langchain-jobs', process.env.REDIS_URL!)

// Middleware
app.use(express.json())
app.use(express.urlencoded({ extended: true }))

// Health check endpoint
app.get('/health', (req, res) => {
  res.json({
    status: 'healthy',
    version: process.env.APP_VERSION,
    timestamp: new Date().toISOString(),
  })
})

// Main chat endpoint
app.post('/api/v1/chat', async (req, res) => {
  const tenantId = req.headers['x-tenant-id'] as string
  const apiKey = req.headers['x-api-key'] as string

  try {
    // Validate API key and get tenant
    const tenant = await validateApiKey(apiKey, tenantId)
    if (!tenant) {
      return res.status(401).json({ error: 'Invalid API key' })
    }

    // Check rate limits
    const rateLimitOk = await checkRateLimit(tenant.id)
    if (!rateLimitOk) {
      return res.status(429).json({ error: 'Rate limit exceeded' })
    }

    // Check token quota
    const quotaOk = await checkTokenQuota(tenant.id)
    if (!quotaOk) {
      return res.status(402).json({ error: 'Token quota exceeded' })
    }

    // Get or create conversation chain
    const chain = await getConversationChain(tenant.id)

    // Process the chat request
    const { question, conversationId } = req.body
    const startTime = Date.now()

    // Get conversation memory
    const memory = await getConversationMemory(tenant.id, conversationId)

    // Execute chain
    const response = await chain.call({
      question,
      chat_history: memory,
    })

    // Calculate token usage
    const tokenUsage = response.llmOutput?.tokenUsage || {
      promptTokens: 0,
      completionTokens: 0,
      totalTokens: 0,
    }

    // Track usage
    await trackUsage({
      tenantId: tenant.id,
      conversationId,
      tokenUsage,
      duration: Date.now() - startTime,
      timestamp: new Date(),
    })

    // Update conversation memory
    await updateConversationMemory(tenant.id, conversationId, {
      question,
      answer: response.text,
    })

    // Return response
    res.json({
      answer: response.text,
      sources: response.sourceDocuments,
      usage: {
        promptTokens: tokenUsage.promptTokens,
        completionTokens: tokenUsage.completionTokens,
        totalTokens: tokenUsage.totalTokens,
      },
      conversationId,
    })
  } catch (error) {
    console.error('Chat error:', error)
    res.status(500).json({
      error: 'Internal server error',
      message:
        process.env.NODE_ENV === 'development' ? error.message : undefined,
    })
  }
})

// Document upload endpoint
app.post('/api/v1/documents', async (req, res) => {
  const tenantId = req.headers['x-tenant-id'] as string

  try {
    // Validate tenant
    const tenant = await getTenant(tenantId)
    if (!tenant) {
      return res.status(404).json({ error: 'Tenant not found' })
    }

    // Check document quota
    const quotaOk = await checkDocumentQuota(tenant.id)
    if (!quotaOk) {
      return res.status(402).json({ error: 'Document quota exceeded' })
    }

    // Queue document processing job
    const job = await jobQueue.add('process_document', {
      tenantId: tenant.id,
      documentUrl: req.body.documentUrl,
      metadata: req.body.metadata,
    })

    res.json({
      jobId: job.id,
      status: 'processing',
      message: 'Document queued for processing',
    })
  } catch (error) {
    console.error('Document upload error:', error)
    res.status(500).json({ error: 'Failed to upload document' })
  }
})

// Usage analytics endpoint
app.get('/api/v1/usage', async (req, res) => {
  const tenantId = req.headers['x-tenant-id'] as string
  const { startDate, endDate } = req.query

  try {
    const usage = await getUsageAnalytics(tenantId, {
      startDate: new Date(startDate as string),
      endDate: new Date(endDate as string),
    })

    res.json({
      period: {
        start: startDate,
        end: endDate,
      },
      tokens: {
        total: usage.totalTokens,
        prompt: usage.promptTokens,
        completion: usage.completionTokens,
      },
      requests: usage.requestCount,
      documents: usage.documentCount,
      cost: usage.totalCost,
      dailyBreakdown: usage.dailyBreakdown,
    })
  } catch (error) {
    console.error('Usage analytics error:', error)
    res.status(500).json({ error: 'Failed to fetch usage data' })
  }
})

// Billing webhook
app.post(
  '/webhook/stripe',
  express.raw({ type: 'application/json' }),
  async (req, res) => {
    const sig = req.headers['stripe-signature'] as string

    try {
      const event = stripe.webhooks.constructEvent(
        req.body,
        sig,
        process.env.STRIPE_WEBHOOK_SECRET!
      )

      await handleStripeWebhook(event)
      res.json({ received: true })
    } catch (error) {
      console.error('Stripe webhook error:', error)
      res.status(400).json({ error: 'Webhook error' })
    }
  }
)

// Job processing
jobQueue.process('process_document', async (job) => {
  const { tenantId, documentUrl, metadata } = job.data

  try {
    // Download document
    const document = await downloadDocument(documentUrl)

    // Split into chunks
    const chunks = await splitDocument(document)

    // Generate embeddings
    const embeddings = new OpenAIEmbeddings({
      openAIApiKey: process.env.OPENAI_API_KEY,
    })

    // Get tenant vector store
    const vectorStore = await getTenantVectorStore(tenantId)

    // Add documents with tenant metadata
    const documents = chunks.map(
      (chunk) =>
        new Document({
          pageContent: chunk.text,
          metadata: {
            ...metadata,
            tenantId,
            source: documentUrl,
            chunkIndex: chunk.index,
            processedAt: new Date().toISOString(),
          },
        })
    )

    await vectorStore.addDocuments(documents)

    // Update document count
    await incrementDocumentCount(tenantId, documents.length)

    // Track completion
    await job.progress(100)
  } catch (error) {
    console.error('Document processing error:', error)
    throw error
  }
})

// Helper functions
async function getConversationChain(tenantId: string) {
  const tenant = await getTenant(tenantId)

  const llm = new ChatOpenAI({
    openAIApiKey: process.env.OPENAI_API_KEY,
    modelName: tenant.settings.model || 'gpt-3.5-turbo',
    temperature: tenant.settings.temperature || 0.7,
    maxTokens: tenant.settings.maxTokens || 1000,
    callbacks: [
      {
        handleLLMEnd: async (output) => {
          // Track token usage in real-time
          await trackTokenUsage(tenantId, output.llmOutput?.tokenUsage)
        },
      },
    ],
  })

  const vectorStore = await getTenantVectorStore(tenantId)

  const chain = ConversationalRetrievalQAChain.fromLLM(
    llm,
    vectorStore.asRetriever(),
    {
      returnSourceDocuments: true,
      qaChainOptions: {
        type: 'stuff',
        prompt: tenant.settings.customPrompt || undefined,
      },
    }
  )

  return chain
}

async function getTenantVectorStore(tenantId: string) {
  const pinecone = new PineconeClient()
  await pinecone.init({
    apiKey: process.env.PINECONE_API_KEY!,
    environment: process.env.PINECONE_ENV!,
  })

  const index = pinecone.Index(process.env.PINECONE_INDEX!)

  return await PineconeStore.fromExistingIndex(
    new OpenAIEmbeddings({
      openAIApiKey: process.env.OPENAI_API_KEY,
    }),
    {
      pineconeIndex: index,
      namespace: `tenant_${tenantId}`,
      filter: { tenantId },
    }
  )
}

// Start server
const PORT = process.env.PORT || 3000
app.listen(PORT, () => {
  console.log(`LangChain SaaS API running on port ${PORT}`)
})

// Graceful shutdown
process.on('SIGTERM', async () => {
  console.log('SIGTERM received, shutting down gracefully')

  // Close job queue
  await jobQueue.close()

  // Close database connections
  await pgPool.end()

  // Close Redis connection
  await redis.quit()

  process.exit(0)
})

Deployment and Operations

For production deployment, consider these critical aspects:

Docker Configuration

# @filename: Dockerfile
# Multi-stage build for optimized image
FROM node:18-alpine AS builder

WORKDIR /app

# Copy package files
COPY package*.json ./
COPY tsconfig.json ./

# Install dependencies
RUN npm ci --only=production

# Copy source code
COPY src ./src

# Build TypeScript
RUN npm run build

# Production stage
FROM node:18-alpine

WORKDIR /app

# Install dumb-init for proper signal handling
RUN apk add --no-cache dumb-init

# Create non-root user
RUN addgroup -g 1001 -S nodejs
RUN adduser -S nodejs -u 1001

# Copy built application
COPY --from=builder --chown=nodejs:nodejs /app/dist ./dist
COPY --from=builder --chown=nodejs:nodejs /app/node_modules ./node_modules
COPY --from=builder --chown=nodejs:nodejs /app/package*.json ./

# Switch to non-root user
USER nodejs

# Expose port
EXPOSE 3000

# Use dumb-init to handle signals
ENTRYPOINT ["dumb-init", "--"]
CMD ["node", "dist/index.js"]

Kubernetes Deployment

apiVersion: apps/v1
kind: Deployment
metadata:
  name: langchain-saas-api
  namespace: production
spec:
  replicas: 3
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 1
      maxUnavailable: 0
  selector:
    matchLabels:
      app: langchain-saas-api
  template:
    metadata:
      labels:
        app: langchain-saas-api
    spec:
      containers:
        - name: api
          image: langchain-saas:latest
          ports:
            - containerPort: 3000
              name: http
          env:
            - name: NODE_ENV
              value: 'production'
            - name: DATABASE_URL
              valueFrom:
                secretKeyRef:
                  name: database-credentials
                  key: url
            - name: REDIS_URL
              valueFrom:
                secretKeyRef:
                  name: redis-credentials
                  key: url
            - name: OPENAI_API_KEY
              valueFrom:
                secretKeyRef:
                  name: openai-credentials
                  key: api-key
          resources:
            requests:
              memory: '1Gi'
              cpu: '500m'
            limits:
              memory: '2Gi'
              cpu: '1000m'
          livenessProbe:
            httpGet:
              path: /health
              port: http
            initialDelaySeconds: 30
            periodSeconds: 10
          readinessProbe:
            httpGet:
              path: /health
              port: http
            initialDelaySeconds: 5
            periodSeconds: 5
---
apiVersion: v1
kind: Service
metadata:
  name: langchain-saas-api
  namespace: production
spec:
  selector:
    app: langchain-saas-api
  ports:
    - port: 80
      targetPort: http
      protocol: TCP
  type: ClusterIP
---
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: langchain-saas-api-hpa
  namespace: production
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: langchain-saas-api
  minReplicas: 3
  maxReplicas: 20
  metrics:
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 70
    - type: Resource
      resource:
        name: memory
        target:
          type: Utilization
          averageUtilization: 80

Lessons Learned and Best Practices

After building and scaling multiple LangChain SaaS applications, here are key lessons learned:

1. Cost Management

Monitor token usage religiously: Set up alerts for unusual spikes
Implement smart caching: Cache embeddings and common queries
Use appropriate models: Don’t use GPT-4 when GPT-3.5 suffices
Batch operations: Process multiple requests together when possible

2. Performance Optimization

Vector store partitioning: Split large indices by date or category
Connection pooling: Maintain pools for all external services
Async processing: Use queues for non-real-time operations
Edge caching: Cache static responses at CDN level

3. Security Best Practices

API key rotation: Implement automatic key rotation
Tenant data encryption: Encrypt sensitive data at rest
Audit logging: Log all data access and modifications
Input validation: Sanitize all user inputs before processing

4. Operational Excellence

Comprehensive monitoring: Track every aspect of the system
Automated testing: Test tenant isolation regularly
Disaster recovery: Regular backups and recovery drills
Documentation: Keep runbooks updated

5. Customer Success

Usage dashboards: Provide real-time usage visibility
Cost predictability: Offer usage alerts and projections
API documentation: Maintain excellent API docs with examples
Support integration: Build debugging tools for support team

Conclusion

Building a production-ready LangChain SaaS requires careful attention to architecture, scaling, and operational concerns. By following the patterns and practices outlined in this guide, you can build a system that scales efficiently from your first customer to thousands while maintaining reliability, security, and cost-effectiveness.

Remember that every SaaS is unique – adapt these patterns to your specific use case and requirements. Start simple, measure everything, and iterate based on real customer needs.

For a complete deployment guide with infrastructure as code, monitoring setup, and CI/CD pipelines, check out our LangChain SaaS Deployment Guide.

Happy building!