Building a SaaS with LangChain: Architecture and Scaling
Building a Software-as-a-Service (SaaS) application with LangChain presents unique challenges beyond typical web applications. You’re not just dealing with user authentication and data storage – you’re managing AI model costs, rate limiting, tenant isolation, and complex billing based on token usage. This guide walks through building a production-ready LangChain SaaS architecture that can scale from your first customer to 10,000 and beyond.
Table of Contents
- Architecture Overview
- Multi-Tenant Design Patterns
- API Gateway and Rate Limiting
- Billing Integration with Stripe
- Usage Tracking and Quotas
- Tenant Isolation Strategies
- Scaling from 0 to 10k Customers
- Complete Example Application
- Deployment and Operations
- Lessons Learned and Best Practices
Architecture Overview
A production LangChain SaaS requires careful consideration of multiple layers. Here’s the high-level architecture that has proven successful for scaling AI applications:
graph TB
subgraph "Client Layer"
WEB[Web App]
API[API Clients]
SDK[SDKs]
end
subgraph "API Gateway"
GW[Kong/AWS API Gateway]
AUTH[Auth Service]
RATE[Rate Limiter]
end
subgraph "Application Layer"
APP1[App Server 1]
APP2[App Server 2]
APP3[App Server N]
QUEUE[Job Queue]
end
subgraph "AI Layer"
LC[LangChain Service]
CACHE[Vector Cache]
EMB[Embeddings Service]
end
subgraph "Data Layer"
PG[(PostgreSQL)]
REDIS[(Redis)]
S3[(S3/Object Storage)]
VECTOR[(Vector DB)]
end
subgraph "Monitoring"
LOG[Logging]
METRIC[Metrics]
TRACE[Tracing]
end
WEB --> GW
API --> GW
SDK --> GW
GW --> AUTH
GW --> RATE
GW --> APP1
GW --> APP2
GW --> APP3
APP1 --> LC
APP2 --> LC
APP3 --> LC
APP1 --> QUEUE
APP2 --> QUEUE
APP3 --> QUEUE
LC --> CACHE
LC --> EMB
LC --> VECTOR
APP1 --> PG
APP2 --> PG
APP3 --> PG
APP1 --> REDIS
APP2 --> REDIS
APP3 --> REDIS
LC --> S3
APP1 --> LOG
APP2 --> LOG
APP3 --> LOG
LC --> METRIC
Key Components
- API Gateway: Central entry point handling authentication, rate limiting, and request routing
- Application Servers: Stateless Node.js/Python servers running your business logic
- LangChain Service: Dedicated service layer for AI operations
- Data Stores: PostgreSQL for relational data, Redis for caching, S3 for documents, Vector DB for embeddings
- Monitoring Stack: Comprehensive logging, metrics, and distributed tracing
Multi-Tenant Design Patterns
Multi-tenancy is crucial for SaaS applications. With LangChain, you need to consider both data isolation and AI resource isolation.
Database-Level Isolation
// @filename: index.ts
// Database schema with tenant isolation
interface TenantSchema {
id: string
name: string
plan: 'starter' | 'professional' | 'enterprise'
settings: {
maxTokensPerMonth: number
maxConcurrentRequests: number
allowedModels: string[]
customPrompts: boolean
dataRetentionDays: number
}
createdAt: Date
updatedAt: Date
}
interface UserSchema {
id: string
tenantId: string // Foreign key to tenant
email: string
role: 'admin' | 'user' | 'viewer'
apiKeys: ApiKey[]
}
interface ConversationSchema {
id: string
tenantId: string // Ensures data isolation
userId: string
messages: Message[]
tokenUsage: {
promptTokens: number
completionTokens: number
totalCost: number
}
metadata: Record<string, any>
createdAt: Date
}
Application-Level Tenant Context
// @filename: index.ts
// Middleware for tenant context injection
export class TenantContextMiddleware {
async use(req: Request, res: Response, next: NextFunction) {
try {
// Extract tenant from JWT or API key
const tenantId = await this.extractTenantId(req);
if (!tenantId) {
return res.status(401).json({ error: 'Invalid tenant context' });
}
// Load tenant configuration
const tenant = await this.tenantService.getTenant(tenantId);
// Inject tenant context
req.context = {
tenantId: tenant.id,
tenant: tenant,
limits: {
maxTokens: tenant.settings.maxTokensPerMonth,
remainingTokens: await this.getRemaining Tokens(tenant.id),
concurrentRequests: tenant.settings.maxConcurrentRequests
}
};
next();
} catch (error) {
res.status(500).json({ error: 'Failed to establish tenant context' });
}
}
private async extractTenantId(req: Request): Promise<string | null> {
// Check API key header
const apiKey = req.headers['x-api-key'];
if (apiKey) {
return await this.tenantService.getTenantIdFromApiKey(apiKey);
}
// Check JWT token
const token = req.headers.authorization?.split(' ')[1];
if (token) {
const decoded = jwt.verify(token, process.env.JWT_SECRET);
return decoded.tenantId;
}
return null;
}
}
LangChain Tenant Isolation
// @filename: index.ts
// Tenant-aware LangChain service
export class TenantLangChainService {
private chains: Map<string, ConversationChain> = new Map()
async getChain(tenantId: string): Promise<ConversationChain> {
// Check if chain exists for tenant
if (this.chains.has(tenantId)) {
return this.chains.get(tenantId)!
}
// Load tenant-specific configuration
const config = await this.loadTenantConfig(tenantId)
// Create tenant-specific LLM instance
const llm = new ChatOpenAI({
openAIApiKey: config.apiKey || process.env.OPENAI_API_KEY,
modelName: config.model || 'gpt-3.5-turbo',
temperature: config.temperature || 0.7,
maxTokens: config.maxTokens || 1000,
callbacks: [
new TokenUsageCallback(tenantId),
new TenantRateLimitCallback(tenantId),
],
})
// Create tenant-specific vector store
const vectorStore = await this.createTenantVectorStore(tenantId)
// Create conversation chain with memory
const memory = new BufferMemory({
memoryKey: 'chat_history',
returnMessages: true,
inputKey: 'question',
outputKey: 'answer',
})
const chain = new ConversationalRetrievalQAChain({
llm,
vectorStore,
memory,
returnSourceDocuments: true,
})
this.chains.set(tenantId, chain)
return chain
}
private async createTenantVectorStore(
tenantId: string
): Promise<VectorStore> {
// Create isolated vector store namespace
return new PineconeStore({
pineconeIndex: this.pineconeIndex,
namespace: `tenant_${tenantId}`,
textKey: 'text',
embeddingModel: new OpenAIEmbeddings({
openAIApiKey: process.env.OPENAI_API_KEY,
}),
})
}
}
API Gateway and Rate Limiting
A robust API gateway is essential for managing multi-tenant traffic and enforcing limits.
Rate Limiting Strategy
// @filename: index.ts
// Rate limiting configuration per tenant tier
const rateLimitConfig = {
starter: {
windowMs: 60 * 1000, // 1 minute
maxRequests: 10,
maxTokensPerMinute: 10000,
maxConcurrent: 2,
},
professional: {
windowMs: 60 * 1000,
maxRequests: 60,
maxTokensPerMinute: 50000,
maxConcurrent: 5,
},
enterprise: {
windowMs: 60 * 1000,
maxRequests: 600,
maxTokensPerMinute: 500000,
maxConcurrent: 20,
},
}
// Redis-based rate limiter
export class TenantRateLimiter {
constructor(private redis: Redis) {}
async checkLimit(
tenantId: string,
type: 'request' | 'token',
amount: number = 1
): Promise<RateLimitResult> {
const tenant = await this.getTenant(tenantId)
const config = rateLimitConfig[tenant.plan]
const key = `rate_limit:${tenantId}:${type}`
const window = config.windowMs
const limit =
type === 'request' ? config.maxRequests : config.maxTokensPerMinute
// Sliding window implementation
const now = Date.now()
const windowStart = now - window
// Remove old entries
await this.redis.zremrangebyscore(key, '-inf', windowStart)
// Count current usage
const currentUsage = await this.redis.zcard(key)
if (currentUsage + amount > limit) {
return {
allowed: false,
limit,
remaining: Math.max(0, limit - currentUsage),
resetAt: new Date(now + window),
}
}
// Add new entry
await this.redis.zadd(key, now, `${now}:${amount}`)
await this.redis.expire(key, Math.ceil(window / 1000))
return {
allowed: true,
limit,
remaining: limit - currentUsage - amount,
resetAt: new Date(now + window),
}
}
async checkConcurrent(tenantId: string): Promise<boolean> {
const tenant = await this.getTenant(tenantId)
const config = rateLimitConfig[tenant.plan]
const key = `concurrent:${tenantId}`
const current = await this.redis.get(key)
if (parseInt(current || '0') >= config.maxConcurrent) {
return false
}
await this.redis.incr(key)
await this.redis.expire(key, 300) // 5 minute expiry
return true
}
async releaseConcurrent(tenantId: string): Promise<void> {
const key = `concurrent:${tenantId}`
await this.redis.decr(key)
}
}
API Gateway Implementation
// @filename: index.ts
// Express middleware for API gateway
export class ApiGateway {
constructor(
private rateLimiter: TenantRateLimiter,
private usageTracker: UsageTracker
) {}
async handleRequest(req: Request, res: Response, next: NextFunction) {
const tenantId = req.context.tenantId
// Check request rate limit
const requestLimit = await this.rateLimiter.checkLimit(tenantId, 'request')
if (!requestLimit.allowed) {
return res.status(429).json({
error: 'Rate limit exceeded',
retryAfter: requestLimit.resetAt,
})
}
// Check concurrent request limit
const canProceed = await this.rateLimiter.checkConcurrent(tenantId)
if (!canProceed) {
return res.status(429).json({
error: 'Concurrent request limit exceeded',
})
}
// Track request
const requestId = uuidv4()
req.context.requestId = requestId
// Set rate limit headers
res.setHeader('X-RateLimit-Limit', requestLimit.limit)
res.setHeader('X-RateLimit-Remaining', requestLimit.remaining)
res.setHeader('X-RateLimit-Reset', requestLimit.resetAt.toISOString())
// Handle response completion
res.on('finish', async () => {
await this.rateLimiter.releaseConcurrent(tenantId)
// Track usage if LangChain was used
if (req.context.tokenUsage) {
await this.usageTracker.trackUsage({
tenantId,
requestId,
tokens: req.context.tokenUsage,
cost: req.context.cost,
timestamp: new Date(),
})
}
})
next()
}
}
Billing Integration with Stripe
Integrating billing requires careful tracking of usage and flexible pricing models.
Stripe Setup and Price Models
// @filename: index.ts
// Stripe product and price configuration
export const stripePricing = {
products: {
starter: 'prod_starter123',
professional: 'prod_prof456',
enterprise: 'prod_ent789',
},
prices: {
starter: {
monthly: 'price_starter_monthly',
usage: {
tokens: 'price_starter_tokens', // $0.01 per 1k tokens after included
documents: 'price_starter_docs', // $0.10 per document after included
},
},
professional: {
monthly: 'price_prof_monthly',
usage: {
tokens: 'price_prof_tokens', // $0.008 per 1k tokens
documents: 'price_prof_docs', // $0.08 per document
},
},
enterprise: {
monthly: 'price_ent_monthly',
usage: {
tokens: 'price_ent_tokens', // $0.006 per 1k tokens
documents: 'price_ent_docs', // $0.06 per document
},
},
},
}
// Billing service implementation
export class BillingService {
private stripe: Stripe
constructor() {
this.stripe = new Stripe(process.env.STRIPE_SECRET_KEY!, {
apiVersion: '2023-10-16',
})
}
async createCustomer(tenant: TenantSchema): Promise<string> {
const customer = await this.stripe.customers.create({
name: tenant.name,
email: tenant.billingEmail,
metadata: {
tenantId: tenant.id,
plan: tenant.plan,
},
})
return customer.id
}
async createSubscription(
tenantId: string,
plan: string
): Promise<Stripe.Subscription> {
const tenant = await this.getTenant(tenantId)
// Create subscription with base plan
const subscription = await this.stripe.subscriptions.create({
customer: tenant.stripeCustomerId,
items: [
{
price: stripePricing.prices[plan].monthly,
},
{
price: stripePricing.prices[plan].usage.tokens,
quantity: 0, // Usage-based, will be reported later
},
{
price: stripePricing.prices[plan].usage.documents,
quantity: 0,
},
],
metadata: {
tenantId,
},
})
return subscription
}
async reportUsage(tenantId: string, usage: UsageReport): Promise<void> {
const tenant = await this.getTenant(tenantId)
const subscription = await this.getActiveSubscription(
tenant.stripeCustomerId
)
// Find usage-based subscription items
const tokenItem = subscription.items.data.find(
(item) => item.price.id === stripePricing.prices[tenant.plan].usage.tokens
)
const docItem = subscription.items.data.find(
(item) =>
item.price.id === stripePricing.prices[tenant.plan].usage.documents
)
// Report token usage
if (tokenItem && usage.tokens > 0) {
await this.stripe.subscriptionItems.createUsageRecord(tokenItem.id, {
quantity: Math.ceil(usage.tokens / 1000), // Billed per 1k tokens
timestamp: Math.floor(usage.timestamp.getTime() / 1000),
action: 'increment',
})
}
// Report document usage
if (docItem && usage.documents > 0) {
await this.stripe.subscriptionItems.createUsageRecord(docItem.id, {
quantity: usage.documents,
timestamp: Math.floor(usage.timestamp.getTime() / 1000),
action: 'increment',
})
}
}
}
Webhook Handling
// @filename: index.ts
// Stripe webhook handler
export class StripeWebhookHandler {
async handleWebhook(req: Request, res: Response) {
const sig = req.headers['stripe-signature'] as string
let event: Stripe.Event
try {
event = this.stripe.webhooks.constructEvent(
req.body,
sig,
process.env.STRIPE_WEBHOOK_SECRET!
)
} catch (err) {
return res.status(400).send(`Webhook Error: ${err.message}`)
}
switch (event.type) {
case 'customer.subscription.created':
case 'customer.subscription.updated':
await this.handleSubscriptionChange(
event.data.object as Stripe.Subscription
)
break
case 'customer.subscription.deleted':
await this.handleSubscriptionCancellation(
event.data.object as Stripe.Subscription
)
break
case 'invoice.payment_succeeded':
await this.handlePaymentSuccess(event.data.object as Stripe.Invoice)
break
case 'invoice.payment_failed':
await this.handlePaymentFailure(event.data.object as Stripe.Invoice)
break
}
res.json({ received: true })
}
private async handleSubscriptionChange(subscription: Stripe.Subscription) {
const tenantId = subscription.metadata.tenantId
// Update tenant plan based on subscription
const plan = this.extractPlanFromSubscription(subscription)
await this.tenantService.updatePlan(tenantId, plan)
// Update limits
await this.updateTenantLimits(tenantId, plan)
}
private async handlePaymentFailure(invoice: Stripe.Invoice) {
const tenantId = invoice.subscription_details?.metadata?.tenantId
if (tenantId) {
// Suspend tenant after grace period
await this.tenantService.scheduleSupension(tenantId, 7) // 7 day grace period
// Send notification
await this.notificationService.sendPaymentFailureNotification(tenantId)
}
}
}
Usage Tracking and Quotas
Accurate usage tracking is critical for billing and enforcing quotas.
Token Usage Tracking
// @filename: index.ts
// Comprehensive usage tracking system
export class UsageTracker {
constructor(
private db: Database,
private redis: Redis,
private billing: BillingService
) {}
async trackTokenUsage(params: {
tenantId: string
userId: string
requestId: string
promptTokens: number
completionTokens: number
model: string
cost: number
}): Promise<void> {
const timestamp = new Date()
// Store detailed usage record
await this.db.usage.create({
...params,
totalTokens: params.promptTokens + params.completionTokens,
timestamp,
})
// Update real-time counters in Redis
const dailyKey = `usage:${params.tenantId}:daily:${this.getDateKey()}`
const monthlyKey = `usage:${params.tenantId}:monthly:${this.getMonthKey()}`
const pipeline = this.redis.pipeline()
// Increment counters
pipeline.hincrby(
dailyKey,
'tokens',
params.promptTokens + params.completionTokens
)
pipeline.hincrby(dailyKey, 'requests', 1)
pipeline.hincrbyfloat(dailyKey, 'cost', params.cost)
pipeline.hincrby(
monthlyKey,
'tokens',
params.promptTokens + params.completionTokens
)
pipeline.hincrby(monthlyKey, 'requests', 1)
pipeline.hincrbyfloat(monthlyKey, 'cost', params.cost)
// Set expiry
pipeline.expire(dailyKey, 60 * 60 * 24 * 7) // 7 days
pipeline.expire(monthlyKey, 60 * 60 * 24 * 35) // 35 days
await pipeline.exec()
// Check quotas
await this.checkAndEnforceQuotas(params.tenantId)
}
async checkAndEnforceQuotas(tenantId: string): Promise<QuotaStatus> {
const tenant = await this.getTenant(tenantId)
const monthlyUsage = await this.getMonthlyUsage(tenantId)
const quotaStatus: QuotaStatus = {
tokensUsed: monthlyUsage.tokens,
tokensLimit: tenant.settings.maxTokensPerMonth,
tokensRemaining: Math.max(
0,
tenant.settings.maxTokensPerMonth - monthlyUsage.tokens
),
percentUsed:
(monthlyUsage.tokens / tenant.settings.maxTokensPerMonth) * 100,
willExceedAt: this.predictExceedance(
monthlyUsage,
tenant.settings.maxTokensPerMonth
),
}
// Send alerts at thresholds
if (quotaStatus.percentUsed >= 80 && !tenant.alerts.sent80) {
await this.sendQuotaAlert(tenantId, 80)
}
if (quotaStatus.percentUsed >= 90 && !tenant.alerts.sent90) {
await this.sendQuotaAlert(tenantId, 90)
}
if (quotaStatus.percentUsed >= 100) {
await this.enforceQuotaLimit(tenantId)
}
return quotaStatus
}
private async enforceQuotaLimit(tenantId: string) {
// Set quota exceeded flag
await this.redis.set(`quota_exceeded:${tenantId}`, '1', 'EX', 3600)
// Notify tenant
await this.notificationService.sendQuotaExceededNotification(tenantId)
// Log event
await this.auditLog.log({
tenantId,
event: 'quota_exceeded',
timestamp: new Date(),
})
}
}
Document and Storage Tracking
// @filename: index.ts
// Document usage tracking
export class DocumentTracker {
async trackDocument(params: {
tenantId: string
documentId: string
size: number
type: string
operation: 'upload' | 'process' | 'delete'
}): Promise<void> {
// Record document operation
await this.db.documentOperations.create(params)
// Update storage metrics
if (params.operation === 'upload') {
await this.redis.hincrby(
`storage:${params.tenantId}`,
'totalBytes',
params.size
)
await this.redis.hincrby(`storage:${params.tenantId}`, 'documentCount', 1)
} else if (params.operation === 'delete') {
await this.redis.hincrby(
`storage:${params.tenantId}`,
'totalBytes',
-params.size
)
await this.redis.hincrby(
`storage:${params.tenantId}`,
'documentCount',
-1
)
}
// Check storage quotas
await this.checkStorageQuotas(params.tenantId)
}
async getStorageMetrics(tenantId: string): Promise<StorageMetrics> {
const data = await this.redis.hgetall(`storage:${tenantId}`)
return {
totalBytes: parseInt(data.totalBytes || '0'),
documentCount: parseInt(data.documentCount || '0'),
averageSize: data.documentCount
? parseInt(data.totalBytes || '0') / parseInt(data.documentCount || '1')
: 0,
}
}
}
Tenant Isolation Strategies
Ensuring complete isolation between tenants is crucial for security and compliance.
Vector Store Isolation
// @filename: index.ts
// Isolated vector stores per tenant
export class TenantVectorStoreManager {
private vectorStores: Map<string, VectorStore> = new Map()
async getVectorStore(tenantId: string): Promise<VectorStore> {
if (this.vectorStores.has(tenantId)) {
return this.vectorStores.get(tenantId)!
}
// Create isolated namespace in Pinecone
const vectorStore = await PineconeStore.fromExistingIndex(
new OpenAIEmbeddings({
openAIApiKey: process.env.OPENAI_API_KEY,
}),
{
pineconeIndex: this.pineconeIndex,
namespace: `tenant_${tenantId}`, // Isolated namespace
filter: { tenantId }, // Additional filter for safety
}
)
this.vectorStores.set(tenantId, vectorStore)
return vectorStore
}
async addDocuments(tenantId: string, documents: Document[]): Promise<void> {
const vectorStore = await this.getVectorStore(tenantId)
// Add tenant metadata to all documents
const taggedDocuments = documents.map((doc) => ({
...doc,
metadata: {
...doc.metadata,
tenantId,
indexedAt: new Date().toISOString(),
},
}))
await vectorStore.addDocuments(taggedDocuments)
// Track document count
await this.documentTracker.trackDocuments({
tenantId,
count: documents.length,
operation: 'index',
})
}
async search(
tenantId: string,
query: string,
k: number = 4
): Promise<Document[]> {
const vectorStore = await this.getVectorStore(tenantId)
// Search with tenant filter
const results = await vectorStore.similaritySearch(
query,
k,
{ tenantId } // Ensure tenant isolation
)
return results
}
}
Memory and Cache Isolation
// @filename: index.ts
// Tenant-isolated memory management
export class TenantMemoryManager {
private memories: Map<string, BaseMemory> = new Map()
getMemoryKey(tenantId: string, conversationId: string): string {
return `${tenantId}:${conversationId}`
}
async getMemory(
tenantId: string,
conversationId: string
): Promise<BufferMemory> {
const key = this.getMemoryKey(tenantId, conversationId)
if (this.memories.has(key)) {
return this.memories.get(key) as BufferMemory
}
// Load memory from Redis with tenant isolation
const memory = new BufferMemory({
returnMessages: true,
memoryKey: 'chat_history',
chatHistory: new RedisChatMessageHistory({
sessionId: key,
client: this.redis,
keyPrefix: `memory:${tenantId}:`, // Tenant prefix
}),
})
this.memories.set(key, memory)
return memory
}
async clearMemory(tenantId: string, conversationId: string): Promise<void> {
const key = this.getMemoryKey(tenantId, conversationId)
// Clear from cache
this.memories.delete(key)
// Clear from Redis
await this.redis.del(`memory:${tenantId}:${conversationId}`)
}
async clearAllTenantMemories(tenantId: string): Promise<void> {
// Find all memories for tenant
const keys = await this.redis.keys(`memory:${tenantId}:*`)
if (keys.length > 0) {
await this.redis.del(...keys)
}
// Clear from local cache
for (const [key, _] of this.memories) {
if (key.startsWith(tenantId)) {
this.memories.delete(key)
}
}
}
}
Scaling from 0 to 10k Customers
Scaling a LangChain SaaS requires careful planning at each growth stage.
Stage 1: 0-100 Customers (MVP)
// @filename: server.js
// Simple architecture for early stage
export class MVPArchitecture {
// Single server setup
async initialize() {
const app = express()
// Basic middleware
app.use(express.json())
app.use(cors())
// Simple in-memory rate limiting
const rateLimiter = rateLimit({
windowMs: 15 * 60 * 1000, // 15 minutes
max: 100, // limit each IP to 100 requests per windowMs
})
app.use('/api/', rateLimiter)
// Single LangChain instance
const llm = new ChatOpenAI({
modelName: 'gpt-3.5-turbo',
temperature: 0.7,
})
// Single vector store
const vectorStore = await HNSWLib.fromTexts(
[''],
[{}],
new OpenAIEmbeddings()
)
// Basic API endpoints
app.post('/api/chat', async (req, res) => {
try {
const { message, tenantId } = req.body
// Simple tenant check
const tenant = await db.tenants.findUnique({ where: { id: tenantId } })
if (!tenant) {
return res.status(404).json({ error: 'Tenant not found' })
}
// Process with LangChain
const response = await llm.call([new HumanMessage(message)])
// Track usage
await db.usage.create({
data: {
tenantId,
tokens: response.llmOutput?.tokenUsage?.totalTokens || 0,
cost: calculateCost(response.llmOutput?.tokenUsage),
},
})
res.json({ response: response.content })
} catch (error) {
res.status(500).json({ error: error.message })
}
})
app.listen(3000)
}
}
Stage 2: 100-1000 Customers (Growth)
// @filename: app.js
// Architecture for growth stage
export class GrowthArchitecture {
async initialize() {
// Load balancer with multiple app servers
const cluster = require('cluster')
const numCPUs = require('os').cpus().length
if (cluster.isMaster) {
// Fork workers
for (let i = 0; i < numCPUs; i++) {
cluster.fork()
}
cluster.on('exit', (worker, code, signal) => {
console.log(`Worker ${worker.process.pid} died`)
cluster.fork() // Replace dead workers
})
} else {
// Worker process
const app = express()
// Redis for distributed rate limiting
const redisClient = new Redis({
host: process.env.REDIS_HOST,
port: process.env.REDIS_PORT,
password: process.env.REDIS_PASSWORD,
})
// Distributed rate limiter
const rateLimiter = new RateLimiterRedis({
storeClient: redisClient,
keyPrefix: 'rl',
points: 100,
duration: 900, // 15 minutes
})
// Connection pooling for databases
const pgPool = new Pool({
connectionString: process.env.DATABASE_URL,
max: 20,
idleTimeoutMillis: 30000,
connectionTimeoutMillis: 2000,
})
// Shared vector store with connection pooling
const pinecone = new PineconeClient()
await pinecone.init({
apiKey: process.env.PINECONE_API_KEY,
environment: process.env.PINECONE_ENV,
})
// Queue for heavy operations
const bullQueue = new Bull('langchain-jobs', {
redis: {
host: process.env.REDIS_HOST,
port: process.env.REDIS_PORT,
password: process.env.REDIS_PASSWORD,
},
})
// Process jobs in background
bullQueue.process(async (job) => {
const { type, data } = job.data
switch (type) {
case 'process_documents':
await processDocuments(data)
break
case 'generate_embeddings':
await generateEmbeddings(data)
break
}
})
app.listen(3000)
}
}
}
Stage 3: 1000-10k Customers (Scale)
// @filename: index.ts
// Enterprise-grade architecture
export class EnterpriseArchitecture {
async initialize() {
// Kubernetes deployment configuration
const k8sDeployment = {
apiVersion: 'apps/v1',
kind: 'Deployment',
metadata: {
name: 'langchain-saas-api',
labels: {
app: 'langchain-saas',
},
},
spec: {
replicas: 10, // Start with 10 replicas
selector: {
matchLabels: {
app: 'langchain-saas',
},
},
template: {
metadata: {
labels: {
app: 'langchain-saas',
},
},
spec: {
containers: [
{
name: 'api',
image: 'langchain-saas:latest',
ports: [
{
containerPort: 3000,
},
],
resources: {
requests: {
memory: '2Gi',
cpu: '1000m',
},
limits: {
memory: '4Gi',
cpu: '2000m',
},
},
env: [
{
name: 'NODE_ENV',
value: 'production',
},
{
name: 'DATABASE_URL',
valueFrom: {
secretKeyRef: {
name: 'database-secret',
key: 'url',
},
},
},
],
},
],
},
},
},
}
// Horizontal Pod Autoscaler
const hpa = {
apiVersion: 'autoscaling/v2',
kind: 'HorizontalPodAutoscaler',
metadata: {
name: 'langchain-saas-hpa',
},
spec: {
scaleTargetRef: {
apiVersion: 'apps/v1',
kind: 'Deployment',
name: 'langchain-saas-api',
},
minReplicas: 10,
maxReplicas: 100,
metrics: [
{
type: 'Resource',
resource: {
name: 'cpu',
target: {
type: 'Utilization',
averageUtilization: 70,
},
},
},
{
type: 'Resource',
resource: {
name: 'memory',
target: {
type: 'Utilization',
averageUtilization: 80,
},
},
},
],
},
}
// Multi-region database setup
const databaseConfig = {
primary: {
host: 'db-primary.us-east-1.rds.amazonaws.com',
database: 'langchain_saas',
max: 100,
idleTimeoutMillis: 30000,
},
replicas: [
{
host: 'db-replica-1.us-west-2.rds.amazonaws.com',
database: 'langchain_saas',
max: 50,
idleTimeoutMillis: 30000,
},
{
host: 'db-replica-2.eu-west-1.rds.amazonaws.com',
database: 'langchain_saas',
max: 50,
idleTimeoutMillis: 30000,
},
],
}
// Global CDN for static assets
const cdnConfig = {
provider: 'cloudflare',
zones: ['us', 'eu', 'asia'],
caching: {
'api/embeddings': 3600, // 1 hour
'api/documents': 86400, // 24 hours
},
}
}
}
Complete Example Application
Here’s a complete example of a production-ready LangChain SaaS application:
// @filename: server.js
// Main application entry point
import express from 'express'
import { ChatOpenAI } from 'langchain/chat_models/openai'
import { ConversationalRetrievalQAChain } from 'langchain/chains'
import { PineconeStore } from 'langchain/vectorstores/pinecone'
import { OpenAIEmbeddings } from 'langchain/embeddings/openai'
import { Document } from 'langchain/document'
import Bull from 'bull'
import Stripe from 'stripe'
import { createClient } from 'redis'
import { Pool } from 'pg'
// Initialize services
const app = express()
const stripe = new Stripe(process.env.STRIPE_SECRET_KEY!)
const redis = createClient({ url: process.env.REDIS_URL })
const pgPool = new Pool({ connectionString: process.env.DATABASE_URL })
const jobQueue = new Bull('langchain-jobs', process.env.REDIS_URL!)
// Middleware
app.use(express.json())
app.use(express.urlencoded({ extended: true }))
// Health check endpoint
app.get('/health', (req, res) => {
res.json({
status: 'healthy',
version: process.env.APP_VERSION,
timestamp: new Date().toISOString(),
})
})
// Main chat endpoint
app.post('/api/v1/chat', async (req, res) => {
const tenantId = req.headers['x-tenant-id'] as string
const apiKey = req.headers['x-api-key'] as string
try {
// Validate API key and get tenant
const tenant = await validateApiKey(apiKey, tenantId)
if (!tenant) {
return res.status(401).json({ error: 'Invalid API key' })
}
// Check rate limits
const rateLimitOk = await checkRateLimit(tenant.id)
if (!rateLimitOk) {
return res.status(429).json({ error: 'Rate limit exceeded' })
}
// Check token quota
const quotaOk = await checkTokenQuota(tenant.id)
if (!quotaOk) {
return res.status(402).json({ error: 'Token quota exceeded' })
}
// Get or create conversation chain
const chain = await getConversationChain(tenant.id)
// Process the chat request
const { question, conversationId } = req.body
const startTime = Date.now()
// Get conversation memory
const memory = await getConversationMemory(tenant.id, conversationId)
// Execute chain
const response = await chain.call({
question,
chat_history: memory,
})
// Calculate token usage
const tokenUsage = response.llmOutput?.tokenUsage || {
promptTokens: 0,
completionTokens: 0,
totalTokens: 0,
}
// Track usage
await trackUsage({
tenantId: tenant.id,
conversationId,
tokenUsage,
duration: Date.now() - startTime,
timestamp: new Date(),
})
// Update conversation memory
await updateConversationMemory(tenant.id, conversationId, {
question,
answer: response.text,
})
// Return response
res.json({
answer: response.text,
sources: response.sourceDocuments,
usage: {
promptTokens: tokenUsage.promptTokens,
completionTokens: tokenUsage.completionTokens,
totalTokens: tokenUsage.totalTokens,
},
conversationId,
})
} catch (error) {
console.error('Chat error:', error)
res.status(500).json({
error: 'Internal server error',
message:
process.env.NODE_ENV === 'development' ? error.message : undefined,
})
}
})
// Document upload endpoint
app.post('/api/v1/documents', async (req, res) => {
const tenantId = req.headers['x-tenant-id'] as string
try {
// Validate tenant
const tenant = await getTenant(tenantId)
if (!tenant) {
return res.status(404).json({ error: 'Tenant not found' })
}
// Check document quota
const quotaOk = await checkDocumentQuota(tenant.id)
if (!quotaOk) {
return res.status(402).json({ error: 'Document quota exceeded' })
}
// Queue document processing job
const job = await jobQueue.add('process_document', {
tenantId: tenant.id,
documentUrl: req.body.documentUrl,
metadata: req.body.metadata,
})
res.json({
jobId: job.id,
status: 'processing',
message: 'Document queued for processing',
})
} catch (error) {
console.error('Document upload error:', error)
res.status(500).json({ error: 'Failed to upload document' })
}
})
// Usage analytics endpoint
app.get('/api/v1/usage', async (req, res) => {
const tenantId = req.headers['x-tenant-id'] as string
const { startDate, endDate } = req.query
try {
const usage = await getUsageAnalytics(tenantId, {
startDate: new Date(startDate as string),
endDate: new Date(endDate as string),
})
res.json({
period: {
start: startDate,
end: endDate,
},
tokens: {
total: usage.totalTokens,
prompt: usage.promptTokens,
completion: usage.completionTokens,
},
requests: usage.requestCount,
documents: usage.documentCount,
cost: usage.totalCost,
dailyBreakdown: usage.dailyBreakdown,
})
} catch (error) {
console.error('Usage analytics error:', error)
res.status(500).json({ error: 'Failed to fetch usage data' })
}
})
// Billing webhook
app.post(
'/webhook/stripe',
express.raw({ type: 'application/json' }),
async (req, res) => {
const sig = req.headers['stripe-signature'] as string
try {
const event = stripe.webhooks.constructEvent(
req.body,
sig,
process.env.STRIPE_WEBHOOK_SECRET!
)
await handleStripeWebhook(event)
res.json({ received: true })
} catch (error) {
console.error('Stripe webhook error:', error)
res.status(400).json({ error: 'Webhook error' })
}
}
)
// Job processing
jobQueue.process('process_document', async (job) => {
const { tenantId, documentUrl, metadata } = job.data
try {
// Download document
const document = await downloadDocument(documentUrl)
// Split into chunks
const chunks = await splitDocument(document)
// Generate embeddings
const embeddings = new OpenAIEmbeddings({
openAIApiKey: process.env.OPENAI_API_KEY,
})
// Get tenant vector store
const vectorStore = await getTenantVectorStore(tenantId)
// Add documents with tenant metadata
const documents = chunks.map(
(chunk) =>
new Document({
pageContent: chunk.text,
metadata: {
...metadata,
tenantId,
source: documentUrl,
chunkIndex: chunk.index,
processedAt: new Date().toISOString(),
},
})
)
await vectorStore.addDocuments(documents)
// Update document count
await incrementDocumentCount(tenantId, documents.length)
// Track completion
await job.progress(100)
} catch (error) {
console.error('Document processing error:', error)
throw error
}
})
// Helper functions
async function getConversationChain(tenantId: string) {
const tenant = await getTenant(tenantId)
const llm = new ChatOpenAI({
openAIApiKey: process.env.OPENAI_API_KEY,
modelName: tenant.settings.model || 'gpt-3.5-turbo',
temperature: tenant.settings.temperature || 0.7,
maxTokens: tenant.settings.maxTokens || 1000,
callbacks: [
{
handleLLMEnd: async (output) => {
// Track token usage in real-time
await trackTokenUsage(tenantId, output.llmOutput?.tokenUsage)
},
},
],
})
const vectorStore = await getTenantVectorStore(tenantId)
const chain = ConversationalRetrievalQAChain.fromLLM(
llm,
vectorStore.asRetriever(),
{
returnSourceDocuments: true,
qaChainOptions: {
type: 'stuff',
prompt: tenant.settings.customPrompt || undefined,
},
}
)
return chain
}
async function getTenantVectorStore(tenantId: string) {
const pinecone = new PineconeClient()
await pinecone.init({
apiKey: process.env.PINECONE_API_KEY!,
environment: process.env.PINECONE_ENV!,
})
const index = pinecone.Index(process.env.PINECONE_INDEX!)
return await PineconeStore.fromExistingIndex(
new OpenAIEmbeddings({
openAIApiKey: process.env.OPENAI_API_KEY,
}),
{
pineconeIndex: index,
namespace: `tenant_${tenantId}`,
filter: { tenantId },
}
)
}
// Start server
const PORT = process.env.PORT || 3000
app.listen(PORT, () => {
console.log(`LangChain SaaS API running on port ${PORT}`)
})
// Graceful shutdown
process.on('SIGTERM', async () => {
console.log('SIGTERM received, shutting down gracefully')
// Close job queue
await jobQueue.close()
// Close database connections
await pgPool.end()
// Close Redis connection
await redis.quit()
process.exit(0)
})
Deployment and Operations
For production deployment, consider these critical aspects:
Docker Configuration
# @filename: Dockerfile
# Multi-stage build for optimized image
FROM node:18-alpine AS builder
WORKDIR /app
# Copy package files
COPY package*.json ./
COPY tsconfig.json ./
# Install dependencies
RUN npm ci --only=production
# Copy source code
COPY src ./src
# Build TypeScript
RUN npm run build
# Production stage
FROM node:18-alpine
WORKDIR /app
# Install dumb-init for proper signal handling
RUN apk add --no-cache dumb-init
# Create non-root user
RUN addgroup -g 1001 -S nodejs
RUN adduser -S nodejs -u 1001
# Copy built application
COPY --from=builder --chown=nodejs:nodejs /app/dist ./dist
COPY --from=builder --chown=nodejs:nodejs /app/node_modules ./node_modules
COPY --from=builder --chown=nodejs:nodejs /app/package*.json ./
# Switch to non-root user
USER nodejs
# Expose port
EXPOSE 3000
# Use dumb-init to handle signals
ENTRYPOINT ["dumb-init", "--"]
CMD ["node", "dist/index.js"]
Kubernetes Deployment
apiVersion: apps/v1
kind: Deployment
metadata:
name: langchain-saas-api
namespace: production
spec:
replicas: 3
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 1
maxUnavailable: 0
selector:
matchLabels:
app: langchain-saas-api
template:
metadata:
labels:
app: langchain-saas-api
spec:
containers:
- name: api
image: langchain-saas:latest
ports:
- containerPort: 3000
name: http
env:
- name: NODE_ENV
value: 'production'
- name: DATABASE_URL
valueFrom:
secretKeyRef:
name: database-credentials
key: url
- name: REDIS_URL
valueFrom:
secretKeyRef:
name: redis-credentials
key: url
- name: OPENAI_API_KEY
valueFrom:
secretKeyRef:
name: openai-credentials
key: api-key
resources:
requests:
memory: '1Gi'
cpu: '500m'
limits:
memory: '2Gi'
cpu: '1000m'
livenessProbe:
httpGet:
path: /health
port: http
initialDelaySeconds: 30
periodSeconds: 10
readinessProbe:
httpGet:
path: /health
port: http
initialDelaySeconds: 5
periodSeconds: 5
---
apiVersion: v1
kind: Service
metadata:
name: langchain-saas-api
namespace: production
spec:
selector:
app: langchain-saas-api
ports:
- port: 80
targetPort: http
protocol: TCP
type: ClusterIP
---
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: langchain-saas-api-hpa
namespace: production
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: langchain-saas-api
minReplicas: 3
maxReplicas: 20
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 80
Lessons Learned and Best Practices
After building and scaling multiple LangChain SaaS applications, here are key lessons learned:
1. Cost Management
- Monitor token usage religiously: Set up alerts for unusual spikes
- Implement smart caching: Cache embeddings and common queries
- Use appropriate models: Don’t use GPT-4 when GPT-3.5 suffices
- Batch operations: Process multiple requests together when possible
2. Performance Optimization
- Vector store partitioning: Split large indices by date or category
- Connection pooling: Maintain pools for all external services
- Async processing: Use queues for non-real-time operations
- Edge caching: Cache static responses at CDN level
3. Security Best Practices
- API key rotation: Implement automatic key rotation
- Tenant data encryption: Encrypt sensitive data at rest
- Audit logging: Log all data access and modifications
- Input validation: Sanitize all user inputs before processing
4. Operational Excellence
- Comprehensive monitoring: Track every aspect of the system
- Automated testing: Test tenant isolation regularly
- Disaster recovery: Regular backups and recovery drills
- Documentation: Keep runbooks updated
5. Customer Success
- Usage dashboards: Provide real-time usage visibility
- Cost predictability: Offer usage alerts and projections
- API documentation: Maintain excellent API docs with examples
- Support integration: Build debugging tools for support team
Conclusion
Building a production-ready LangChain SaaS requires careful attention to architecture, scaling, and operational concerns. By following the patterns and practices outlined in this guide, you can build a system that scales efficiently from your first customer to thousands while maintaining reliability, security, and cost-effectiveness.
Remember that every SaaS is unique – adapt these patterns to your specific use case and requirements. Start simple, measure everything, and iterate based on real customer needs.
For a complete deployment guide with infrastructure as code, monitoring setup, and CI/CD pipelines, check out our LangChain SaaS Deployment Guide.
Happy building!
