Production Bug Fixes: Authentication, Templates & SQL
Production Bug Fixes: Authentication, Templates & SQL
Production debugging requires systematic investigation, patience, and understanding of how system components interact. This article details three critical bug categories I encountered while building an email server, including the investigation process, root causes, fixes, and lessons learned.
Table of Contents
- Bug #1: User Creation Crash - Non-Existent Email Column
- Bug #2: Admin Panel SQL Errors - Schema Drift
- Bug #3: Template Isolation Issues - Content Injection
- Debugging Techniques Used
- Testing Strategies to Prevent Regressions
- Production Lessons Learned
Bug #1: User Creation Crash - Non-Existent Email Column
The Symptom
When creating new users through the admin panel, the server crashed with a SQL error:
ERROR: column "email" does not exist
This prevented any user creation, making the authentication system unusable.
Root Cause Analysis
Commit 465cfa5 - The issue stemmed from a SQL schema mismatch between the database structure and the application code.
// @filename: main.go
// ❌ BEFORE: Trying to insert non-existent email column
result, err := a.db.ExecContext(ctx, `
INSERT INTO users (domain_id, username, password_hash, email, is_active, created_at, updated_at)
VALUES (?, ?, ?, ?, TRUE, CURRENT_TIMESTAMP, CURRENT_TIMESTAMP)
`, domainID, strings.ToLower(username), passwordHash, email)
The users table schema used a composite key approach with username and domain_id, not a separate email column. The email address should be constructed from these two fields.
The Fix
Two changes were needed:
- Remove the non-existent email column from the INSERT statement
// @filename: main.go
// ✅ AFTER: Remove email column from INSERT
result, err := a.db.ExecContext(ctx, `
INSERT INTO users (domain_id, username, password_hash, is_active, created_at, updated_at)
VALUES (?, ?, ?, TRUE, CURRENT_TIMESTAMP, CURRENT_TIMESTAMP)
`, domainID, strings.ToLower(username), passwordHash)
- Construct email in application code
// @filename: main.go
// ✅ AFTER: Construct email from username + domain
user := &User{
ID: userID,
DomainID: domainID,
Username: strings.ToLower(username),
Domain: domainName,
Email: fmt.Sprintf("%s@%s", strings.ToLower(username), domainName),
IsActive: true,
}
- Fix admin flag update to use user ID instead of username
// ✅ AFTER: Use user ID instead of username
s.db.ExecContext(r.Context(), "UPDATE users SET is_admin = TRUE WHERE id = ?", user.ID)
Database Migration Lessons
| Issue | Lesson |
|---|---|
| Schema documentation | Keep a canonical schema file that all code references |
| Column existence checks | Verify column existence before queries in production |
| Composite keys | Be explicit about when derived fields should exist in DB vs. be computed |
| Migration testing | Always test migrations against production-like data |
Bug #2: Admin Panel SQL Errors - Schema Drift
The Symptom
The admin panel’s user list and user edit pages failed with SQL errors, making user management impossible.
Root Cause Analysis
Commit 5c0eb66 - Multiple queries referenced the non-existent u.email column, causing systematic failures across the admin interface.
// @filename: query.sql
// ❌ BEFORE: Querying non-existent email column
rows, err := s.db.QueryContext(r.Context(), `
SELECT u.id, u.username, d.name as domain, u.email, u.is_admin, u.created_at
FROM users u
JOIN domains d ON u.domain_id = d.id
ORDER BY u.email
`)
This was schema drift: the database schema evolved but the queries weren’t updated accordingly.
The Fix
1. Remove email from SELECT clause and construct in code
// @filename: query.sql
// ✅ AFTER: Remove email from query
rows, err := s.db.QueryContext(r.Context(), `
SELECT u.id, u.username, d.name as domain, u.is_admin, u.created_at
FROM users u
JOIN domains d ON u.domain_id = d.id
ORDER BY d.name, u.username
`)
// Construct email in application layer
for rows.Next() {
var u User
if err := rows.Scan(&u.ID, &u.Username, &u.Domain, &u.IsAdmin, &u.CreatedAt); err != nil {
continue
}
u.Email = u.Username + "@" + u.Domain
users = append(users, u)
}
2. Fix user edit handler query
// @filename: query.sql
// ❌ BEFORE: Querying non-existent email column
err := s.db.QueryRowContext(r.Context(),
"SELECT username, email, is_admin FROM users WHERE id = ?", userID).
Scan(&username, &email, &isAdmin)
// ✅ AFTER: Join with domains table
var username, domain string
var isAdmin bool
err := s.db.QueryRowContext(r.Context(),
`SELECT u.username, d.name, u.is_admin FROM users u
JOIN domains d ON u.domain_id = d.id WHERE u.id = ?`, userID).
Scan(&username, &domain, &isAdmin)
// Construct email
Email: username + "@" + domain
Schema Drift Prevention
| Strategy | Implementation |
|---|---|
| Schema as code | Store schema definition in version control alongside application code |
| Query validation | Run integration tests that execute all queries against the actual schema |
| Database migrations | Use migration tools that track schema changes (Go: migrate, golang-migrate) |
| Schema comparison | Periodically compare production schema to source-of-truth |
Bug #3: Template Isolation Issues - Content Injection
The Symptom
When rendering different admin panel pages, the content from one page would appear on another page. This was a critical security vulnerability as it could lead to XSS attacks and data exposure.
Root Cause Analysis
The issue stemmed from Go template composition behavior. When using {'{'}define{'}'} blocks in page templates and composing them with a base template, Go’s template package would cache template definitions globally, causing cross-page contamination.
Initial approach (816bca5): Parse base template first, then each page into a fresh template.
// @filename: main.go
// ❌ ATTEMPT 1: Parse base then page
tmpl := template.New("base")
tmpl, err = tmpl.Parse(string(baseContent)) // Parse base first
pageContent, _ := templatesFS.ReadFile("templates/" + page)
tmpl, err = tmpl.Parse(string(pageContent)) // Then page
This failed because templates were still sharing a namespace.
Second attempt (89a6e4a): Parse page first (which defines {'{'}define "content"{'}'}), then base.
// @filename: main.go
// ❌ ATTEMPT 2: Parse page then base
tmpl := template.New(page)
pageContent, _ := templatesFS.ReadFile("templates/" + page)
tmpl, err = tmpl.Parse(string(pageContent)) // Parse page first
tmpl, err = tmpl.Parse(string(baseContent)) // Then base
This also failed because Go templates still cached definitions globally.
Final solution (b964975): String replacement instead of template composition.
The Three-Attempt Journey
Commit 816bca5 - First Attempt: Separate Template Namespaces
// @filename: main.go
// Parse each page template separately
for _, page := range pages {
tmpl := template.New("base")
tmpl, err = tmpl.Parse(string(baseContent))
pageContent, err := templatesFS.ReadFile("templates/" + page)
tmpl, err = tmpl.Parse(string(pageContent))
templates[page] = tmpl
}
Result: Failed - templates still contaminated each other
Commit 89a6e4a - Second Attempt: Parse Order Swap
// @filename: main.go
// Parse page first, then base
for _, page := range pages {
tmpl := template.New(page)
pageContent, _ := templatesFS.ReadFile("templates/" + page)
tmpl, err = tmpl.Parse(string(pageContent)) // Defines "content"
tmpl, err = tmpl.Parse(string(baseContent)) // Uses "content"
templates[page] = tmpl
}
Changed base.html from {{block "content"}} to {{template "content"}}
Result: Failed - Go template global caching still caused issues
Commit b964975 - Final Solution: String Replacement
// @filename: main.go
// ✅ SOLUTION: Use string replacement for complete isolation
baseStr := string(baseContent)
for _, page := range pages {
pageContent, _ := templatesFS.ReadFile("templates/" + page)
// Replace placeholder in base with page content
combined := strings.Replace(baseStr, "<!-- CONTENT_PLACEHOLDER -->", string(pageContent), 1)
// Parse combined template
tmpl, err := template.New(page).Parse(combined)
templates[page] = tmpl
}
Updated base.html:
<!-- ✅ Use HTML comment as placeholder -->
<main class="container">
<!-- CONTENT_PLACEHOLDER -->
</main>
Removed {define} blocks from all page templates:
<!-- ✅ Page template: Direct content, no define block (using {'{'}define{'}'} syntax) -->
<div class="page-header">
<h1>Users</h1>
<a href="/admin/users/add" class="btn btn-primary">Add User</a>
</div>
<!-- ... rest of content ... -->
Security Implications: XSS Prevention
The template isolation issue had serious security implications:
| Risk | Impact | Prevention |
|---|---|---|
| Content injection | User data from one page appears on another | String replacement ensures each template is self-contained |
| XSS vulnerabilities | Malicious scripts could execute | Go’s html/template still auto-escapes HTML, preventing XSS |
| Data leakage | Users could see admin content from other pages | Isolated templates prevent cross-page data exposure |
Why string replacement is more secure:
- Complete isolation - Each rendered page has exactly the content intended
- No shared template namespace - No risk of contamination between pages
- Predictable behavior - No hidden caching mechanisms causing issues
- Explicit content - Content is directly inserted, not defined then referenced
Debugging Techniques Used
1. Log Analysis
The first clue came from error logs:
// Check logs for SQL errors
s.logger.ErrorContext(r.Context(), "Failed to create user", err)
Action: Extracted the exact SQL error message: column "email" does not exist
2. Schema Verification
After identifying the column error, I verified the actual database schema:
-- Inspect actual table structure
\d users
Result: Confirmed users table has id, username, domain_id, password_hash, but no email column
3. Code Review
Searched for all references to the problematic column:
# @filename: script.sh
# Find all SQL queries using 'email' column
grep -r "u.email" internal/
grep -r "\.email" internal/admin/
Result: Found 7 locations where u.email was referenced
4. Git Bisect
For the template issue, used git bisect to find when the problem started:
# @filename: script.sh
git bisect start
git bisect bad HEAD # Current version has bug
git bisect good <working-commit>
git bisect run ./test-template-isolation.sh
5. Minimal Reproduction
Created a minimal test case to reproduce the template isolation issue:
// @filename: handlers.go
func TestTemplateIsolation(t *testing.T) {
base := `<html><body>{{define "content"}}{{end}}</body></html>`
page1 := `{{define "content"}}Page 1{{end}}`
page2 := `{{define "content"}}Page 2{{end}}`
// Test if rendering page1 shows page2 content
tmpl, _ := template.New("test").Parse(base)
tmpl.Parse(page1)
tmpl.Parse(page2)
var buf1 bytes.Buffer
tmpl.ExecuteTemplate(&buf1, "content", nil)
// This revealed the contamination issue
}
6. Step-by-Step Debugging
For complex template issues, added debug logging at each step:
// @filename: main.go
s.logger.Debug("Parsed base template", "content", string(baseContent))
s.logger.Debug("Parsed page template", "page", page, "content", string(pageContent))
s.logger.Debug("Combined template", "combined", string(combined))
Testing Strategies to Prevent Regressions
1. Integration Tests with Real Database
// @filename: handlers.go
func TestCreateUser_WithRealSchema(t *testing.T) {
// Set up test database with actual schema
db := setupTestDB(t)
defer db.Close()
auth := NewAuthenticator(db, logger)
// Test user creation
user, err := auth.CreateUser(context.Background(), "testuser", "password123", 1)
if err != nil {
t.Fatalf("CreateUser failed: %v", err)
}
// Verify user was created correctly
if user.Email != "testuser@example.com" {
t.Errorf("Email mismatch: got %s, want testuser@example.com", user.Email)
}
}
2. Query Validation Tests
// @filename: query.sql
func TestAllAdminQueries_RunAgainstSchema(t *testing.T) {
db := setupTestDB(t)
queries := []string{
"SELECT u.id, u.username, d.name as domain, u.is_admin FROM users u JOIN domains d ON u.domain_id = d.id",
"SELECT u.username, d.name, u.is_admin FROM users u JOIN domains d ON u.domain_id = d.id WHERE u.id = ?",
}
for _, query := range queries {
_, err := db.Exec(query)
if err != nil {
t.Errorf("Query failed: %s\nError: %v", query, err)
}
}
}
3. Template Isolation Tests
// @filename: handlers.go
func TestTemplateIsolation_NoCrossContamination(t *testing.T) {
pages := []string{"users.html", "domains.html", "dashboard.html"}
for _, page := range pages {
tmpl := templates[page]
// Render with test data
var buf bytes.Buffer
err := tmpl.Execute(&buf, map[string]interface{}{"Title": "Test"})
if err != nil {
t.Fatalf("Failed to render %s: %v", page, err)
}
// Verify output doesn't contain content from other pages
rendered := buf.String()
for _, otherPage := range pages {
if otherPage != page {
// Check for unique content markers
if strings.Contains(rendered, "unique-to-"+otherPage) {
t.Errorf("%s contains content from %s", page, otherPage)
}
}
}
}
}
4. Security Tests for Template Output
// @filename: handlers.go
func TestTemplateOutput_SafeFromXSS(t *testing.T) {
// Test with malicious input
maliciousInput := `<script>alert('XSS')</script>`
tmpl := templates["user_edit.html"]
var buf bytes.Buffer
err := tmpl.Execute(&buf, map[string]interface{}{
"Username": maliciousInput,
})
if err != nil {
t.Fatalf("Template execution failed: %v", err)
}
// Verify script tags are escaped
if strings.Contains(buf.String(), `<script>`) {
t.Errorf("Template did not escape HTML: %s", buf.String())
}
// Should contain escaped version
if !strings.Contains(buf.String(), `<script>`) {
t.Errorf("Expected escaped HTML in output")
}
}
5. Schema Migration Tests
// @filename: query.sql
func TestSchemaMigration_VerifyTableStructure(t *testing.T) {
db := setupTestDB(t)
// Get table structure
rows, _ := db.Query(`
SELECT column_name, data_type
FROM information_schema.columns
WHERE table_name = 'users'
`)
expectedColumns := map[string]string{
"id": "integer",
"username": "text",
"domain_id": "integer",
"password_hash": "text",
"is_active": "boolean",
}
// Verify each expected column exists
for rows.Next() {
var name, dataType string
rows.Scan(&name, &dataType)
if expectedType, ok := expectedColumns[name]; ok {
if dataType != expectedType {
t.Errorf("Column %s: got type %s, want %s", name, dataType, expectedType)
}
delete(expectedColumns, name)
}
}
// Verify no missing columns
if len(expectedColumns) > 0 {
t.Errorf("Missing columns: %v", expectedColumns)
}
// Verify email column does NOT exist
var emailExists bool
db.QueryRow(`
SELECT EXISTS(
SELECT 1 FROM information_schema.columns
WHERE table_name = 'users' AND column_name = 'email'
)
`).Scan(&emailExists)
if emailExists {
t.Error("email column should not exist in users table")
}
}
6. End-to-End Tests
// @filename: query.sql
func TestUserCreation_E2E(t *testing.T) {
// Start test server
server := setupTestServer(t)
defer server.Close()
// Create user via admin API
resp, err := http.PostForm(server.URL+"/admin/users/add", url.Values{
"username": {"testuser"},
"password": {"password123"},
"domain_id": {"1"},
"is_admin": {"true"},
})
if err != nil {
t.Fatalf("Failed to create user: %v", err)
}
if resp.StatusCode != http.StatusSeeOther {
t.Errorf("Expected redirect, got %d", resp.StatusCode)
}
// Verify user exists in database
var count int
server.DB.QueryRow("SELECT COUNT(*) FROM users WHERE username = ?", "testuser").Scan(&count)
if count != 1 {
t.Errorf("Expected 1 user, got %d", count)
}
}
Production Lessons Learned
1. Database Schema as Single Source of Truth
Lesson: Keep schema definition in version control and make all code reference it.
Implementation:
// @filename: schema.sql
// schema/schema.go - Single source of truth
const UsersTableSchema = `
CREATE TABLE IF NOT EXISTS users (
id SERIAL PRIMARY KEY,
domain_id INTEGER NOT NULL REFERENCES domains(id),
username TEXT NOT NULL,
password_hash TEXT NOT NULL,
is_active BOOLEAN DEFAULT TRUE,
is_admin BOOLEAN DEFAULT FALSE,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
UNIQUE(domain_id, username)
);
`
// Verify schema matches expectations
func VerifySchema(db *sql.DB) error {
var emailColumnExists bool
err := db.QueryRow(`
SELECT EXISTS(
SELECT 1 FROM information_schema.columns
WHERE table_name = 'users' AND column_name = 'email'
)
`).Scan(&emailColumnExists)
if err != nil {
return err
}
if emailColumnExists {
return fmt.Errorf("unexpected 'email' column in users table")
}
return nil
}
2. Defensive Programming in Database Operations
Lesson: Always validate database operations and handle errors gracefully.
// @filename: query.sql
func (a *Authenticator) CreateUser(ctx context.Context, username, password string, domainID int64) (*User, error) {
// Validate inputs
if username == "" || password == "" {
return nil, fmt.Errorf("username and password required")
}
// Verify domain exists
var domainName string
err := a.db.QueryRowContext(ctx, "SELECT name FROM domains WHERE id = ?", domainID).
Scan(&domainName)
if err == sql.ErrNoRows {
return nil, ErrDomainNotFound
}
if err != nil {
return nil, fmt.Errorf("failed to verify domain: %w", err)
}
// Prepare password hash
passwordHash, err := bcrypt.GenerateFromPassword([]byte(password), bcrypt.DefaultCost)
if err != nil {
return nil, fmt.Errorf("failed to hash password: %w", err)
}
// Insert user - use transaction
tx, err := a.db.BeginTx(ctx, nil)
if err != nil {
return nil, fmt.Errorf("failed to begin transaction: %w", err)
}
defer tx.Rollback()
result, err := tx.ExecContext(ctx, `
INSERT INTO users (domain_id, username, password_hash, is_active, created_at, updated_at)
VALUES (?, ?, ?, TRUE, CURRENT_TIMESTAMP, CURRENT_TIMESTAMP)
`, domainID, strings.ToLower(username), passwordHash)
if err != nil {
return nil, fmt.Errorf("failed to create user: %w", err)
}
// Get inserted user ID
userID, _ := result.LastInsertId()
// Commit transaction
if err := tx.Commit(); err != nil {
return nil, fmt.Errorf("failed to commit: %w", err)
}
// Construct user object
return &User{
ID: userID,
DomainID: domainID,
Username: strings.ToLower(username),
Domain: domainName,
Email: fmt.Sprintf("%s@%s", strings.ToLower(username), domainName),
IsActive: true,
}, nil
}
3. Template System Design Patterns
Lesson: Prefer simple, explicit approaches over complex template composition.
Anti-pattern to avoid:
// @filename: main.go
// ❌ DON'T: Complex template composition
baseTmpl := template.Must(template.ParseFS(fs, "base.html"))
pageTmpl := template.Must(baseTmpl.Clone())
pageTmpl.ParseFS(fs, "page.html")
pageTmpl.ExecuteTemplate(w, "content", data)
Recommended pattern:
// @filename: main.go
// ✅ DO: String replacement for complete isolation
baseContent, _ := fs.ReadFile("base.html")
pageContent, _ := fs.ReadFile("page.html")
combined := strings.Replace(
string(baseContent),
"<!-- CONTENT_PLACEHOLDER -->",
string(pageContent),
1,
)
tmpl := template.Must(template.New("page").Parse(combined))
tmpl.Execute(w, data)
4. Comprehensive Error Logging
Lesson: Log enough context to debug production issues without exposing secrets.
// @filename: main.go
type Server struct {
logger *logging.Logger
}
func (s *Server) handleUserAdd(w http.ResponseWriter, r *http.Request) {
username := r.FormValue("username")
domainID := r.FormValue("domain_id")
// Log attempt with context (but NOT password)
s.logger.InfoContext(r.Context(), "User creation attempt",
"username", username,
"domain_id", domainID,
"remote_addr", r.RemoteAddr,
)
// Try to create user
user, err := s.authenticator.CreateUser(r.Context(), username, password, domainID)
if err != nil {
// Log full error for debugging
s.logger.ErrorContext(r.Context(), "Failed to create user", err,
"username", username,
"domain_id", domainID,
)
// Return user-friendly message
http.Error(w, "Failed to create user: "+err.Error(), http.StatusInternalServerError)
return
}
// Log success with user ID (not email/username for privacy)
s.logger.InfoContext(r.Context(), "User created successfully",
"user_id", user.ID,
)
}
5. Migration-First Development
Lesson: Always write database migrations before application code.
# @filename: script.sh
# Development workflow
1. Write migration file
2. Run migration on local database
3. Write application code using new schema
4. Write tests against migrated schema
5. Run tests
6. Commit migration and code together
Example migration:
-- migrations/002_remove_email_column.up.sql
ALTER TABLE users DROP COLUMN IF EXISTS email;
-- migrations/002_remove_email_column.down.sql
ALTER TABLE users ADD COLUMN email TEXT;
CREATE INDEX idx_users_email ON users(email);
6. Health Checks for Critical Components
Lesson: Implement health checks that verify database schema and critical queries.
// @filename: query.sql
func (s *Server) healthCheck(w http.ResponseWriter, r *http.Request) {
checks := struct {
Database bool `json:"database"`
Schema bool `json:"schema"`
Queries bool `json:"queries"`
}{}
// Check database connection
err := s.db.PingContext(r.Context())
checks.Database = (err == nil)
// Check schema
err = VerifySchema(s.db)
checks.Schema = (err == nil)
// Check critical queries
err = s.db.QueryRowContext(r.Context(),
"SELECT EXISTS(SELECT 1 FROM users LIMIT 1)").Scan(&new(bool))
checks.Queries = (err == nil)
// Return 503 if any check fails
if !(checks.Database && checks.Schema && checks.Queries) {
w.WriteHeader(http.StatusServiceUnavailable)
}
json.NewEncoder(w).Encode(checks)
}
7. Rollback Plans for Schema Changes
Lesson: Always have a rollback plan before deploying schema changes.
// @filename: main.go
type Migration struct {
Name string
Up func(*sql.DB) error
Down func(*sql.DB) error
CanRollback bool
}
var migrations = []Migration{
{
Name: "remove_email_column",
Up: func(db *sql.DB) error {
_, err := db.Exec("ALTER TABLE users DROP COLUMN IF EXISTS email")
return err
},
Down: func(db *sql.DB) error {
_, err := db.Exec("ALTER TABLE users ADD COLUMN email TEXT")
if err != nil {
return err
}
// Populate email from username + domain
_, err = db.Exec(`
UPDATE users u
SET email = u.username || '@' || d.name
FROM domains d
WHERE u.domain_id = d.id
`)
return err
},
CanRollback: true,
},
}
Summary: Key Takeaways
| Bug Category | Root Cause | Fix | Prevention |
|---|---|---|---|
| User Creation Crash | SQL schema mismatch - code referenced non-existent email column | Remove column from INSERT, construct email in code | Keep schema in version control, verify queries against schema |
| Admin Panel SQL Errors | Schema drift - queries not updated after schema evolution | Update all queries to match actual schema, construct derived fields in code | Schema-as-code, query validation tests, migration tests |
| Template Isolation | Go template global caching causing cross-contamination | String replacement instead of template composition | Unit tests for template isolation, security tests for XSS |
Production Debugging Checklist
- Extract exact error messages from logs
- Verify actual database schema vs. code assumptions
- Search codebase for all references to problematic fields
- Create minimal reproduction case
- Add debug logging at each step
- Test fix against production-like data
- Add integration tests to prevent regression
- Document schema assumptions in code
Testing Recommendations
- Schema verification tests - Run on every deployment
- Query validation tests - Verify all queries against actual schema
- Template isolation tests - Ensure no cross-page contamination
- Security tests - Verify XSS prevention in template output
- End-to-end tests - Test complete workflows
- Health check tests - Verify database and schema availability
These debugging experiences taught me that production bugs often stem from mismatches between assumptions and reality - whether it’s database schema, template behavior, or component interaction. The key is systematic investigation, comprehensive testing, and defensive programming to prevent similar issues in the future.
