Tailstream TimescaleDB Migration Plan

This document outlines the migration from a Redis-only architecture to a hybrid Redis + TimescaleDB approach for Tailstream's event streaming and replay capabilities.

Current State

Architecture: Pure Redis-based streaming

  • Real-time events: Redis Streams (tailstream:stream:{orgId})
  • Event retention: 5K events (non-premium) / 50K events (premium)
  • Replay capability: Limited to ~5 minutes - 50 minutes depending on traffic
  • Storage: All in Redis memory (~500MB per active stream)

Limitations:

  • Expensive memory requirements for longer retention
  • Poor replay experience for high-traffic sites (5-50 seconds of history)
  • No historical analytics beyond immediate retention window
  • Scaling bottleneck at ~100+ concurrent organizations

Target State

Architecture: Hybrid Redis + TimescaleDB

  • Real-time streaming: Redis (unchanged for <100ms latency)
  • Historical storage: TimescaleDB for persistent event storage
  • Replay capability: 3 hours (non-premium) / 7 days (premium)
  • Analytics: Time-based queries with 10-100x better performance
  • Cost optimization: 90% reduction in memory requirements

Data Flow

graph TD
    A["HTTP POST /api/ingest/{streamId}"] --> B["IngestController"]
    B --> C["ProcessLogEvent::dispatch()"]
    C --> D["Redis Queue"]
    D --> E["Queue Worker"]
    E --> F["ProcessLogEvent::handle()"]
    F --> G["Redis Streams<br/>(Real-time)"]
    F --> H["TimescaleDB<br/>(Historical)"]
    G --> I["Fanout Command"]
    I --> J["Browser (Live)"]
    H --> K["Replay Controller"]
    K --> L["Browser (Historical)"]
    
    subgraph "New Components"
        H
        K
        L
    end
    
    subgraph "Existing (Unchanged)"
        A
        B
        C
        D
        E
        I
        J
    end
    
    style H fill:#90EE90,stroke:#333,stroke-width:2px
    style K fill:#90EE90,stroke:#333,stroke-width:2px
    style L fill:#90EE90,stroke:#333,stroke-width:2px

Retention Strategy

Redis (Real-time buffer):

  • Purpose: Ultra-low latency streaming to browsers
  • Retention: 30-60 minutes of hot data
  • Memory usage: ~50-100MB per active organization

TimescaleDB (Historical storage):

  • Purpose: Replay, analytics, and long-term storage
  • Retention: Time-based (3 hours / 7 days)
  • Storage: Compressed, disk-based (~80% smaller than Redis)

Migration Strategy

Phase 1: Dual Write (Zero Downtime)

Goal: Add TimescaleDB storage alongside existing Redis streams

Changes:

  1. Add TimescaleDB schema and models
  2. Modify ProcessLogEvent job to write to both Redis AND TimescaleDB
  3. Existing real-time streaming continues unchanged
  4. New replay endpoints use TimescaleDB

Risk: Minimal - existing functionality unaffected

Phase 2: Enhanced Replay

Goal: Replace Redis-based replay with TimescaleDB queries

Changes:

  1. Update frontend to use new replay endpoints
  2. Implement time-based retention policies
  3. Add replay streaming controller with SSE
  4. Enhanced analytics capabilities

Risk: Low - fallback to Redis if issues

Phase 3: Memory Optimization

Goal: Reduce Redis memory usage for cost optimization

Changes:

  1. Reduce Redis stream retention to 30-60 minutes
  2. Update monitoring and alerting
  3. Cost analysis and scaling adjustments

Risk: Medium - requires careful Redis retention tuning

Implementation Details

Database Schema

TimescaleDB Hypertable:

CREATE TABLE log_events (
    id BIGSERIAL,
    created_at TIMESTAMPTZ NOT NULL,
    org_id VARCHAR(255) NOT NULL,
    host VARCHAR(255),
    method VARCHAR(10),
    path TEXT,
    status INTEGER,
    response_time INTEGER,
    response_size BIGINT,
    user_agent TEXT,
    referrer TEXT,
    ip_address INET,
    metadata JSONB,
    
    PRIMARY KEY (id, created_at)
);

-- Convert to hypertable (time-partitioned)
SELECT create_hypertable('log_events', 'created_at');

-- Indexes for common queries
CREATE INDEX idx_log_events_org_time ON log_events (org_id, created_at DESC);
CREATE INDEX idx_log_events_status_time ON log_events (status, created_at DESC);
CREATE INDEX idx_log_events_host_time ON log_events (host, created_at DESC);

Retention Policies:

-- Non-premium: 3 hours
SELECT add_retention_policy('log_events', INTERVAL '3 hours');

-- Premium: 7 days (conditional based on org settings)
SELECT add_retention_policy('log_events', INTERVAL '7 days', if_not_exists => true);

Laravel Integration

LogEvent Model:

<?php

namespace App\Models;

use Illuminate\Database\Eloquent\Model;

class LogEvent extends Model
{
    protected $fillable = [
        'org_id', 'host', 'method', 'path', 'status', 'response_time',
        'response_size', 'user_agent', 'referrer', 'ip_address', 'metadata'
    ];

    protected $casts = [
        'metadata' => 'array',
        'created_at' => 'datetime',
        'ip_address' => 'string',
    ];

    public function scopeForOrg($query, $orgId)
    {
        return $query->where('org_id', $orgId);
    }

    public function scopeInTimeRange($query, $start, $end)
    {
        return $query->whereBetween('created_at', [$start, $end]);
    }

    public function scopeWithStatus($query, $statusCodes)
    {
        if (is_array($statusCodes)) {
            return $query->whereIn('status', $statusCodes);
        }
        return $query->where('status', $statusCodes);
    }

    public function scopeOrderedByTime($query, $direction = 'asc')
    {
        return $query->orderBy('created_at', $direction);
    }
}

Updated ProcessLogEvent Job:

<?php

namespace App\Jobs;

use App\Models\LogEvent;
use Illuminate\Bus\Queueable;
use Illuminate\Contracts\Queue\ShouldQueue;
use Illuminate\Foundation\Bus\Dispatchable;
use Illuminate\Queue\InteractsWithQueue;
use Illuminate\Queue\SerializesModels;

class ProcessLogEvent implements ShouldQueue
{
    use Dispatchable, InteractsWithQueue, Queueable, SerializesModels;

    public function handle()
    {
        // Existing Redis streaming (unchanged)
        $this->writeToRedisStream();
        
        // NEW: Also write to TimescaleDB for historical storage
        $this->writeToTimescaleDB();
    }

    private function writeToTimescaleDB()
    {
        LogEvent::create([
            'org_id' => $this->orgId,
            'host' => $this->logData['host'] ?? null,
            'method' => $this->logData['method'] ?? 'GET',
            'path' => $this->logData['path'] ?? '/',
            'status' => $this->logData['status'] ?? 200,
            'response_time' => $this->logData['response_time'] ?? null,
            'response_size' => $this->logData['response_size'] ?? null,
            'user_agent' => $this->logData['user_agent'] ?? null,
            'referrer' => $this->logData['referrer'] ?? null,
            'ip_address' => $this->logData['ip_address'] ?? null,
            'metadata' => $this->logData,
        ]);
    }

    private function writeToRedisStream()
    {
        // Existing Redis logic unchanged
        // ...
    }
}

New Replay Controller:

<?php

namespace App\Http\Controllers;

use App\Models\LogEvent;
use Carbon\Carbon;
use Illuminate\Http\Request;
use Illuminate\Support\Facades\Response;

class ReplayController extends Controller
{
    public function stream(Request $request, string $orgId)
    {
        $this->authorize('viewStream', $orgId);

        $start = Carbon::parse($request->input('start'));
        $end = Carbon::parse($request->input('end'));
        
        // Validate time range based on subscription
        $this->validateTimeRange($orgId, $start, $end);

        return Response::stream(function () use ($orgId, $start, $end) {
            echo "data: " . json_encode(['type' => 'replay_start']) . "\n\n";
            
            LogEvent::forOrg($orgId)
                ->inTimeRange($start, $end)
                ->orderedByTime()
                ->chunk(1000, function ($events) {
                    foreach ($events as $event) {
                        $data = [
                            'type' => 'log_event',
                            'timestamp' => $event->created_at->toISOString(),
                            'data' => $event->toArray()
                        ];
                        
                        echo "data: " . json_encode($data) . "\n\n";
                        
                        if (ob_get_level()) {
                            ob_flush();
                        }
                        flush();
                        
                        // Small delay to prevent overwhelming the browser
                        usleep(1000); // 1ms
                    }
                });
            
            echo "data: " . json_encode(['type' => 'replay_complete']) . "\n\n";
        }, 200, [
            'Content-Type' => 'text/event-stream',
            'Cache-Control' => 'no-cache',
            'Connection' => 'keep-alive',
            'X-Accel-Buffering' => 'no', // Disable Nginx buffering
        ]);
    }

    private function validateTimeRange(string $orgId, Carbon $start, Carbon $end)
    {
        $maxRange = $this->getMaxRetentionForOrg($orgId);
        $earliest = now()->sub($maxRange);
        
        if ($start->lt($earliest)) {
            throw new \InvalidArgumentException("Requested time range exceeds retention limit");
        }
    }

    private function getMaxRetentionForOrg(string $orgId): \DateInterval
    {
        // Check org subscription level
        // Non-premium: 3 hours, Premium: 7 days
        return new \DateInterval('PT3H'); // Default: 3 hours
    }
}

Environment Configuration

New Environment Variables:

# TimescaleDB Configuration
DB_CONNECTION=pgsql
DB_HOST=127.0.0.1
DB_PORT=5432
DB_DATABASE=tailstream
DB_USERNAME=tailstream  
DB_PASSWORD=secret

# Retention Policies
TAILSTREAM_REDIS_RETENTION_MINUTES=60          # Hot data in Redis
TAILSTREAM_NONPREMIUM_RETENTION_HOURS=3        # Non-premium replay limit
TAILSTREAM_PREMIUM_RETENTION_DAYS=7            # Premium replay limit

# Performance Tuning
TAILSTREAM_REPLAY_CHUNK_SIZE=1000              # Events per chunk in replay
TAILSTREAM_REPLAY_DELAY_MICROSECONDS=1000      # Delay between events (1ms)

Updated Existing Variables:

# Reduce Redis memory usage
TAILSTREAM_STREAM_MAXLEN=50000    # Reduced from premium levels
TAILSTREAM_STREAM_MINID_MS=3600000  # 1 hour retention in Redis

Database Migrations

Migration: Create Log Events Table:

<?php

use Illuminate\Database\Migrations\Migration;
use Illuminate\Database\Schema\Blueprint;
use Illuminate\Support\Facades\Schema;
use Illuminate\Support\Facades\DB;

return new class extends Migration
{
    public function up()
    {
        Schema::create('log_events', function (Blueprint $table) {
            $table->id();
            $table->timestampTz('created_at');
            $table->string('org_id');
            $table->string('host')->nullable();
            $table->string('method', 10)->default('GET');
            $table->text('path');
            $table->integer('status');
            $table->integer('response_time')->nullable();
            $table->bigInteger('response_size')->nullable();
            $table->text('user_agent')->nullable();
            $table->text('referrer')->nullable();
            $table->inet('ip_address')->nullable();
            $table->json('metadata')->nullable();
            
            // Composite primary key for TimescaleDB
            $table->primary(['id', 'created_at']);
        });

        // Convert to TimescaleDB hypertable
        DB::statement("SELECT create_hypertable('log_events', 'created_at')");
        
        // Create indexes for common queries
        DB::statement('CREATE INDEX idx_log_events_org_time ON log_events (org_id, created_at DESC)');
        DB::statement('CREATE INDEX idx_log_events_status_time ON log_events (status, created_at DESC)');
        DB::statement('CREATE INDEX idx_log_events_host_time ON log_events (host, created_at DESC)');
    }

    public function down()
    {
        Schema::dropIfExists('log_events');
    }
};

Frontend Changes

Updated Tailstream Scene:

// resources/js/components/TailstreamScene.tsx

interface TailstreamSceneProps {
  orgId: string;
  mode: 'live' | 'replay';
  replayStart?: string;
  replayEnd?: string;
}

export function TailstreamScene({ orgId, mode, replayStart, replayEnd }: TailstreamSceneProps) {
  useEffect(() => {
    let eventSource: EventSource;
    
    if (mode === 'live') {
      // Existing live streaming (unchanged)
      eventSource = new EventSource(`/stream/${orgId}?token=${token}`);
    } else if (mode === 'replay') {
      // NEW: Historical replay from TimescaleDB
      const params = new URLSearchParams({
        start: replayStart!,
        end: replayEnd!,
        token: token
      });
      
      eventSource = new EventSource(`/replay/${orgId}?${params}`);
    }
    
    eventSource.onmessage = (event) => {
      const data = JSON.parse(event.data);
      
      if (data.type === 'log_event') {
        addParticleToScene(data.data);
      } else if (data.type === 'replay_complete') {
        onReplayComplete?.();
      }
    };
    
    return () => {
      eventSource?.close();
    };
  }, [orgId, mode, replayStart, replayEnd]);
  
  // ... rest of component unchanged
}

Benefits Analysis

Performance Improvements

  • Query speed: 10-100x faster for time-range queries
  • Memory usage: 90% reduction in Redis memory requirements
  • Retention capability: 3 hours → 7 days for premium users
  • Analytics: Built-in time-series functions for dashboards

Cost Optimization

# Current Redis-only (100 organizations)
Redis Memory: 500 MB per org × 100 = 50 GB = ~$1,500/month

# Hybrid Redis + TimescaleDB  
Redis Memory: 50 MB per org × 100 = 5 GB = ~$150/month
TimescaleDB Storage: 100 GB compressed = ~$50/month
Total: ~$200/month (87% reduction)

Scalability Improvements

  • Concurrent orgs: 100 → 1000+ (limited by disk, not memory)
  • Event throughput: No change (still 1000+ events/sec per worker)
  • Query performance: Linear scaling with time-based partitioning
  • Storage growth: Predictable and cost-effective

Risk Assessment

Low Risk Components

  • Existing real-time streaming: Completely unchanged
  • Redis operations: No modifications to current Redis usage
  • Frontend live mode: No changes required

Medium Risk Components

  • Dual writes: Additional latency (~1-2ms per event)
  • Database connection pool: May need tuning for TimescaleDB
  • Migration process: Requires careful coordination

High Risk Components

  • Replay functionality: Complete replacement of existing replay
  • Retention policies: Automated data deletion
  • Query performance: Depends on proper indexing and partitioning

Mitigation Strategies

  1. Feature flags: Enable TimescaleDB features gradually
  2. Monitoring: Track dual-write latency and success rates
  3. Rollback plan: Ability to disable TimescaleDB writes instantly
  4. Load testing: Validate performance under realistic loads

Implementation Timeline

Week 1: Foundation

  • [ ] Set up TimescaleDB development environment
  • [ ] Create database schema and migrations
  • [ ] Implement LogEvent model with basic queries
  • [ ] Unit tests for model functionality

Week 2: Dual Write Integration

  • [ ] Modify ProcessLogEvent job for dual writes
  • [ ] Add feature flag for TimescaleDB writes
  • [ ] Monitor dual-write performance and reliability
  • [ ] Integration tests for dual-write functionality

Week 3: Replay Controller

  • [ ] Implement replay streaming controller
  • [ ] Add time range validation and authorization
  • [ ] Create replay API endpoints
  • [ ] Test replay performance with sample data

Week 4: Frontend Integration

  • [ ] Update TailstreamScene for replay mode
  • [ ] Add replay controls to UI
  • [ ] Implement replay progress indicators
  • [ ] End-to-end testing of replay functionality

Week 5: Production Deployment

  • [ ] Deploy to staging environment
  • [ ] Performance testing with production-like data
  • [ ] Gradual rollout with feature flags
  • [ ] Monitor performance and adjust retention policies

Week 6: Optimization & Cleanup

  • [ ] Reduce Redis retention to optimize memory usage
  • [ ] Performance tuning and index optimization
  • [ ] Documentation and team training
  • [ ] Full production rollout

Success Metrics

Performance Metrics

  • Dual-write latency: < 2ms additional overhead
  • Replay query speed: < 1 second for 3-hour range
  • Memory reduction: 80%+ reduction in Redis usage
  • Cost savings: 75%+ reduction in infrastructure costs

Functionality Metrics

  • Replay availability: 99.9% uptime for replay functionality
  • Data consistency: 100% consistency between Redis and TimescaleDB
  • Retention accuracy: Automatic cleanup working within 1-hour precision
  • User experience: No degradation in live streaming performance

Business Metrics

  • Premium conversions: Increase due to better replay capabilities
  • Customer satisfaction: Improved retention and analytics experience
  • Operational efficiency: Reduced infrastructure management overhead
  • Scalability: Support 10x more organizations without proportional cost increase

Conclusion

The migration to a hybrid Redis + TimescaleDB architecture provides:

  1. Immediate benefits: Better replay capabilities and cost optimization
  2. Future scalability: Foundation for advanced analytics and longer retention
  3. Risk mitigation: Gradual migration with fallback options
  4. Competitive advantage: Enterprise-grade features with startup efficiency

This migration positions Tailstream for sustainable growth while maintaining the real-time performance that differentiates the product in the market.