Tailstream TimescaleDB Migration Plan
This document outlines the migration from a Redis-only architecture to a hybrid Redis + TimescaleDB approach for Tailstream's event streaming and replay capabilities.
Current State
Architecture: Pure Redis-based streaming
- Real-time events: Redis Streams (
tailstream:stream:{orgId}
) - Event retention: 5K events (non-premium) / 50K events (premium)
- Replay capability: Limited to ~5 minutes - 50 minutes depending on traffic
- Storage: All in Redis memory (~500MB per active stream)
Limitations:
- Expensive memory requirements for longer retention
- Poor replay experience for high-traffic sites (5-50 seconds of history)
- No historical analytics beyond immediate retention window
- Scaling bottleneck at ~100+ concurrent organizations
Target State
Architecture: Hybrid Redis + TimescaleDB
- Real-time streaming: Redis (unchanged for <100ms latency)
- Historical storage: TimescaleDB for persistent event storage
- Replay capability: 3 hours (non-premium) / 7 days (premium)
- Analytics: Time-based queries with 10-100x better performance
- Cost optimization: 90% reduction in memory requirements
Data Flow
graph TD
A["HTTP POST /api/ingest/{streamId}"] --> B["IngestController"]
B --> C["ProcessLogEvent::dispatch()"]
C --> D["Redis Queue"]
D --> E["Queue Worker"]
E --> F["ProcessLogEvent::handle()"]
F --> G["Redis Streams<br/>(Real-time)"]
F --> H["TimescaleDB<br/>(Historical)"]
G --> I["Fanout Command"]
I --> J["Browser (Live)"]
H --> K["Replay Controller"]
K --> L["Browser (Historical)"]
subgraph "New Components"
H
K
L
end
subgraph "Existing (Unchanged)"
A
B
C
D
E
I
J
end
style H fill:#90EE90,stroke:#333,stroke-width:2px
style K fill:#90EE90,stroke:#333,stroke-width:2px
style L fill:#90EE90,stroke:#333,stroke-width:2px
Retention Strategy
Redis (Real-time buffer):
- Purpose: Ultra-low latency streaming to browsers
- Retention: 30-60 minutes of hot data
- Memory usage: ~50-100MB per active organization
TimescaleDB (Historical storage):
- Purpose: Replay, analytics, and long-term storage
- Retention: Time-based (3 hours / 7 days)
- Storage: Compressed, disk-based (~80% smaller than Redis)
Migration Strategy
Phase 1: Dual Write (Zero Downtime)
Goal: Add TimescaleDB storage alongside existing Redis streams
Changes:
- Add TimescaleDB schema and models
- Modify
ProcessLogEvent
job to write to both Redis AND TimescaleDB - Existing real-time streaming continues unchanged
- New replay endpoints use TimescaleDB
Risk: Minimal - existing functionality unaffected
Phase 2: Enhanced Replay
Goal: Replace Redis-based replay with TimescaleDB queries
Changes:
- Update frontend to use new replay endpoints
- Implement time-based retention policies
- Add replay streaming controller with SSE
- Enhanced analytics capabilities
Risk: Low - fallback to Redis if issues
Phase 3: Memory Optimization
Goal: Reduce Redis memory usage for cost optimization
Changes:
- Reduce Redis stream retention to 30-60 minutes
- Update monitoring and alerting
- Cost analysis and scaling adjustments
Risk: Medium - requires careful Redis retention tuning
Implementation Details
Database Schema
TimescaleDB Hypertable:
CREATE TABLE log_events (
id BIGSERIAL,
created_at TIMESTAMPTZ NOT NULL,
org_id VARCHAR(255) NOT NULL,
host VARCHAR(255),
method VARCHAR(10),
path TEXT,
status INTEGER,
response_time INTEGER,
response_size BIGINT,
user_agent TEXT,
referrer TEXT,
ip_address INET,
metadata JSONB,
PRIMARY KEY (id, created_at)
);
-- Convert to hypertable (time-partitioned)
SELECT create_hypertable('log_events', 'created_at');
-- Indexes for common queries
CREATE INDEX idx_log_events_org_time ON log_events (org_id, created_at DESC);
CREATE INDEX idx_log_events_status_time ON log_events (status, created_at DESC);
CREATE INDEX idx_log_events_host_time ON log_events (host, created_at DESC);
Retention Policies:
-- Non-premium: 3 hours
SELECT add_retention_policy('log_events', INTERVAL '3 hours');
-- Premium: 7 days (conditional based on org settings)
SELECT add_retention_policy('log_events', INTERVAL '7 days', if_not_exists => true);
Laravel Integration
LogEvent Model:
<?php
namespace App\Models;
use Illuminate\Database\Eloquent\Model;
class LogEvent extends Model
{
protected $fillable = [
'org_id', 'host', 'method', 'path', 'status', 'response_time',
'response_size', 'user_agent', 'referrer', 'ip_address', 'metadata'
];
protected $casts = [
'metadata' => 'array',
'created_at' => 'datetime',
'ip_address' => 'string',
];
public function scopeForOrg($query, $orgId)
{
return $query->where('org_id', $orgId);
}
public function scopeInTimeRange($query, $start, $end)
{
return $query->whereBetween('created_at', [$start, $end]);
}
public function scopeWithStatus($query, $statusCodes)
{
if (is_array($statusCodes)) {
return $query->whereIn('status', $statusCodes);
}
return $query->where('status', $statusCodes);
}
public function scopeOrderedByTime($query, $direction = 'asc')
{
return $query->orderBy('created_at', $direction);
}
}
Updated ProcessLogEvent Job:
<?php
namespace App\Jobs;
use App\Models\LogEvent;
use Illuminate\Bus\Queueable;
use Illuminate\Contracts\Queue\ShouldQueue;
use Illuminate\Foundation\Bus\Dispatchable;
use Illuminate\Queue\InteractsWithQueue;
use Illuminate\Queue\SerializesModels;
class ProcessLogEvent implements ShouldQueue
{
use Dispatchable, InteractsWithQueue, Queueable, SerializesModels;
public function handle()
{
// Existing Redis streaming (unchanged)
$this->writeToRedisStream();
// NEW: Also write to TimescaleDB for historical storage
$this->writeToTimescaleDB();
}
private function writeToTimescaleDB()
{
LogEvent::create([
'org_id' => $this->orgId,
'host' => $this->logData['host'] ?? null,
'method' => $this->logData['method'] ?? 'GET',
'path' => $this->logData['path'] ?? '/',
'status' => $this->logData['status'] ?? 200,
'response_time' => $this->logData['response_time'] ?? null,
'response_size' => $this->logData['response_size'] ?? null,
'user_agent' => $this->logData['user_agent'] ?? null,
'referrer' => $this->logData['referrer'] ?? null,
'ip_address' => $this->logData['ip_address'] ?? null,
'metadata' => $this->logData,
]);
}
private function writeToRedisStream()
{
// Existing Redis logic unchanged
// ...
}
}
New Replay Controller:
<?php
namespace App\Http\Controllers;
use App\Models\LogEvent;
use Carbon\Carbon;
use Illuminate\Http\Request;
use Illuminate\Support\Facades\Response;
class ReplayController extends Controller
{
public function stream(Request $request, string $orgId)
{
$this->authorize('viewStream', $orgId);
$start = Carbon::parse($request->input('start'));
$end = Carbon::parse($request->input('end'));
// Validate time range based on subscription
$this->validateTimeRange($orgId, $start, $end);
return Response::stream(function () use ($orgId, $start, $end) {
echo "data: " . json_encode(['type' => 'replay_start']) . "\n\n";
LogEvent::forOrg($orgId)
->inTimeRange($start, $end)
->orderedByTime()
->chunk(1000, function ($events) {
foreach ($events as $event) {
$data = [
'type' => 'log_event',
'timestamp' => $event->created_at->toISOString(),
'data' => $event->toArray()
];
echo "data: " . json_encode($data) . "\n\n";
if (ob_get_level()) {
ob_flush();
}
flush();
// Small delay to prevent overwhelming the browser
usleep(1000); // 1ms
}
});
echo "data: " . json_encode(['type' => 'replay_complete']) . "\n\n";
}, 200, [
'Content-Type' => 'text/event-stream',
'Cache-Control' => 'no-cache',
'Connection' => 'keep-alive',
'X-Accel-Buffering' => 'no', // Disable Nginx buffering
]);
}
private function validateTimeRange(string $orgId, Carbon $start, Carbon $end)
{
$maxRange = $this->getMaxRetentionForOrg($orgId);
$earliest = now()->sub($maxRange);
if ($start->lt($earliest)) {
throw new \InvalidArgumentException("Requested time range exceeds retention limit");
}
}
private function getMaxRetentionForOrg(string $orgId): \DateInterval
{
// Check org subscription level
// Non-premium: 3 hours, Premium: 7 days
return new \DateInterval('PT3H'); // Default: 3 hours
}
}
Environment Configuration
New Environment Variables:
# TimescaleDB Configuration
DB_CONNECTION=pgsql
DB_HOST=127.0.0.1
DB_PORT=5432
DB_DATABASE=tailstream
DB_USERNAME=tailstream
DB_PASSWORD=secret
# Retention Policies
TAILSTREAM_REDIS_RETENTION_MINUTES=60 # Hot data in Redis
TAILSTREAM_NONPREMIUM_RETENTION_HOURS=3 # Non-premium replay limit
TAILSTREAM_PREMIUM_RETENTION_DAYS=7 # Premium replay limit
# Performance Tuning
TAILSTREAM_REPLAY_CHUNK_SIZE=1000 # Events per chunk in replay
TAILSTREAM_REPLAY_DELAY_MICROSECONDS=1000 # Delay between events (1ms)
Updated Existing Variables:
# Reduce Redis memory usage
TAILSTREAM_STREAM_MAXLEN=50000 # Reduced from premium levels
TAILSTREAM_STREAM_MINID_MS=3600000 # 1 hour retention in Redis
Database Migrations
Migration: Create Log Events Table:
<?php
use Illuminate\Database\Migrations\Migration;
use Illuminate\Database\Schema\Blueprint;
use Illuminate\Support\Facades\Schema;
use Illuminate\Support\Facades\DB;
return new class extends Migration
{
public function up()
{
Schema::create('log_events', function (Blueprint $table) {
$table->id();
$table->timestampTz('created_at');
$table->string('org_id');
$table->string('host')->nullable();
$table->string('method', 10)->default('GET');
$table->text('path');
$table->integer('status');
$table->integer('response_time')->nullable();
$table->bigInteger('response_size')->nullable();
$table->text('user_agent')->nullable();
$table->text('referrer')->nullable();
$table->inet('ip_address')->nullable();
$table->json('metadata')->nullable();
// Composite primary key for TimescaleDB
$table->primary(['id', 'created_at']);
});
// Convert to TimescaleDB hypertable
DB::statement("SELECT create_hypertable('log_events', 'created_at')");
// Create indexes for common queries
DB::statement('CREATE INDEX idx_log_events_org_time ON log_events (org_id, created_at DESC)');
DB::statement('CREATE INDEX idx_log_events_status_time ON log_events (status, created_at DESC)');
DB::statement('CREATE INDEX idx_log_events_host_time ON log_events (host, created_at DESC)');
}
public function down()
{
Schema::dropIfExists('log_events');
}
};
Frontend Changes
Updated Tailstream Scene:
// resources/js/components/TailstreamScene.tsx
interface TailstreamSceneProps {
orgId: string;
mode: 'live' | 'replay';
replayStart?: string;
replayEnd?: string;
}
export function TailstreamScene({ orgId, mode, replayStart, replayEnd }: TailstreamSceneProps) {
useEffect(() => {
let eventSource: EventSource;
if (mode === 'live') {
// Existing live streaming (unchanged)
eventSource = new EventSource(`/stream/${orgId}?token=${token}`);
} else if (mode === 'replay') {
// NEW: Historical replay from TimescaleDB
const params = new URLSearchParams({
start: replayStart!,
end: replayEnd!,
token: token
});
eventSource = new EventSource(`/replay/${orgId}?${params}`);
}
eventSource.onmessage = (event) => {
const data = JSON.parse(event.data);
if (data.type === 'log_event') {
addParticleToScene(data.data);
} else if (data.type === 'replay_complete') {
onReplayComplete?.();
}
};
return () => {
eventSource?.close();
};
}, [orgId, mode, replayStart, replayEnd]);
// ... rest of component unchanged
}
Benefits Analysis
Performance Improvements
- Query speed: 10-100x faster for time-range queries
- Memory usage: 90% reduction in Redis memory requirements
- Retention capability: 3 hours → 7 days for premium users
- Analytics: Built-in time-series functions for dashboards
Cost Optimization
# Current Redis-only (100 organizations)
Redis Memory: 500 MB per org × 100 = 50 GB = ~$1,500/month
# Hybrid Redis + TimescaleDB
Redis Memory: 50 MB per org × 100 = 5 GB = ~$150/month
TimescaleDB Storage: 100 GB compressed = ~$50/month
Total: ~$200/month (87% reduction)
Scalability Improvements
- Concurrent orgs: 100 → 1000+ (limited by disk, not memory)
- Event throughput: No change (still 1000+ events/sec per worker)
- Query performance: Linear scaling with time-based partitioning
- Storage growth: Predictable and cost-effective
Risk Assessment
Low Risk Components
- Existing real-time streaming: Completely unchanged
- Redis operations: No modifications to current Redis usage
- Frontend live mode: No changes required
Medium Risk Components
- Dual writes: Additional latency (~1-2ms per event)
- Database connection pool: May need tuning for TimescaleDB
- Migration process: Requires careful coordination
High Risk Components
- Replay functionality: Complete replacement of existing replay
- Retention policies: Automated data deletion
- Query performance: Depends on proper indexing and partitioning
Mitigation Strategies
- Feature flags: Enable TimescaleDB features gradually
- Monitoring: Track dual-write latency and success rates
- Rollback plan: Ability to disable TimescaleDB writes instantly
- Load testing: Validate performance under realistic loads
Implementation Timeline
Week 1: Foundation
- [ ] Set up TimescaleDB development environment
- [ ] Create database schema and migrations
- [ ] Implement LogEvent model with basic queries
- [ ] Unit tests for model functionality
Week 2: Dual Write Integration
- [ ] Modify ProcessLogEvent job for dual writes
- [ ] Add feature flag for TimescaleDB writes
- [ ] Monitor dual-write performance and reliability
- [ ] Integration tests for dual-write functionality
Week 3: Replay Controller
- [ ] Implement replay streaming controller
- [ ] Add time range validation and authorization
- [ ] Create replay API endpoints
- [ ] Test replay performance with sample data
Week 4: Frontend Integration
- [ ] Update TailstreamScene for replay mode
- [ ] Add replay controls to UI
- [ ] Implement replay progress indicators
- [ ] End-to-end testing of replay functionality
Week 5: Production Deployment
- [ ] Deploy to staging environment
- [ ] Performance testing with production-like data
- [ ] Gradual rollout with feature flags
- [ ] Monitor performance and adjust retention policies
Week 6: Optimization & Cleanup
- [ ] Reduce Redis retention to optimize memory usage
- [ ] Performance tuning and index optimization
- [ ] Documentation and team training
- [ ] Full production rollout
Success Metrics
Performance Metrics
- Dual-write latency: < 2ms additional overhead
- Replay query speed: < 1 second for 3-hour range
- Memory reduction: 80%+ reduction in Redis usage
- Cost savings: 75%+ reduction in infrastructure costs
Functionality Metrics
- Replay availability: 99.9% uptime for replay functionality
- Data consistency: 100% consistency between Redis and TimescaleDB
- Retention accuracy: Automatic cleanup working within 1-hour precision
- User experience: No degradation in live streaming performance
Business Metrics
- Premium conversions: Increase due to better replay capabilities
- Customer satisfaction: Improved retention and analytics experience
- Operational efficiency: Reduced infrastructure management overhead
- Scalability: Support 10x more organizations without proportional cost increase
Conclusion
The migration to a hybrid Redis + TimescaleDB architecture provides:
- Immediate benefits: Better replay capabilities and cost optimization
- Future scalability: Foundation for advanced analytics and longer retention
- Risk mitigation: Gradual migration with fallback options
- Competitive advantage: Enterprise-grade features with startup efficiency
This migration positions Tailstream for sustainable growth while maintaining the real-time performance that differentiates the product in the market.