AI Workload Forecasting
Translate enterprise and application demand into compute curves — plan capacity, routing, and spend with confidence while protecting service levels.
KPIs
Tail Latency (P99)
99th‑percentile end‑to‑end request latency at the client boundary; higher values increase SLA risk.
Higher is worse
Endpoint Availability
Share of requests served successfully within the window.
Higher is better
SLA Breach Rate
Fraction of requests violating the contracted latency/availability criteria.
Higher is worse
Forecast Error (MAPE)
Mean Absolute Percentage Error of demand/compute forecasts over the evaluation window.
Higher is worse
Forecast Bias
Signed average forecast error as a percent; positive means systematic over‑forecasting.
Higher is worse
Capacity Shortfall Rate
Share of windows where demand exceeds effective throughput capacity.
Higher is worse
Cost to Serve
Unit delivery cost for inference (finance-ready); higher values erode margin.
Higher is worse
Obfuscated preview — sign in to view exact values
USD (millions)
Spend Variance to Budget
Percent variance of actual spend against budget for the period.
Higher is worse
Compute Efficiency Index
Composite 0–1 index rewarding healthy batching, caching, and right‑sized utilization.
Higher is better
Internal Factors
Request Backlog
Unserved requests queued at the router or endpoint.
Higher is worse
Capacity Headroom
Share of effective capacity remaining after meeting current demand (0–1).
Higher is better
Prewarmed Instance Pool
Instances kept warm to absorb bursts without cold‑start penalties.
Higher is better
Cache Hit Rate
Fraction of requests served from cache (prompt/result).
Higher is better
Batching Effectiveness Index
Normalized 0–1 index of achieved batching benefits under the latency budget.
Higher is better
Traffic Burstiness Index
How spiky arrivals are relative to median demand (0–1).
Higher is worse
Seasonality Amplitude Index
Normalized magnitude of recurring seasonal components.
Higher is worse
Arrival Dispersion Index
Over‑dispersion of arrivals vs. Poisson (Fano‑like index) normalized to 0–1.
Higher is worse
Utilization Ratio
Fraction of available compute time that is busy serving requests (0–1).
Levers
Reserved Capacity Share
Share of planned capacity sourced from reserved/committed contracts.
Autoscaling Target Utilization
Policy target for average utilization before scaling out.
Batching Latency Budget
Maximum allowed batching delay per request before dispatch.
Cache TTL
Time‑to‑live for cache entries (prompt/result cache).
Routing Fallback Enabled
Whether the router is allowed to fail over to alternate providers/models.
Admission Control Enabled
Whether load‑shedding/admission policies apply under stress.
Max Batch Size
Upper bound on number of requests grouped into a single batch.
Forecast Smoothing Window
Window size (minutes) used to smooth input signals before forecasting.
