Fine‑Tuning & Model Economics
Compare fine‑tuning, retrieval, and prompt engineering to choose the most cost‑effective customization path for your use case.
KPIs
Task Success Rate
Share of evaluation cases that meet the quality criterion.
Higher is better
Hallucination Rate
Fraction of responses flagged as unsupported or factually incorrect.
Higher is worse
Cost per Successful Request
Variable cost allocated per request that meets the quality threshold.
Higher is worse
Obfuscated preview — sign in to view exact values
USD (millions)
Effective Cost per 1K Tokens
Blended cost per 1,000 tokens including retrieval and orchestration overheads.
Higher is worse
Obfuscated preview — sign in to view exact values
USD (millions)
Latency P95
95th percentile end-to-end latency (prompting + retrieval + inference).
Higher is worse
Compliance Conformance Rate
Fraction of requests processed within allowed regions and with policy-compliant handling of sensitive data.
Higher is better
Net Savings (USD)
Baseline cost minus current cost for the same volume and task mix.
Higher is better
Obfuscated preview — sign in to view exact values
USD (millions)
Payback Period (days)
Days to recoup fine-tuning program costs via net savings.
Higher is worse
Customization Fit Index
Composite 0–1 index combining quality, cost, latency, and compliance for the chosen approach.
Higher is better
Internal Factors
Tokens per Request
Average total tokens (prompt + completion) used per request in the window.
Higher is worse
Context Utilization Ratio
Share of available context window actually used.
Higher is worse
Token Inflation Factor
Multiplier of tokens introduced by prompting/retrieval over raw input size.
Higher is worse
Retrieval Recall@K
Share of eval queries for which at least one relevant document appears in the top‑K.
Higher is better
Retrieval Usage Share
Fraction of requests that invoked retrieval as part of the response.
Higher is worse
Embedding Index Age (days)
Days since the embedding index or corpus was last refreshed.
Higher is worse
Labeled Dataset Size
Number of labeled examples available for fine‑tuning/evaluation.
Higher is better
Label Quality Score
Normalized 0–1 score capturing label accuracy/consistency.
Higher is better
Fine‑Tune Checkpoint Age (days)
Days since the fine‑tuned model checkpoint was produced.
Higher is worse
Fine‑Tuning Total Cost (USD)
Cumulative spend on fine‑tuning runs, data prep, and evaluation.
Higher is worse
Obfuscated preview — sign in to view exact values
USD (millions)
Domain Drift Index
0–1 index of distribution shift between production queries and the fine‑tune/eval corpus.
Higher is worse
PII Flag Rate
Share of requests flagged by data‑loss‑prevention (DLP) or privacy rules.
Higher is worse
In‑Region Processing Share
Share of requests executed in the region dictated by policy/jurisdiction.
Higher is better
Levers
Customization Strategy
Chosen approach to customization.
Context Window (tokens)
Maximum prompt tokens allowed per request.
Max Output Tokens
Upper bound on generated tokens per request.
Retrieval Top‑K
Number of documents retrieved per query.
Reranker Policy
Reranking model/policy applied to retrieved candidates.
LoRA Rank
Rank parameter used for Low‑Rank Adaptation during fine‑tuning.
Fine‑Tune Epochs
Number of passes over the training set.
Quantization Level
Numeric precision for serving.
Data Residency Policy
Placement rules for where data is processed.
