Cost and Latency: Planning Your LLM Rating Runs

Before you launch a full OASIS-LLM run, it is worth understanding what it will cost and how long it will take. This page gives you real numbers from the shipped pilot config, extrapolations to the full 900-image set, a cost comparison across model classes, and a formula you can use to estimate any run before you start it.

Pilot benchmark

The shipped pilot config runs 10 images × 2 dimensions × 5 samples = 100 trials through google/gemma-4-31b-it on OpenRouter, vision modality, capture_reasoning: true, cache_buster: true:

Metric	Value
Trials completed	100
Total cost	$0.00867
Cost per trial	~$0.0000867
Average latency	8.2 s / trial
p50 latency	5.7 s
Throughput at `max_concurrency: 4`	~0.49 trial/s

The image data URL dominates input tokens — the per-trial cost includes both the full image upload and a one-sentence reasoning string in the output.

Full-set extrapolation

Scaling the same model to image_set: full_900 × 2 dimensions × 5 samples = 9,000 trials:

Metric	Estimate
Total cost	~$0.78
Wall time at `max_concurrency: 4`	~2 hr
Wall time at `max_concurrency: 16`	~30 min (provider-rate-limited)

Cost by model class

Frontier models run roughly 20–100× more expensive per trial than open mid-size models. Plan your budget before choosing a model:

Model class	Pilot (100 trials)	Full set (9,000 trials)
Open mid-size (Gemma 4 31B, Qwen3-VL)	~$0.01	~$1
Open large / frontier-cheap	~$0.10–0.30	~$10–30
Frontier closed	~$1–3	~$80–300

Full-set numbers extrapolate from observed pilot per-trial cost. They are estimates, not quotes. Run a smoke test against any new model to validate cost capture before committing to a full run.

How cost is captured

Cost capture uses a two-tier fallback in the model runner:

LiteLLM cost lookup (primary)

litellm.completion_cost(completion_response=resp) consults LiteLLM’s built-in price table for the resolved model ID. This works for first-party Anthropic, OpenAI, and Google models, and for most well-known open-weights routes.

OpenRouter native usage.cost (fallback)

For OpenRouter runs, the runner injects the following before each call:

call_kwargs.setdefault("extra_body", {})
call_kwargs["extra_body"].setdefault("usage", {"include": True})

OpenRouter then returns the actual billed cost in resp.usage.cost. The runner reads it through three different access paths (attribute, model_extra, __dict__) to handle pydantic-version variations in the LiteLLM response object.

NULL (Ollama and local providers)

No price source is available for local models. cost_usd is stored as NULL. Latency and token counts are still captured for every trial.

Throughput formula

max_concurrency is the asyncio semaphore size. Effective throughput is:

\text{throughput} \approx \min\left( \frac{\text{max\_concurrency}}{\bar{\text{latency}}},\ \text{provider rate limit} \right)

For the Gemma-4-31B pilot at 8.2 s mean latency, max_concurrency: 4 yields roughly 0.49 trial/s, matching the observed numbers. Bumping concurrency further usually hits OpenRouter per-key rate limits before it speeds anything up — start at 4 and only raise it if you observe a queue building.

max_concurrency is excluded from canonical_hash, so you can adjust it between resumes without invalidating the run.

Budget worksheet

Use this formula to estimate any run before you launch it:

trials       = N_images × N_dims × samples_per_image
cost_total   ≈ trials × cost_per_trial_observed_in_smoke
wall_time    ≈ trials × mean_latency / max_concurrency

For any new model, run oasis-llm smoke <config> first — it executes 3 trials for under $0.001 and reports per-trial cost. Read cost_per_trial from oasis-llm status, then multiply by your full trial count before launching the real run.

Always run oasis-llm smoke <config> against a new model before starting a full run. It validates that authentication, cost capture, and response parsing all work correctly, and gives you a real per-trial cost to plug into the formula above.

Documentation Index

​Pilot benchmark

​Full-set extrapolation

​Cost by model class

​How cost is captured

​Throughput formula

​Budget worksheet

Pilot benchmark

Full-set extrapolation

Cost by model class

How cost is captured

Throughput formula

Budget worksheet