Before you launch a full OASIS-LLM run, it is worth understanding what it will cost and how long it will take. This page gives you real numbers from the shipped pilot config, extrapolations to the full 900-image set, a cost comparison across model classes, and a formula you can use to estimate any run before you start it.Documentation Index
Fetch the complete documentation index at: https://dcpma.mintlify.app/llms.txt
Use this file to discover all available pages before exploring further.
Pilot benchmark
The shipped pilot config runs 10 images × 2 dimensions × 5 samples = 100 trials throughgoogle/gemma-4-31b-it on OpenRouter, vision modality, capture_reasoning: true, cache_buster: true:
| Metric | Value |
|---|---|
| Trials completed | 100 |
| Total cost | $0.00867 |
| Cost per trial | ~$0.0000867 |
| Average latency | 8.2 s / trial |
| p50 latency | 5.7 s |
Throughput at max_concurrency: 4 | ~0.49 trial/s |
Full-set extrapolation
Scaling the same model toimage_set: full_900 × 2 dimensions × 5 samples = 9,000 trials:
| Metric | Estimate |
|---|---|
| Total cost | ~$0.78 |
Wall time at max_concurrency: 4 | ~2 hr |
Wall time at max_concurrency: 16 | ~30 min (provider-rate-limited) |
Cost by model class
Frontier models run roughly 20–100× more expensive per trial than open mid-size models. Plan your budget before choosing a model:| Model class | Pilot (100 trials) | Full set (9,000 trials) |
|---|---|---|
| Open mid-size (Gemma 4 31B, Qwen3-VL) | ~$0.01 | ~$1 |
| Open large / frontier-cheap | ~$0.10–0.30 | ~$10–30 |
| Frontier closed | ~$1–3 | ~$80–300 |
Full-set numbers extrapolate from observed pilot per-trial cost. They are
estimates, not quotes. Run a smoke test against any new model to validate cost
capture before committing to a full run.
How cost is captured
Cost capture uses a two-tier fallback in the model runner:LiteLLM cost lookup (primary)
litellm.completion_cost(completion_response=resp) consults LiteLLM’s
built-in price table for the resolved model ID. This works for first-party
Anthropic, OpenAI, and Google models, and for most well-known open-weights
routes.OpenRouter native usage.cost (fallback)
For OpenRouter runs, the runner injects the following before each call:OpenRouter then returns the actual billed cost in
resp.usage.cost. The
runner reads it through three different access paths (attribute,
model_extra, __dict__) to handle pydantic-version variations in the
LiteLLM response object.Throughput formula
max_concurrency is the asyncio semaphore size. Effective throughput is:
For the Gemma-4-31B pilot at 8.2 s mean latency, max_concurrency: 4 yields roughly 0.49 trial/s, matching the observed numbers. Bumping concurrency further usually hits OpenRouter per-key rate limits before it speeds anything up — start at 4 and only raise it if you observe a queue building.
Budget worksheet
Use this formula to estimate any run before you launch it:oasis-llm smoke <config> first — it executes 3 trials for under $0.001 and reports per-trial cost. Read cost_per_trial from oasis-llm status, then multiply by your full trial count before launching the real run.