Skip to main content

Documentation Index

Fetch the complete documentation index at: https://dcpma.mintlify.app/llms.txt

Use this file to discover all available pages before exploring further.

Before you launch a full OASIS-LLM run, it is worth understanding what it will cost and how long it will take. This page gives you real numbers from the shipped pilot config, extrapolations to the full 900-image set, a cost comparison across model classes, and a formula you can use to estimate any run before you start it.

Pilot benchmark

The shipped pilot config runs 10 images × 2 dimensions × 5 samples = 100 trials through google/gemma-4-31b-it on OpenRouter, vision modality, capture_reasoning: true, cache_buster: true:
MetricValue
Trials completed100
Total cost$0.00867
Cost per trial~$0.0000867
Average latency8.2 s / trial
p50 latency5.7 s
Throughput at max_concurrency: 4~0.49 trial/s
The image data URL dominates input tokens — the per-trial cost includes both the full image upload and a one-sentence reasoning string in the output.

Full-set extrapolation

Scaling the same model to image_set: full_900 × 2 dimensions × 5 samples = 9,000 trials:
MetricEstimate
Total cost~$0.78
Wall time at max_concurrency: 4~2 hr
Wall time at max_concurrency: 16~30 min (provider-rate-limited)

Cost by model class

Frontier models run roughly 20–100× more expensive per trial than open mid-size models. Plan your budget before choosing a model:
Model classPilot (100 trials)Full set (9,000 trials)
Open mid-size (Gemma 4 31B, Qwen3-VL)~$0.01~$1
Open large / frontier-cheap~$0.10–0.30~$10–30
Frontier closed~$1–3~$80–300
Full-set numbers extrapolate from observed pilot per-trial cost. They are estimates, not quotes. Run a smoke test against any new model to validate cost capture before committing to a full run.

How cost is captured

Cost capture uses a two-tier fallback in the model runner:
1

LiteLLM cost lookup (primary)

litellm.completion_cost(completion_response=resp) consults LiteLLM’s built-in price table for the resolved model ID. This works for first-party Anthropic, OpenAI, and Google models, and for most well-known open-weights routes.
2

OpenRouter native usage.cost (fallback)

For OpenRouter runs, the runner injects the following before each call:
call_kwargs.setdefault("extra_body", {})
call_kwargs["extra_body"].setdefault("usage", {"include": True})
OpenRouter then returns the actual billed cost in resp.usage.cost. The runner reads it through three different access paths (attribute, model_extra, __dict__) to handle pydantic-version variations in the LiteLLM response object.
3

NULL (Ollama and local providers)

No price source is available for local models. cost_usd is stored as NULL. Latency and token counts are still captured for every trial.

Throughput formula

max_concurrency is the asyncio semaphore size. Effective throughput is: throughputmin(max_concurrencylatencyˉ, provider rate limit)\text{throughput} \approx \min\left( \frac{\text{max\_concurrency}}{\bar{\text{latency}}},\ \text{provider rate limit} \right) For the Gemma-4-31B pilot at 8.2 s mean latency, max_concurrency: 4 yields roughly 0.49 trial/s, matching the observed numbers. Bumping concurrency further usually hits OpenRouter per-key rate limits before it speeds anything up — start at 4 and only raise it if you observe a queue building.
max_concurrency is excluded from canonical_hash, so you can adjust it between resumes without invalidating the run.

Budget worksheet

Use this formula to estimate any run before you launch it:
trials       = N_images × N_dims × samples_per_image
cost_total   ≈ trials × cost_per_trial_observed_in_smoke
wall_time    ≈ trials × mean_latency / max_concurrency
For any new model, run oasis-llm smoke <config> first — it executes 3 trials for under $0.001 and reports per-trial cost. Read cost_per_trial from oasis-llm status, then multiply by your full trial count before launching the real run.
Always run oasis-llm smoke <config> against a new model before starting a full run. It validates that authentication, cost capture, and response parsing all work correctly, and gives you a real per-trial cost to plug into the formula above.