TheDocumentation Index
Fetch the complete documentation index at: https://dcpma.mintlify.app/llms.txt
Use this file to discover all available pages before exploring further.
image_set field in your run config determines which images are sent to the model. You can use one of four built-in named subsets ranging from a 3-image smoke test to all 900 OASIS images, or you can point to your own newline-delimited file of image IDs. This page explains how images are loaded, how the built-in subsets are constructed, and how to define a custom set.
How images are loaded
OASIS ships 900 open-access color images, standardized to 500 × 400 pixels, stored inOASIS/images/*.jpg. Image IDs are filename stems — for example, Beach 1 or Alarm clock 1.
Images are passed to the model as base64 data URLs, not hosted URLs. This makes runs fully reproducible without network fetches and ensures the same prompt format works identically across all providers:
Named image sets
image_set | Size | Description |
|---|---|---|
full_900 | 900 | Every image in OASIS/images/, sorted by ID. |
pilot_30 | 30 | Stratified sample across OASIS categories, seed=42. |
pilot_10 | 10 | Stratified sample across OASIS categories, seed=42. |
smoke_3 | 3 | Stratified sample across OASIS categories, seed=42. |
| path | varies | Path to a newline-delimited file of image IDs. |
How stratified sampling works
The pilot subsets (pilot_30, pilot_10, smoke_3) are built by proportionally allocating images across the OASIS category labels (Animal, Scene, Person, Object), with at least one image per category. Allocations are deterministic for a given (n, seed) pair.
Load the category map
image_categories() reads data/derived/OASIS_data_long.csv and returns a
mapping of {image_id: category} for all 900 images.Allocate proportionally
Each category receives
max(1, round(n × size_cat / total)) images. Any
drift from the target n caused by rounding is corrected by adding one
image to the smallest bucket or removing one from the largest.Shuffle within category
Image pools are sorted deterministically first (DuckDB
DISTINCT does not
guarantee order), then shuffled with a seeded RNG so the selection is
reproducible.If
data/derived/OASIS_data_long.csv is missing, the stratified sampler
falls back to a uniform random sample across all 900 images using the same
seed.Custom image sets
Pointimage_set at a path to a newline-delimited text file of image IDs to run against any arbitrary subset of the 900 images:
negative_high_arousal.txt
.jpg extension). This is the right approach for targeted reliability runs, stress tests, or any analysis that requires a hand-curated stimulus list.
Why image_set is part of the canonical hash
The image set is included in the run’scanonical_hash. Switching image_set from pilot_10 to full_900 constitutes a different experiment, and OASIS-LLM requires a new name in the YAML to reflect this. This is intentional: it prevents a partially-completed pilot run from being silently extended with a different stimulus pool.
samples_per_image is not part of the hash, so you can increase it on an existing run and only the new sample indices are enqueued — no renaming required.