Image Sets: Choose and Customize Your Stimulus Pool

The image_set field in your run config determines which images are sent to the model. You can use one of four built-in named subsets ranging from a 3-image smoke test to all 900 OASIS images, or you can point to your own newline-delimited file of image IDs. This page explains how images are loaded, how the built-in subsets are constructed, and how to define a custom set.

How images are loaded

OASIS ships 900 open-access color images, standardized to 500 × 400 pixels, stored in OASIS/images/*.jpg. Image IDs are filename stems — for example, Beach 1 or Alarm clock 1. Images are passed to the model as base64 data URLs, not hosted URLs. This makes runs fully reproducible without network fetches and ensures the same prompt format works identically across all providers:

def image_data_url(image_id: str) -> str:
    return f"data:image/jpeg;base64,{encode_image_base64(image_id)}"

Using data URLs is more bytes on the wire than a hosted URL, but it eliminates an entire class of failure modes where the URL changes or becomes unreachable between runs.

Named image sets

`image_set`	Size	Description
`full_900`	900	Every image in `OASIS/images/`, sorted by ID.
`pilot_30`	30	Stratified sample across OASIS categories, seed=42.
`pilot_10`	10	Stratified sample across OASIS categories, seed=42.
`smoke_3`	3	Stratified sample across OASIS categories, seed=42.
path	varies	Path to a newline-delimited file of image IDs.

Set the value in your YAML config:

image_set: pilot_10

How stratified sampling works

The pilot subsets (pilot_30, pilot_10, smoke_3) are built by proportionally allocating images across the OASIS category labels (Animal, Scene, Person, Object), with at least one image per category. Allocations are deterministic for a given (n, seed) pair.

Load the category map

image_categories() reads data/derived/OASIS_data_long.csv and returns a mapping of {image_id: category} for all 900 images.

Allocate proportionally

Each category receives max(1, round(n × size_cat / total)) images. Any drift from the target n caused by rounding is corrected by adding one image to the smallest bucket or removing one from the largest.

Shuffle within category

Image pools are sorted deterministically first (DuckDB DISTINCT does not guarantee order), then shuffled with a seeded RNG so the selection is reproducible.

Filter to disk

Only image IDs that have a matching .jpg file on disk are kept. This guards against missing files without erroring out.

If data/derived/OASIS_data_long.csv is missing, the stratified sampler falls back to a uniform random sample across all 900 images using the same seed.

Custom image sets

Point image_set at a path to a newline-delimited text file of image IDs to run against any arbitrary subset of the 900 images:

image_set: configs/image_sets/negative_high_arousal.txt

negative_high_arousal.txt

Snake 2
Spider 5
Car crash 3

Each line is an OASIS image ID (the filename stem without the .jpg extension). This is the right approach for targeted reliability runs, stress tests, or any analysis that requires a hand-curated stimulus list.

Why image_set is part of the canonical hash

The image set is included in the run’s canonical_hash. Switching image_set from pilot_10 to full_900 constitutes a different experiment, and OASIS-LLM requires a new name in the YAML to reflect this. This is intentional: it prevents a partially-completed pilot run from being silently extended with a different stimulus pool. samples_per_image is not part of the hash, so you can increase it on an existing run and only the new sample indices are enqueued — no renaming required.

Documentation Index

​How images are loaded

​Named image sets

​How stratified sampling works

​Custom image sets

​Why image_set is part of the canonical hash

How images are loaded

Named image sets

How stratified sampling works

Custom image sets

Why image_set is part of the canonical hash