OASIS-LLM is a research harness that submits the 900 OASIS affective images to vision-language models and records their valence and arousal ratings for comparison against human norms. This guide takes you from a fresh clone to a running pilot in a few minutes.Documentation Index
Fetch the complete documentation index at: https://dcpma.mintlify.app/llms.txt
Use this file to discover all available pages before exploring further.
Prerequisites
Before you start, make sure you have:- Python 3.12 or later — check with
python --version - uv — the package manager used to install and run OASIS-LLM
- An API key for at least one supported provider (OpenRouter, OpenAI, Anthropic, or Google), or a local Ollama installation
Create your environment file
Copy the example environment file:Open
.env and fill in the key for the provider you want to use. You only need to set one.OpenRouter gives you access to hundreds of models — including free-tier models — under a single key. It is the easiest way to get started if you do not already have a direct provider key.
Download the OASIS images
The 900 OASIS images are licensed under CC BY-NC-SA 4.0 by the original authors and are not bundled with this repository. Download them from osf.io/6pnd7 and unpack the archive into the
OASIS/images/ directory:Smoke test your setup
Before committing to a full run, verify that your API key and image path are working correctly:This sends 3 images through the pipeline with a single valence rating each and prints the results to the terminal. It takes under a minute and costs a fraction of a cent.
Launch the dashboard
Open the Streamlit dashboard to browse the image set, design experiments, and monitor runs:The dashboard opens at
http://localhost:8501. From there you can inspect the OASIS image set, create datasets, and preview cost estimates before running anything from the CLI.Run a pilot from the CLI
Launch a 30-image pilot run using one of the example configs. Pick the config that matches your provider:Each pilot config targets 30 stratified images × 5 samples per image (150 trials) on both valence and arousal dimensions. The run is idempotent — if it is interrupted you can re-run the same command and it will resume from where it left off.
Check run status
While a run is in progress, or after it completes, check status across all runs:To inspect a specific run by its ID:The table shows counts of
done, pending, and failed trials, along with cumulative cost in USD.Next steps
With your first pilot complete, you can open the dashboard’s Analysis page to compare the model’s ratings against the human norms, or read Configuration to learn how to customise run parameters likesamples_per_image, max_concurrency, and prompt settings.