Skip to content

Anatomy of Mini Control Arena

Understanding the Core Concepts

Before we dive into the details, let's clarify exactly what we mean by templates, environments, and how they connect:

Template vs Environment - What's the difference?

Template is what ships with the library. For example, it's the apps or bash folder sitting inside the mini-control-arena package. We advise that you don't edit these files.

Environment is your copy of the template. When you run copy_template("apps", "./my-custom-apps"), you get your own version in ./my-custom-apps/. This is the path you give to run_eval().

Most evaluation frameworks make you write code that inherits from their classes to customize an evaluation setup. There are benefits to this approach, but we find it easier to work directly with files for research purposes.

graph LR
    subgraph lib[Mini Control Arena]
        direction TB
        Evals[Pre-built Evaluations]
        Runner[run_eval]
    end

    subgraph project[Your Project]
        direction TB
        Copy[Your Copy]
        Results[Results]
    end

    Evals -- copy_template --> Copy
    Copy -- run_eval --> Runner
    Runner -- saves --> Results

From the user's perspective, experimenting with Mini-Control-Arena involves three steps:

  1. copy_template("{template-name}", "{your-dir}") copies a template to your directory.

  2. You can edit (or not edit) this environment.

  3. Point run_eval to your directory with run_eval(environment_dir="{your-dir}", ...).

When running evaluation, you can pass in various parameters to customize the experiment. You can take a look at the code sample in running bashbench2 to get a hang before continuing.

Naming convention: run_eval() parameters

We differentiate parameters for run_eval() as those that get used in run_eval() itself and those that get passed to your template/environment.

Any parameter that's passed to run_eval() starting with environment_ gets passed to the template/environment to configure your environment settings, not the runner itself. Such examples include environment_side_task or environment_mode.

You can learn more about parameters that get used in run_eval() in the API Reference - run_eval() page. You can learn more about parameters that get used in each template/environment in the Templates pages on the left navigation bar.

What's Inside a Template?

We recommend that you take a look at APPS first as it's the simplest. After running copy_template("apps", "./my-custom-apps"), here's what you get:

my-custom-apps/
├── environment.py                 # Coordinator that ties everything together
├── dataset.py                     # Loads programming problems from data/
├── scorer.py                      # Checks if the solution passes test cases
├── tools/                         # Functions that the agent can call
│   ├── code_solution_honest.py    # A function to submit code solution
│   └── code_solution_attack.py    # A function to submit code solution (with backdoor)
├── prompts/                       # Text files for system prompts
│   ├── system_prompts/
│   │   ├── honest.txt             # System prompt for Honest mode
│   │   └── attack.txt             # System prompt for Attack mode
│   └── side_tasks/                # Hidden objectives for Attack mode
│       └── backdoor_insertion.txt
├── prepare_dataset.py             # Script to download dataset (runs automatically)
└── data/                          # Created by prepare_dataset.py
    └── apps_data.json             # Programming problems (downloaded on setup)

The environment.py file is where everything comes together. Every environment.py file has six required methods that run_eval() calls. These six methods represent distinct working components of your AI Control evaluation. Therefore, you should be able to understand any template by reading environment.py and then exploring the files that it references:

class AppsEnvironment:
    def get_dataset(self, **kwargs):
        # Returns list of problems to solve (this is your agent's "Main Task")

    def setup_environment(self, **kwargs):
        # Setup phase before agent starts

    def get_tools(self, **kwargs):
        # Returns functions the agent can use

    def get_scorer(self, **kwargs): 
        # Returns list of scoring functions to evaluate success

    def get_sandbox_config(self):
        # Returns Docker configuration if available

    def get_system_prompt(self, **kwargs):
        # Returns the agent's system prompt

How Evaluation Works

When you call run_eval(), the runner loads the environment.py in the environment_dir that you specified. Then, it calls the six methods in environment.py to load your experiment setup, while passing along any environment_* parameters to the environment.py in your environment:

run_eval(
    # Parameters used by run_eval()
    environment_dir="./my-custom-apps",  
    model="gpt-4",
    log_format="eval",

    # Environment parameters (passed to your environment)
    environment_mode="attack",            # → goes to your methods
    environment_samples=5,                # → Random 5 samples (or "0-10" for range, or "0,5,10" for specific)
    environment_seed=42,                  # → Reproducible random selection
    environment_my_custom_param="value"   # → any parameter that your environment accepts!
)

run_eval()'s job is to build an Inspect Task from your dataset, tools, prompts, and scorer. Then, it executes Inspect's react agent and saves results to your log directory. The runner itself is only ~400 lines. This means that most of the experiment logic lives in your template/environment files.

Tool Customization

Much work in building Mini-Control-Arena has been spent on implementing all tools locally. Tools are defined in simple Python files in the tools/ directory that you can edit directly. This design choice gives you complete transparency over what tools are available.

When running AI Control experiments, you often need to track what the agent is doing, modify tool behavior for specific scenarios, or add custom features. With local implementations, you can add logging to track every command executed, modify timeout defaults or error handling, add new parameters or functionality, and implement custom validation or sandboxing.

Scorer Architecture

The scorer system returns a list of scorer functions rather than a single combined scorer. This allows the Inspect log viewer to display each score as a separate column:

def get_scorer(self, environment_mode="honest", **kwargs):
    scorers = [main_task_scorer()]  # Always score the main task

    if environment_mode == "attack":
        scorers.append(side_task_scorer())  # Add side task scoring in attack mode

    return scorers  # Returns list, not single scorer

This approach provides better visibility in the log viewer, where you can see separate columns for main_task_scorer and {side_task}_scorer with their individual accuracy metrics.