Viewing Evaluation Logs¶

Mini Control Arena uses Inspect AI's logging system to save evaluation results. This page covers how to view, share, and work with these logs.

Log Format and Location¶

By default, evaluation logs are saved to ./logs/ in your current directory. The logs are stored in Inspect AI's efficient .eval format (binary) or .json format.

./logs/
├── eval_20250107_123456.eval
└── eval_20250107_234567.json

Viewing Logs Interactively¶

The easiest way to explore your evaluation results is with Inspect's built-in viewer.

inspect view --log-dir ./logs

This opens an interactive web interface where you can browse samples, filter by scores, and drill into individual conversations to see tool calls and model responses.

Converting Logs to Different Formats¶

Inspect AI's native logging format can be hard to work with directly. Mini Control Arena provides conversion utilities to export logs to more common formats.

Log Conversion Functions¶

Mini Control Arena provides two log conversion functions. The convert_logs function handles batch conversion of multiple log files to a default format with metadata. The convert_logs_oai_sft function converts logs to OpenAI's fine-tuning format.

convert_logs¶

The convert_logs function handles multiple log files or directories at once, preserving directory structure and adding metadata.

from mini_control_arena import convert_logs

result = convert_logs(
    paths_to_convert=["./logs"],
    target_dir="./logs_converted",
    require_submit=True,
    preserve_structure=True
)

result = convert_logs(
    paths_to_convert=["logs/exp1", "logs/exp2", "logs/exp3/eval.eval"],
    target_dir="./all_logs_converted"
)

The converted files are in JSONL format with metadata on the first line and one sample per line after that.

{"_metadata": {"source_file": "logs/eval.eval", "format": "default", "total_samples": 20, "require_submit": true, "model": "openai/gpt-4"}}

{
  "sample_id": "1",
  "sample_index": 0,
  "messages": [
    {"role": "system", "content": "You are a helpful assistant..."},
    {"role": "user", "content": "Please help me with this task..."},
    {
      "role": "assistant",
      "content": "I'll help you with that.",
      "reasoning": "First, I need to understand what the user wants...",
      "tool_calls": [
        {"function": "bash", "arguments": {"cmd": "ls"}, "id": "call_123"}
      ]
    },
    {"role": "tool", "tool_call_id": "call_123", "content": "file1.txt\nfile2.txt"},
    {"role": "assistant", "content": "I found 2 files."}
  ],
  "scores": {"main_task": 0.75, "completion": 1.0},
  "metadata": {
    "total_messages": 5,
    "message_types": {"system": 1, "user": 1, "assistant": 2, "tool": 1},
    "has_assistant_response": true,
    "complete": true,
    "failed": false
  }
}

convert_logs_oai_sft¶

The convert_logs_oai_sft function converts logs to OpenAI's fine-tuning format for training custom models.

from mini_control_arena import convert_logs_oai_sft

convert_logs_oai_sft(
    paths_to_convert=["logs/eval.eval"],
    target_dir="fine_tune_data/",
    require_submit=True,
    cut_after_submit=True,
    only_used_tools=True
)

convert_logs_oai_sft(
    paths_to_convert=["logs/exp1/", "logs/exp2/", "logs/special.eval"],
    target_dir="fine_tune_data/",
    preserve_structure=True
)

Each sample in OpenAI fine-tuning format contains messages and tool definitions.

{
  "messages": [
    {"role": "system", "content": "You are a helpful assistant..."},
    {"role": "user", "content": "Please write a bash script to..."},
    {
      "role": "assistant",
      "content": "",
      "tool_calls": [
        {
          "id": "call_xyz123",
          "type": "function",
          "function": {
            "name": "bash",
            "arguments": "{\"cmd\": \"ls -la\"}"
          }
        }
      ]
    },
    {"role": "tool", "content": "total 24\ndrwxr-xr-x...", "tool_call_id": "call_xyz123"},
    {"role": "assistant", "content": "I see the files. Now let me create the script..."}
  ],
  "tools": [
    {
      "type": "function",
      "function": {
        "name": "bash",
        "description": "Execute bash commands",
        "parameters": {
          "type": "object",
          "properties": {
            "cmd": {"type": "string", "description": "Command to execute"}
          },
          "required": ["cmd"]
        }
      }
    }
  ],
  "parallel_tool_calls": false
}

Reviewing Converted Files in Terminal¶

After converting logs to JSONL format, you can review them using standard command-line tools without needing the Inspect viewer. The first line contains metadata about the conversion. Lines 2 and onward each contain one sample.

Review samples interactively¶

The commands below show up to 20 samples.

for i in {2..21}; do
  echo -e "\n========== Sample $((i-1)) =========="
  sed -n "${i}p" converted.jsonl | jq .
  read -p "Press enter for next sample (Ctrl+C to quit)..."
done

for i in {2..21}; do
  echo -e "\n========== Sample $((i-1)) =========="
  sed -n "${i}p" converted.jsonl | jq '{
    sample_id,
    msg_count: (.messages | length),
    types: .metadata.message_types,
    messages: .messages | map({role, content: (.content[:120] // "")})
  }'
  read -p "Press enter for next sample (Ctrl+C to quit)..."
done

for i in {2..21}; do
  echo -e "\n========== Sample $((i-1)) =========="
  sed -n "${i}p" converted.jsonl | jq '{
    sample_id,
    msg_count: (.messages | length),
    types: .metadata.message_types,
    first_5_messages: .messages[:5]
  }'
  read -p "Press enter for next sample (Ctrl+C to quit)..."
done

You can create a self-contained web directory that others can view without installing Inspect.

inspect view bundle --log-dir logs --output-dir logs-www

The output directory can be deployed to GitHub Pages or any static hosting service.

You must take measures to ensure that the scheming examples don't end up in LLM pretraining data by sharing online. They might affect model behavior in an unintended manner. inspect view bundle should come with robots.txt by default but additional measures are needed. We believe that you'll choose carefully when and when not to share.