Rollout

Rollout helps you run agent tasks from your own code and review what happened in a web UI. Datasets, runs, rollouts, and traces are first-class: the SDK call your production agent makes is the same call a training loop or eval harness makes.

For the hosted product, see rollout.mv37.org. For the announcement essay, see /rollout. Source lives at github.com/mv37-org/rollout.

Start here

If you are new to Rollout, read these sections in order: Concepts for the vocabulary, Web UI for sign-in and keys, SDKs to start runs from code, and CLI for terminal use.

For hosted Rollout, you need:

A Rollout account.
Access to a Rollout workspace.
An API key if you want to use an SDK from local code.

An API key is a secret string that lets your code call Rollout. Treat it like a password. Do not commit it to Git.

Most examples use https://mv37.org as the hosted Rollout URL. If your team uses a different URL, replace https://mv37.orgwith your team’s URL. For local development, use http://127.0.0.1:8080 after you start the local backend.

Concepts

Workspace

A workspace is a shared place for a team or project. Workspaces keep API keys, datasets, files, runs, traces, and settings separate. If you work on more than one project, you may have more than one workspace.

API key

An API key is a secret string used by code to call Rollout. Use an API key with the Python SDK or TypeScript SDK. Keep it private — anyone with the key may be able to use your workspace.

Dataset

A dataset is a named collection of tasks. For example, a dataset could contain 100 customer support questions, 50 research papers, or 20 browser tasks. Your code starts a run from a dataset or task set, then Rollout sends the tasks to your code.

Task

A task is one unit of work inside a dataset. A task can include:

An instruction, such as “Summarize this paper.”
Input data, such as a title or URL.
Files, such as PDFs or text files.
An expected output schema, which describes the shape of a good answer.

Run

A run is one execution of a dataset or task set. When your code starts a run, Rollout returns the tasks for that run. Your code can then start one rollout attempt for each task.

Rollout

A rollout is one attempt to complete one task. If you run the same task several times, each attempt is a separate rollout. This lets you compare outputs, errors, latency, cost, and trace events.

Trace

A trace is the timeline of what happened during a rollout. Trace events can include model messages, tool calls, tool results, errors, timing, token counts, and cost.

S3-compatible storage

S3-compatible storage is file storage that follows the same API style as Amazon S3 — Amazon S3, MinIO, and other object storage providers. Rollout can connect to this kind of storage so you can browse files and attach them to tasks.

Web UI

The Rollout web UI is where you manage work and inspect results. Use the UI first if you are new. It is the easiest place to create API keys, upload or import datasets, connect files, and review traces.

Open your Rollout URL in a browser.
Select the sign-in button.
Finish sign-in with your MV37 account.
After sign-in, Rollout opens your workspace.

If sign-in sends you back to the sign-in page, ask your workspace owner to invite your account. Use the workspace switcher in the sidebar to choose which workspace you want to work in.

SDK and CLI requests also use a workspace. If the UI shows one workspace but your code sends requests to another, you may not see the expected datasets or traces.

API keys

Open Settings.
Open API keys.
Select Create key.
Give the key a clear name, such as local python runner.
Create the key.
Copy the secret value right away.

Rollout only shows the full secret once. Save it in a password manager or a local environment variable. Do not paste it into source code that will be committed to Git. To rotate, create a new key before deleting the old one.

Datasets

A dataset is a set of tasks. SDK clients start runs from datasets or task sets, so create or import a dataset before running tasks from code.

Open Datasets.
Select Create dataset.
Add a dataset name.
Add at least one task.
For each task, add the instruction your code should follow.
Add input fields if your task needs structured input.
Add files if the task needs documents or other file data.
Create the dataset.

Use Import dataset when your dataset already exists outside the UI. You can Upload ZIP to upload a Harbor ZIP file, or Import from S3 to import a zipped Harbor dataset from S3-compatible storage. Harbor is the dataset format Rollout uses for structured agent tasks.

If GitHub is connected, you can export a dataset to a repository so dataset changes are reviewed with the same pull request flow as code.

Environments

Use Environments to prototype the world an agent works in before you hand it to a larger rollout runner. An environment is a versioned workspace asset that can link datasets, define state, mock tools and APIs, seed a DuckDB database, manage browser-local files, compute rewards, and preview verifiers.

Open Environments.
Select New environment or start from Gallery.
Fill in the goal, success criteria, base image, budgets, timeouts, and state contract.
Link one or more datasets and choose a sample task.
Add database seed SQL, API mocks, simulator state, and virtual filesystem fixtures as needed.
Configure reward, rollout, and evaluation modules.
Save a version when the prototype is ready.

The environment lab runs in the browser. Python hooks run through Pyodide. Database previews run through DuckDB-WASM. Files live in the browser virtual filesystem. Logs collect state changes, tool calls, database queries, filesystem diffs, reward output, and verifier previews. You can use the Edgent assistant from an environment to write or revise code modules such as env.py, observations.py, rewards.py, and rollouts.py.

Files

Use Files when tasks need documents or other file inputs.

Open Files.
Select Connect S3 bucket.
Enter the bucket, region, prefix, and credentials for your S3-compatible storage.
Save the connection.
Browse folders or upload files.

After files are connected, you can attach them while creating dataset tasks. When SDK code starts a rollout, the task payload includes file references and the SDK can download those files to local paths.

Troubleshooting. If the file browser is empty, check the bucket name, prefix, region, endpoint URL, and credentials. Also confirm you are in the expected workspace.

Traces

Open Traces.
Use Group by to choose how runs are grouped.
Select a group.
Select a run inside that group.
Select a rollout inside that run, if there is more than one.
Read the timeline.

The timeline shows events your SDK code logged: messages, tool calls, tool results, errors, latency, tokens, and cost. There is a Gantt-style view for timing and a flat view for prose. Unfinished traces are shown as stalledwhen they have not logged activity for the workflow’s markStalledAfterSeconds / mark_stalled_after_seconds window. The default window is 900 seconds.

The default grouping is by group_id / groupId. Use this for a conversation ID, session ID, job ID, or eval batch ID. If your SDK code does not pass a group ID, the run appears under Ungrouped runs. You can also group by workflow, agent, dataset, task set, or status.

If no traces appear, check that your SDK code:

Uses the correct API key.
Uses the correct workspace.
Calls start_run or startRun.
Passes the expected group_id / groupId if you are looking inside a specific group.
Calls start_rollout or startRollout.
Calls finish.

If your code logs custom events, check that it calls log_event in Python or logEvent in TypeScript.

Team and settings

Workspace owners can invite teammates.

Open Settings.
Open Team.
Select Invite member.
Enter the teammate’s email address.
Send the invite.

The invited person must sign in with that email address to join the workspace. Use Settings > Appearance to choose the theme and accent color for the active workspace. These settings change how the UI looks for your workspace membership but do not change SDK behavior.

SDKs

Use a Rollout SDK when your local code should run tasks and send results back. Both SDKs can start a run, read the tasks in that run, start a rollout attempt for each task, download task files to local paths, log trace events, attach workflow and group metadata, and finish the rollout with an output or error.

Trace grouping

Every run gets a generated run ID. Every rollout attempt gets a generated trace ID. You can optionally pass:

workflowName / workflow_name: a human-readable workflow, such as Customer support bot or Invoice extraction.
groupId / group_id: a stable application ID that groups related runs, such as a conversation ID, session ID, job ID, or eval batch ID.
markStalledAfterSeconds / mark_stalled_after_seconds: how long an unfinished trace can go without logging activity before the UI shows it as stalled.

Python SDK

Install the package and create an API key in the Rollout UI.

pip install mv37.rollout

export ROLLOUT_API_KEY="mv37_rl_..."
export ROLLOUT_WORKSPACE="my-workspace"

Create a client:

import os

from mv37.rollout import Rollout

rollout = Rollout(
    api_key=os.environ["ROLLOUT_API_KEY"],
    base_url=os.environ.get("ROLLOUT_BASE_URL", "https://mv37.org"),
    workspace=os.environ.get("ROLLOUT_WORKSPACE"),
)

Start a run and process tasks:

from pathlib import Path
from typing import Any


def answer_task(instruction: str, task_input: dict[str, Any], files: list[dict[str, Any]]) -> dict[str, Any]:
    file_names = [Path(file["path"]).name for file in files if file.get("path")]
    return {
        "summary": f"Replace this with your answer for: {instruction}",
        "input": task_input,
        "files_used": file_names,
    }


run = rollout.start_run(
    "paper-summary-v1",
    workflow_name="Paper QA",
    group_id="thread-42",
    mark_stalled_after_seconds=900,
)

for task in run.tasks:
    attempt = run.start_rollout(task)
    files = attempt.materialize_files()

    try:
        attempt.message("Starting task", role="system", task_name=task.name)
        result = answer_task(
            instruction=task.task.instruction,
            task_input=task.task.input,
            files=files,
        )
        attempt.finish(output=result)
    except Exception as error:
        attempt.error(str(error))
        attempt.finish(status="failed", error=str(error))
        raise

materialize_files() downloads task files and returns file records with a local path. If you do not pass a directory, the SDK creates a temporary directory for you. attempt.id is the rollout attempt ID. attempt.trace_id is the generated trace ID shown in trace payloads and summaries.

Log trace events:

attempt.message("Drafting answer", role="assistant")
attempt.tool_call("search", {"query": "project notes"})
attempt.tool_result("search", {"matches": 3})
attempt.log_event("score", {"value": 0.92})

attempt.log_event(
    "model_call",
    {"model": "example-model"},
    latency_ms=820,
    cost_usd=0.0042,
    tokens={"input": 1200, "output": 180},
)

The SDK batches events and sends them automatically. finish() flushes any events that are still waiting. The SDK records startedAt, endedAt, startOffsetMs, and endOffsetMs for every event using a monotonic clock, so fast events are not limited by wall-clock timestamp precision. If you pass latency_ms, the SDK treats the event as a span that ended when it was logged unless you also provide explicit offsets.

Set the workspace on the client or per run:

rollout = Rollout(
    api_key=os.environ["ROLLOUT_API_KEY"],
    base_url="https://mv37.org",
    workspace="my-workspace",
)

run = rollout.start_run("paper-summary-v1", workspace="my-workspace")

If neither value is set, the SDK checks the ROLLOUT_WORKSPACE environment variable.

TypeScript SDK

npm install @mv37/rollout

export ROLLOUT_API_KEY="mv37_rl_..."
export ROLLOUT_WORKSPACE="my-workspace"

Create a client:

import { Rollout } from "@mv37/rollout";

const rollout = new Rollout({
  apiKey: process.env.ROLLOUT_API_KEY!,
  baseUrl: process.env.ROLLOUT_BASE_URL ?? "https://mv37.org",
  workspace: process.env.ROLLOUT_WORKSPACE,
});

Start a run and process tasks:

import path from "node:path";
import { Rollout, type TaskFileRef } from "@mv37/rollout";

type Answer = {
  summary: string;
  input: Record<string, unknown>;
  filesUsed: string[];
};

async function answerTask(
  instruction: string,
  input: Record<string, unknown>,
  files: TaskFileRef[],
): Promise<Answer> {
  return {
    summary: `Replace this with your answer for: ${instruction}`,
    input,
    filesUsed: files
      .map((file) => (typeof file.path === "string" ? path.basename(file.path) : null))
      .filter((name): name is string => Boolean(name)),
  };
}

const run = await rollout.startRun("paper-summary-v1", {
  workflowName: "Paper QA",
  groupId: "thread-42",
  markStalledAfterSeconds: 900,
});

for (const task of run.tasks) {
  const attempt = await run.startRollout(task);
  const files = await attempt.materializeFiles();

  try {
    await attempt.message("Starting task", "system", { taskName: task.name });
    const result = await answerTask(task.task.instruction, task.task.input, files);
    await attempt.finish({ output: result });
  } catch (error) {
    const message = error instanceof Error ? error.message : String(error);
    await attempt.error(message);
    await attempt.finish({ status: "failed", error: message });
    throw error;
  }
}

materializeFiles() downloads task files and returns records with a local path. This method requires a Node.js runtime because it writes files to disk.

Log trace events:

await attempt.message("Drafting answer", "assistant");
await attempt.toolCall("search", { query: "project notes" });
await attempt.toolResult("search", { matches: 3 });
await attempt.logEvent("score", { value: 0.92 });

await attempt.logEvent({
  type: "model_call",
  payload: { model: "example-model" },
  latencyMs: 820,
  costUsd: 0.0042,
  tokens: { input: 1200, output: 180 },
});

The SDK batches events and sends them automatically. finish() flushes any events that are still waiting. The SDK records timing for every event using a monotonic clock. If you pass latencyMs, the SDK treats the event as a span that ended when it was logged unless you also provide explicit offsets.

CLI

The Rollout CLI is the command line tool named rollout. Use it from your terminal to log in, check who you are logged in as, choose a workspace, and list datasets.

Install and login

The installer requires curl.

curl -fsSL https://rollout.work/install | sh

The installer downloads the CLI to ~/.rollout/bin/rollout by default and adds that directory to your shell path. Open a new terminal after installing, then check that the command is available:

rollout --help

rollout login

The CLI opens a browser window. Finish sign-in in the browser, then return to your terminal. If the browser does not open, print the login URL instead:

rollout login --no-browser

If you already know the workspace slug, pass it during login:

rollout login --workspace my-workspace

Commands

# Check your account
rollout whoami

# Show the active workspace
rollout workspace

# Change the active workspace
rollout workspace my-workspace

# List datasets
rollout datasets list
# or
rollout list datasets

# Show the config file path
rollout config path

# Log out
rollout logout

rollout whoami prints your user, active workspace, and API key preview. The preview is not the full secret. When you change the active workspace, the CLI checks that your saved key can access it; if it cannot, log in again with that workspace using rollout login --workspace my-workspace. The config file stores the saved base URL, API key, API key ID, workspace, and creation time. rollout logout removes the saved local credential from your machine.

Local development

If your team has a different Rollout URL, pass it with --base-url or set the environment variable:

rollout --base-url https://rollout.example.com whoami

export ROLLOUT_BASE_URL="https://rollout.example.com"
rollout whoami

When running the backend locally, run the CLI from the monorepo:

uv run --project packages/cli rollout login --base-url http://127.0.0.1:8080

Local stack

Rollout is a monorepo with three packages: packages/frontend (React, TypeScript, Vite, Tailwind, shadcn-style UI), packages/backend (FastAPI, SQLAlchemy, MV37 OAuth, Postgres-backed sessions), and packages/cli (the standalone Python rollout CLI and hosted installer assets).

Required environment variables:

ROLLOUT_MV37_ISSUER_URL
ROLLOUT_MV37_CLIENT_ID
ROLLOUT_MV37_CLIENT_SECRET
ROLLOUT_MV37_REDIRECT_URL
NEXT_PUBLIC_APP_URL
AUTH_SECRET
DATABASE_URL

Start the local stack:

make setup
make db-start
make init-db
make dev

The API runs on http://127.0.0.1:8080 and the frontend on http://127.0.0.1:5173. make db-start uses local Postgres binaries (initdb, pg_ctl, psql, and createdb) and stores its data in .postgres/data. Verify with make test.

Documentation lives in the repo and is built with VitePress:

pnpm docs:dev
pnpm docs:build

Point the SDK at localhost for local development:

export ROLLOUT_BASE_URL="http://127.0.0.1:8080"
export ROLLOUT_API_KEY="mv37_rl_..."

Source

The repository is open at github.com/mv37-org/rollout. For use cases, feedback, or rough edges, email v@mv37.org.

Start here

Concepts

Workspace

API key

Dataset

Task

Run

Rollout

Trace

S3-compatible storage

Web UI

Sign in and workspaces

API keys

Datasets

Environments

Files

Traces

Team and settings

SDKs

Trace grouping

Python SDK

TypeScript SDK

CLI

Install and login

Commands

Local development

Local stack

Source