session-retrospective/docs/design.md

# Session Retrospective Tool — Design Spec

## Problem

After a Claude Code session, there is no structured way to understand which actions were impactful vs. wasteful, how tokens were spent across tasks, or what could be improved. The built-in `/cost` command shows aggregate USD but no breakdown by task, phase, or impact.

## Objective

Enable Claude Code to self-assess its own efficiency during a session, then produce a retrospective report that cross-references qualitative impact judgments with quantitative token and time data. This creates a feedback loop for identifying what was high-impact and cheap vs. expensive and wasteful.

## Architecture Overview

```
During Session                          After Session
─────────────                           ─────────────
/mn:start-tracking                      /mn:session-retrospective
     │                                       │
     ▼                                       ▼
analytics-db MCP                        Haskell executable
     │                                       │
     ▼                                       ├─→ Reads JSONL files
claude_analytics DB                     │    (tokens, timestamps, tools)
  ┌──────────────┐                      │
  │ cc_sessions  │◄─────────────────────├─→ Queries cc_session_phases
  │ cc_phases    │                      │    (verdicts, tasks, notes)
  └──────────────┘                      │
                                        ├─→ Reads .meta.json
                                        │    (subagent descriptions)
                                        │
                                        ▼
                                   Markdown report
                                   ai_files/session-reports/
```

## Components

### 1. Database: `claude_analytics`

A standalone PostgreSQL database, separate from any project database. Accessed via a dedicated `analytics-db` MCP instance in `~/.claude/.mcp.json`.

#### `cc_sessions`

| Column | Type | Notes |
|--------|------|-------|
| `id` | `SERIAL PRIMARY KEY` | |
| `session_uuid` | `TEXT NOT NULL UNIQUE` | Claude Code session UUID |
| `project_slug` | `TEXT NOT NULL` | e.g. `-var-www-assets-cedrusconsult-com-ser-dev` |
| `started_at` | `TIMESTAMPTZ NOT NULL` | |
| `ended_at` | `TIMESTAMPTZ` | Filled by retrospective |
| `description` | `TEXT` | What the session was about |
| `total_output_tokens` | `INT` | Filled by retrospective |
| `total_input_tokens` | `INT` | Filled by retrospective |
| `total_cache_read_tokens` | `INT` | Filled by retrospective |
| `total_cache_create_tokens` | `INT` | Filled by retrospective |
| `claude_time_seconds` | `REAL` | Filled by retrospective |
| `tool_time_seconds` | `REAL` | Filled by retrospective |
| `human_time_seconds` | `REAL` | Filled by retrospective |

#### `cc_session_phases`

| Column | Type | Notes |
|--------|------|-------|
| `id` | `SERIAL PRIMARY KEY` | |
| `session_id` | `INT REFERENCES cc_sessions` | |
| `task` | `TEXT NOT NULL` | e.g. "fix-login-bug" |
| `phase` | `TEXT NOT NULL` | e.g. "diagnose-redirect" |
| `started_at` | `TIMESTAMPTZ NOT NULL` | |
| `ended_at` | `TIMESTAMPTZ` | Filled when phase ends |
| `actions_summary` | `TEXT` | e.g. "Grep, Read Server.hs, Read Auth.hs" |
| `verdict` | `TEXT NOT NULL` | One of the verdict categories |
| `subagents` | `TEXT` | Comma-separated descriptions of dispatched subagents |
| `useful_info` | `TEXT` | Facts/code locations discovered |
| `lessons_learned` | `TEXT` | Meta-observations for self-learning |
| `notes` | `TEXT` | Freeform |

### 2. Verdict Categories

| Verdict | Meaning |
|---------|---------|
| `high_impact` | Pivotal — unlocked the solution or saved significant downstream work |
| `moderate_impact` | Directly contributed to the outcome |
| `small_impact` | Minor direct contribution |
| `exploratory_useful` | Exploration that paid off |
| `exploratory_waste` | Exploration that led nowhere but was reasonable to try |
| `avoidable_waste` | Should have known better |

### 3. Haskell Executable: `session-retrospective`

**Location:** `~/.claude/tools/session-retrospective/` (standalone project, own `stack.yaml`)

**CLI:**
```bash
stack exec session-retrospective -- <session-uuid>
```

**Modules:**

| Module | Responsibility |
|--------|---------------|
| `SessionRetrospective.Main` | Entry point — parse args, orchestrate |
| `SessionRetrospective.Jsonl` | Parse JSONL files into typed records (Aeson) |
| `SessionRetrospective.Phases` | Query `cc_session_phases` from PG, match to JSONL messages by timestamp |
| `SessionRetrospective.Subagents` | Parse subagent JSONLs + `.meta.json`, compute per-subagent stats |
| `SessionRetrospective.TimeAnalysis` | Classify timestamp gaps into claude/tool/human time |
| `SessionRetrospective.Report` | Generate markdown report from computed stats |

**Key types:**

```haskell
data TokenCounts = TokenCounts
  { tcOutput      :: !Int
  , tcInput       :: !Int
  , tcCacheRead   :: !Int
  , tcCacheCreate :: !Int
  }

data Verdict
  = HighImpact
  | ModerateImpact
  | SmallImpact
  | ExploratoryUseful
  | ExploratoryWaste
  | AvoidableWaste

data Phase = Phase
  { phTask           :: !Text
  , phName           :: !Text
  , phStartedAt      :: !UTCTime
  , phEndedAt        :: !(Maybe UTCTime)  -- Nothing if phase still open
  , phVerdict        :: !Verdict
  , phActionsSummary :: !Text
  , phSubagents      :: !(Maybe Text)
  , phUsefulInfo     :: !(Maybe Text)
  , phLessonsLearned :: !(Maybe Text)
  , phNotes          :: !(Maybe Text)
  -- Computed by joining with JSONL:
  , phTokens         :: !TokenCounts
  , phClaudeTime     :: !NominalDiffTime
  , phToolTime       :: !NominalDiffTime
  , phToolTurns      :: !Int
  , phTextTurns      :: !Int
  , phToolResultSize :: !Int  -- input bloat from tool results
  }

data SubagentStats = SubagentStats
  { saDescription :: !Text
  , saAgentType   :: !Text
  , saTokens      :: !TokenCounts
  , saTime        :: !NominalDiffTime
  , saPhase       :: !Text
  , saVerdict     :: !Verdict
  }
```

**Dependencies:** `aeson`, `postgresql-simple`, `time`, `text`, `bytestring`, `filepath`, `directory`

**DB access:** Uses `postgresql-simple` (not Squeal). This is a standalone tool, not part of the ser platform `src/`.

### 4. JSONL Data Available

Located at `~/.claude/projects/<project-slug>/<session-uuid>.jsonl` with subagents at `<session-uuid>/subagents/agent-<id>.jsonl`.

#### Streaming Deduplication (CRITICAL)

The JSONL logs **streaming events**, not final messages. Multiple entries share the same `.requestId` and represent incremental chunks from one API call. Token counts within a `requestId` are cumulative — only the final chunk has the correct totals.

**Deduplication rule:** Group assistant messages by `requestId`. For each group, take the token counts from the entry with the highest `output_tokens` value (the final streaming chunk). Sum across groups for session totals.

#### Per assistant message:
- `.message.usage.output_tokens` — Claude generation tokens (includes thinking tokens)
- `.message.usage.input_tokens` — fresh input tokens
- `.message.usage.cache_read_input_tokens` — cached context
- `.message.usage.cache_creation_input_tokens` — new cache entries
- `.message.content[]` — tool calls (`type: "tool_use"`, `name`, `input`), text (`type: "text"`), and thinking (`type: "thinking"`, text present but no separate token count — cost is included in `output_tokens`)
- `.timestamp` — ISO 8601
- `.requestId` — groups streaming chunks from a single API call

#### Per user message:
- `.message.content` — either a string (human text) or an array containing `{"type": "tool_result", ...}` and/or `{"type": "text", ...}` entries
- `.timestamp`

**Content format note:** Human text can appear as either `"content": "text"` (string) or `"content": [{"type": "text", "text": "..."}]` (array). The parser must handle both.

#### Subagent matching

`agent-<id>.meta.json` contains `agentType` and `description`. The `description` matches the Agent tool_use call's `description` in the main session JSONL.

**Fallback for missing `.meta.json`:** The `.meta.json` files are a recent Claude Code feature. Older sessions do not have them. When absent, the Haskell tool should:
1. Match subagent JSONL files to Agent tool_use calls in the main session by timestamp overlap
2. Extract the description from the Agent tool_use `input.description` field
3. If no match is found, report the subagent with its agent ID and mark description as "unknown"

#### Time classification from timestamp gaps:

| Category | Detection |
|----------|-----------|
| Claude processing | Gap before `assistant` message |
| Tool execution | Gap between `assistant(tool_use)` → `user(tool_result)` |
| Human wait | Gap before `user` message with human text content |

Messages of type `progress`, `queue-operation`, and `file-history-snapshot` are ignored for time classification.

#### Cross-matching phases to JSONL

The phase log records `started_at`/`ended_at` timestamps. The retrospective selects all JSONL messages whose timestamps fall within each phase's window. Phase-level token counts are computed at report time from the matched JSONL messages — they are NOT stored in the database.

#### Tool result size estimation

`phToolResultSize` is computed as the total byte length of all `tool_result.content` strings within the phase's time window, divided by 4 (rough token estimate). This is a structural proxy, not exact.

### 5. Skills (MN Plugin)

Both skills live in `~/.claude/local-plugins/mn/skills/`.

#### `/mn:start-tracking`

1. Asks what the session is about (or takes description argument)
2. Determines the Claude Code session UUID by finding the most recently modified `.jsonl` file in `~/.claude/projects/<project-slug>/`. Note: the file may not exist yet at the very start of a session — the skill should verify the file exists, and if not, record the UUID after the first assistant turn
3. Inserts a row into `cc_sessions` via `analytics-db` MCP
4. Records `started_at`
5. Reminds Claude of the phase logging protocol and verdict categories

#### `/mn:session-retrospective`

1. Closes any open phases (sets `ended_at`)
2. Runs `stack exec session-retrospective -- <session-uuid>`
3. Report saved to `ai_files/session-reports/<date>-<description>.md`
4. Claude reviews and presents the report

### 6. MCP Configuration

A dedicated MCP instance `analytics-db` in `~/.claude/.mcp.json`:

```json
{
  "mcpServers": {
    "analytics-db": {
      "command": "node",
      "args": ["/home/mnehme/.claude/mcp-servers/database-testing/index.js"],
      "env": {
        "DB_HOST": "localhost",
        "DB_USER": "assets_servant",
        "DB_PASSWORD": "...",
        "DB_NAME": "claude_analytics"
      }
    }
  }
}
```

Uses the same MCP server binary as `database-testing`, just with different database credentials.

Reserved exclusively for session tracking — never reconfigured for other databases.

### 6b. Schema Creation

The Haskell executable runs `CREATE TABLE IF NOT EXISTS` on startup for both tables (same pattern as the ser platform's `sqInitSchema`). No manual schema setup required — just create the `claude_analytics` database and the tool self-initializes.

### 6c. Report Output Location

Reports are saved to `~/session-retrospective/reports/<date>-<description>.md` (inside the project directory, not inside any specific project's working tree), since this tool is project-agnostic.

### 7. Report Format

```markdown
# Session Retrospective: <date> — <description>

## Summary
| Metric | Value |
|--------|-------|
| Wall clock | ... |
| Claude time | ... |
| Tool time | ... |
| Human time | ... |
| Output tokens | ... |
| Input tokens (fresh) | ... |
| Cache read tokens | ... |
| Cache create tokens | ... |
| Phases | ... |

## Verdict Breakdown
| Verdict | Phases | Out Tkn | In Tkn | Cache Read | Cache Create | Claude Time | % of Out |
|---------|--------|---------|--------|------------|-------------|-------------|----------|
| High impact | ... | ... | ... | ... | ... | ... | ... |
| Moderate impact | ... | ... | ... | ... | ... | ... | ... |
| Small impact | ... | ... | ... | ... | ... | ... | ... |
| Exploratory (useful) | ... | ... | ... | ... | ... | ... | ... |
| Exploratory (waste) | ... | ... | ... | ... | ... | ... | ... |
| Avoidable waste | ... | ... | ... | ... | ... | ... | ... |

## Subagents
| Description | Type | Out Tkn | In Tkn | Cache Read | Cache Create | Time | Phase | Verdict |
|-------------|------|---------|--------|------------|-------------|------|-------|---------|
| ... | ... | ... | ... | ... | ... | ... | ... | ... |

## By Task
| Task | Phases | Out Tkn | In Tkn | Cache Read | Cache Create | Claude Time | Verdict Mix |
|------|--------|---------|--------|------------|-------------|-------------|-------------|
| ... | ... | ... | ... | ... | ... | ... | ... |

## Phase Detail
### 1. <task> — "<phase>" (<verdict>)
- Actions: ...
- Output tokens: ... | Input tokens: ... | Cache read: ... | Cache create: ...
- Tool-heavy turns: ... | Text turns: ...
- Tool result input bloat: ~...K tokens
- Claude time: ... | Tool time: ...
- **Useful info:** ...
- **Lessons learned:** ...

## Lessons Summary
1. ...
2. ...
```

## Explicitly Deferred

| Feature | Why |
|---------|-----|
| Cross-session index / accumulation | Start with standalone reports first |
| `/session-trends` meta-analysis | Depends on accumulation |
| Live mid-session cost dashboard | Not needed for core learning loop |
| Automatic phase detection from JSONL | Manual tagging is more accurate and is the point |
| Dollar cost estimation | Token pricing changes; token counts are stable |
| Thinking token breakdown | `thinking` content blocks exist in JSONL with text, but no separate `thinking_tokens` field — cost is folded into `output_tokens` and cannot be isolated |