# Session Retrospective Tool — Design Spec ## Problem After a Claude Code session, there is no structured way to understand which actions were impactful vs. wasteful, how tokens were spent across tasks, or what could be improved. The built-in `/cost` command shows aggregate USD but no breakdown by task, phase, or impact. ## Objective Enable Claude Code to self-assess its own efficiency during a session, then produce a retrospective report that cross-references qualitative impact judgments with quantitative token and time data. This creates a feedback loop for identifying what was high-impact and cheap vs. expensive and wasteful. ## Architecture Overview ``` During Session After Session ───────────── ───────────── /mn:start-tracking /mn:session-retrospective │ │ ▼ ▼ analytics-db MCP Haskell executable │ │ ▼ ├─→ Reads JSONL files claude_analytics DB │ (tokens, timestamps, tools) ┌──────────────┐ │ │ cc_sessions │◄─────────────────────├─→ Queries cc_session_phases │ cc_phases │ │ (verdicts, tasks, notes) └──────────────┘ │ ├─→ Reads .meta.json │ (subagent descriptions) │ ▼ Markdown report ai_files/session-reports/ ``` ## Components ### 1. Database: `claude_analytics` A standalone PostgreSQL database, separate from any project database. Accessed via a dedicated `analytics-db` MCP instance in `~/.claude/.mcp.json`. #### `cc_sessions` | Column | Type | Notes | |--------|------|-------| | `id` | `SERIAL PRIMARY KEY` | | | `session_uuid` | `TEXT NOT NULL UNIQUE` | Claude Code session UUID | | `project_slug` | `TEXT NOT NULL` | e.g. `-var-www-assets-cedrusconsult-com-ser-dev` | | `started_at` | `TIMESTAMPTZ NOT NULL` | | | `ended_at` | `TIMESTAMPTZ` | Filled by retrospective | | `description` | `TEXT` | What the session was about | | `total_output_tokens` | `INT` | Filled by retrospective | | `total_input_tokens` | `INT` | Filled by retrospective | | `total_cache_read_tokens` | `INT` | Filled by retrospective | | `total_cache_create_tokens` | `INT` | Filled by retrospective | | `claude_time_seconds` | `REAL` | Filled by retrospective | | `tool_time_seconds` | `REAL` | Filled by retrospective | | `human_time_seconds` | `REAL` | Filled by retrospective | #### `cc_session_phases` | Column | Type | Notes | |--------|------|-------| | `id` | `SERIAL PRIMARY KEY` | | | `session_id` | `INT REFERENCES cc_sessions` | | | `task` | `TEXT NOT NULL` | e.g. "fix-login-bug" | | `phase` | `TEXT NOT NULL` | e.g. "diagnose-redirect" | | `started_at` | `TIMESTAMPTZ NOT NULL` | | | `ended_at` | `TIMESTAMPTZ` | Filled when phase ends | | `actions_summary` | `TEXT` | e.g. "Grep, Read Server.hs, Read Auth.hs" | | `verdict` | `TEXT NOT NULL` | One of the verdict categories | | `subagents` | `TEXT` | Comma-separated descriptions of dispatched subagents | | `useful_info` | `TEXT` | Facts/code locations discovered | | `lessons_learned` | `TEXT` | Meta-observations for self-learning | | `notes` | `TEXT` | Freeform | ### 2. Verdict Categories | Verdict | Meaning | |---------|---------| | `high_impact` | Pivotal — unlocked the solution or saved significant downstream work | | `moderate_impact` | Directly contributed to the outcome | | `small_impact` | Minor direct contribution | | `exploratory_useful` | Exploration that paid off | | `exploratory_waste` | Exploration that led nowhere but was reasonable to try | | `avoidable_waste` | Should have known better | ### 3. Haskell Executable: `session-retrospective` **Location:** `~/.claude/tools/session-retrospective/` (standalone project, own `stack.yaml`) **CLI:** ```bash stack exec session-retrospective -- ``` **Modules:** | Module | Responsibility | |--------|---------------| | `SessionRetrospective.Main` | Entry point — parse args, orchestrate | | `SessionRetrospective.Jsonl` | Parse JSONL files into typed records (Aeson) | | `SessionRetrospective.Phases` | Query `cc_session_phases` from PG, match to JSONL messages by timestamp | | `SessionRetrospective.Subagents` | Parse subagent JSONLs + `.meta.json`, compute per-subagent stats | | `SessionRetrospective.TimeAnalysis` | Classify timestamp gaps into claude/tool/human time | | `SessionRetrospective.Report` | Generate markdown report from computed stats | **Key types:** ```haskell data TokenCounts = TokenCounts { tcOutput :: !Int , tcInput :: !Int , tcCacheRead :: !Int , tcCacheCreate :: !Int } data Verdict = HighImpact | ModerateImpact | SmallImpact | ExploratoryUseful | ExploratoryWaste | AvoidableWaste data Phase = Phase { phTask :: !Text , phName :: !Text , phStartedAt :: !UTCTime , phEndedAt :: !(Maybe UTCTime) -- Nothing if phase still open , phVerdict :: !Verdict , phActionsSummary :: !Text , phSubagents :: !(Maybe Text) , phUsefulInfo :: !(Maybe Text) , phLessonsLearned :: !(Maybe Text) , phNotes :: !(Maybe Text) -- Computed by joining with JSONL: , phTokens :: !TokenCounts , phClaudeTime :: !NominalDiffTime , phToolTime :: !NominalDiffTime , phToolTurns :: !Int , phTextTurns :: !Int , phToolResultSize :: !Int -- input bloat from tool results } data SubagentStats = SubagentStats { saDescription :: !Text , saAgentType :: !Text , saTokens :: !TokenCounts , saTime :: !NominalDiffTime , saPhase :: !Text , saVerdict :: !Verdict } ``` **Dependencies:** `aeson`, `postgresql-simple`, `time`, `text`, `bytestring`, `filepath`, `directory` **DB access:** Uses `postgresql-simple` (not Squeal). This is a standalone tool, not part of the ser platform `src/`. ### 4. JSONL Data Available Located at `~/.claude/projects//.jsonl` with subagents at `/subagents/agent-.jsonl`. #### Streaming Deduplication (CRITICAL) The JSONL logs **streaming events**, not final messages. Multiple entries share the same `.requestId` and represent incremental chunks from one API call. Token counts within a `requestId` are cumulative — only the final chunk has the correct totals. **Deduplication rule:** Group assistant messages by `requestId`. For each group, take the token counts from the entry with the highest `output_tokens` value (the final streaming chunk). Sum across groups for session totals. #### Per assistant message: - `.message.usage.output_tokens` — Claude generation tokens (includes thinking tokens) - `.message.usage.input_tokens` — fresh input tokens - `.message.usage.cache_read_input_tokens` — cached context - `.message.usage.cache_creation_input_tokens` — new cache entries - `.message.content[]` — tool calls (`type: "tool_use"`, `name`, `input`), text (`type: "text"`), and thinking (`type: "thinking"`, text present but no separate token count — cost is included in `output_tokens`) - `.timestamp` — ISO 8601 - `.requestId` — groups streaming chunks from a single API call #### Per user message: - `.message.content` — either a string (human text) or an array containing `{"type": "tool_result", ...}` and/or `{"type": "text", ...}` entries - `.timestamp` **Content format note:** Human text can appear as either `"content": "text"` (string) or `"content": [{"type": "text", "text": "..."}]` (array). The parser must handle both. #### Subagent matching `agent-.meta.json` contains `agentType` and `description`. The `description` matches the Agent tool_use call's `description` in the main session JSONL. **Fallback for missing `.meta.json`:** The `.meta.json` files are a recent Claude Code feature. Older sessions do not have them. When absent, the Haskell tool should: 1. Match subagent JSONL files to Agent tool_use calls in the main session by timestamp overlap 2. Extract the description from the Agent tool_use `input.description` field 3. If no match is found, report the subagent with its agent ID and mark description as "unknown" #### Time classification from timestamp gaps: | Category | Detection | |----------|-----------| | Claude processing | Gap before `assistant` message | | Tool execution | Gap between `assistant(tool_use)` → `user(tool_result)` | | Human wait | Gap before `user` message with human text content | Messages of type `progress`, `queue-operation`, and `file-history-snapshot` are ignored for time classification. #### Cross-matching phases to JSONL The phase log records `started_at`/`ended_at` timestamps. The retrospective selects all JSONL messages whose timestamps fall within each phase's window. Phase-level token counts are computed at report time from the matched JSONL messages — they are NOT stored in the database. #### Tool result size estimation `phToolResultSize` is computed as the total byte length of all `tool_result.content` strings within the phase's time window, divided by 4 (rough token estimate). This is a structural proxy, not exact. ### 5. Skills (MN Plugin) Both skills live in `~/.claude/local-plugins/mn/skills/`. #### `/mn:start-tracking` 1. Asks what the session is about (or takes description argument) 2. Determines the Claude Code session UUID by finding the most recently modified `.jsonl` file in `~/.claude/projects//`. Note: the file may not exist yet at the very start of a session — the skill should verify the file exists, and if not, record the UUID after the first assistant turn 3. Inserts a row into `cc_sessions` via `analytics-db` MCP 4. Records `started_at` 5. Reminds Claude of the phase logging protocol and verdict categories #### `/mn:session-retrospective` 1. Closes any open phases (sets `ended_at`) 2. Runs `stack exec session-retrospective -- ` 3. Report saved to `ai_files/session-reports/-.md` 4. Claude reviews and presents the report ### 6. MCP Configuration A dedicated MCP instance `analytics-db` in `~/.claude/.mcp.json`: ```json { "mcpServers": { "analytics-db": { "command": "node", "args": ["/home/mnehme/.claude/mcp-servers/database-testing/index.js"], "env": { "DB_HOST": "localhost", "DB_USER": "assets_servant", "DB_PASSWORD": "...", "DB_NAME": "claude_analytics" } } } } ``` Uses the same MCP server binary as `database-testing`, just with different database credentials. Reserved exclusively for session tracking — never reconfigured for other databases. ### 6b. Schema Creation The Haskell executable runs `CREATE TABLE IF NOT EXISTS` on startup for both tables (same pattern as the ser platform's `sqInitSchema`). No manual schema setup required — just create the `claude_analytics` database and the tool self-initializes. ### 6c. Report Output Location Reports are saved to `~/session-retrospective/reports/-.md` (inside the project directory, not inside any specific project's working tree), since this tool is project-agnostic. ### 7. Report Format ```markdown # Session Retrospective: ## Summary | Metric | Value | |--------|-------| | Wall clock | ... | | Claude time | ... | | Tool time | ... | | Human time | ... | | Output tokens | ... | | Input tokens (fresh) | ... | | Cache read tokens | ... | | Cache create tokens | ... | | Phases | ... | ## Verdict Breakdown | Verdict | Phases | Out Tkn | In Tkn | Cache Read | Cache Create | Claude Time | % of Out | |---------|--------|---------|--------|------------|-------------|-------------|----------| | High impact | ... | ... | ... | ... | ... | ... | ... | | Moderate impact | ... | ... | ... | ... | ... | ... | ... | | Small impact | ... | ... | ... | ... | ... | ... | ... | | Exploratory (useful) | ... | ... | ... | ... | ... | ... | ... | | Exploratory (waste) | ... | ... | ... | ... | ... | ... | ... | | Avoidable waste | ... | ... | ... | ... | ... | ... | ... | ## Subagents | Description | Type | Out Tkn | In Tkn | Cache Read | Cache Create | Time | Phase | Verdict | |-------------|------|---------|--------|------------|-------------|------|-------|---------| | ... | ... | ... | ... | ... | ... | ... | ... | ... | ## By Task | Task | Phases | Out Tkn | In Tkn | Cache Read | Cache Create | Claude Time | Verdict Mix | |------|--------|---------|--------|------------|-------------|-------------|-------------| | ... | ... | ... | ... | ... | ... | ... | ... | ## Phase Detail ### 1. — "" () - Actions: ... - Output tokens: ... | Input tokens: ... | Cache read: ... | Cache create: ... - Tool-heavy turns: ... | Text turns: ... - Tool result input bloat: ~...K tokens - Claude time: ... | Tool time: ... - **Useful info:** ... - **Lessons learned:** ... ## Lessons Summary 1. ... 2. ... ``` ## Explicitly Deferred | Feature | Why | |---------|-----| | Cross-session index / accumulation | Start with standalone reports first | | `/session-trends` meta-analysis | Depends on accumulation | | Live mid-session cost dashboard | Not needed for core learning loop | | Automatic phase detection from JSONL | Manual tagging is more accurate and is the point | | Dollar cost estimation | Token pricing changes; token counts are stable | | Thinking token breakdown | `thinking` content blocks exist in JSONL with text, but no separate `thinking_tokens` field — cost is folded into `output_tokens` and cannot be isolated |