Initial commit: design spec for session retrospective tool

Claude Code session efficiency analysis tool — self-assessment of action impact + token/time retrospectives via JSONL parsing.
2026-03-20 20:45:23 +00:00
commit 0033751900
2 changed files with 339 additions and 0 deletions
--- a/.gitignore
+++ b/.gitignore
@@ -0,0 +1,14 @@
+# Build artifacts
+.stack-work/
+*.hi
+*.o
+*.dyn_hi
+*.dyn_o
+
+# Reports (generated, local-only)
+reports/
+
+# Editor
+*~
+*.swp
+.vscode/
--- a/docs/design.md
+++ b/docs/design.md
@@ -0,0 +1,325 @@
+# Session Retrospective Tool — Design Spec
+
+## Problem
+
+After a Claude Code session, there is no structured way to understand which actions were impactful vs. wasteful, how tokens were spent across tasks, or what could be improved. The built-in `/cost` command shows aggregate USD but no breakdown by task, phase, or impact.
+
+## Objective
+
+Enable Claude Code to self-assess its own efficiency during a session, then produce a retrospective report that cross-references qualitative impact judgments with quantitative token and time data. This creates a feedback loop for identifying what was high-impact and cheap vs. expensive and wasteful.
+
+## Architecture Overview
+
+```
+During Session                          After Session
+─────────────                           ─────────────
+/mn:start-tracking                      /mn:session-retrospective
+     │                                       │
+     ▼                                       ▼
+analytics-db MCP                        Haskell executable
+     │                                       │
+     ▼                                       ├─→ Reads JSONL files
+claude_analytics DB                     │    (tokens, timestamps, tools)
+  ┌──────────────┐                      │
+  │ cc_sessions  │◄─────────────────────├─→ Queries cc_session_phases
+  │ cc_phases    │                      │    (verdicts, tasks, notes)
+  └──────────────┘                      │
+                                        ├─→ Reads .meta.json
+                                        │    (subagent descriptions)
+                                        │
+                                        ▼
+                                   Markdown report
+                                   ai_files/session-reports/
+```
+
+## Components
+
+### 1. Database: `claude_analytics`
+
+A standalone PostgreSQL database, separate from any project database. Accessed via a dedicated `analytics-db` MCP instance in `~/.claude/.mcp.json`.
+
+#### `cc_sessions`
+
+| Column | Type | Notes |
+|--------|------|-------|
+| `id` | `SERIAL PRIMARY KEY` | |
+| `session_uuid` | `TEXT NOT NULL UNIQUE` | Claude Code session UUID |
+| `project_slug` | `TEXT NOT NULL` | e.g. `-var-www-assets-cedrusconsult-com-ser-dev` |
+| `started_at` | `TIMESTAMPTZ NOT NULL` | |
+| `ended_at` | `TIMESTAMPTZ` | Filled by retrospective |
+| `description` | `TEXT` | What the session was about |
+| `total_output_tokens` | `INT` | Filled by retrospective |
+| `total_input_tokens` | `INT` | Filled by retrospective |
+| `total_cache_read_tokens` | `INT` | Filled by retrospective |
+| `total_cache_create_tokens` | `INT` | Filled by retrospective |
+| `claude_time_seconds` | `REAL` | Filled by retrospective |
+| `tool_time_seconds` | `REAL` | Filled by retrospective |
+| `human_time_seconds` | `REAL` | Filled by retrospective |
+
+#### `cc_session_phases`
+
+| Column | Type | Notes |
+|--------|------|-------|
+| `id` | `SERIAL PRIMARY KEY` | |
+| `session_id` | `INT REFERENCES cc_sessions` | |
+| `task` | `TEXT NOT NULL` | e.g. "fix-login-bug" |
+| `phase` | `TEXT NOT NULL` | e.g. "diagnose-redirect" |
+| `started_at` | `TIMESTAMPTZ NOT NULL` | |
+| `ended_at` | `TIMESTAMPTZ` | Filled when phase ends |
+| `actions_summary` | `TEXT` | e.g. "Grep, Read Server.hs, Read Auth.hs" |
+| `verdict` | `TEXT NOT NULL` | One of the verdict categories |
+| `subagents` | `TEXT` | Comma-separated descriptions of dispatched subagents |
+| `useful_info` | `TEXT` | Facts/code locations discovered |
+| `lessons_learned` | `TEXT` | Meta-observations for self-learning |
+| `notes` | `TEXT` | Freeform |
+
+### 2. Verdict Categories
+
+| Verdict | Meaning |
+|---------|---------|
+| `high_impact` | Pivotal — unlocked the solution or saved significant downstream work |
+| `moderate_impact` | Directly contributed to the outcome |
+| `small_impact` | Minor direct contribution |
+| `exploratory_useful` | Exploration that paid off |
+| `exploratory_waste` | Exploration that led nowhere but was reasonable to try |
+| `avoidable_waste` | Should have known better |
+
+### 3. Haskell Executable: `session-retrospective`
+
+**Location:** `~/.claude/tools/session-retrospective/` (standalone project, own `stack.yaml`)
+
+**CLI:**
+```bash
+stack exec session-retrospective -- <session-uuid>
+```
+
+**Modules:**
+
+| Module | Responsibility |
+|--------|---------------|
+| `SessionRetrospective.Main` | Entry point — parse args, orchestrate |
+| `SessionRetrospective.Jsonl` | Parse JSONL files into typed records (Aeson) |
+| `SessionRetrospective.Phases` | Query `cc_session_phases` from PG, match to JSONL messages by timestamp |
+| `SessionRetrospective.Subagents` | Parse subagent JSONLs + `.meta.json`, compute per-subagent stats |
+| `SessionRetrospective.TimeAnalysis` | Classify timestamp gaps into claude/tool/human time |
+| `SessionRetrospective.Report` | Generate markdown report from computed stats |
+
+**Key types:**
+
+```haskell
+data TokenCounts = TokenCounts
+  { tcOutput      :: !Int
+  , tcInput       :: !Int
+  , tcCacheRead   :: !Int
+  , tcCacheCreate :: !Int
+  }
+
+data Verdict
+  = HighImpact
+  | ModerateImpact
+  | SmallImpact
+  | ExploratoryUseful
+  | ExploratoryWaste
+  | AvoidableWaste
+
+data Phase = Phase
+  { phTask           :: !Text
+  , phName           :: !Text
+  , phStartedAt      :: !UTCTime
+  , phEndedAt        :: !(Maybe UTCTime)  -- Nothing if phase still open
+  , phVerdict        :: !Verdict
+  , phActionsSummary :: !Text
+  , phSubagents      :: !(Maybe Text)
+  , phUsefulInfo     :: !(Maybe Text)
+  , phLessonsLearned :: !(Maybe Text)
+  , phNotes          :: !(Maybe Text)
+  -- Computed by joining with JSONL:
+  , phTokens         :: !TokenCounts
+  , phClaudeTime     :: !NominalDiffTime
+  , phToolTime       :: !NominalDiffTime
+  , phToolTurns      :: !Int
+  , phTextTurns      :: !Int
+  , phToolResultSize :: !Int  -- input bloat from tool results
+  }
+
+data SubagentStats = SubagentStats
+  { saDescription :: !Text
+  , saAgentType   :: !Text
+  , saTokens      :: !TokenCounts
+  , saTime        :: !NominalDiffTime
+  , saPhase       :: !Text
+  , saVerdict     :: !Verdict
+  }
+```
+
+**Dependencies:** `aeson`, `postgresql-simple`, `time`, `text`, `bytestring`, `filepath`, `directory`
+
+**DB access:** Uses `postgresql-simple` (not Squeal). This is a standalone tool, not part of the ser platform `src/`.
+
+### 4. JSONL Data Available
+
+Located at `~/.claude/projects/<project-slug>/<session-uuid>.jsonl` with subagents at `<session-uuid>/subagents/agent-<id>.jsonl`.
+
+#### Streaming Deduplication (CRITICAL)
+
+The JSONL logs **streaming events**, not final messages. Multiple entries share the same `.requestId` and represent incremental chunks from one API call. Token counts within a `requestId` are cumulative — only the final chunk has the correct totals.
+
+**Deduplication rule:** Group assistant messages by `requestId`. For each group, take the token counts from the entry with the highest `output_tokens` value (the final streaming chunk). Sum across groups for session totals.
+
+#### Per assistant message:
+- `.message.usage.output_tokens` — Claude generation tokens (includes thinking tokens)
+- `.message.usage.input_tokens` — fresh input tokens
+- `.message.usage.cache_read_input_tokens` — cached context
+- `.message.usage.cache_creation_input_tokens` — new cache entries
+- `.message.content[]` — tool calls (`type: "tool_use"`, `name`, `input`), text (`type: "text"`), and thinking (`type: "thinking"`, text present but no separate token count — cost is included in `output_tokens`)
+- `.timestamp` — ISO 8601
+- `.requestId` — groups streaming chunks from a single API call
+
+#### Per user message:
+- `.message.content` — either a string (human text) or an array containing `{"type": "tool_result", ...}` and/or `{"type": "text", ...}` entries
+- `.timestamp`
+
+**Content format note:** Human text can appear as either `"content": "text"` (string) or `"content": [{"type": "text", "text": "..."}]` (array). The parser must handle both.
+
+#### Subagent matching
+
+`agent-<id>.meta.json` contains `agentType` and `description`. The `description` matches the Agent tool_use call's `description` in the main session JSONL.
+
+**Fallback for missing `.meta.json`:** The `.meta.json` files are a recent Claude Code feature. Older sessions do not have them. When absent, the Haskell tool should:
+1. Match subagent JSONL files to Agent tool_use calls in the main session by timestamp overlap
+2. Extract the description from the Agent tool_use `input.description` field
+3. If no match is found, report the subagent with its agent ID and mark description as "unknown"
+
+#### Time classification from timestamp gaps:
+
+| Category | Detection |
+|----------|-----------|
+| Claude processing | Gap before `assistant` message |
+| Tool execution | Gap between `assistant(tool_use)` → `user(tool_result)` |
+| Human wait | Gap before `user` message with human text content |
+
+Messages of type `progress`, `queue-operation`, and `file-history-snapshot` are ignored for time classification.
+
+#### Cross-matching phases to JSONL
+
+The phase log records `started_at`/`ended_at` timestamps. The retrospective selects all JSONL messages whose timestamps fall within each phase's window. Phase-level token counts are computed at report time from the matched JSONL messages — they are NOT stored in the database.
+
+#### Tool result size estimation
+
+`phToolResultSize` is computed as the total byte length of all `tool_result.content` strings within the phase's time window, divided by 4 (rough token estimate). This is a structural proxy, not exact.
+
+### 5. Skills (MN Plugin)
+
+Both skills live in `~/.claude/local-plugins/mn/skills/`.
+
+#### `/mn:start-tracking`
+
+1. Asks what the session is about (or takes description argument)
+2. Determines the Claude Code session UUID by finding the most recently modified `.jsonl` file in `~/.claude/projects/<project-slug>/`. Note: the file may not exist yet at the very start of a session — the skill should verify the file exists, and if not, record the UUID after the first assistant turn
+3. Inserts a row into `cc_sessions` via `analytics-db` MCP
+4. Records `started_at`
+5. Reminds Claude of the phase logging protocol and verdict categories
+
+#### `/mn:session-retrospective`
+
+1. Closes any open phases (sets `ended_at`)
+2. Runs `stack exec session-retrospective -- <session-uuid>`
+3. Report saved to `ai_files/session-reports/<date>-<description>.md`
+4. Claude reviews and presents the report
+
+### 6. MCP Configuration
+
+A dedicated MCP instance `analytics-db` in `~/.claude/.mcp.json`:
+
+```json
+{
+  "mcpServers": {
+    "analytics-db": {
+      "command": "node",
+      "args": ["/home/mnehme/.claude/mcp-servers/database-testing/index.js"],
+      "env": {
+        "DB_HOST": "localhost",
+        "DB_USER": "assets_servant",
+        "DB_PASSWORD": "...",
+        "DB_NAME": "claude_analytics"
+      }
+    }
+  }
+}
+```
+
+Uses the same MCP server binary as `database-testing`, just with different database credentials.
+
+Reserved exclusively for session tracking — never reconfigured for other databases.
+
+### 6b. Schema Creation
+
+The Haskell executable runs `CREATE TABLE IF NOT EXISTS` on startup for both tables (same pattern as the ser platform's `sqInitSchema`). No manual schema setup required — just create the `claude_analytics` database and the tool self-initializes.
+
+### 6c. Report Output Location
+
+Reports are saved to `~/session-retrospective/reports/<date>-<description>.md` (inside the project directory, not inside any specific project's working tree), since this tool is project-agnostic.
+
+### 7. Report Format
+
+```markdown
+# Session Retrospective: <date> — <description>
+
+## Summary
+| Metric | Value |
+|--------|-------|
+| Wall clock | ... |
+| Claude time | ... |
+| Tool time | ... |
+| Human time | ... |
+| Output tokens | ... |
+| Input tokens (fresh) | ... |
+| Cache read tokens | ... |
+| Cache create tokens | ... |
+| Phases | ... |
+
+## Verdict Breakdown
+| Verdict | Phases | Out Tkn | In Tkn | Cache Read | Cache Create | Claude Time | % of Out |
+|---------|--------|---------|--------|------------|-------------|-------------|----------|
+| High impact | ... | ... | ... | ... | ... | ... | ... |
+| Moderate impact | ... | ... | ... | ... | ... | ... | ... |
+| Small impact | ... | ... | ... | ... | ... | ... | ... |
+| Exploratory (useful) | ... | ... | ... | ... | ... | ... | ... |
+| Exploratory (waste) | ... | ... | ... | ... | ... | ... | ... |
+| Avoidable waste | ... | ... | ... | ... | ... | ... | ... |
+
+## Subagents
+| Description | Type | Out Tkn | In Tkn | Cache Read | Cache Create | Time | Phase | Verdict |
+|-------------|------|---------|--------|------------|-------------|------|-------|---------|
+| ... | ... | ... | ... | ... | ... | ... | ... | ... |
+
+## By Task
+| Task | Phases | Out Tkn | In Tkn | Cache Read | Cache Create | Claude Time | Verdict Mix |
+|------|--------|---------|--------|------------|-------------|-------------|-------------|
+| ... | ... | ... | ... | ... | ... | ... | ... |
+
+## Phase Detail
+### 1. <task> — "<phase>" (<verdict>)
+- Actions: ...
+- Output tokens: ... | Input tokens: ... | Cache read: ... | Cache create: ...
+- Tool-heavy turns: ... | Text turns: ...
+- Tool result input bloat: ~...K tokens
+- Claude time: ... | Tool time: ...
+- **Useful info:** ...
+- **Lessons learned:** ...
+
+## Lessons Summary
+1. ...
+2. ...
+```
+
+## Explicitly Deferred
+
+| Feature | Why |
+|---------|-----|
+| Cross-session index / accumulation | Start with standalone reports first |
+| `/session-trends` meta-analysis | Depends on accumulation |
+| Live mid-session cost dashboard | Not needed for core learning loop |
+| Automatic phase detection from JSONL | Manual tagging is more accurate and is the point |
+| Dollar cost estimation | Token pricing changes; token counts are stable |
+| Thinking token breakdown | `thinking` content blocks exist in JSONL with text, but no separate `thinking_tokens` field — cost is folded into `output_tokens` and cannot be isolated |