Files
session-retrospective/docs/design.md
Michel Nehme 0033751900 Initial commit: design spec for session retrospective tool
Claude Code session efficiency analysis tool — self-assessment of
action impact + token/time retrospectives via JSONL parsing.
2026-03-20 20:45:23 +00:00

14 KiB

Session Retrospective Tool — Design Spec

Problem

After a Claude Code session, there is no structured way to understand which actions were impactful vs. wasteful, how tokens were spent across tasks, or what could be improved. The built-in /cost command shows aggregate USD but no breakdown by task, phase, or impact.

Objective

Enable Claude Code to self-assess its own efficiency during a session, then produce a retrospective report that cross-references qualitative impact judgments with quantitative token and time data. This creates a feedback loop for identifying what was high-impact and cheap vs. expensive and wasteful.

Architecture Overview

During Session                          After Session
─────────────                           ─────────────
/mn:start-tracking                      /mn:session-retrospective
     │                                       │
     ▼                                       ▼
analytics-db MCP                        Haskell executable
     │                                       │
     ▼                                       ├─→ Reads JSONL files
claude_analytics DB                     │    (tokens, timestamps, tools)
  ┌──────────────┐                      │
  │ cc_sessions  │◄─────────────────────├─→ Queries cc_session_phases
  │ cc_phases    │                      │    (verdicts, tasks, notes)
  └──────────────┘                      │
                                        ├─→ Reads .meta.json
                                        │    (subagent descriptions)
                                        │
                                        ▼
                                   Markdown report
                                   ai_files/session-reports/

Components

1. Database: claude_analytics

A standalone PostgreSQL database, separate from any project database. Accessed via a dedicated analytics-db MCP instance in ~/.claude/.mcp.json.

cc_sessions

Column Type Notes
id SERIAL PRIMARY KEY
session_uuid TEXT NOT NULL UNIQUE Claude Code session UUID
project_slug TEXT NOT NULL e.g. -var-www-assets-cedrusconsult-com-ser-dev
started_at TIMESTAMPTZ NOT NULL
ended_at TIMESTAMPTZ Filled by retrospective
description TEXT What the session was about
total_output_tokens INT Filled by retrospective
total_input_tokens INT Filled by retrospective
total_cache_read_tokens INT Filled by retrospective
total_cache_create_tokens INT Filled by retrospective
claude_time_seconds REAL Filled by retrospective
tool_time_seconds REAL Filled by retrospective
human_time_seconds REAL Filled by retrospective

cc_session_phases

Column Type Notes
id SERIAL PRIMARY KEY
session_id INT REFERENCES cc_sessions
task TEXT NOT NULL e.g. "fix-login-bug"
phase TEXT NOT NULL e.g. "diagnose-redirect"
started_at TIMESTAMPTZ NOT NULL
ended_at TIMESTAMPTZ Filled when phase ends
actions_summary TEXT e.g. "Grep, Read Server.hs, Read Auth.hs"
verdict TEXT NOT NULL One of the verdict categories
subagents TEXT Comma-separated descriptions of dispatched subagents
useful_info TEXT Facts/code locations discovered
lessons_learned TEXT Meta-observations for self-learning
notes TEXT Freeform

2. Verdict Categories

Verdict Meaning
high_impact Pivotal — unlocked the solution or saved significant downstream work
moderate_impact Directly contributed to the outcome
small_impact Minor direct contribution
exploratory_useful Exploration that paid off
exploratory_waste Exploration that led nowhere but was reasonable to try
avoidable_waste Should have known better

3. Haskell Executable: session-retrospective

Location: ~/.claude/tools/session-retrospective/ (standalone project, own stack.yaml)

CLI:

stack exec session-retrospective -- <session-uuid>

Modules:

Module Responsibility
SessionRetrospective.Main Entry point — parse args, orchestrate
SessionRetrospective.Jsonl Parse JSONL files into typed records (Aeson)
SessionRetrospective.Phases Query cc_session_phases from PG, match to JSONL messages by timestamp
SessionRetrospective.Subagents Parse subagent JSONLs + .meta.json, compute per-subagent stats
SessionRetrospective.TimeAnalysis Classify timestamp gaps into claude/tool/human time
SessionRetrospective.Report Generate markdown report from computed stats

Key types:

data TokenCounts = TokenCounts
  { tcOutput      :: !Int
  , tcInput       :: !Int
  , tcCacheRead   :: !Int
  , tcCacheCreate :: !Int
  }

data Verdict
  = HighImpact
  | ModerateImpact
  | SmallImpact
  | ExploratoryUseful
  | ExploratoryWaste
  | AvoidableWaste

data Phase = Phase
  { phTask           :: !Text
  , phName           :: !Text
  , phStartedAt      :: !UTCTime
  , phEndedAt        :: !(Maybe UTCTime)  -- Nothing if phase still open
  , phVerdict        :: !Verdict
  , phActionsSummary :: !Text
  , phSubagents      :: !(Maybe Text)
  , phUsefulInfo     :: !(Maybe Text)
  , phLessonsLearned :: !(Maybe Text)
  , phNotes          :: !(Maybe Text)
  -- Computed by joining with JSONL:
  , phTokens         :: !TokenCounts
  , phClaudeTime     :: !NominalDiffTime
  , phToolTime       :: !NominalDiffTime
  , phToolTurns      :: !Int
  , phTextTurns      :: !Int
  , phToolResultSize :: !Int  -- input bloat from tool results
  }

data SubagentStats = SubagentStats
  { saDescription :: !Text
  , saAgentType   :: !Text
  , saTokens      :: !TokenCounts
  , saTime        :: !NominalDiffTime
  , saPhase       :: !Text
  , saVerdict     :: !Verdict
  }

Dependencies: aeson, postgresql-simple, time, text, bytestring, filepath, directory

DB access: Uses postgresql-simple (not Squeal). This is a standalone tool, not part of the ser platform src/.

4. JSONL Data Available

Located at ~/.claude/projects/<project-slug>/<session-uuid>.jsonl with subagents at <session-uuid>/subagents/agent-<id>.jsonl.

Streaming Deduplication (CRITICAL)

The JSONL logs streaming events, not final messages. Multiple entries share the same .requestId and represent incremental chunks from one API call. Token counts within a requestId are cumulative — only the final chunk has the correct totals.

Deduplication rule: Group assistant messages by requestId. For each group, take the token counts from the entry with the highest output_tokens value (the final streaming chunk). Sum across groups for session totals.

Per assistant message:

  • .message.usage.output_tokens — Claude generation tokens (includes thinking tokens)
  • .message.usage.input_tokens — fresh input tokens
  • .message.usage.cache_read_input_tokens — cached context
  • .message.usage.cache_creation_input_tokens — new cache entries
  • .message.content[] — tool calls (type: "tool_use", name, input), text (type: "text"), and thinking (type: "thinking", text present but no separate token count — cost is included in output_tokens)
  • .timestamp — ISO 8601
  • .requestId — groups streaming chunks from a single API call

Per user message:

  • .message.content — either a string (human text) or an array containing {"type": "tool_result", ...} and/or {"type": "text", ...} entries
  • .timestamp

Content format note: Human text can appear as either "content": "text" (string) or "content": [{"type": "text", "text": "..."}] (array). The parser must handle both.

Subagent matching

agent-<id>.meta.json contains agentType and description. The description matches the Agent tool_use call's description in the main session JSONL.

Fallback for missing .meta.json: The .meta.json files are a recent Claude Code feature. Older sessions do not have them. When absent, the Haskell tool should:

  1. Match subagent JSONL files to Agent tool_use calls in the main session by timestamp overlap
  2. Extract the description from the Agent tool_use input.description field
  3. If no match is found, report the subagent with its agent ID and mark description as "unknown"

Time classification from timestamp gaps:

Category Detection
Claude processing Gap before assistant message
Tool execution Gap between assistant(tool_use)user(tool_result)
Human wait Gap before user message with human text content

Messages of type progress, queue-operation, and file-history-snapshot are ignored for time classification.

Cross-matching phases to JSONL

The phase log records started_at/ended_at timestamps. The retrospective selects all JSONL messages whose timestamps fall within each phase's window. Phase-level token counts are computed at report time from the matched JSONL messages — they are NOT stored in the database.

Tool result size estimation

phToolResultSize is computed as the total byte length of all tool_result.content strings within the phase's time window, divided by 4 (rough token estimate). This is a structural proxy, not exact.

5. Skills (MN Plugin)

Both skills live in ~/.claude/local-plugins/mn/skills/.

/mn:start-tracking

  1. Asks what the session is about (or takes description argument)
  2. Determines the Claude Code session UUID by finding the most recently modified .jsonl file in ~/.claude/projects/<project-slug>/. Note: the file may not exist yet at the very start of a session — the skill should verify the file exists, and if not, record the UUID after the first assistant turn
  3. Inserts a row into cc_sessions via analytics-db MCP
  4. Records started_at
  5. Reminds Claude of the phase logging protocol and verdict categories

/mn:session-retrospective

  1. Closes any open phases (sets ended_at)
  2. Runs stack exec session-retrospective -- <session-uuid>
  3. Report saved to ai_files/session-reports/<date>-<description>.md
  4. Claude reviews and presents the report

6. MCP Configuration

A dedicated MCP instance analytics-db in ~/.claude/.mcp.json:

{
  "mcpServers": {
    "analytics-db": {
      "command": "node",
      "args": ["/home/mnehme/.claude/mcp-servers/database-testing/index.js"],
      "env": {
        "DB_HOST": "localhost",
        "DB_USER": "assets_servant",
        "DB_PASSWORD": "...",
        "DB_NAME": "claude_analytics"
      }
    }
  }
}

Uses the same MCP server binary as database-testing, just with different database credentials.

Reserved exclusively for session tracking — never reconfigured for other databases.

6b. Schema Creation

The Haskell executable runs CREATE TABLE IF NOT EXISTS on startup for both tables (same pattern as the ser platform's sqInitSchema). No manual schema setup required — just create the claude_analytics database and the tool self-initializes.

6c. Report Output Location

Reports are saved to ~/session-retrospective/reports/<date>-<description>.md (inside the project directory, not inside any specific project's working tree), since this tool is project-agnostic.

7. Report Format

# Session Retrospective: <date> — <description>

## Summary
| Metric | Value |
|--------|-------|
| Wall clock | ... |
| Claude time | ... |
| Tool time | ... |
| Human time | ... |
| Output tokens | ... |
| Input tokens (fresh) | ... |
| Cache read tokens | ... |
| Cache create tokens | ... |
| Phases | ... |

## Verdict Breakdown
| Verdict | Phases | Out Tkn | In Tkn | Cache Read | Cache Create | Claude Time | % of Out |
|---------|--------|---------|--------|------------|-------------|-------------|----------|
| High impact | ... | ... | ... | ... | ... | ... | ... |
| Moderate impact | ... | ... | ... | ... | ... | ... | ... |
| Small impact | ... | ... | ... | ... | ... | ... | ... |
| Exploratory (useful) | ... | ... | ... | ... | ... | ... | ... |
| Exploratory (waste) | ... | ... | ... | ... | ... | ... | ... |
| Avoidable waste | ... | ... | ... | ... | ... | ... | ... |

## Subagents
| Description | Type | Out Tkn | In Tkn | Cache Read | Cache Create | Time | Phase | Verdict |
|-------------|------|---------|--------|------------|-------------|------|-------|---------|
| ... | ... | ... | ... | ... | ... | ... | ... | ... |

## By Task
| Task | Phases | Out Tkn | In Tkn | Cache Read | Cache Create | Claude Time | Verdict Mix |
|------|--------|---------|--------|------------|-------------|-------------|-------------|
| ... | ... | ... | ... | ... | ... | ... | ... |

## Phase Detail
### 1. <task> — "<phase>" (<verdict>)
- Actions: ...
- Output tokens: ... | Input tokens: ... | Cache read: ... | Cache create: ...
- Tool-heavy turns: ... | Text turns: ...
- Tool result input bloat: ~...K tokens
- Claude time: ... | Tool time: ...
- **Useful info:** ...
- **Lessons learned:** ...

## Lessons Summary
1. ...
2. ...

Explicitly Deferred

Feature Why
Cross-session index / accumulation Start with standalone reports first
/session-trends meta-analysis Depends on accumulation
Live mid-session cost dashboard Not needed for core learning loop
Automatic phase detection from JSONL Manual tagging is more accurate and is the point
Dollar cost estimation Token pricing changes; token counts are stable
Thinking token breakdown thinking content blocks exist in JSONL with text, but no separate thinking_tokens field — cost is folded into output_tokens and cannot be isolated