Files

Michel Nehme 0033751900 Initial commit: design spec for session retrospective tool

Claude Code session efficiency analysis tool — self-assessment of
action impact + token/time retrospectives via JSONL parsing.

2026-03-20 20:45:23 +00:00

14 KiB

Raw Blame History

Session Retrospective Tool — Design Spec

Problem

After a Claude Code session, there is no structured way to understand which actions were impactful vs. wasteful, how tokens were spent across tasks, or what could be improved. The built-in /cost command shows aggregate USD but no breakdown by task, phase, or impact.

Objective

Enable Claude Code to self-assess its own efficiency during a session, then produce a retrospective report that cross-references qualitative impact judgments with quantitative token and time data. This creates a feedback loop for identifying what was high-impact and cheap vs. expensive and wasteful.

Architecture Overview

During Session                          After Session
─────────────                           ─────────────
/mn:start-tracking                      /mn:session-retrospective
     │                                       │
     ▼                                       ▼
analytics-db MCP                        Haskell executable
     │                                       │
     ▼                                       ├─→ Reads JSONL files
claude_analytics DB                     │    (tokens, timestamps, tools)
  ┌──────────────┐                      │
  │ cc_sessions  │◄─────────────────────├─→ Queries cc_session_phases
  │ cc_phases    │                      │    (verdicts, tasks, notes)
  └──────────────┘                      │
                                        ├─→ Reads .meta.json
                                        │    (subagent descriptions)
                                        │
                                        ▼
                                   Markdown report
                                   ai_files/session-reports/

Components

1. Database: `claude_analytics`

A standalone PostgreSQL database, separate from any project database. Accessed via a dedicated analytics-db MCP instance in ~/.claude/.mcp.json.

`cc_sessions`

Column	Type	Notes
`id`	`SERIAL PRIMARY KEY`
`session_uuid`	`TEXT NOT NULL UNIQUE`	Claude Code session UUID
`project_slug`	`TEXT NOT NULL`	e.g. `-var-www-assets-cedrusconsult-com-ser-dev`
`started_at`	`TIMESTAMPTZ NOT NULL`
`ended_at`	`TIMESTAMPTZ`	Filled by retrospective
`description`	`TEXT`	What the session was about
`total_output_tokens`	`INT`	Filled by retrospective
`total_input_tokens`	`INT`	Filled by retrospective
`total_cache_read_tokens`	`INT`	Filled by retrospective
`total_cache_create_tokens`	`INT`	Filled by retrospective
`claude_time_seconds`	`REAL`	Filled by retrospective
`tool_time_seconds`	`REAL`	Filled by retrospective
`human_time_seconds`	`REAL`	Filled by retrospective

`cc_session_phases`

Column	Type	Notes
`id`	`SERIAL PRIMARY KEY`
`session_id`	`INT REFERENCES cc_sessions`
`task`	`TEXT NOT NULL`	e.g. "fix-login-bug"
`phase`	`TEXT NOT NULL`	e.g. "diagnose-redirect"
`started_at`	`TIMESTAMPTZ NOT NULL`
`ended_at`	`TIMESTAMPTZ`	Filled when phase ends
`actions_summary`	`TEXT`	e.g. "Grep, Read Server.hs, Read Auth.hs"
`verdict`	`TEXT NOT NULL`	One of the verdict categories
`subagents`	`TEXT`	Comma-separated descriptions of dispatched subagents
`useful_info`	`TEXT`	Facts/code locations discovered
`lessons_learned`	`TEXT`	Meta-observations for self-learning
`notes`	`TEXT`	Freeform

2. Verdict Categories

Verdict	Meaning
`high_impact`	Pivotal — unlocked the solution or saved significant downstream work
`moderate_impact`	Directly contributed to the outcome
`small_impact`	Minor direct contribution
`exploratory_useful`	Exploration that paid off
`exploratory_waste`	Exploration that led nowhere but was reasonable to try
`avoidable_waste`	Should have known better

3. Haskell Executable: `session-retrospective`

Location: ~/.claude/tools/session-retrospective/ (standalone project, own stack.yaml)

CLI:

stack exec session-retrospective -- <session-uuid>

Modules:

Module	Responsibility
`SessionRetrospective.Main`	Entry point — parse args, orchestrate
`SessionRetrospective.Jsonl`	Parse JSONL files into typed records (Aeson)
`SessionRetrospective.Phases`	Query `cc_session_phases` from PG, match to JSONL messages by timestamp
`SessionRetrospective.Subagents`	Parse subagent JSONLs + `.meta.json`, compute per-subagent stats
`SessionRetrospective.TimeAnalysis`	Classify timestamp gaps into claude/tool/human time
`SessionRetrospective.Report`	Generate markdown report from computed stats

Key types:

data TokenCounts = TokenCounts
  { tcOutput      :: !Int
  , tcInput       :: !Int
  , tcCacheRead   :: !Int
  , tcCacheCreate :: !Int
  }

data Verdict
  = HighImpact
  | ModerateImpact
  | SmallImpact
  | ExploratoryUseful
  | ExploratoryWaste
  | AvoidableWaste

data Phase = Phase
  { phTask           :: !Text
  , phName           :: !Text
  , phStartedAt      :: !UTCTime
  , phEndedAt        :: !(Maybe UTCTime)  -- Nothing if phase still open
  , phVerdict        :: !Verdict
  , phActionsSummary :: !Text
  , phSubagents      :: !(Maybe Text)
  , phUsefulInfo     :: !(Maybe Text)
  , phLessonsLearned :: !(Maybe Text)
  , phNotes          :: !(Maybe Text)
  -- Computed by joining with JSONL:
  , phTokens         :: !TokenCounts
  , phClaudeTime     :: !NominalDiffTime
  , phToolTime       :: !NominalDiffTime
  , phToolTurns      :: !Int
  , phTextTurns      :: !Int
  , phToolResultSize :: !Int  -- input bloat from tool results
  }

data SubagentStats = SubagentStats
  { saDescription :: !Text
  , saAgentType   :: !Text
  , saTokens      :: !TokenCounts
  , saTime        :: !NominalDiffTime
  , saPhase       :: !Text
  , saVerdict     :: !Verdict
  }

Dependencies: aeson, postgresql-simple, time, text, bytestring, filepath, directory

DB access: Uses postgresql-simple (not Squeal). This is a standalone tool, not part of the ser platform src/.

4. JSONL Data Available

Located at ~/.claude/projects/<project-slug>/<session-uuid>.jsonl with subagents at <session-uuid>/subagents/agent-<id>.jsonl.

Streaming Deduplication (CRITICAL)

The JSONL logs streaming events, not final messages. Multiple entries share the same .requestId and represent incremental chunks from one API call. Token counts within a requestId are cumulative — only the final chunk has the correct totals.

Deduplication rule: Group assistant messages by requestId. For each group, take the token counts from the entry with the highest output_tokens value (the final streaming chunk). Sum across groups for session totals.

Per assistant message:

.message.usage.output_tokens — Claude generation tokens (includes thinking tokens)
.message.usage.input_tokens — fresh input tokens
.message.usage.cache_read_input_tokens — cached context
.message.usage.cache_creation_input_tokens — new cache entries
.message.content[] — tool calls (type: "tool_use", name, input), text (type: "text"), and thinking (type: "thinking", text present but no separate token count — cost is included in output_tokens)
.timestamp — ISO 8601
.requestId — groups streaming chunks from a single API call

Per user message:

.message.content — either a string (human text) or an array containing {"type": "tool_result", ...} and/or {"type": "text", ...} entries
.timestamp

Content format note: Human text can appear as either "content": "text" (string) or "content": [{"type": "text", "text": "..."}] (array). The parser must handle both.

Subagent matching

agent-<id>.meta.json contains agentType and description. The description matches the Agent tool_use call's description in the main session JSONL.

Fallback for missing .meta.json: The .meta.json files are a recent Claude Code feature. Older sessions do not have them. When absent, the Haskell tool should:

Match subagent JSONL files to Agent tool_use calls in the main session by timestamp overlap
Extract the description from the Agent tool_use input.description field
If no match is found, report the subagent with its agent ID and mark description as "unknown"

Time classification from timestamp gaps:

Category	Detection
Claude processing	Gap before `assistant` message
Tool execution	Gap between `assistant(tool_use)` → `user(tool_result)`
Human wait	Gap before `user` message with human text content

Messages of type progress, queue-operation, and file-history-snapshot are ignored for time classification.

Cross-matching phases to JSONL

The phase log records started_at/ended_at timestamps. The retrospective selects all JSONL messages whose timestamps fall within each phase's window. Phase-level token counts are computed at report time from the matched JSONL messages — they are NOT stored in the database.

Tool result size estimation

phToolResultSize is computed as the total byte length of all tool_result.content strings within the phase's time window, divided by 4 (rough token estimate). This is a structural proxy, not exact.

5. Skills (MN Plugin)

Both skills live in ~/.claude/local-plugins/mn/skills/.

`/mn:start-tracking`

Asks what the session is about (or takes description argument)
Determines the Claude Code session UUID by finding the most recently modified .jsonl file in ~/.claude/projects/<project-slug>/. Note: the file may not exist yet at the very start of a session — the skill should verify the file exists, and if not, record the UUID after the first assistant turn
Inserts a row into cc_sessions via analytics-db MCP
Records started_at
Reminds Claude of the phase logging protocol and verdict categories

`/mn:session-retrospective`

Closes any open phases (sets ended_at)
Runs stack exec session-retrospective -- <session-uuid>
Report saved to ai_files/session-reports/<date>-<description>.md
Claude reviews and presents the report

6. MCP Configuration

A dedicated MCP instance analytics-db in ~/.claude/.mcp.json:

{
  "mcpServers": {
    "analytics-db": {
      "command": "node",
      "args": ["/home/mnehme/.claude/mcp-servers/database-testing/index.js"],
      "env": {
        "DB_HOST": "localhost",
        "DB_USER": "assets_servant",
        "DB_PASSWORD": "...",
        "DB_NAME": "claude_analytics"
      }
    }
  }
}

Uses the same MCP server binary as database-testing, just with different database credentials.

Reserved exclusively for session tracking — never reconfigured for other databases.

6b. Schema Creation

The Haskell executable runs CREATE TABLE IF NOT EXISTS on startup for both tables (same pattern as the ser platform's sqInitSchema). No manual schema setup required — just create the claude_analytics database and the tool self-initializes.

6c. Report Output Location

Reports are saved to ~/session-retrospective/reports/<date>-<description>.md (inside the project directory, not inside any specific project's working tree), since this tool is project-agnostic.

7. Report Format

# Session Retrospective: <date> — <description>

## Summary
| Metric | Value |
|--------|-------|
| Wall clock | ... |
| Claude time | ... |
| Tool time | ... |
| Human time | ... |
| Output tokens | ... |
| Input tokens (fresh) | ... |
| Cache read tokens | ... |
| Cache create tokens | ... |
| Phases | ... |

## Verdict Breakdown
| Verdict | Phases | Out Tkn | In Tkn | Cache Read | Cache Create | Claude Time | % of Out |
|---------|--------|---------|--------|------------|-------------|-------------|----------|
| High impact | ... | ... | ... | ... | ... | ... | ... |
| Moderate impact | ... | ... | ... | ... | ... | ... | ... |
| Small impact | ... | ... | ... | ... | ... | ... | ... |
| Exploratory (useful) | ... | ... | ... | ... | ... | ... | ... |
| Exploratory (waste) | ... | ... | ... | ... | ... | ... | ... |
| Avoidable waste | ... | ... | ... | ... | ... | ... | ... |

## Subagents
| Description | Type | Out Tkn | In Tkn | Cache Read | Cache Create | Time | Phase | Verdict |
|-------------|------|---------|--------|------------|-------------|------|-------|---------|
| ... | ... | ... | ... | ... | ... | ... | ... | ... |

## By Task
| Task | Phases | Out Tkn | In Tkn | Cache Read | Cache Create | Claude Time | Verdict Mix |
|------|--------|---------|--------|------------|-------------|-------------|-------------|
| ... | ... | ... | ... | ... | ... | ... | ... |

## Phase Detail
### 1. <task> — "<phase>" (<verdict>)
- Actions: ...
- Output tokens: ... | Input tokens: ... | Cache read: ... | Cache create: ...
- Tool-heavy turns: ... | Text turns: ...
- Tool result input bloat: ~...K tokens
- Claude time: ... | Tool time: ...
- **Useful info:** ...
- **Lessons learned:** ...

## Lessons Summary
1. ...
2. ...

Explicitly Deferred

Feature	Why
Cross-session index / accumulation	Start with standalone reports first
`/session-trends` meta-analysis	Depends on accumulation
Live mid-session cost dashboard	Not needed for core learning loop
Automatic phase detection from JSONL	Manual tagging is more accurate and is the point
Dollar cost estimation	Token pricing changes; token counts are stable
Thinking token breakdown	`thinking` content blocks exist in JSONL with text, but no separate `thinking_tokens` field — cost is folded into `output_tokens` and cannot be isolated

14 KiB Raw Blame History