Files
Michel Nehme 0033751900 Initial commit: design spec for session retrospective tool
Claude Code session efficiency analysis tool — self-assessment of
action impact + token/time retrospectives via JSONL parsing.
2026-03-20 20:45:23 +00:00

326 lines
14 KiB
Markdown

# Session Retrospective Tool — Design Spec
## Problem
After a Claude Code session, there is no structured way to understand which actions were impactful vs. wasteful, how tokens were spent across tasks, or what could be improved. The built-in `/cost` command shows aggregate USD but no breakdown by task, phase, or impact.
## Objective
Enable Claude Code to self-assess its own efficiency during a session, then produce a retrospective report that cross-references qualitative impact judgments with quantitative token and time data. This creates a feedback loop for identifying what was high-impact and cheap vs. expensive and wasteful.
## Architecture Overview
```
During Session After Session
───────────── ─────────────
/mn:start-tracking /mn:session-retrospective
│ │
▼ ▼
analytics-db MCP Haskell executable
│ │
▼ ├─→ Reads JSONL files
claude_analytics DB │ (tokens, timestamps, tools)
┌──────────────┐ │
│ cc_sessions │◄─────────────────────├─→ Queries cc_session_phases
│ cc_phases │ │ (verdicts, tasks, notes)
└──────────────┘ │
├─→ Reads .meta.json
│ (subagent descriptions)
Markdown report
ai_files/session-reports/
```
## Components
### 1. Database: `claude_analytics`
A standalone PostgreSQL database, separate from any project database. Accessed via a dedicated `analytics-db` MCP instance in `~/.claude/.mcp.json`.
#### `cc_sessions`
| Column | Type | Notes |
|--------|------|-------|
| `id` | `SERIAL PRIMARY KEY` | |
| `session_uuid` | `TEXT NOT NULL UNIQUE` | Claude Code session UUID |
| `project_slug` | `TEXT NOT NULL` | e.g. `-var-www-assets-cedrusconsult-com-ser-dev` |
| `started_at` | `TIMESTAMPTZ NOT NULL` | |
| `ended_at` | `TIMESTAMPTZ` | Filled by retrospective |
| `description` | `TEXT` | What the session was about |
| `total_output_tokens` | `INT` | Filled by retrospective |
| `total_input_tokens` | `INT` | Filled by retrospective |
| `total_cache_read_tokens` | `INT` | Filled by retrospective |
| `total_cache_create_tokens` | `INT` | Filled by retrospective |
| `claude_time_seconds` | `REAL` | Filled by retrospective |
| `tool_time_seconds` | `REAL` | Filled by retrospective |
| `human_time_seconds` | `REAL` | Filled by retrospective |
#### `cc_session_phases`
| Column | Type | Notes |
|--------|------|-------|
| `id` | `SERIAL PRIMARY KEY` | |
| `session_id` | `INT REFERENCES cc_sessions` | |
| `task` | `TEXT NOT NULL` | e.g. "fix-login-bug" |
| `phase` | `TEXT NOT NULL` | e.g. "diagnose-redirect" |
| `started_at` | `TIMESTAMPTZ NOT NULL` | |
| `ended_at` | `TIMESTAMPTZ` | Filled when phase ends |
| `actions_summary` | `TEXT` | e.g. "Grep, Read Server.hs, Read Auth.hs" |
| `verdict` | `TEXT NOT NULL` | One of the verdict categories |
| `subagents` | `TEXT` | Comma-separated descriptions of dispatched subagents |
| `useful_info` | `TEXT` | Facts/code locations discovered |
| `lessons_learned` | `TEXT` | Meta-observations for self-learning |
| `notes` | `TEXT` | Freeform |
### 2. Verdict Categories
| Verdict | Meaning |
|---------|---------|
| `high_impact` | Pivotal — unlocked the solution or saved significant downstream work |
| `moderate_impact` | Directly contributed to the outcome |
| `small_impact` | Minor direct contribution |
| `exploratory_useful` | Exploration that paid off |
| `exploratory_waste` | Exploration that led nowhere but was reasonable to try |
| `avoidable_waste` | Should have known better |
### 3. Haskell Executable: `session-retrospective`
**Location:** `~/.claude/tools/session-retrospective/` (standalone project, own `stack.yaml`)
**CLI:**
```bash
stack exec session-retrospective -- <session-uuid>
```
**Modules:**
| Module | Responsibility |
|--------|---------------|
| `SessionRetrospective.Main` | Entry point — parse args, orchestrate |
| `SessionRetrospective.Jsonl` | Parse JSONL files into typed records (Aeson) |
| `SessionRetrospective.Phases` | Query `cc_session_phases` from PG, match to JSONL messages by timestamp |
| `SessionRetrospective.Subagents` | Parse subagent JSONLs + `.meta.json`, compute per-subagent stats |
| `SessionRetrospective.TimeAnalysis` | Classify timestamp gaps into claude/tool/human time |
| `SessionRetrospective.Report` | Generate markdown report from computed stats |
**Key types:**
```haskell
data TokenCounts = TokenCounts
{ tcOutput :: !Int
, tcInput :: !Int
, tcCacheRead :: !Int
, tcCacheCreate :: !Int
}
data Verdict
= HighImpact
| ModerateImpact
| SmallImpact
| ExploratoryUseful
| ExploratoryWaste
| AvoidableWaste
data Phase = Phase
{ phTask :: !Text
, phName :: !Text
, phStartedAt :: !UTCTime
, phEndedAt :: !(Maybe UTCTime) -- Nothing if phase still open
, phVerdict :: !Verdict
, phActionsSummary :: !Text
, phSubagents :: !(Maybe Text)
, phUsefulInfo :: !(Maybe Text)
, phLessonsLearned :: !(Maybe Text)
, phNotes :: !(Maybe Text)
-- Computed by joining with JSONL:
, phTokens :: !TokenCounts
, phClaudeTime :: !NominalDiffTime
, phToolTime :: !NominalDiffTime
, phToolTurns :: !Int
, phTextTurns :: !Int
, phToolResultSize :: !Int -- input bloat from tool results
}
data SubagentStats = SubagentStats
{ saDescription :: !Text
, saAgentType :: !Text
, saTokens :: !TokenCounts
, saTime :: !NominalDiffTime
, saPhase :: !Text
, saVerdict :: !Verdict
}
```
**Dependencies:** `aeson`, `postgresql-simple`, `time`, `text`, `bytestring`, `filepath`, `directory`
**DB access:** Uses `postgresql-simple` (not Squeal). This is a standalone tool, not part of the ser platform `src/`.
### 4. JSONL Data Available
Located at `~/.claude/projects/<project-slug>/<session-uuid>.jsonl` with subagents at `<session-uuid>/subagents/agent-<id>.jsonl`.
#### Streaming Deduplication (CRITICAL)
The JSONL logs **streaming events**, not final messages. Multiple entries share the same `.requestId` and represent incremental chunks from one API call. Token counts within a `requestId` are cumulative — only the final chunk has the correct totals.
**Deduplication rule:** Group assistant messages by `requestId`. For each group, take the token counts from the entry with the highest `output_tokens` value (the final streaming chunk). Sum across groups for session totals.
#### Per assistant message:
- `.message.usage.output_tokens` — Claude generation tokens (includes thinking tokens)
- `.message.usage.input_tokens` — fresh input tokens
- `.message.usage.cache_read_input_tokens` — cached context
- `.message.usage.cache_creation_input_tokens` — new cache entries
- `.message.content[]` — tool calls (`type: "tool_use"`, `name`, `input`), text (`type: "text"`), and thinking (`type: "thinking"`, text present but no separate token count — cost is included in `output_tokens`)
- `.timestamp` — ISO 8601
- `.requestId` — groups streaming chunks from a single API call
#### Per user message:
- `.message.content` — either a string (human text) or an array containing `{"type": "tool_result", ...}` and/or `{"type": "text", ...}` entries
- `.timestamp`
**Content format note:** Human text can appear as either `"content": "text"` (string) or `"content": [{"type": "text", "text": "..."}]` (array). The parser must handle both.
#### Subagent matching
`agent-<id>.meta.json` contains `agentType` and `description`. The `description` matches the Agent tool_use call's `description` in the main session JSONL.
**Fallback for missing `.meta.json`:** The `.meta.json` files are a recent Claude Code feature. Older sessions do not have them. When absent, the Haskell tool should:
1. Match subagent JSONL files to Agent tool_use calls in the main session by timestamp overlap
2. Extract the description from the Agent tool_use `input.description` field
3. If no match is found, report the subagent with its agent ID and mark description as "unknown"
#### Time classification from timestamp gaps:
| Category | Detection |
|----------|-----------|
| Claude processing | Gap before `assistant` message |
| Tool execution | Gap between `assistant(tool_use)``user(tool_result)` |
| Human wait | Gap before `user` message with human text content |
Messages of type `progress`, `queue-operation`, and `file-history-snapshot` are ignored for time classification.
#### Cross-matching phases to JSONL
The phase log records `started_at`/`ended_at` timestamps. The retrospective selects all JSONL messages whose timestamps fall within each phase's window. Phase-level token counts are computed at report time from the matched JSONL messages — they are NOT stored in the database.
#### Tool result size estimation
`phToolResultSize` is computed as the total byte length of all `tool_result.content` strings within the phase's time window, divided by 4 (rough token estimate). This is a structural proxy, not exact.
### 5. Skills (MN Plugin)
Both skills live in `~/.claude/local-plugins/mn/skills/`.
#### `/mn:start-tracking`
1. Asks what the session is about (or takes description argument)
2. Determines the Claude Code session UUID by finding the most recently modified `.jsonl` file in `~/.claude/projects/<project-slug>/`. Note: the file may not exist yet at the very start of a session — the skill should verify the file exists, and if not, record the UUID after the first assistant turn
3. Inserts a row into `cc_sessions` via `analytics-db` MCP
4. Records `started_at`
5. Reminds Claude of the phase logging protocol and verdict categories
#### `/mn:session-retrospective`
1. Closes any open phases (sets `ended_at`)
2. Runs `stack exec session-retrospective -- <session-uuid>`
3. Report saved to `ai_files/session-reports/<date>-<description>.md`
4. Claude reviews and presents the report
### 6. MCP Configuration
A dedicated MCP instance `analytics-db` in `~/.claude/.mcp.json`:
```json
{
"mcpServers": {
"analytics-db": {
"command": "node",
"args": ["/home/mnehme/.claude/mcp-servers/database-testing/index.js"],
"env": {
"DB_HOST": "localhost",
"DB_USER": "assets_servant",
"DB_PASSWORD": "...",
"DB_NAME": "claude_analytics"
}
}
}
}
```
Uses the same MCP server binary as `database-testing`, just with different database credentials.
Reserved exclusively for session tracking — never reconfigured for other databases.
### 6b. Schema Creation
The Haskell executable runs `CREATE TABLE IF NOT EXISTS` on startup for both tables (same pattern as the ser platform's `sqInitSchema`). No manual schema setup required — just create the `claude_analytics` database and the tool self-initializes.
### 6c. Report Output Location
Reports are saved to `~/session-retrospective/reports/<date>-<description>.md` (inside the project directory, not inside any specific project's working tree), since this tool is project-agnostic.
### 7. Report Format
```markdown
# Session Retrospective: <date> — <description>
## Summary
| Metric | Value |
|--------|-------|
| Wall clock | ... |
| Claude time | ... |
| Tool time | ... |
| Human time | ... |
| Output tokens | ... |
| Input tokens (fresh) | ... |
| Cache read tokens | ... |
| Cache create tokens | ... |
| Phases | ... |
## Verdict Breakdown
| Verdict | Phases | Out Tkn | In Tkn | Cache Read | Cache Create | Claude Time | % of Out |
|---------|--------|---------|--------|------------|-------------|-------------|----------|
| High impact | ... | ... | ... | ... | ... | ... | ... |
| Moderate impact | ... | ... | ... | ... | ... | ... | ... |
| Small impact | ... | ... | ... | ... | ... | ... | ... |
| Exploratory (useful) | ... | ... | ... | ... | ... | ... | ... |
| Exploratory (waste) | ... | ... | ... | ... | ... | ... | ... |
| Avoidable waste | ... | ... | ... | ... | ... | ... | ... |
## Subagents
| Description | Type | Out Tkn | In Tkn | Cache Read | Cache Create | Time | Phase | Verdict |
|-------------|------|---------|--------|------------|-------------|------|-------|---------|
| ... | ... | ... | ... | ... | ... | ... | ... | ... |
## By Task
| Task | Phases | Out Tkn | In Tkn | Cache Read | Cache Create | Claude Time | Verdict Mix |
|------|--------|---------|--------|------------|-------------|-------------|-------------|
| ... | ... | ... | ... | ... | ... | ... | ... |
## Phase Detail
### 1. <task> — "<phase>" (<verdict>)
- Actions: ...
- Output tokens: ... | Input tokens: ... | Cache read: ... | Cache create: ...
- Tool-heavy turns: ... | Text turns: ...
- Tool result input bloat: ~...K tokens
- Claude time: ... | Tool time: ...
- **Useful info:** ...
- **Lessons learned:** ...
## Lessons Summary
1. ...
2. ...
```
## Explicitly Deferred
| Feature | Why |
|---------|-----|
| Cross-session index / accumulation | Start with standalone reports first |
| `/session-trends` meta-analysis | Depends on accumulation |
| Live mid-session cost dashboard | Not needed for core learning loop |
| Automatic phase detection from JSONL | Manual tagging is more accurate and is the point |
| Dollar cost estimation | Token pricing changes; token counts are stable |
| Thinking token breakdown | `thinking` content blocks exist in JSONL with text, but no separate `thinking_tokens` field — cost is folded into `output_tokens` and cannot be isolated |