Claude Code session efficiency analysis tool — self-assessment of action impact + token/time retrospectives via JSONL parsing.
326 lines
14 KiB
Markdown
326 lines
14 KiB
Markdown
# Session Retrospective Tool — Design Spec
|
|
|
|
## Problem
|
|
|
|
After a Claude Code session, there is no structured way to understand which actions were impactful vs. wasteful, how tokens were spent across tasks, or what could be improved. The built-in `/cost` command shows aggregate USD but no breakdown by task, phase, or impact.
|
|
|
|
## Objective
|
|
|
|
Enable Claude Code to self-assess its own efficiency during a session, then produce a retrospective report that cross-references qualitative impact judgments with quantitative token and time data. This creates a feedback loop for identifying what was high-impact and cheap vs. expensive and wasteful.
|
|
|
|
## Architecture Overview
|
|
|
|
```
|
|
During Session After Session
|
|
───────────── ─────────────
|
|
/mn:start-tracking /mn:session-retrospective
|
|
│ │
|
|
▼ ▼
|
|
analytics-db MCP Haskell executable
|
|
│ │
|
|
▼ ├─→ Reads JSONL files
|
|
claude_analytics DB │ (tokens, timestamps, tools)
|
|
┌──────────────┐ │
|
|
│ cc_sessions │◄─────────────────────├─→ Queries cc_session_phases
|
|
│ cc_phases │ │ (verdicts, tasks, notes)
|
|
└──────────────┘ │
|
|
├─→ Reads .meta.json
|
|
│ (subagent descriptions)
|
|
│
|
|
▼
|
|
Markdown report
|
|
ai_files/session-reports/
|
|
```
|
|
|
|
## Components
|
|
|
|
### 1. Database: `claude_analytics`
|
|
|
|
A standalone PostgreSQL database, separate from any project database. Accessed via a dedicated `analytics-db` MCP instance in `~/.claude/.mcp.json`.
|
|
|
|
#### `cc_sessions`
|
|
|
|
| Column | Type | Notes |
|
|
|--------|------|-------|
|
|
| `id` | `SERIAL PRIMARY KEY` | |
|
|
| `session_uuid` | `TEXT NOT NULL UNIQUE` | Claude Code session UUID |
|
|
| `project_slug` | `TEXT NOT NULL` | e.g. `-var-www-assets-cedrusconsult-com-ser-dev` |
|
|
| `started_at` | `TIMESTAMPTZ NOT NULL` | |
|
|
| `ended_at` | `TIMESTAMPTZ` | Filled by retrospective |
|
|
| `description` | `TEXT` | What the session was about |
|
|
| `total_output_tokens` | `INT` | Filled by retrospective |
|
|
| `total_input_tokens` | `INT` | Filled by retrospective |
|
|
| `total_cache_read_tokens` | `INT` | Filled by retrospective |
|
|
| `total_cache_create_tokens` | `INT` | Filled by retrospective |
|
|
| `claude_time_seconds` | `REAL` | Filled by retrospective |
|
|
| `tool_time_seconds` | `REAL` | Filled by retrospective |
|
|
| `human_time_seconds` | `REAL` | Filled by retrospective |
|
|
|
|
#### `cc_session_phases`
|
|
|
|
| Column | Type | Notes |
|
|
|--------|------|-------|
|
|
| `id` | `SERIAL PRIMARY KEY` | |
|
|
| `session_id` | `INT REFERENCES cc_sessions` | |
|
|
| `task` | `TEXT NOT NULL` | e.g. "fix-login-bug" |
|
|
| `phase` | `TEXT NOT NULL` | e.g. "diagnose-redirect" |
|
|
| `started_at` | `TIMESTAMPTZ NOT NULL` | |
|
|
| `ended_at` | `TIMESTAMPTZ` | Filled when phase ends |
|
|
| `actions_summary` | `TEXT` | e.g. "Grep, Read Server.hs, Read Auth.hs" |
|
|
| `verdict` | `TEXT NOT NULL` | One of the verdict categories |
|
|
| `subagents` | `TEXT` | Comma-separated descriptions of dispatched subagents |
|
|
| `useful_info` | `TEXT` | Facts/code locations discovered |
|
|
| `lessons_learned` | `TEXT` | Meta-observations for self-learning |
|
|
| `notes` | `TEXT` | Freeform |
|
|
|
|
### 2. Verdict Categories
|
|
|
|
| Verdict | Meaning |
|
|
|---------|---------|
|
|
| `high_impact` | Pivotal — unlocked the solution or saved significant downstream work |
|
|
| `moderate_impact` | Directly contributed to the outcome |
|
|
| `small_impact` | Minor direct contribution |
|
|
| `exploratory_useful` | Exploration that paid off |
|
|
| `exploratory_waste` | Exploration that led nowhere but was reasonable to try |
|
|
| `avoidable_waste` | Should have known better |
|
|
|
|
### 3. Haskell Executable: `session-retrospective`
|
|
|
|
**Location:** `~/.claude/tools/session-retrospective/` (standalone project, own `stack.yaml`)
|
|
|
|
**CLI:**
|
|
```bash
|
|
stack exec session-retrospective -- <session-uuid>
|
|
```
|
|
|
|
**Modules:**
|
|
|
|
| Module | Responsibility |
|
|
|--------|---------------|
|
|
| `SessionRetrospective.Main` | Entry point — parse args, orchestrate |
|
|
| `SessionRetrospective.Jsonl` | Parse JSONL files into typed records (Aeson) |
|
|
| `SessionRetrospective.Phases` | Query `cc_session_phases` from PG, match to JSONL messages by timestamp |
|
|
| `SessionRetrospective.Subagents` | Parse subagent JSONLs + `.meta.json`, compute per-subagent stats |
|
|
| `SessionRetrospective.TimeAnalysis` | Classify timestamp gaps into claude/tool/human time |
|
|
| `SessionRetrospective.Report` | Generate markdown report from computed stats |
|
|
|
|
**Key types:**
|
|
|
|
```haskell
|
|
data TokenCounts = TokenCounts
|
|
{ tcOutput :: !Int
|
|
, tcInput :: !Int
|
|
, tcCacheRead :: !Int
|
|
, tcCacheCreate :: !Int
|
|
}
|
|
|
|
data Verdict
|
|
= HighImpact
|
|
| ModerateImpact
|
|
| SmallImpact
|
|
| ExploratoryUseful
|
|
| ExploratoryWaste
|
|
| AvoidableWaste
|
|
|
|
data Phase = Phase
|
|
{ phTask :: !Text
|
|
, phName :: !Text
|
|
, phStartedAt :: !UTCTime
|
|
, phEndedAt :: !(Maybe UTCTime) -- Nothing if phase still open
|
|
, phVerdict :: !Verdict
|
|
, phActionsSummary :: !Text
|
|
, phSubagents :: !(Maybe Text)
|
|
, phUsefulInfo :: !(Maybe Text)
|
|
, phLessonsLearned :: !(Maybe Text)
|
|
, phNotes :: !(Maybe Text)
|
|
-- Computed by joining with JSONL:
|
|
, phTokens :: !TokenCounts
|
|
, phClaudeTime :: !NominalDiffTime
|
|
, phToolTime :: !NominalDiffTime
|
|
, phToolTurns :: !Int
|
|
, phTextTurns :: !Int
|
|
, phToolResultSize :: !Int -- input bloat from tool results
|
|
}
|
|
|
|
data SubagentStats = SubagentStats
|
|
{ saDescription :: !Text
|
|
, saAgentType :: !Text
|
|
, saTokens :: !TokenCounts
|
|
, saTime :: !NominalDiffTime
|
|
, saPhase :: !Text
|
|
, saVerdict :: !Verdict
|
|
}
|
|
```
|
|
|
|
**Dependencies:** `aeson`, `postgresql-simple`, `time`, `text`, `bytestring`, `filepath`, `directory`
|
|
|
|
**DB access:** Uses `postgresql-simple` (not Squeal). This is a standalone tool, not part of the ser platform `src/`.
|
|
|
|
### 4. JSONL Data Available
|
|
|
|
Located at `~/.claude/projects/<project-slug>/<session-uuid>.jsonl` with subagents at `<session-uuid>/subagents/agent-<id>.jsonl`.
|
|
|
|
#### Streaming Deduplication (CRITICAL)
|
|
|
|
The JSONL logs **streaming events**, not final messages. Multiple entries share the same `.requestId` and represent incremental chunks from one API call. Token counts within a `requestId` are cumulative — only the final chunk has the correct totals.
|
|
|
|
**Deduplication rule:** Group assistant messages by `requestId`. For each group, take the token counts from the entry with the highest `output_tokens` value (the final streaming chunk). Sum across groups for session totals.
|
|
|
|
#### Per assistant message:
|
|
- `.message.usage.output_tokens` — Claude generation tokens (includes thinking tokens)
|
|
- `.message.usage.input_tokens` — fresh input tokens
|
|
- `.message.usage.cache_read_input_tokens` — cached context
|
|
- `.message.usage.cache_creation_input_tokens` — new cache entries
|
|
- `.message.content[]` — tool calls (`type: "tool_use"`, `name`, `input`), text (`type: "text"`), and thinking (`type: "thinking"`, text present but no separate token count — cost is included in `output_tokens`)
|
|
- `.timestamp` — ISO 8601
|
|
- `.requestId` — groups streaming chunks from a single API call
|
|
|
|
#### Per user message:
|
|
- `.message.content` — either a string (human text) or an array containing `{"type": "tool_result", ...}` and/or `{"type": "text", ...}` entries
|
|
- `.timestamp`
|
|
|
|
**Content format note:** Human text can appear as either `"content": "text"` (string) or `"content": [{"type": "text", "text": "..."}]` (array). The parser must handle both.
|
|
|
|
#### Subagent matching
|
|
|
|
`agent-<id>.meta.json` contains `agentType` and `description`. The `description` matches the Agent tool_use call's `description` in the main session JSONL.
|
|
|
|
**Fallback for missing `.meta.json`:** The `.meta.json` files are a recent Claude Code feature. Older sessions do not have them. When absent, the Haskell tool should:
|
|
1. Match subagent JSONL files to Agent tool_use calls in the main session by timestamp overlap
|
|
2. Extract the description from the Agent tool_use `input.description` field
|
|
3. If no match is found, report the subagent with its agent ID and mark description as "unknown"
|
|
|
|
#### Time classification from timestamp gaps:
|
|
|
|
| Category | Detection |
|
|
|----------|-----------|
|
|
| Claude processing | Gap before `assistant` message |
|
|
| Tool execution | Gap between `assistant(tool_use)` → `user(tool_result)` |
|
|
| Human wait | Gap before `user` message with human text content |
|
|
|
|
Messages of type `progress`, `queue-operation`, and `file-history-snapshot` are ignored for time classification.
|
|
|
|
#### Cross-matching phases to JSONL
|
|
|
|
The phase log records `started_at`/`ended_at` timestamps. The retrospective selects all JSONL messages whose timestamps fall within each phase's window. Phase-level token counts are computed at report time from the matched JSONL messages — they are NOT stored in the database.
|
|
|
|
#### Tool result size estimation
|
|
|
|
`phToolResultSize` is computed as the total byte length of all `tool_result.content` strings within the phase's time window, divided by 4 (rough token estimate). This is a structural proxy, not exact.
|
|
|
|
### 5. Skills (MN Plugin)
|
|
|
|
Both skills live in `~/.claude/local-plugins/mn/skills/`.
|
|
|
|
#### `/mn:start-tracking`
|
|
|
|
1. Asks what the session is about (or takes description argument)
|
|
2. Determines the Claude Code session UUID by finding the most recently modified `.jsonl` file in `~/.claude/projects/<project-slug>/`. Note: the file may not exist yet at the very start of a session — the skill should verify the file exists, and if not, record the UUID after the first assistant turn
|
|
3. Inserts a row into `cc_sessions` via `analytics-db` MCP
|
|
4. Records `started_at`
|
|
5. Reminds Claude of the phase logging protocol and verdict categories
|
|
|
|
#### `/mn:session-retrospective`
|
|
|
|
1. Closes any open phases (sets `ended_at`)
|
|
2. Runs `stack exec session-retrospective -- <session-uuid>`
|
|
3. Report saved to `ai_files/session-reports/<date>-<description>.md`
|
|
4. Claude reviews and presents the report
|
|
|
|
### 6. MCP Configuration
|
|
|
|
A dedicated MCP instance `analytics-db` in `~/.claude/.mcp.json`:
|
|
|
|
```json
|
|
{
|
|
"mcpServers": {
|
|
"analytics-db": {
|
|
"command": "node",
|
|
"args": ["/home/mnehme/.claude/mcp-servers/database-testing/index.js"],
|
|
"env": {
|
|
"DB_HOST": "localhost",
|
|
"DB_USER": "assets_servant",
|
|
"DB_PASSWORD": "...",
|
|
"DB_NAME": "claude_analytics"
|
|
}
|
|
}
|
|
}
|
|
}
|
|
```
|
|
|
|
Uses the same MCP server binary as `database-testing`, just with different database credentials.
|
|
|
|
Reserved exclusively for session tracking — never reconfigured for other databases.
|
|
|
|
### 6b. Schema Creation
|
|
|
|
The Haskell executable runs `CREATE TABLE IF NOT EXISTS` on startup for both tables (same pattern as the ser platform's `sqInitSchema`). No manual schema setup required — just create the `claude_analytics` database and the tool self-initializes.
|
|
|
|
### 6c. Report Output Location
|
|
|
|
Reports are saved to `~/session-retrospective/reports/<date>-<description>.md` (inside the project directory, not inside any specific project's working tree), since this tool is project-agnostic.
|
|
|
|
### 7. Report Format
|
|
|
|
```markdown
|
|
# Session Retrospective: <date> — <description>
|
|
|
|
## Summary
|
|
| Metric | Value |
|
|
|--------|-------|
|
|
| Wall clock | ... |
|
|
| Claude time | ... |
|
|
| Tool time | ... |
|
|
| Human time | ... |
|
|
| Output tokens | ... |
|
|
| Input tokens (fresh) | ... |
|
|
| Cache read tokens | ... |
|
|
| Cache create tokens | ... |
|
|
| Phases | ... |
|
|
|
|
## Verdict Breakdown
|
|
| Verdict | Phases | Out Tkn | In Tkn | Cache Read | Cache Create | Claude Time | % of Out |
|
|
|---------|--------|---------|--------|------------|-------------|-------------|----------|
|
|
| High impact | ... | ... | ... | ... | ... | ... | ... |
|
|
| Moderate impact | ... | ... | ... | ... | ... | ... | ... |
|
|
| Small impact | ... | ... | ... | ... | ... | ... | ... |
|
|
| Exploratory (useful) | ... | ... | ... | ... | ... | ... | ... |
|
|
| Exploratory (waste) | ... | ... | ... | ... | ... | ... | ... |
|
|
| Avoidable waste | ... | ... | ... | ... | ... | ... | ... |
|
|
|
|
## Subagents
|
|
| Description | Type | Out Tkn | In Tkn | Cache Read | Cache Create | Time | Phase | Verdict |
|
|
|-------------|------|---------|--------|------------|-------------|------|-------|---------|
|
|
| ... | ... | ... | ... | ... | ... | ... | ... | ... |
|
|
|
|
## By Task
|
|
| Task | Phases | Out Tkn | In Tkn | Cache Read | Cache Create | Claude Time | Verdict Mix |
|
|
|------|--------|---------|--------|------------|-------------|-------------|-------------|
|
|
| ... | ... | ... | ... | ... | ... | ... | ... |
|
|
|
|
## Phase Detail
|
|
### 1. <task> — "<phase>" (<verdict>)
|
|
- Actions: ...
|
|
- Output tokens: ... | Input tokens: ... | Cache read: ... | Cache create: ...
|
|
- Tool-heavy turns: ... | Text turns: ...
|
|
- Tool result input bloat: ~...K tokens
|
|
- Claude time: ... | Tool time: ...
|
|
- **Useful info:** ...
|
|
- **Lessons learned:** ...
|
|
|
|
## Lessons Summary
|
|
1. ...
|
|
2. ...
|
|
```
|
|
|
|
## Explicitly Deferred
|
|
|
|
| Feature | Why |
|
|
|---------|-----|
|
|
| Cross-session index / accumulation | Start with standalone reports first |
|
|
| `/session-trends` meta-analysis | Depends on accumulation |
|
|
| Live mid-session cost dashboard | Not needed for core learning loop |
|
|
| Automatic phase detection from JSONL | Manual tagging is more accurate and is the point |
|
|
| Dollar cost estimation | Token pricing changes; token counts are stable |
|
|
| Thinking token breakdown | `thinking` content blocks exist in JSONL with text, but no separate `thinking_tokens` field — cost is folded into `output_tokens` and cannot be isolated |
|