Initial commit: design spec for session retrospective tool
Claude Code session efficiency analysis tool — self-assessment of action impact + token/time retrospectives via JSONL parsing.
This commit is contained in:
14
.gitignore
vendored
Normal file
14
.gitignore
vendored
Normal file
@@ -0,0 +1,14 @@
|
||||
# Build artifacts
|
||||
.stack-work/
|
||||
*.hi
|
||||
*.o
|
||||
*.dyn_hi
|
||||
*.dyn_o
|
||||
|
||||
# Reports (generated, local-only)
|
||||
reports/
|
||||
|
||||
# Editor
|
||||
*~
|
||||
*.swp
|
||||
.vscode/
|
||||
325
docs/design.md
Normal file
325
docs/design.md
Normal file
@@ -0,0 +1,325 @@
|
||||
# Session Retrospective Tool — Design Spec
|
||||
|
||||
## Problem
|
||||
|
||||
After a Claude Code session, there is no structured way to understand which actions were impactful vs. wasteful, how tokens were spent across tasks, or what could be improved. The built-in `/cost` command shows aggregate USD but no breakdown by task, phase, or impact.
|
||||
|
||||
## Objective
|
||||
|
||||
Enable Claude Code to self-assess its own efficiency during a session, then produce a retrospective report that cross-references qualitative impact judgments with quantitative token and time data. This creates a feedback loop for identifying what was high-impact and cheap vs. expensive and wasteful.
|
||||
|
||||
## Architecture Overview
|
||||
|
||||
```
|
||||
During Session After Session
|
||||
───────────── ─────────────
|
||||
/mn:start-tracking /mn:session-retrospective
|
||||
│ │
|
||||
▼ ▼
|
||||
analytics-db MCP Haskell executable
|
||||
│ │
|
||||
▼ ├─→ Reads JSONL files
|
||||
claude_analytics DB │ (tokens, timestamps, tools)
|
||||
┌──────────────┐ │
|
||||
│ cc_sessions │◄─────────────────────├─→ Queries cc_session_phases
|
||||
│ cc_phases │ │ (verdicts, tasks, notes)
|
||||
└──────────────┘ │
|
||||
├─→ Reads .meta.json
|
||||
│ (subagent descriptions)
|
||||
│
|
||||
▼
|
||||
Markdown report
|
||||
ai_files/session-reports/
|
||||
```
|
||||
|
||||
## Components
|
||||
|
||||
### 1. Database: `claude_analytics`
|
||||
|
||||
A standalone PostgreSQL database, separate from any project database. Accessed via a dedicated `analytics-db` MCP instance in `~/.claude/.mcp.json`.
|
||||
|
||||
#### `cc_sessions`
|
||||
|
||||
| Column | Type | Notes |
|
||||
|--------|------|-------|
|
||||
| `id` | `SERIAL PRIMARY KEY` | |
|
||||
| `session_uuid` | `TEXT NOT NULL UNIQUE` | Claude Code session UUID |
|
||||
| `project_slug` | `TEXT NOT NULL` | e.g. `-var-www-assets-cedrusconsult-com-ser-dev` |
|
||||
| `started_at` | `TIMESTAMPTZ NOT NULL` | |
|
||||
| `ended_at` | `TIMESTAMPTZ` | Filled by retrospective |
|
||||
| `description` | `TEXT` | What the session was about |
|
||||
| `total_output_tokens` | `INT` | Filled by retrospective |
|
||||
| `total_input_tokens` | `INT` | Filled by retrospective |
|
||||
| `total_cache_read_tokens` | `INT` | Filled by retrospective |
|
||||
| `total_cache_create_tokens` | `INT` | Filled by retrospective |
|
||||
| `claude_time_seconds` | `REAL` | Filled by retrospective |
|
||||
| `tool_time_seconds` | `REAL` | Filled by retrospective |
|
||||
| `human_time_seconds` | `REAL` | Filled by retrospective |
|
||||
|
||||
#### `cc_session_phases`
|
||||
|
||||
| Column | Type | Notes |
|
||||
|--------|------|-------|
|
||||
| `id` | `SERIAL PRIMARY KEY` | |
|
||||
| `session_id` | `INT REFERENCES cc_sessions` | |
|
||||
| `task` | `TEXT NOT NULL` | e.g. "fix-login-bug" |
|
||||
| `phase` | `TEXT NOT NULL` | e.g. "diagnose-redirect" |
|
||||
| `started_at` | `TIMESTAMPTZ NOT NULL` | |
|
||||
| `ended_at` | `TIMESTAMPTZ` | Filled when phase ends |
|
||||
| `actions_summary` | `TEXT` | e.g. "Grep, Read Server.hs, Read Auth.hs" |
|
||||
| `verdict` | `TEXT NOT NULL` | One of the verdict categories |
|
||||
| `subagents` | `TEXT` | Comma-separated descriptions of dispatched subagents |
|
||||
| `useful_info` | `TEXT` | Facts/code locations discovered |
|
||||
| `lessons_learned` | `TEXT` | Meta-observations for self-learning |
|
||||
| `notes` | `TEXT` | Freeform |
|
||||
|
||||
### 2. Verdict Categories
|
||||
|
||||
| Verdict | Meaning |
|
||||
|---------|---------|
|
||||
| `high_impact` | Pivotal — unlocked the solution or saved significant downstream work |
|
||||
| `moderate_impact` | Directly contributed to the outcome |
|
||||
| `small_impact` | Minor direct contribution |
|
||||
| `exploratory_useful` | Exploration that paid off |
|
||||
| `exploratory_waste` | Exploration that led nowhere but was reasonable to try |
|
||||
| `avoidable_waste` | Should have known better |
|
||||
|
||||
### 3. Haskell Executable: `session-retrospective`
|
||||
|
||||
**Location:** `~/.claude/tools/session-retrospective/` (standalone project, own `stack.yaml`)
|
||||
|
||||
**CLI:**
|
||||
```bash
|
||||
stack exec session-retrospective -- <session-uuid>
|
||||
```
|
||||
|
||||
**Modules:**
|
||||
|
||||
| Module | Responsibility |
|
||||
|--------|---------------|
|
||||
| `SessionRetrospective.Main` | Entry point — parse args, orchestrate |
|
||||
| `SessionRetrospective.Jsonl` | Parse JSONL files into typed records (Aeson) |
|
||||
| `SessionRetrospective.Phases` | Query `cc_session_phases` from PG, match to JSONL messages by timestamp |
|
||||
| `SessionRetrospective.Subagents` | Parse subagent JSONLs + `.meta.json`, compute per-subagent stats |
|
||||
| `SessionRetrospective.TimeAnalysis` | Classify timestamp gaps into claude/tool/human time |
|
||||
| `SessionRetrospective.Report` | Generate markdown report from computed stats |
|
||||
|
||||
**Key types:**
|
||||
|
||||
```haskell
|
||||
data TokenCounts = TokenCounts
|
||||
{ tcOutput :: !Int
|
||||
, tcInput :: !Int
|
||||
, tcCacheRead :: !Int
|
||||
, tcCacheCreate :: !Int
|
||||
}
|
||||
|
||||
data Verdict
|
||||
= HighImpact
|
||||
| ModerateImpact
|
||||
| SmallImpact
|
||||
| ExploratoryUseful
|
||||
| ExploratoryWaste
|
||||
| AvoidableWaste
|
||||
|
||||
data Phase = Phase
|
||||
{ phTask :: !Text
|
||||
, phName :: !Text
|
||||
, phStartedAt :: !UTCTime
|
||||
, phEndedAt :: !(Maybe UTCTime) -- Nothing if phase still open
|
||||
, phVerdict :: !Verdict
|
||||
, phActionsSummary :: !Text
|
||||
, phSubagents :: !(Maybe Text)
|
||||
, phUsefulInfo :: !(Maybe Text)
|
||||
, phLessonsLearned :: !(Maybe Text)
|
||||
, phNotes :: !(Maybe Text)
|
||||
-- Computed by joining with JSONL:
|
||||
, phTokens :: !TokenCounts
|
||||
, phClaudeTime :: !NominalDiffTime
|
||||
, phToolTime :: !NominalDiffTime
|
||||
, phToolTurns :: !Int
|
||||
, phTextTurns :: !Int
|
||||
, phToolResultSize :: !Int -- input bloat from tool results
|
||||
}
|
||||
|
||||
data SubagentStats = SubagentStats
|
||||
{ saDescription :: !Text
|
||||
, saAgentType :: !Text
|
||||
, saTokens :: !TokenCounts
|
||||
, saTime :: !NominalDiffTime
|
||||
, saPhase :: !Text
|
||||
, saVerdict :: !Verdict
|
||||
}
|
||||
```
|
||||
|
||||
**Dependencies:** `aeson`, `postgresql-simple`, `time`, `text`, `bytestring`, `filepath`, `directory`
|
||||
|
||||
**DB access:** Uses `postgresql-simple` (not Squeal). This is a standalone tool, not part of the ser platform `src/`.
|
||||
|
||||
### 4. JSONL Data Available
|
||||
|
||||
Located at `~/.claude/projects/<project-slug>/<session-uuid>.jsonl` with subagents at `<session-uuid>/subagents/agent-<id>.jsonl`.
|
||||
|
||||
#### Streaming Deduplication (CRITICAL)
|
||||
|
||||
The JSONL logs **streaming events**, not final messages. Multiple entries share the same `.requestId` and represent incremental chunks from one API call. Token counts within a `requestId` are cumulative — only the final chunk has the correct totals.
|
||||
|
||||
**Deduplication rule:** Group assistant messages by `requestId`. For each group, take the token counts from the entry with the highest `output_tokens` value (the final streaming chunk). Sum across groups for session totals.
|
||||
|
||||
#### Per assistant message:
|
||||
- `.message.usage.output_tokens` — Claude generation tokens (includes thinking tokens)
|
||||
- `.message.usage.input_tokens` — fresh input tokens
|
||||
- `.message.usage.cache_read_input_tokens` — cached context
|
||||
- `.message.usage.cache_creation_input_tokens` — new cache entries
|
||||
- `.message.content[]` — tool calls (`type: "tool_use"`, `name`, `input`), text (`type: "text"`), and thinking (`type: "thinking"`, text present but no separate token count — cost is included in `output_tokens`)
|
||||
- `.timestamp` — ISO 8601
|
||||
- `.requestId` — groups streaming chunks from a single API call
|
||||
|
||||
#### Per user message:
|
||||
- `.message.content` — either a string (human text) or an array containing `{"type": "tool_result", ...}` and/or `{"type": "text", ...}` entries
|
||||
- `.timestamp`
|
||||
|
||||
**Content format note:** Human text can appear as either `"content": "text"` (string) or `"content": [{"type": "text", "text": "..."}]` (array). The parser must handle both.
|
||||
|
||||
#### Subagent matching
|
||||
|
||||
`agent-<id>.meta.json` contains `agentType` and `description`. The `description` matches the Agent tool_use call's `description` in the main session JSONL.
|
||||
|
||||
**Fallback for missing `.meta.json`:** The `.meta.json` files are a recent Claude Code feature. Older sessions do not have them. When absent, the Haskell tool should:
|
||||
1. Match subagent JSONL files to Agent tool_use calls in the main session by timestamp overlap
|
||||
2. Extract the description from the Agent tool_use `input.description` field
|
||||
3. If no match is found, report the subagent with its agent ID and mark description as "unknown"
|
||||
|
||||
#### Time classification from timestamp gaps:
|
||||
|
||||
| Category | Detection |
|
||||
|----------|-----------|
|
||||
| Claude processing | Gap before `assistant` message |
|
||||
| Tool execution | Gap between `assistant(tool_use)` → `user(tool_result)` |
|
||||
| Human wait | Gap before `user` message with human text content |
|
||||
|
||||
Messages of type `progress`, `queue-operation`, and `file-history-snapshot` are ignored for time classification.
|
||||
|
||||
#### Cross-matching phases to JSONL
|
||||
|
||||
The phase log records `started_at`/`ended_at` timestamps. The retrospective selects all JSONL messages whose timestamps fall within each phase's window. Phase-level token counts are computed at report time from the matched JSONL messages — they are NOT stored in the database.
|
||||
|
||||
#### Tool result size estimation
|
||||
|
||||
`phToolResultSize` is computed as the total byte length of all `tool_result.content` strings within the phase's time window, divided by 4 (rough token estimate). This is a structural proxy, not exact.
|
||||
|
||||
### 5. Skills (MN Plugin)
|
||||
|
||||
Both skills live in `~/.claude/local-plugins/mn/skills/`.
|
||||
|
||||
#### `/mn:start-tracking`
|
||||
|
||||
1. Asks what the session is about (or takes description argument)
|
||||
2. Determines the Claude Code session UUID by finding the most recently modified `.jsonl` file in `~/.claude/projects/<project-slug>/`. Note: the file may not exist yet at the very start of a session — the skill should verify the file exists, and if not, record the UUID after the first assistant turn
|
||||
3. Inserts a row into `cc_sessions` via `analytics-db` MCP
|
||||
4. Records `started_at`
|
||||
5. Reminds Claude of the phase logging protocol and verdict categories
|
||||
|
||||
#### `/mn:session-retrospective`
|
||||
|
||||
1. Closes any open phases (sets `ended_at`)
|
||||
2. Runs `stack exec session-retrospective -- <session-uuid>`
|
||||
3. Report saved to `ai_files/session-reports/<date>-<description>.md`
|
||||
4. Claude reviews and presents the report
|
||||
|
||||
### 6. MCP Configuration
|
||||
|
||||
A dedicated MCP instance `analytics-db` in `~/.claude/.mcp.json`:
|
||||
|
||||
```json
|
||||
{
|
||||
"mcpServers": {
|
||||
"analytics-db": {
|
||||
"command": "node",
|
||||
"args": ["/home/mnehme/.claude/mcp-servers/database-testing/index.js"],
|
||||
"env": {
|
||||
"DB_HOST": "localhost",
|
||||
"DB_USER": "assets_servant",
|
||||
"DB_PASSWORD": "...",
|
||||
"DB_NAME": "claude_analytics"
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
Uses the same MCP server binary as `database-testing`, just with different database credentials.
|
||||
|
||||
Reserved exclusively for session tracking — never reconfigured for other databases.
|
||||
|
||||
### 6b. Schema Creation
|
||||
|
||||
The Haskell executable runs `CREATE TABLE IF NOT EXISTS` on startup for both tables (same pattern as the ser platform's `sqInitSchema`). No manual schema setup required — just create the `claude_analytics` database and the tool self-initializes.
|
||||
|
||||
### 6c. Report Output Location
|
||||
|
||||
Reports are saved to `~/session-retrospective/reports/<date>-<description>.md` (inside the project directory, not inside any specific project's working tree), since this tool is project-agnostic.
|
||||
|
||||
### 7. Report Format
|
||||
|
||||
```markdown
|
||||
# Session Retrospective: <date> — <description>
|
||||
|
||||
## Summary
|
||||
| Metric | Value |
|
||||
|--------|-------|
|
||||
| Wall clock | ... |
|
||||
| Claude time | ... |
|
||||
| Tool time | ... |
|
||||
| Human time | ... |
|
||||
| Output tokens | ... |
|
||||
| Input tokens (fresh) | ... |
|
||||
| Cache read tokens | ... |
|
||||
| Cache create tokens | ... |
|
||||
| Phases | ... |
|
||||
|
||||
## Verdict Breakdown
|
||||
| Verdict | Phases | Out Tkn | In Tkn | Cache Read | Cache Create | Claude Time | % of Out |
|
||||
|---------|--------|---------|--------|------------|-------------|-------------|----------|
|
||||
| High impact | ... | ... | ... | ... | ... | ... | ... |
|
||||
| Moderate impact | ... | ... | ... | ... | ... | ... | ... |
|
||||
| Small impact | ... | ... | ... | ... | ... | ... | ... |
|
||||
| Exploratory (useful) | ... | ... | ... | ... | ... | ... | ... |
|
||||
| Exploratory (waste) | ... | ... | ... | ... | ... | ... | ... |
|
||||
| Avoidable waste | ... | ... | ... | ... | ... | ... | ... |
|
||||
|
||||
## Subagents
|
||||
| Description | Type | Out Tkn | In Tkn | Cache Read | Cache Create | Time | Phase | Verdict |
|
||||
|-------------|------|---------|--------|------------|-------------|------|-------|---------|
|
||||
| ... | ... | ... | ... | ... | ... | ... | ... | ... |
|
||||
|
||||
## By Task
|
||||
| Task | Phases | Out Tkn | In Tkn | Cache Read | Cache Create | Claude Time | Verdict Mix |
|
||||
|------|--------|---------|--------|------------|-------------|-------------|-------------|
|
||||
| ... | ... | ... | ... | ... | ... | ... | ... |
|
||||
|
||||
## Phase Detail
|
||||
### 1. <task> — "<phase>" (<verdict>)
|
||||
- Actions: ...
|
||||
- Output tokens: ... | Input tokens: ... | Cache read: ... | Cache create: ...
|
||||
- Tool-heavy turns: ... | Text turns: ...
|
||||
- Tool result input bloat: ~...K tokens
|
||||
- Claude time: ... | Tool time: ...
|
||||
- **Useful info:** ...
|
||||
- **Lessons learned:** ...
|
||||
|
||||
## Lessons Summary
|
||||
1. ...
|
||||
2. ...
|
||||
```
|
||||
|
||||
## Explicitly Deferred
|
||||
|
||||
| Feature | Why |
|
||||
|---------|-----|
|
||||
| Cross-session index / accumulation | Start with standalone reports first |
|
||||
| `/session-trends` meta-analysis | Depends on accumulation |
|
||||
| Live mid-session cost dashboard | Not needed for core learning loop |
|
||||
| Automatic phase detection from JSONL | Manual tagging is more accurate and is the point |
|
||||
| Dollar cost estimation | Token pricing changes; token counts are stable |
|
||||
| Thinking token breakdown | `thinking` content blocks exist in JSONL with text, but no separate `thinking_tokens` field — cost is folded into `output_tokens` and cannot be isolated |
|
||||
Reference in New Issue
Block a user