Files
session-retrospective/docs/claude-code-token-data.md

150 lines
7.2 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Claude Code Token Usage Data
## Conversation Storage
- Path: `~/.claude/projects/<project-slug>/<session-uuid>.jsonl`
- Subagents: `~/.claude/projects/<project-slug>/<session-uuid>/subagents/agent-<id>.jsonl`
- Project slug for ser-dev: `-var-www-assets-cedrusconsult-com-ser-dev`
- Format: one JSON object per line (JSONL)
## Streaming Deduplication (CRITICAL)
The JSONL logs **streaming events**, not final messages. Multiple entries share the same `.requestId` and represent incremental chunks from one API call. Token counts within a `requestId` are cumulative — only the final chunk has the correct totals.
**Deduplication rule:** Group entries by `requestId`. For each group, take the token counts from the entry with the highest `output_tokens` value (the final streaming chunk). Sum across groups for session totals. Naively summing all entries will massively overcount.
## Per-Message Usage Fields
Located at `.message.usage` on messages that involve API calls (not all lines have this):
```json
{
"input_tokens": 1,
"cache_creation_input_tokens": 205,
"cache_read_input_tokens": 76262,
"output_tokens": 308,
"server_tool_use": {
"web_search_requests": 0,
"web_fetch_requests": 0
},
"service_tier": "standard",
"cache_creation": {
"ephemeral_1h_input_tokens": 205,
"ephemeral_5m_input_tokens": 0
},
"inference_geo": "",
"iterations": [],
"speed": "standard"
}
```
## Other Useful Fields Per Line
- `.timestamp` — ISO 8601
- `.uuid` — unique message ID
- `.sessionId` — conversation session
- `.type``"progress"`, `"assistant"`, `"user"`, `"file-history-snapshot"`
- `.parentUuid` — message threading
- `.version` — Claude Code version (e.g. `"2.1.79"`)
- `.requestId` — groups streaming chunks from a single API call (critical for deduplication)
## Message Types and Content Structure
### Assistant messages (`.type == "assistant"`)
`.message.content` is an array containing:
- `{"type": "text", "text": "..."}` — text output shown to user
- `{"type": "tool_use", "name": "Bash", "input": {...}}` — tool invocations
- `{"type": "thinking", "thinking": "..."}` — extended thinking (text IS present in JSONL, contrary to earlier belief)
Each assistant message carries `.message.usage` with token counts. Thinking token cost is folded into `output_tokens` — there is no separate `thinking_tokens` field.
### User messages (`.type == "user"`)
`.message.content` is either a string or an array:
- String format: `"content": "the user typed this"` — human-typed input
- Array format: `"content": [{"type": "text", "text": "..."}, ...]`
- `{"type": "text", "text": "..."}` — human-typed input
- `{"type": "tool_result", "content": "...", "is_error": false}` — tool execution results
**Distinguishing human input from tool results:** A user message with `tool_result` entries is an automatic tool response. A user message with `text` content (and no `tool_result`) is human input. Some messages contain both. Parsers must handle both string and array content formats.
### Progress messages (`.type == "progress"`)
System-level messages (skill loading, etc.). No usage data.
### File history snapshots (`.type == "file-history-snapshot"`)
Periodic snapshots of tracked file state. No usage data.
## Time Analysis
Timestamps on every message allow decomposing wall clock into three categories:
| Category | How to detect | Typical range |
|----------|--------------|---------------|
| **Claude processing** | Gap before `assistant` message | 117s |
| **Tool execution** | Gap between `assistant(tool_use)``user(tool_result)` | 0.1s (reads) to 1400s+ (builds) |
| **Human wait** | Gap before `user(human_input)` | 10s60min+ |
Example from real session: a `Bash` tool call running `stack build` showed 1458s tool execution time; human gaps between conversation turns ranged from 50s to 54min.
## Correlating Tool Calls with Token Costs
Each assistant message contains BOTH the tool calls AND the usage for that API call. To get per-tool-call cost:
- If the message has a single tool call → usage is directly attributable
- If the message has multiple parallel tool calls → usage is the combined cost (cannot split per-tool)
- The output_tokens field covers Claude's generation (tool call arguments + any text)
- The input_tokens / cache_read fields cover the full context sent to Claude for that turn
## Counts
In a multi-hour session with 3 large subagent dispatches: ~583 lines with `.message.usage`.
As of 2026-03-20: 77 sessions exist for ser-dev, 64 of which have subagent directories.
## Built-in CLI Commands
| Command | Shows | Does NOT Show |
|---------|-------|---------------|
| `/cost` | Aggregate USD, API duration, wall duration, lines changed | Per-type token breakdown |
| `/stats` | Usage patterns dialog | Raw token counts |
## Thinking Tokens
`thinking` content blocks (type `"thinking"`) exist in the JSONL with the full reasoning text present. There is no separate `thinking_tokens` field in `.message.usage` — thinking token cost is folded into `output_tokens`. This means output_tokens cannot be split into "thinking" vs "visible output" from the usage data alone, though the text content of thinking blocks is available for length-based estimation.
## Subagent Cross-Matching
Subagent files are at `<session-uuid>/subagents/agent-<id>.jsonl` with companion `agent-<id>.meta.json`.
**Meta file format:**
```json
{
"agentType": "general-purpose",
"description": "Review Haskell backend changes"
}
```
The `description` field matches the `description` argument from the `Agent` tool_use call in the main session JSONL. This is the join key. For duplicate descriptions, timestamps disambiguate.
**`.meta.json` availability:** This is a recent Claude Code feature. Only the most recent sessions have `.meta.json` files — older sessions (the vast majority as of 2026-03-20) do not. When absent, fall back to matching subagent JSONL files to Agent tool_use calls in the main session by timestamp overlap and extracting the description from the tool call's `input.description` field.
**Subagent JSONL** has the same message/usage structure as the main session — independent usage blocks that must be aggregated separately.
## Tool-Specific Token Breakdown
There is **no per-tool token count** in the usage data. The `output_tokens` field is the combined cost of Claude's thinking + text + tool call arguments for that API turn. The `server_tool_use` field only tracks `web_search_requests` and `web_fetch_requests` (counts, not tokens) — these are always zero in sessions that don't use web search.
**Structural proxies for tool cost:**
- Tool argument size: `len(json.dumps(tool_input))` — large Bash commands or Edit calls cost more output tokens
- Tool result bloat: `len(tool_result.content)` — large Read results or verbose Bash output inflate the next turn's input tokens
- Cache efficiency: `cache_read` vs `cache_create` ratio — high cache reads = cheap context reuse
## Access Notes
- Claude (the model) has no runtime access to its own token usage
- Subagent JSONL files contain independent usage blocks — must be aggregated separately
- `grep '"usage"' <file> | wc -l` to count API-call lines
- Deep extraction: `python3 -c "..." < file` with recursive key search (not all usage is at top level)