Add Claude Code token data reference doc
This commit is contained in:
149
docs/claude-code-token-data.md
Normal file
149
docs/claude-code-token-data.md
Normal file
@@ -0,0 +1,149 @@
|
||||
# Claude Code Token Usage Data
|
||||
|
||||
## Conversation Storage
|
||||
|
||||
- Path: `~/.claude/projects/<project-slug>/<session-uuid>.jsonl`
|
||||
- Subagents: `~/.claude/projects/<project-slug>/<session-uuid>/subagents/agent-<id>.jsonl`
|
||||
- Project slug for ser-dev: `-var-www-assets-cedrusconsult-com-ser-dev`
|
||||
- Format: one JSON object per line (JSONL)
|
||||
|
||||
## Streaming Deduplication (CRITICAL)
|
||||
|
||||
The JSONL logs **streaming events**, not final messages. Multiple entries share the same `.requestId` and represent incremental chunks from one API call. Token counts within a `requestId` are cumulative — only the final chunk has the correct totals.
|
||||
|
||||
**Deduplication rule:** Group entries by `requestId`. For each group, take the token counts from the entry with the highest `output_tokens` value (the final streaming chunk). Sum across groups for session totals. Naively summing all entries will massively overcount.
|
||||
|
||||
## Per-Message Usage Fields
|
||||
|
||||
Located at `.message.usage` on messages that involve API calls (not all lines have this):
|
||||
|
||||
```json
|
||||
{
|
||||
"input_tokens": 1,
|
||||
"cache_creation_input_tokens": 205,
|
||||
"cache_read_input_tokens": 76262,
|
||||
"output_tokens": 308,
|
||||
"server_tool_use": {
|
||||
"web_search_requests": 0,
|
||||
"web_fetch_requests": 0
|
||||
},
|
||||
"service_tier": "standard",
|
||||
"cache_creation": {
|
||||
"ephemeral_1h_input_tokens": 205,
|
||||
"ephemeral_5m_input_tokens": 0
|
||||
},
|
||||
"inference_geo": "",
|
||||
"iterations": [],
|
||||
"speed": "standard"
|
||||
}
|
||||
```
|
||||
|
||||
## Other Useful Fields Per Line
|
||||
|
||||
- `.timestamp` — ISO 8601
|
||||
- `.uuid` — unique message ID
|
||||
- `.sessionId` — conversation session
|
||||
- `.type` — `"progress"`, `"assistant"`, `"user"`, `"file-history-snapshot"`
|
||||
- `.parentUuid` — message threading
|
||||
- `.version` — Claude Code version (e.g. `"2.1.79"`)
|
||||
- `.requestId` — groups streaming chunks from a single API call (critical for deduplication)
|
||||
|
||||
## Message Types and Content Structure
|
||||
|
||||
### Assistant messages (`.type == "assistant"`)
|
||||
|
||||
`.message.content` is an array containing:
|
||||
- `{"type": "text", "text": "..."}` — text output shown to user
|
||||
- `{"type": "tool_use", "name": "Bash", "input": {...}}` — tool invocations
|
||||
- `{"type": "thinking", "thinking": "..."}` — extended thinking (text IS present in JSONL, contrary to earlier belief)
|
||||
|
||||
Each assistant message carries `.message.usage` with token counts. Thinking token cost is folded into `output_tokens` — there is no separate `thinking_tokens` field.
|
||||
|
||||
### User messages (`.type == "user"`)
|
||||
|
||||
`.message.content` is either a string or an array:
|
||||
- String format: `"content": "the user typed this"` — human-typed input
|
||||
- Array format: `"content": [{"type": "text", "text": "..."}, ...]`
|
||||
- `{"type": "text", "text": "..."}` — human-typed input
|
||||
- `{"type": "tool_result", "content": "...", "is_error": false}` — tool execution results
|
||||
|
||||
**Distinguishing human input from tool results:** A user message with `tool_result` entries is an automatic tool response. A user message with `text` content (and no `tool_result`) is human input. Some messages contain both. Parsers must handle both string and array content formats.
|
||||
|
||||
### Progress messages (`.type == "progress"`)
|
||||
|
||||
System-level messages (skill loading, etc.). No usage data.
|
||||
|
||||
### File history snapshots (`.type == "file-history-snapshot"`)
|
||||
|
||||
Periodic snapshots of tracked file state. No usage data.
|
||||
|
||||
## Time Analysis
|
||||
|
||||
Timestamps on every message allow decomposing wall clock into three categories:
|
||||
|
||||
| Category | How to detect | Typical range |
|
||||
|----------|--------------|---------------|
|
||||
| **Claude processing** | Gap before `assistant` message | 1–17s |
|
||||
| **Tool execution** | Gap between `assistant(tool_use)` → `user(tool_result)` | 0.1s (reads) to 1400s+ (builds) |
|
||||
| **Human wait** | Gap before `user(human_input)` | 10s–60min+ |
|
||||
|
||||
Example from real session: a `Bash` tool call running `stack build` showed 1458s tool execution time; human gaps between conversation turns ranged from 50s to 54min.
|
||||
|
||||
## Correlating Tool Calls with Token Costs
|
||||
|
||||
Each assistant message contains BOTH the tool calls AND the usage for that API call. To get per-tool-call cost:
|
||||
- If the message has a single tool call → usage is directly attributable
|
||||
- If the message has multiple parallel tool calls → usage is the combined cost (cannot split per-tool)
|
||||
- The output_tokens field covers Claude's generation (tool call arguments + any text)
|
||||
- The input_tokens / cache_read fields cover the full context sent to Claude for that turn
|
||||
|
||||
## Counts
|
||||
|
||||
In a multi-hour session with 3 large subagent dispatches: ~583 lines with `.message.usage`.
|
||||
|
||||
As of 2026-03-20: 77 sessions exist for ser-dev, 64 of which have subagent directories.
|
||||
|
||||
## Built-in CLI Commands
|
||||
|
||||
| Command | Shows | Does NOT Show |
|
||||
|---------|-------|---------------|
|
||||
| `/cost` | Aggregate USD, API duration, wall duration, lines changed | Per-type token breakdown |
|
||||
| `/stats` | Usage patterns dialog | Raw token counts |
|
||||
|
||||
## Thinking Tokens
|
||||
|
||||
`thinking` content blocks (type `"thinking"`) exist in the JSONL with the full reasoning text present. There is no separate `thinking_tokens` field in `.message.usage` — thinking token cost is folded into `output_tokens`. This means output_tokens cannot be split into "thinking" vs "visible output" from the usage data alone, though the text content of thinking blocks is available for length-based estimation.
|
||||
|
||||
## Subagent Cross-Matching
|
||||
|
||||
Subagent files are at `<session-uuid>/subagents/agent-<id>.jsonl` with companion `agent-<id>.meta.json`.
|
||||
|
||||
**Meta file format:**
|
||||
```json
|
||||
{
|
||||
"agentType": "general-purpose",
|
||||
"description": "Review Haskell backend changes"
|
||||
}
|
||||
```
|
||||
|
||||
The `description` field matches the `description` argument from the `Agent` tool_use call in the main session JSONL. This is the join key. For duplicate descriptions, timestamps disambiguate.
|
||||
|
||||
**`.meta.json` availability:** This is a recent Claude Code feature. Only the most recent sessions have `.meta.json` files — older sessions (the vast majority as of 2026-03-20) do not. When absent, fall back to matching subagent JSONL files to Agent tool_use calls in the main session by timestamp overlap and extracting the description from the tool call's `input.description` field.
|
||||
|
||||
**Subagent JSONL** has the same message/usage structure as the main session — independent usage blocks that must be aggregated separately.
|
||||
|
||||
## Tool-Specific Token Breakdown
|
||||
|
||||
There is **no per-tool token count** in the usage data. The `output_tokens` field is the combined cost of Claude's thinking + text + tool call arguments for that API turn. The `server_tool_use` field only tracks `web_search_requests` and `web_fetch_requests` (counts, not tokens) — these are always zero in sessions that don't use web search.
|
||||
|
||||
**Structural proxies for tool cost:**
|
||||
- Tool argument size: `len(json.dumps(tool_input))` — large Bash commands or Edit calls cost more output tokens
|
||||
- Tool result bloat: `len(tool_result.content)` — large Read results or verbose Bash output inflate the next turn's input tokens
|
||||
- Cache efficiency: `cache_read` vs `cache_create` ratio — high cache reads = cheap context reuse
|
||||
|
||||
## Access Notes
|
||||
|
||||
- Claude (the model) has no runtime access to its own token usage
|
||||
- Subagent JSONL files contain independent usage blocks — must be aggregated separately
|
||||
- `grep '"usage"' <file> | wc -l` to count API-call lines
|
||||
- Deep extraction: `python3 -c "..." < file` with recursive key search (not all usage is at top level)
|
||||
Reference in New Issue
Block a user