7.2 KiB
Claude Code Token Usage Data
Conversation Storage
- Path:
~/.claude/projects/<project-slug>/<session-uuid>.jsonl - Subagents:
~/.claude/projects/<project-slug>/<session-uuid>/subagents/agent-<id>.jsonl - Project slug for ser-dev:
-var-www-assets-cedrusconsult-com-ser-dev - Format: one JSON object per line (JSONL)
Streaming Deduplication (CRITICAL)
The JSONL logs streaming events, not final messages. Multiple entries share the same .requestId and represent incremental chunks from one API call. Token counts within a requestId are cumulative — only the final chunk has the correct totals.
Deduplication rule: Group entries by requestId. For each group, take the token counts from the entry with the highest output_tokens value (the final streaming chunk). Sum across groups for session totals. Naively summing all entries will massively overcount.
Per-Message Usage Fields
Located at .message.usage on messages that involve API calls (not all lines have this):
{
"input_tokens": 1,
"cache_creation_input_tokens": 205,
"cache_read_input_tokens": 76262,
"output_tokens": 308,
"server_tool_use": {
"web_search_requests": 0,
"web_fetch_requests": 0
},
"service_tier": "standard",
"cache_creation": {
"ephemeral_1h_input_tokens": 205,
"ephemeral_5m_input_tokens": 0
},
"inference_geo": "",
"iterations": [],
"speed": "standard"
}
Other Useful Fields Per Line
.timestamp— ISO 8601.uuid— unique message ID.sessionId— conversation session.type—"progress","assistant","user","file-history-snapshot".parentUuid— message threading.version— Claude Code version (e.g."2.1.79").requestId— groups streaming chunks from a single API call (critical for deduplication)
Message Types and Content Structure
Assistant messages (.type == "assistant")
.message.content is an array containing:
{"type": "text", "text": "..."}— text output shown to user{"type": "tool_use", "name": "Bash", "input": {...}}— tool invocations{"type": "thinking", "thinking": "..."}— extended thinking (text IS present in JSONL, contrary to earlier belief)
Each assistant message carries .message.usage with token counts. Thinking token cost is folded into output_tokens — there is no separate thinking_tokens field.
User messages (.type == "user")
.message.content is either a string or an array:
- String format:
"content": "the user typed this"— human-typed input - Array format:
"content": [{"type": "text", "text": "..."}, ...]{"type": "text", "text": "..."}— human-typed input{"type": "tool_result", "content": "...", "is_error": false}— tool execution results
Distinguishing human input from tool results: A user message with tool_result entries is an automatic tool response. A user message with text content (and no tool_result) is human input. Some messages contain both. Parsers must handle both string and array content formats.
Progress messages (.type == "progress")
System-level messages (skill loading, etc.). No usage data.
File history snapshots (.type == "file-history-snapshot")
Periodic snapshots of tracked file state. No usage data.
Time Analysis
Timestamps on every message allow decomposing wall clock into three categories:
| Category | How to detect | Typical range |
|---|---|---|
| Claude processing | Gap before assistant message |
1–17s |
| Tool execution | Gap between assistant(tool_use) → user(tool_result) |
0.1s (reads) to 1400s+ (builds) |
| Human wait | Gap before user(human_input) |
10s–60min+ |
Example from real session: a Bash tool call running stack build showed 1458s tool execution time; human gaps between conversation turns ranged from 50s to 54min.
Correlating Tool Calls with Token Costs
Each assistant message contains BOTH the tool calls AND the usage for that API call. To get per-tool-call cost:
- If the message has a single tool call → usage is directly attributable
- If the message has multiple parallel tool calls → usage is the combined cost (cannot split per-tool)
- The output_tokens field covers Claude's generation (tool call arguments + any text)
- The input_tokens / cache_read fields cover the full context sent to Claude for that turn
Counts
In a multi-hour session with 3 large subagent dispatches: ~583 lines with .message.usage.
As of 2026-03-20: 77 sessions exist for ser-dev, 64 of which have subagent directories.
Built-in CLI Commands
| Command | Shows | Does NOT Show |
|---|---|---|
/cost |
Aggregate USD, API duration, wall duration, lines changed | Per-type token breakdown |
/stats |
Usage patterns dialog | Raw token counts |
Thinking Tokens
thinking content blocks (type "thinking") exist in the JSONL with the full reasoning text present. There is no separate thinking_tokens field in .message.usage — thinking token cost is folded into output_tokens. This means output_tokens cannot be split into "thinking" vs "visible output" from the usage data alone, though the text content of thinking blocks is available for length-based estimation.
Subagent Cross-Matching
Subagent files are at <session-uuid>/subagents/agent-<id>.jsonl with companion agent-<id>.meta.json.
Meta file format:
{
"agentType": "general-purpose",
"description": "Review Haskell backend changes"
}
The description field matches the description argument from the Agent tool_use call in the main session JSONL. This is the join key. For duplicate descriptions, timestamps disambiguate.
.meta.json availability: This is a recent Claude Code feature. Only the most recent sessions have .meta.json files — older sessions (the vast majority as of 2026-03-20) do not. When absent, fall back to matching subagent JSONL files to Agent tool_use calls in the main session by timestamp overlap and extracting the description from the tool call's input.description field.
Subagent JSONL has the same message/usage structure as the main session — independent usage blocks that must be aggregated separately.
Tool-Specific Token Breakdown
There is no per-tool token count in the usage data. The output_tokens field is the combined cost of Claude's thinking + text + tool call arguments for that API turn. The server_tool_use field only tracks web_search_requests and web_fetch_requests (counts, not tokens) — these are always zero in sessions that don't use web search.
Structural proxies for tool cost:
- Tool argument size:
len(json.dumps(tool_input))— large Bash commands or Edit calls cost more output tokens - Tool result bloat:
len(tool_result.content)— large Read results or verbose Bash output inflate the next turn's input tokens - Cache efficiency:
cache_readvscache_createratio — high cache reads = cheap context reuse
Access Notes
- Claude (the model) has no runtime access to its own token usage
- Subagent JSONL files contain independent usage blocks — must be aggregated separately
grep '"usage"' <file> | wc -lto count API-call lines- Deep extraction:
python3 -c "..." < filewith recursive key search (not all usage is at top level)