Files
session-retrospective/docs/claude-code-token-data.md

7.2 KiB
Raw Blame History

Claude Code Token Usage Data

Conversation Storage

  • Path: ~/.claude/projects/<project-slug>/<session-uuid>.jsonl
  • Subagents: ~/.claude/projects/<project-slug>/<session-uuid>/subagents/agent-<id>.jsonl
  • Project slug for ser-dev: -var-www-assets-cedrusconsult-com-ser-dev
  • Format: one JSON object per line (JSONL)

Streaming Deduplication (CRITICAL)

The JSONL logs streaming events, not final messages. Multiple entries share the same .requestId and represent incremental chunks from one API call. Token counts within a requestId are cumulative — only the final chunk has the correct totals.

Deduplication rule: Group entries by requestId. For each group, take the token counts from the entry with the highest output_tokens value (the final streaming chunk). Sum across groups for session totals. Naively summing all entries will massively overcount.

Per-Message Usage Fields

Located at .message.usage on messages that involve API calls (not all lines have this):

{
  "input_tokens": 1,
  "cache_creation_input_tokens": 205,
  "cache_read_input_tokens": 76262,
  "output_tokens": 308,
  "server_tool_use": {
    "web_search_requests": 0,
    "web_fetch_requests": 0
  },
  "service_tier": "standard",
  "cache_creation": {
    "ephemeral_1h_input_tokens": 205,
    "ephemeral_5m_input_tokens": 0
  },
  "inference_geo": "",
  "iterations": [],
  "speed": "standard"
}

Other Useful Fields Per Line

  • .timestamp — ISO 8601
  • .uuid — unique message ID
  • .sessionId — conversation session
  • .type"progress", "assistant", "user", "file-history-snapshot"
  • .parentUuid — message threading
  • .version — Claude Code version (e.g. "2.1.79")
  • .requestId — groups streaming chunks from a single API call (critical for deduplication)

Message Types and Content Structure

Assistant messages (.type == "assistant")

.message.content is an array containing:

  • {"type": "text", "text": "..."} — text output shown to user
  • {"type": "tool_use", "name": "Bash", "input": {...}} — tool invocations
  • {"type": "thinking", "thinking": "..."} — extended thinking (text IS present in JSONL, contrary to earlier belief)

Each assistant message carries .message.usage with token counts. Thinking token cost is folded into output_tokens — there is no separate thinking_tokens field.

User messages (.type == "user")

.message.content is either a string or an array:

  • String format: "content": "the user typed this" — human-typed input
  • Array format: "content": [{"type": "text", "text": "..."}, ...]
    • {"type": "text", "text": "..."} — human-typed input
    • {"type": "tool_result", "content": "...", "is_error": false} — tool execution results

Distinguishing human input from tool results: A user message with tool_result entries is an automatic tool response. A user message with text content (and no tool_result) is human input. Some messages contain both. Parsers must handle both string and array content formats.

Progress messages (.type == "progress")

System-level messages (skill loading, etc.). No usage data.

File history snapshots (.type == "file-history-snapshot")

Periodic snapshots of tracked file state. No usage data.

Time Analysis

Timestamps on every message allow decomposing wall clock into three categories:

Category How to detect Typical range
Claude processing Gap before assistant message 117s
Tool execution Gap between assistant(tool_use)user(tool_result) 0.1s (reads) to 1400s+ (builds)
Human wait Gap before user(human_input) 10s60min+

Example from real session: a Bash tool call running stack build showed 1458s tool execution time; human gaps between conversation turns ranged from 50s to 54min.

Correlating Tool Calls with Token Costs

Each assistant message contains BOTH the tool calls AND the usage for that API call. To get per-tool-call cost:

  • If the message has a single tool call → usage is directly attributable
  • If the message has multiple parallel tool calls → usage is the combined cost (cannot split per-tool)
  • The output_tokens field covers Claude's generation (tool call arguments + any text)
  • The input_tokens / cache_read fields cover the full context sent to Claude for that turn

Counts

In a multi-hour session with 3 large subagent dispatches: ~583 lines with .message.usage.

As of 2026-03-20: 77 sessions exist for ser-dev, 64 of which have subagent directories.

Built-in CLI Commands

Command Shows Does NOT Show
/cost Aggregate USD, API duration, wall duration, lines changed Per-type token breakdown
/stats Usage patterns dialog Raw token counts

Thinking Tokens

thinking content blocks (type "thinking") exist in the JSONL with the full reasoning text present. There is no separate thinking_tokens field in .message.usage — thinking token cost is folded into output_tokens. This means output_tokens cannot be split into "thinking" vs "visible output" from the usage data alone, though the text content of thinking blocks is available for length-based estimation.

Subagent Cross-Matching

Subagent files are at <session-uuid>/subagents/agent-<id>.jsonl with companion agent-<id>.meta.json.

Meta file format:

{
  "agentType": "general-purpose",
  "description": "Review Haskell backend changes"
}

The description field matches the description argument from the Agent tool_use call in the main session JSONL. This is the join key. For duplicate descriptions, timestamps disambiguate.

.meta.json availability: This is a recent Claude Code feature. Only the most recent sessions have .meta.json files — older sessions (the vast majority as of 2026-03-20) do not. When absent, fall back to matching subagent JSONL files to Agent tool_use calls in the main session by timestamp overlap and extracting the description from the tool call's input.description field.

Subagent JSONL has the same message/usage structure as the main session — independent usage blocks that must be aggregated separately.

Tool-Specific Token Breakdown

There is no per-tool token count in the usage data. The output_tokens field is the combined cost of Claude's thinking + text + tool call arguments for that API turn. The server_tool_use field only tracks web_search_requests and web_fetch_requests (counts, not tokens) — these are always zero in sessions that don't use web search.

Structural proxies for tool cost:

  • Tool argument size: len(json.dumps(tool_input)) — large Bash commands or Edit calls cost more output tokens
  • Tool result bloat: len(tool_result.content) — large Read results or verbose Bash output inflate the next turn's input tokens
  • Cache efficiency: cache_read vs cache_create ratio — high cache reads = cheap context reuse

Access Notes

  • Claude (the model) has no runtime access to its own token usage
  • Subagent JSONL files contain independent usage blocks — must be aggregated separately
  • grep '"usage"' <file> | wc -l to count API-call lines
  • Deep extraction: python3 -c "..." < file with recursive key search (not all usage is at top level)