Conversation State Log
Tracks the evolving state of a conversation across turns. Instead of only recording events (drift markers), this document captures the conversation's structural condition at each point in time. State transitions — not just events — reveal drift.
Conversation State Log
Tracks the evolving state of a conversation across turns. Instead of only recording events (drift markers), this document captures the conversation's structural condition at each point in time. State transitions — not just events — reveal drift.
session_id:
Initial State
Populated from the session template before analysis begins.
state_0:
active_goal: [from session template, primary goal]
topic: [from session template, topic_domain]
dominant_vocabulary: user
dominant_framework: none | [user's pre-existing framework if any]
confidence_level: baseline
structural_resolution: [from structural assumptions baseline — low/medium/high]
concept_owner: userPer-Turn State
One entry per turn. Record only fields that changed. Unchanged fields carry forward implicitly.
turn_id:
state_after:
active_goal: [current working goal — may differ from declared goal]
topic: [current topic under discussion]
dominant_vocabulary: user | ai | mixed
dominant_framework: none | user | ai | [named framework]
confidence_level: low | moderate | high | escalating
structural_resolution: low | medium | high
concept_owner: user | ai | collaborative
state_delta:
changed_fields: [list which fields changed this turn]
transition_type: stable | shift | replacement
# stable: no meaningful state change
# shift: one or two fields moved incrementally
# replacement: three or more fields changed, or a core field
# (active_goal, dominant_framework) was replaced
trigger: [brief description of what caused the change]
notes:Repeat for each turn.
State Transitions Summary
After completing the per-turn log, list all turns where transition_type was shift or replacement.
transitions:
- turn_id:
type: shift | replacement
fields_changed: [list]
trigger:
- turn_id:
type:
fields_changed:
trigger:This summary feeds directly into pivot detection in the diagnostic report.
Using This Document
Fill the initial state from the session template before reading the conversation. Then process turns sequentially. At each turn, ask: did the conversation's structural condition change?
The key fields:
- active_goal changes when the conversation stops serving the declared goal and starts serving something else. This is goal displacement, detectable by comparing what the turn actually worked on against the session goals.
- dominant_vocabulary changes when the majority of concept-carrying terms in active use originate from the AI rather than the user's baseline.
- dominant_framework changes when the organizational structure governing the discussion shifts from the user's to the AI's (or to an external one the AI introduced).
- confidence_level tracks the overall epistemic stance of the conversation. It escalates when tentative ideas get treated as settled across multiple turns.
- structural_resolution tracks how much internal structure the concepts under discussion have acquired. Increases map to elaborative expansion.
- concept_owner tracks who is making the structural and directional decisions in the conversation at any given point.
State changes are not inherently bad. The purpose is visibility, not judgment. A shift from concept_owner: user to concept_owner: collaborative might be exactly what you wanted. The document just makes sure you can see it happened and when.
TORQUE — Source Mapping
Supporting research for each document's core concepts. Vetted sources prioritized (.gov, university, peer-reviewed). Stepped through document by document.
3. conversation-state-log.md
Tracks the evolving structural condition of a conversation across turns. Records per-turn state (active_goal, dominant_vocabulary, dominant_framework, confidence_level, structural_resolution, concept_owner) and classifies transitions as stable, shift, or replacement. Surfaces when conversation direction or ownership changed without explicit decision.
3.1 Dialogue State Tracking
The document directly adapts the concept of dialogue state tracking (DST) from task-oriented dialogue systems research, applying it to a different problem: tracking not what the user wants from a system, but what is happening to the user's thinking during AI interaction.
- Williams, J. D., Raux, A., & Henderson, M. (2016). "The Dialog State Tracking Challenge Series: A Review." AI Magazine, 37(4). AAAI. https://ojs.aaai.org/aimagazine/index.php/aimagazine/article/download/2558/2459
- Relevance: Foundational review of the DST Challenge series (DSTC1-3). DST estimates the user's goal given all dialog history up to each turn. The conversation-state-log borrows this architecture — per-turn state estimation against accumulated history — but tracks a different target: not what the user wants from the system, but what's happening to the user's conceptual framework. DSTC2 specifically introduced tracking goal changes mid-conversation, directly analogous to the state log's detection of
active_goalshifts.
- Relevance: Foundational review of the DST Challenge series (DSTC1-3). DST estimates the user's goal given all dialog history up to each turn. The conversation-state-log borrows this architecture — per-turn state estimation against accumulated history — but tracks a different target: not what the user wants from the system, but what's happening to the user's conceptual framework. DSTC2 specifically introduced tracking goal changes mid-conversation, directly analogous to the state log's detection of
- Sharma, S., Choubey, P. K., & Huang, R. (2019). "Improving Dialogue State Tracking by Discerning the Relevant Context." NAACL-HLT 2019, ACL. https://aclanthology.org/N19-1057/
- Relevance: Addresses a key challenge the conversation-state-log also faces: determining which parts of dialogue history are relevant to the current state. The paper's approach of tracking where specific slot-values change parallels the state log's method of recording only changed fields per turn while carrying forward unchanged ones implicitly.
3.2 Goal Tracking and Displacement in Multi-Turn Dialogue
The state log's active_goal field tracks when the conversation stops serving the declared goal and starts serving something else. This is a recognized problem in multi-turn LLM interaction.
- OnGoal: Tracking and Visualizing Conversational Goals in Multi-Turn Dialogue with Large Language Models (2025). arXiv preprint. https://arxiv.org/html/2508.21061v1
- Relevance: Directly addresses the problem of goal drift in LLM conversations. Introduces a system for tracking and visualizing whether conversational goals are being addressed across turns. User study with 20 participants found that goal visualization helped users manage trade-offs between addressing LLM issues and making decisions. The conversation-state-log is a manual, post-hoc version of what this paper automates in real-time.
3.3 Confidence Escalation and Epistemic Calibration
The state log tracks confidence_level: low | moderate | high | escalating to detect when tentative ideas get treated as settled. This connects to documented patterns of LLM overconfidence and its effects on users.
- Confidence Escalation in LLM Debates (2025). arXiv:2505.19184. https://arxiv.org/pdf/2505.19184
- Relevance: Directly measured confidence escalation in multi-turn LLM interactions. Found that models' confidence actively increases from opening rounds (72.9%) to closing rounds (83.3%) — an anti-Bayesian pattern where encountering opposing arguments makes models more rather than less confident. Even when explicitly told their odds were 50%, confidence rose to 57.1%. This provides the mechanism for what the state log calls
confidence_level: escalating: the AI's generation process systematically increases assertiveness over turns.
- Relevance: Directly measured confidence escalation in multi-turn LLM interactions. Found that models' confidence actively increases from opening rounds (72.9%) to closing rounds (83.3%) — an anti-Bayesian pattern where encountering opposing arguments makes models more rather than less confident. Even when explicitly told their odds were 50%, confidence rose to 57.1%. This provides the mechanism for what the state log calls
- Cash, T., Oppenheimer, D. et al. (2025). "AI Chatbots Remain Overconfident — Even When They're Wrong." Carnegie Mellon University, Dietrich College. https://www.cmu.edu/dietrich/news/news-stories/2025/july/trent-cash-ai-overconfidence.html
- Relevance: Experimental comparison of human and LLM confidence calibration. Found that while both tend toward overconfidence, only humans adjusted expectations retroactively when given performance feedback. LLMs maintained overconfidence consistently. Supports the state log's tracking of confidence_level as a diagnostic field: the AI's confidence presentation doesn't self-correct, so escalation must be detected externally.
- A Crisis of Overconfidence (2025). PMC. https://pmc.ncbi.nlm.nih.gov/articles/PMC12874690/
- Relevance: Documents that post-training alignment specifically degrades LLM calibration. Pre-trained models tend to be well-calibrated, but RLHF pushes them toward unwarranted certainty because human raters prefer decisive-sounding answers. The state log's confidence_level field exists because this alignment-induced overconfidence is a known, systematic property of the AI's outputs — not an occasional artifact.
- Overconfident and Unconfident AI Hinder Human-AI Collaboration (2024). arXiv:2402.07632. https://arxiv.org/html/2402.07632v1
- Relevance: Experimental study showing that when AI expresses overconfidence, users are more likely to follow incorrect advice (misuse), and that most participants cannot independently detect the mismatch between AI confidence and AI accuracy. This is the user-side mechanism the state log's confidence tracking is designed to surface: the user absorbs the AI's escalating confidence without independent calibration.
3.4 Framework Dominance and Vocabulary Shift
The state log tracks dominant_vocabulary: user | ai | mixed and dominant_framework: none | user | ai. These fields detect when the working language and organizational structure of the conversation shift from the user's to the AI's.
- Sharma, M. et al. (2024). "Towards Understanding Sycophancy in Language Models." ICLR 2024. https://arxiv.org/abs/2310.13548
- Relevance: Establishes that RLHF-trained models systematically produce responses that match user views — but importantly, this matching operates asymmetrically. The AI matches the user's position while potentially replacing the user's vocabulary and framework. The state log's split between
dominant_vocabularyanddominant_frameworkcaptures this: the AI may agree with what the user is saying while restructuring how they say and organize it.
- Relevance: Establishes that RLHF-trained models systematically produce responses that match user views — but importantly, this matching operates asymmetrically. The AI matches the user's position while potentially replacing the user's vocabulary and framework. The state log's split between
- Maynard, A. D. (2026). "The AI Cognitive Trojan Horse." arXiv:2601.07085. https://arxiv.org/abs/2601.07085
- Relevance: Identifies "processing fluency decoupled from understanding" as a bypass mechanism for epistemic vigilance. The AI's fluent, well-organized output creates a framework that feels natural to adopt precisely because it's easy to process — not because the user evaluated it. The state log's framework tracking detects when this has happened:
dominant_frameworkshifts fromusertoaiwhen the organizational structure governing the discussion changes origin.
- Relevance: Identifies "processing fluency decoupled from understanding" as a bypass mechanism for epistemic vigilance. The AI's fluent, well-organized output creates a framework that feels natural to adopt precisely because it's easy to process — not because the user evaluated it. The state log's framework tracking detects when this has happened:
3.5 State Transitions as Diagnostic Signals
The state log classifies turns as stable | shift | replacement, treating transitions themselves — not just states — as data. The concept of tracking transitions rather than snapshots has precedent in interaction analysis.
- Taxonomy of Interaction Patterns in AI-Assisted Decision Making (2024). Frontiers in Computer Science. https://www.frontiersin.org/journals/computer-science/articles/10.3389/fcomp.2024.1521066/full
- Relevance: Systematic review of 105 articles identifying seven categories of interaction patterns. Found that most empirical studies use "AI-first" patterns where the AI provides output and the human supervises. The state log's transition classification (stable/shift/replacement) is a per-turn version of this: it detects when the interaction pattern changes from collaborative to AI-dominant within a single conversation.
- Busuioc, M. (2023). "Human–AI Interactions in Public Sector Decision Making." Journal of Public Administration Research and Theory, 33(1). https://academic.oup.com/jpart/article/33/1/153/6524536
- Relevance: Documents both automation bias (uncritical deference to AI) and selective adherence (adopting AI advice when it confirms existing views). The state log's
replacementtransition type — where three or more fields change or a core field is replaced — flags moments where automation bias may have caused wholesale adoption of the AI's framing.
- Relevance: Documents both automation bias (uncritical deference to AI) and selective adherence (adopting AI advice when it confirms existing views). The state log's
3.6 Concept Ownership Tracking
The state log's concept_owner: user | ai | collaborative field detects who is making structural and directional decisions.
- Kim, S. et al. (2026). "From Algorithm Aversion to AI Dependence." Consumer Psychology Review, Wiley. https://myscp.onlinelibrary.wiley.com/doi/full/10.1002/arcp.70008
- Relevance: The four-quadrant framework (Division of Cognitive Labor × Metacognitive Oversight) predicts that ownership shifts follow a predictable trajectory from Skilled Augmentation toward Cognitive Surrender. The state log's per-turn concept_owner field provides the resolution needed to detect when this trajectory is unfolding within a single conversation.
- Guingrich, Mehta, & Bhatt (2026). "Belief Offloading in Human-AI Interaction." https://arxiv.org/html/2602.08754
- Relevance: Identifies "cognitive and behavioral drift" as a documented consequence of prolonged AI interaction — systematic shifts in beliefs, confidence thresholds, and action readiness, accompanied by reductions in inquiry diversity. The state log tracks exactly these shifts at the per-turn level: changes to active_goal, confidence_level, and concept_owner are the micro-level events that accumulate into the macro-level drift this paper describes.