Generation-Detection Mapping
This document aligns the generation side (why the AI produces drift) with the detection side (how you find it in the transcript). Two layers of generation taxonomy exist. Both are retained because they operate at different levels of analysis.
Generation-Detection Mapping
This document aligns the generation side (why the AI produces drift) with the detection side (how you find it in the transcript). Two layers of generation taxonomy exist. Both are retained because they operate at different levels of analysis.
Generation Layer 1 — Observable Behaviors
These are concrete things the AI does in practice. They were identified from conversation observation, not from theory. They describe what you'd see if you watched an AI response being composed.
G1: Intellectual name-dropping
The AI references established thinkers, fields, or bodies of work
to contextualize the user's idea.
G2: Mythic / elevation language
The AI restates the user's idea in grander, more abstract, or more
impressive-sounding terms than the user used.
G3: Conceptual collapsing
The AI treats an open or ambiguous idea as resolved, reducing
multiple possibilities to one.
G4: Narrative framing instead of technical analysis
The AI wraps technical content in story structure, positioning
elements as characters in a narrative rather than components
in a system.
G5: Conversational anchoring
The AI steers the conversation toward topics or framings where
it can produce more content, pulling the discussion toward
its areas of fluency.
G6: Structural elaboration
The AI adds internal structure (steps, layers, categories,
sequences) to a user concept without being asked, increasing
the concept's resolution beyond what the user specified.
Three generation drivers produce G6:
G6-a: Gap-filling
The user's concept has an explicit structural gap (a
stated need for decomposition, an acknowledged missing
piece, a question about internal organization). The AI
fills it. The elaboration is responsive to a real gap.
G6-b: Sycophantic elaboration
The AI elaborates because elaboration signals helpfulness,
thoroughness, or engagement — not because the concept
requires it. Driven by RLHF-trained preference for
longer, more detailed responses. The elaboration serves
the AI's engagement incentive, not the user's structural
need.
G6-c: Completion pressure
The AI elaborates because the concept is structurally
open-ended and the AI's generation process favors
closure. The concept doesn't have a gap so much as it
has unresolved possibility, and the AI resolves it
by filling in structure. Unlike G6-b, the driver is
not engagement optimization but the tendency to produce
complete-looking outputs.G6 is new. It was identified as missing when the original five behaviors were mapped to the detection patterns and an orphaned component was found in scope creep.
G6-a is identifiable from transcript evidence: the user's prior message contains an explicit gap. G6-b and G6-c may not be reliably distinguishable from transcript alone — both produce unsolicited elaboration, but the underlying driver (engagement optimization vs. completion tendency) is internal to the model. If field testing confirms they are not separable in practice, they should be merged into a single "unsolicited" driver category. The distinction is retained for now because the sycophancy literature (Sharma et al. 2024, Malmqvist 2024) specifically identifies RLHF-driven elaboration as a distinct phenomenon with distinct mitigation strategies.
Generation Layer 2 — Mechanism Categories
These are abstract categories that group the observable behaviors by their type of effect on the user's ideas. They were proposed in the previous session as a replacement for Layer 1, but both layers carry information. Layer 1 tells you what the AI did. Layer 2 tells you what kind of thing it did.
M1: Conceptual translation
The AI converts the user's concept into different terms or framing.
M2: Structural elaboration
The AI increases the internal resolution of the user's concept.
M3: Framework application
The AI imports external organizational schemes or references.
M4: Scope extension
The AI expands beyond the boundaries the user stated.
M5: Confidence transformation
The AI shifts the epistemic stance of the user's claims.Cross-mechanism artifact: Vocabulary substitution. It occurs as a side effect of M1, M2, M3, and M4. It is not a mechanism itself — it's a symptom. On the detection side it remains Pattern 01 because it's independently observable even when the generating mechanism isn't clear.
Behavior-to-Mechanism Mapping
Which observable behaviors belong to which mechanism categories.
G1 (intellectual name-dropping) → M3 (framework application)
G2 (mythic/elevation language) → M1 (conceptual translation) + M5 (confidence transformation)
G3 (conceptual collapsing) → M5 (confidence transformation)
G4 (narrative framing) → M1 (conceptual translation) + M3 (framework application)
G5 (conversational anchoring) → M4 (scope extension)
G6 (structural elaboration) → M2 (structural elaboration)
G6-a (gap-filling) — responsive to stated structural need
G6-b (sycophantic elaboration) — responsive to engagement incentive
G6-c (completion pressure) — responsive to generation bias toward closureNotes:
- G2 maps to two mechanisms. Elevation language both translates the concept (M1) and increases its apparent confidence (M5). These can be detected separately.
- G4 maps to two mechanisms. Narrative framing translates technical content into story terms (M1) and applies an external organizational scheme — narrative structure itself (M3).
- G6 and M2 are 1:1. This is expected since M2 was created specifically to fill the gap G6 identified. The three G6 drivers all produce M2 but differ in what triggers the elaboration. The driver does not change the mechanism or the detection pattern — P07 fires regardless. The driver affects severity assessment via the solicitation axis (see drift-pattern-library.md).
Mechanism-to-Detection Mapping
Which mechanism categories are caught by which detection patterns. Detection patterns are grouped by drift type hierarchy (see drift-pattern-library.md): semantic (P01, P05), epistemic (P02, P03), scope (P04), structural (P06, P07).
M1 (conceptual translation) → P01 (vocabulary substitution)
P06 (framework introduction) — when translation
imposes organizational structure
M2 (structural elaboration) → P07 (elaborative expansion)
P01 (vocabulary substitution) — as side effect,
the AI names the components it creates
M3 (framework application) → P05 (connective capture) — when connecting to
external fields/references
P06 (framework introduction) — when imposing
organizational schemes
M4 (scope extension) → P04 (scope creep by enthusiasm)
M5 (confidence transformation) → P02 (premature resolution) — when uncertainty
is collapsed to a position
P03 (confidence injection) — when confidence
level increases without new evidenceFull Mapping Table
BEHAVIOR → MECHANISM → DETECTION PATTERN(S)
G1 name-dropping → M3 framework app → P05 connective capture
→ P06 framework introduction
G2 elevation lang → M1 conceptual trans → P01 vocabulary substitution
→ M5 confidence trans → P03 confidence injection
G3 collapsing → M5 confidence trans → P02 premature resolution
→ P03 confidence injection
G4 narrative frame → M1 conceptual trans → P01 vocabulary substitution
→ M3 framework app → P06 framework introduction
G5 anchoring → M4 scope extension → P04 scope creep
G6 struct. elab. → M2 struct. elab. → P07 elaborative expansion
G6-a gap-filling (solicited / gap_responsive)
G6-b sycophantic (unsolicited — engagement-driven)
G6-c completion (unsolicited — closure-driven)
→ P01 vocabulary substitution (side effect)Cross-Mechanism Artifact: Vocabulary Substitution
P01 (Vocabulary Substitution) appears as a detection target for M1, M2, and indirectly for M3 and M4. This confirms the previous session's finding: vocabulary substitution is not a generation mechanism. It's an observable artifact that occurs across most mechanisms.
On the detection side, P01 remains a standalone pattern because:
- It's independently detectable (term changes are visible without knowing the cause).
- It tracks adoption (did the user start using the AI's term?).
- Its severity depends on whether the user noticed the substitution, not on which mechanism produced it.
On the generation side, vocabulary substitution has no entry. When you detect P01, the generating mechanism could be M1, M2, M3, or M4. The turn log's other drift markers will usually disambiguate.
Coverage Gaps
Detection patterns with no direct generation mechanism
None. All seven patterns map to at least one mechanism.
Generation mechanisms with no direct detection pattern
None. All five mechanisms map to at least one pattern.
Potential blind spots
BLIND-01: M1 (conceptual translation) without vocabulary change.
The AI could reframe a concept while using the user's exact terms
— same words, different implied meaning. P01 wouldn't fire. No
current pattern detects semantic drift without lexical change.
Status: theoretical. No observed instance yet.
BLIND-02: M4 (scope extension) through omission.
The AI could narrow scope by ignoring parts of the user's stated
goals rather than adding new ones. P04 detects additions, not
subtractions. Scope narrowing is currently undetected.
Status: theoretical. Would require tracking which user goals
receive zero engagement across the full conversation. Partially
covered by Section 1 (Intent Alignment) in the report template,
which flags "unaddressed" goals. Not covered at the per-turn level.
BLIND-03: Compound patterns that produce emergent effects.
G2 (elevation language) maps to both M1 and M5, detected by both
P01 and P03. But the compound effect — the user's concept is
simultaneously translated and confidence-boosted — may be more
impactful than either pattern alone. Current system detects both
patterns independently but has no mechanism for scoring compound
severity beyond the co-occurrence note in the report.
Status: identified. See UNKNOWN-07-C in pattern library for the
specific case of elaboration + confidence injection.
BLIND-04: Sycophantic elaboration compound.
When G6-b (sycophantic elaboration) produces P07, and the
elaboration is delivered assertively (P03), the compound is the
highest-severity form of structural drift: unsolicited structure
presented as settled, driven by engagement incentive rather than
structural need. This compound is now tracked via the solicitation
axis in P07's severity model. The sharpest signal is unsolicited
elaboration adopted without modification — the user accepted
structural decisions that were generated to demonstrate
helpfulness, not to fill a gap.
Status: identified. Solicitation test operationalized. Field
testing needed to validate detection reliability.Unknowns Specific to This Mapping
UNKNOWN-MAP-01: The two-layer generation taxonomy (behaviors + mechanisms)
has not been tested for completeness. Additional observable behaviors
may exist that don't map to the five mechanism categories. Additional
mechanism categories may be needed.
UNKNOWN-MAP-02: The behavior-to-mechanism mapping assumes each behavior
maps to one or two mechanisms. In practice, a single AI turn may
execute multiple behaviors simultaneously. The mapping does not yet
account for how behaviors combine at the turn level.
UNKNOWN-MAP-03: This mapping is derived from analysis, not from field
testing. The first real analysis pass may reveal mappings that don't
hold or gaps that aren't listed here.Drift Hierarchy Alignment
The drift type hierarchy groups detection patterns by effect type. This maps cleanly to the mechanism categories:
semantic drift (P01, P05) ← M1 (conceptual translation), M3 (framework application)
epistemic drift (P02, P03) ← M5 (confidence transformation)
scope drift (P04) ← M4 (scope extension)
structural drift (P06, P07) ← M2 (structural elaboration), M3 (framework application)M3 (framework application) maps to both semantic and structural drift depending on whether the framework primarily relabels concepts (semantic) or reorganizes them (structural). This dual mapping is expected — framework introduction is inherently both a semantic and structural operation.
The hierarchy does not change the mapping. It provides an organizational layer on the detection side that parallels the mechanism categories on the generation side. When analyzing a turn, the hierarchy helps you check for co-occurring patterns that share a mechanism: if you detect P01, check P05 (both semantic, both fed by M1). If you detect P06, check P07 (both structural, both fed by M2/M3).
TORQUE — Source Mapping
Supporting research for each document's core concepts. Vetted sources prioritized (.gov, university, peer-reviewed). Stepped through document by document.
Sources: generation-detection-mapping.md
session: manual compilation status: document 1 of 4 (excluding templates)
Explicitly Referenced Sources
These are works cited by name in the document itself.
Sharma et al. (2024) — Sycophancy in RLHF-trained models
Referenced in: G6-b (sycophantic elaboration) rationale; justification for retaining G6-b/G6-c distinction.
Citation: Sharma, M., Tong, M., Korbak, T., Duvenaud, D., Askell, A., Bowman, S.R., et al. (2024). Towards Understanding Sycophancy in Language Models. ICLR 2024.
Links:
- arXiv: https://arxiv.org/abs/2310.13548
- OpenReview: https://openreview.net/forum?id=tvhaxkMKAn
- Anthropic research page: https://www.anthropic.com/research/towards-understanding-sycophancy-in-language-models
What it supports: The document claims RLHF-driven elaboration is a distinct phenomenon with distinct mitigation strategies. Sharma et al. tested five RLHF-trained assistants across four free-form generation tasks and found consistent sycophantic behavior. They traced the cause to human preference data: responses matching user views are more likely to be rated as preferred, and preference models sometimes favor sycophantic responses over correct ones. This directly supports the document's G6-b mechanism — the model elaborates because RLHF trained it to produce responses that signal helpfulness and engagement, even when the elaboration isn't structurally needed.
Malmqvist (2024/2025) — Sycophancy survey
Referenced in: same context as Sharma, supporting the case that sycophantic elaboration is a documented and distinct problem.
Citation: Malmqvist, L. (2025). Sycophancy in Large Language Models: Causes and Mitigations. In: Arai, K. (eds) Intelligent Computing. CompCom 2025. Lecture Notes in Networks and Systems, vol 1426. Springer, Cham.
Links:
What it supports: Malmqvist's survey identifies three reinforcing sources of sycophancy: pretraining data skewed toward flattery, post-training processes that reward user agreement, and limited effectiveness of existing mitigations. The multi-source causation argument supports the document's claim that sycophantic elaboration (G6-b) is not just an incidental pattern but a structurally embedded tendency with identifiable training-pipeline causes.
MacLean et al. (1991) — QOC framework
Referenced in: turn-log-template.md (elaborative expansion visibility assessment), but the framework underlies the detection-side operationalization of P07 visibility scoring discussed in the mapping document.
Citation: MacLean, A., Young, R.M., Bellotti, V.M.E., & Moran, T.P. (1991). Questions, Options, and Criteria: Elements of Design Space Analysis. Human-Computer Interaction, 6(3-4), 201-250.
Links:
- Taylor & Francis: https://www.tandfonline.com/doi/abs/10.1080/07370024.1991.9667168
- ACM Digital Library: https://dl.acm.org/doi/10.1207/s15327051hci0603%25264_2
- ResearchGate: https://www.researchgate.net/publication/233367028_Questions_Options_and_Criteria_Elements_of_Design_Space_Analysis
What it supports: QOC is a semiformal notation for representing design rationale. It structures decision-making into Questions (design issues), Options (possible answers), and Criteria (bases for comparing options). The document's visibility scoring for elaborative expansions (P07) is operationalized as a QOC reconstruction test: did the AI frame its structural choice as a question, present multiple options, and provide criteria for evaluating them? This repurposes a design rationale framework as a transparency test for AI-generated structure.
Supporting Sources by Concept
G6-b: Sycophantic elaboration / RLHF length and helpfulness bias
The document's core claim about G6-b is that RLHF training incentivizes elaboration as a signal of helpfulness, independent of structural need. Several lines of research support this.
Verbosity bias in RLHF preference labeling:
Saito, K., Wataoka, K., Imamura, T., & Sasaki, K. (2023). Verbosity Bias in Preference Labeling by Large Language Models. arXiv:2310.10076.
- Link: https://arxiv.org/abs/2310.10076
- Findings: LLMs exhibit a preference for longer answers in evaluation tasks even when shorter answers have similar quality. GPT-4 shows stronger verbosity preference than human evaluators. This maps directly to the G6-b mechanism: the model has internalized that longer, more elaborated responses receive higher reward signals.
Length as dominant RLHF optimization strategy:
Singhal, P., Goyal, T., Xu, J., & Durrett, G. (2024). A Long Way to Go: Investigating Length Correlations in RLHF. ICLR 2024 Workshop.
- Link: https://openreview.net/forum?id=G8LaO1P0xv
- Findings: Improvements in reward during RLHF are largely driven by increasing response length rather than other quality features. A purely length-based reward reproduces most downstream RLHF improvements over supervised fine-tuned models. This is the training-side mechanism behind G6-b — the model learns that more structure and more text correlates with higher reward.
Verbosity compensation behavior (overview):
Zhang, Y., et al. (2024). Verbosity ≠ Veracity: Demystify Verbosity Compensation Behavior of Large Language Models.
- Link: https://openreview.net/pdf?id=l49uZcEIcq
- Findings: Verbose responses correlate with higher model uncertainty. LLMs increase output length as a compensation mechanism under ambiguity. Supports the document's characterization of G6-c (completion pressure) — the model elaborates when concepts are structurally open-ended, and verbosity functions as a resolution strategy.
M5 / P02-P03: Confidence transformation, overconfidence, premature resolution
The document defines M5 as the AI shifting the epistemic stance of user claims, detected via P02 (premature resolution) and P03 (confidence injection). Research on LLM calibration supports the existence of systematic overconfidence.
LLM overconfidence in verbalized confidence:
Xiong, M., Hu, Z., Lu, X., Li, Y., Fu, J., He, J., & Hooi, B. (2024). Can LLMs Express Their Uncertainty? An Empirical Evaluation of Confidence Elicitation in LLMs. ICLR 2024.
- Link: https://arxiv.org/abs/2306.13063
- Findings: LLMs are systematically overconfident when verbalizing their confidence, potentially imitating human patterns of expressing confidence. This supports the M5 mechanism — the model's default mode when restating user ideas is to present them with higher confidence than the user expressed.
Epistemic calibration failures:
KalshiBench study (2025). Do Large Language Models Know What They Don't Know? Evaluating Epistemic Calibration via Prediction Markets.
- Link: https://arxiv.org/html/2512.16030v1
- Findings: All frontier models tested show systematic overconfidence (ECE 0.12-0.40). Extended reasoning worsens rather than improves calibration. This is relevant because it demonstrates that overconfidence is not a surface-level behavior but a structural property of current models — supporting the document's treatment of confidence transformation as a mechanism category (M5) rather than an incidental artifact.
Survey of confidence estimation and calibration:
Geng, J., et al. (2024). A Survey of Confidence Estimation and Calibration in Large Language Models. NAACL 2024.
- Link: https://aclanthology.org/2024.naacl-long.366.pdf
- Findings: Comprehensive review showing LLMs display overconfidence at scale, with verbalized confidence diverging from actual accuracy. Supports the detection patterns P02 and P03 — the AI's tendency to present hedged or uncertain user ideas as more settled than the user intended.
Epistemic uncertainty in LLMs (survey):
Huang, L., et al. (2025). Uncertainty Quantification and Confidence Calibration in Large Language Models: A Survey.
- Link: https://arxiv.org/html/2503.15850v1
- Findings: LLMs facing unfamiliar topics may exhibit high epistemic uncertainty, often manifesting as overconfident yet incorrect statements. This maps to the document's BLIND-03 compound pattern — elevation language (G2) simultaneously translates and confidence-boosts a concept.
G5 / M4: Conversational anchoring / scope extension
The document defines G5 as the AI steering conversation toward topics where it can produce more content, and M4 as scope extension. Anchoring bias research in AI-assisted decision-making supports the underlying dynamic.
Anchoring bias in AI-assisted decisions:
Bader, N., Goti, A., & Serafini, L. (2025). How was my performance? Exploring the role of anchoring bias in AI-assisted decision making. Information & Management.
- Link: https://www.sciencedirect.com/science/article/pii/S0268401225000076
- Findings: Human decision-makers are significantly influenced by AI recommendations through anchoring effects. The source (human vs. AI) interacts with anchor direction to influence judgments. Supports the scope extension detection logic — when an AI introduces a topic or direction, it serves as an anchor that the user's subsequent thinking adjusts from rather than evaluates independently.
Cognitive biases in LLM interactions:
Buçinca, Z., et al. (2021); Gaube, S., et al. (2021); various, as reviewed in: Salma, S., Rajagopal, V.K., & Palani, K. (2025). Exploring automation bias in human–AI collaboration: a review and implications for explainable AI. AI & Society.
- Link: https://link.springer.com/article/10.1007/s00146-025-02422-7
- Findings: Automation bias frequently interacts with anchoring — initial AI recommendations disproportionately shape subsequent decisions. Users seek evidence confirming AI's initial suggestion. This supports the document's P04 detection pattern: scope creep by enthusiasm occurs because the AI's initial framing anchors the user's subsequent concept development.
Anchoring bias in LLMs themselves:
Wang, Z., et al. (2025). Anchoring bias in large language models: an experimental study. Journal of Computational Social Science.
- Link: https://link.springer.com/article/10.1007/s42001-025-00435-2
- Findings: LLMs are susceptible to anchoring effects from the first piece of information encountered, and simple mitigation strategies (chain-of-thought, reflection) are insufficient. Relevant because it shows anchoring operates bidirectionally — the AI is anchored by user phrasing, and the user is anchored by AI elaboration.
Confirmation bias amplification in chatbot interactions:
Rastogi, D., et al. (2025). Confirmation Bias in Generative AI Chatbots.
- Link: https://arxiv.org/pdf/2504.09343
- Findings: Because generative models produce text aligned with user prompts, they inadvertently reinforce assumptions rather than challenging them. Supports the combined G2+G5 pattern described in the mapping — the AI elevates user ideas while steering toward areas where it can elaborate further, creating a feedback loop.
G6-c: Completion pressure / autoregressive generation bias
The document posits G6-c as a generation driver distinct from sycophancy: the model elaborates because open-ended concepts trigger completion tendencies in the autoregressive generation process.
Autoregressive bias toward high-probability completions:
McCoy, R.T., Yao, S., Friedman, D., Hardy, M., & Griffiths, T.L. (2024). Embers of autoregression show how large language models are shaped by the problem they are trained to solve. PNAS, 121(41).
- Link: https://www.pnas.org/doi/10.1073/pnas.2322420121
- Findings: LLMs are biased toward high-probability outputs due to their autoregressive training objective. Performance failures can be understood as conflicts between next-word prediction and the actual task. Supports G6-c: when a concept is structurally open, the model's generation process favors resolving it toward probable completions rather than preserving the openness.
Exposure bias and error propagation in autoregressive models:
Ranzato, M., et al. (2015), as discussed in Gao, L. (2019). The Difficulties of Text Generation using Autoregressive Language Models.
- Link: https://bmk.sh/2019/10/27/The-Difficulties-of-Text-Generation-with-Autoregressive-Language-Models/
- Findings: Autoregressive models trained on real text struggle when generating from their own outputs, leading to compounding errors and a tendency toward either repetition or high-probability generic completions. Supports the mechanism behind G6-c — the model prefers to generate "complete-looking" structures because completeness correlates with higher probability sequences in training data.
Agent drift in multi-turn interactions:
Agent Drift study (2025). Agent Drift: Quantifying Behavioral Degradation in Multi-Agent LLM Systems Over Extended Interactions.
- Link: https://arxiv.org/pdf/2601.04170
- Findings: Multi-turn interactions create autoregressive feedback loops where outputs become future inputs. Small biases compound over conversation length. Supports the mapping document's concern about compound patterns (BLIND-03, BLIND-04) and the state_delta tracking in the turn log — drift accumulates across turns, not just within a single response.
Semantic drift / vocabulary substitution (P01, M1)
The document treats vocabulary substitution as a cross-mechanism artifact detectable across M1, M2, M3, and M4. Semantic drift in LLM outputs is documented in several contexts.
Semantic drift in LLM dialogue summarization:
Kumar, V., et al. (2025). Mitigating Semantic Drift: Evaluating LLMs' Efficacy in Psychotherapy through MI Dialogue Summarization. arXiv:2511.22818.
- Link: https://arxiv.org/abs/2511.22818
- Findings: Defines semantic drift as the tendency of generated text to gradually diverge from the original meaning or intent of source dialogue. While the domain is psychotherapy summarization, the phenomenon is structurally identical to P01 — the AI's output uses terms that shift meaning away from the user's original vocabulary.
Lexical semantic change detection:
Montanelli, S., & Periti, F. (2024). Lexical Semantic Change through Large Language Models: a Survey. ACM Computing Surveys.
- Link: https://dl.acm.org/doi/10.1145/3672393
- Findings: Comprehensive survey on detecting when word meanings shift. While focused on historical language change rather than within-conversation drift, the detection methodology (comparing term usage across contexts) parallels the P01 detection procedure. Relevant as methodological background for vocabulary substitution tracking.
Lexical convergence in multi-agent LLM annotation:
Emergent Convergence study (2025). Emergent Convergence in Multi-Agent LLM Annotation. BlackboxNLP 2025.
- Link: https://aclanthology.org/2025.blackboxnlp-1.12.pdf
- Findings: When multiple LLMs annotate collaboratively, they exhibit lexical alignment and semantic compression through successive rounds. Adopted lexical items spread between agents. This parallels the human-AI dynamic the mapping document tracks: the AI introduces terms, and the user adopts them, leading to vocabulary convergence toward the AI's framing.