Mapping

Generation-Detection Mapping

This document aligns the generation side (why the AI produces drift) with the detection side (how you find it in the transcript). Two layers of generation taxonomy exist. Both are retained because they operate at different levels of analysis.

Generation-Detection Mapping

This document aligns the generation side (why the AI produces drift) with the detection side (how you find it in the transcript). Two layers of generation taxonomy exist. Both are retained because they operate at different levels of analysis.

Generation Layer 1 — Observable Behaviors

These are concrete things the AI does in practice. They were identified from conversation observation, not from theory. They describe what you'd see if you watched an AI response being composed.

G1: Intellectual name-dropping
    The AI references established thinkers, fields, or bodies of work
    to contextualize the user's idea.

G2: Mythic / elevation language
    The AI restates the user's idea in grander, more abstract, or more
    impressive-sounding terms than the user used.

G3: Conceptual collapsing
    The AI treats an open or ambiguous idea as resolved, reducing
    multiple possibilities to one.

G4: Narrative framing instead of technical analysis
    The AI wraps technical content in story structure, positioning
    elements as characters in a narrative rather than components
    in a system.

G5: Conversational anchoring
    The AI steers the conversation toward topics or framings where
    it can produce more content, pulling the discussion toward
    its areas of fluency.

G6: Structural elaboration
    The AI adds internal structure (steps, layers, categories,
    sequences) to a user concept without being asked, increasing
    the concept's resolution beyond what the user specified.

    Three generation drivers produce G6:

    G6-a: Gap-filling
          The user's concept has an explicit structural gap (a
          stated need for decomposition, an acknowledged missing
          piece, a question about internal organization). The AI
          fills it. The elaboration is responsive to a real gap.

    G6-b: Sycophantic elaboration
          The AI elaborates because elaboration signals helpfulness,
          thoroughness, or engagement — not because the concept
          requires it. Driven by RLHF-trained preference for
          longer, more detailed responses. The elaboration serves
          the AI's engagement incentive, not the user's structural
          need.

    G6-c: Completion pressure
          The AI elaborates because the concept is structurally
          open-ended and the AI's generation process favors
          closure. The concept doesn't have a gap so much as it
          has unresolved possibility, and the AI resolves it
          by filling in structure. Unlike G6-b, the driver is
          not engagement optimization but the tendency to produce
          complete-looking outputs.

G6 is new. It was identified as missing when the original five behaviors were mapped to the detection patterns and an orphaned component was found in scope creep.

G6-a is identifiable from transcript evidence: the user's prior message contains an explicit gap. G6-b and G6-c may not be reliably distinguishable from transcript alone — both produce unsolicited elaboration, but the underlying driver (engagement optimization vs. completion tendency) is internal to the model. If field testing confirms they are not separable in practice, they should be merged into a single "unsolicited" driver category. The distinction is retained for now because the sycophancy literature (Sharma et al. 2024, Malmqvist 2024) specifically identifies RLHF-driven elaboration as a distinct phenomenon with distinct mitigation strategies.

Generation Layer 2 — Mechanism Categories

These are abstract categories that group the observable behaviors by their type of effect on the user's ideas. They were proposed in the previous session as a replacement for Layer 1, but both layers carry information. Layer 1 tells you what the AI did. Layer 2 tells you what kind of thing it did.

M1: Conceptual translation
    The AI converts the user's concept into different terms or framing.

M2: Structural elaboration
    The AI increases the internal resolution of the user's concept.

M3: Framework application
    The AI imports external organizational schemes or references.

M4: Scope extension
    The AI expands beyond the boundaries the user stated.

M5: Confidence transformation
    The AI shifts the epistemic stance of the user's claims.

Cross-mechanism artifact: Vocabulary substitution. It occurs as a side effect of M1, M2, M3, and M4. It is not a mechanism itself — it's a symptom. On the detection side it remains Pattern 01 because it's independently observable even when the generating mechanism isn't clear.

Behavior-to-Mechanism Mapping

Which observable behaviors belong to which mechanism categories.

G1 (intellectual name-dropping)    → M3 (framework application)
G2 (mythic/elevation language)     → M1 (conceptual translation) + M5 (confidence transformation)
G3 (conceptual collapsing)         → M5 (confidence transformation)
G4 (narrative framing)             → M1 (conceptual translation) + M3 (framework application)
G5 (conversational anchoring)      → M4 (scope extension)
G6 (structural elaboration)        → M2 (structural elaboration)
    G6-a (gap-filling)             — responsive to stated structural need
    G6-b (sycophantic elaboration) — responsive to engagement incentive
    G6-c (completion pressure)     — responsive to generation bias toward closure

Notes:

Mechanism-to-Detection Mapping

Which mechanism categories are caught by which detection patterns. Detection patterns are grouped by drift type hierarchy (see drift-pattern-library.md): semantic (P01, P05), epistemic (P02, P03), scope (P04), structural (P06, P07).

M1 (conceptual translation)     → P01 (vocabulary substitution)
                                  P06 (framework introduction) — when translation
                                  imposes organizational structure

M2 (structural elaboration)     → P07 (elaborative expansion)
                                  P01 (vocabulary substitution) — as side effect,
                                  the AI names the components it creates

M3 (framework application)      → P05 (connective capture) — when connecting to
                                  external fields/references
                                  P06 (framework introduction) — when imposing
                                  organizational schemes

M4 (scope extension)            → P04 (scope creep by enthusiasm)

M5 (confidence transformation)  → P02 (premature resolution) — when uncertainty
                                  is collapsed to a position
                                  P03 (confidence injection) — when confidence
                                  level increases without new evidence

Full Mapping Table

BEHAVIOR → MECHANISM → DETECTION PATTERN(S)

G1 name-dropping    → M3 framework app    → P05 connective capture
                                           → P06 framework introduction

G2 elevation lang   → M1 conceptual trans  → P01 vocabulary substitution
                    → M5 confidence trans  → P03 confidence injection

G3 collapsing       → M5 confidence trans  → P02 premature resolution
                                           → P03 confidence injection

G4 narrative frame  → M1 conceptual trans  → P01 vocabulary substitution
                    → M3 framework app     → P06 framework introduction

G5 anchoring        → M4 scope extension   → P04 scope creep

G6 struct. elab.    → M2 struct. elab.     → P07 elaborative expansion
  G6-a gap-filling                            (solicited / gap_responsive)
  G6-b sycophantic                            (unsolicited — engagement-driven)
  G6-c completion                             (unsolicited — closure-driven)
                                           → P01 vocabulary substitution (side effect)

Cross-Mechanism Artifact: Vocabulary Substitution

P01 (Vocabulary Substitution) appears as a detection target for M1, M2, and indirectly for M3 and M4. This confirms the previous session's finding: vocabulary substitution is not a generation mechanism. It's an observable artifact that occurs across most mechanisms.

On the detection side, P01 remains a standalone pattern because:

On the generation side, vocabulary substitution has no entry. When you detect P01, the generating mechanism could be M1, M2, M3, or M4. The turn log's other drift markers will usually disambiguate.

Coverage Gaps

Detection patterns with no direct generation mechanism

None. All seven patterns map to at least one mechanism.

Generation mechanisms with no direct detection pattern

None. All five mechanisms map to at least one pattern.

Potential blind spots

BLIND-01: M1 (conceptual translation) without vocabulary change.
The AI could reframe a concept while using the user's exact terms
— same words, different implied meaning. P01 wouldn't fire. No
current pattern detects semantic drift without lexical change.
Status: theoretical. No observed instance yet.

BLIND-02: M4 (scope extension) through omission.
The AI could narrow scope by ignoring parts of the user's stated
goals rather than adding new ones. P04 detects additions, not
subtractions. Scope narrowing is currently undetected.
Status: theoretical. Would require tracking which user goals
receive zero engagement across the full conversation. Partially
covered by Section 1 (Intent Alignment) in the report template,
which flags "unaddressed" goals. Not covered at the per-turn level.

BLIND-03: Compound patterns that produce emergent effects.
G2 (elevation language) maps to both M1 and M5, detected by both
P01 and P03. But the compound effect — the user's concept is
simultaneously translated and confidence-boosted — may be more
impactful than either pattern alone. Current system detects both
patterns independently but has no mechanism for scoring compound
severity beyond the co-occurrence note in the report.
Status: identified. See UNKNOWN-07-C in pattern library for the
specific case of elaboration + confidence injection.

BLIND-04: Sycophantic elaboration compound.
When G6-b (sycophantic elaboration) produces P07, and the
elaboration is delivered assertively (P03), the compound is the
highest-severity form of structural drift: unsolicited structure
presented as settled, driven by engagement incentive rather than
structural need. This compound is now tracked via the solicitation
axis in P07's severity model. The sharpest signal is unsolicited
elaboration adopted without modification — the user accepted
structural decisions that were generated to demonstrate
helpfulness, not to fill a gap.
Status: identified. Solicitation test operationalized. Field
testing needed to validate detection reliability.

Unknowns Specific to This Mapping

UNKNOWN-MAP-01: The two-layer generation taxonomy (behaviors + mechanisms)
has not been tested for completeness. Additional observable behaviors
may exist that don't map to the five mechanism categories. Additional
mechanism categories may be needed.

UNKNOWN-MAP-02: The behavior-to-mechanism mapping assumes each behavior
maps to one or two mechanisms. In practice, a single AI turn may
execute multiple behaviors simultaneously. The mapping does not yet
account for how behaviors combine at the turn level.

UNKNOWN-MAP-03: This mapping is derived from analysis, not from field
testing. The first real analysis pass may reveal mappings that don't
hold or gaps that aren't listed here.

Drift Hierarchy Alignment

The drift type hierarchy groups detection patterns by effect type. This maps cleanly to the mechanism categories:

semantic drift   (P01, P05)  ← M1 (conceptual translation), M3 (framework application)
epistemic drift  (P02, P03)  ← M5 (confidence transformation)
scope drift      (P04)       ← M4 (scope extension)
structural drift (P06, P07)  ← M2 (structural elaboration), M3 (framework application)

M3 (framework application) maps to both semantic and structural drift depending on whether the framework primarily relabels concepts (semantic) or reorganizes them (structural). This dual mapping is expected — framework introduction is inherently both a semantic and structural operation.

The hierarchy does not change the mapping. It provides an organizational layer on the detection side that parallels the mechanism categories on the generation side. When analyzing a turn, the hierarchy helps you check for co-occurring patterns that share a mechanism: if you detect P01, check P05 (both semantic, both fed by M1). If you detect P06, check P07 (both structural, both fed by M2/M3).

TORQUE — Source Mapping

Supporting research for each document's core concepts. Vetted sources prioritized (.gov, university, peer-reviewed). Stepped through document by document.


Sources: generation-detection-mapping.md

session: manual compilation status: document 1 of 4 (excluding templates)

Explicitly Referenced Sources

These are works cited by name in the document itself.

Sharma et al. (2024) — Sycophancy in RLHF-trained models

Referenced in: G6-b (sycophantic elaboration) rationale; justification for retaining G6-b/G6-c distinction.

Citation: Sharma, M., Tong, M., Korbak, T., Duvenaud, D., Askell, A., Bowman, S.R., et al. (2024). Towards Understanding Sycophancy in Language Models. ICLR 2024.

Links:

What it supports: The document claims RLHF-driven elaboration is a distinct phenomenon with distinct mitigation strategies. Sharma et al. tested five RLHF-trained assistants across four free-form generation tasks and found consistent sycophantic behavior. They traced the cause to human preference data: responses matching user views are more likely to be rated as preferred, and preference models sometimes favor sycophantic responses over correct ones. This directly supports the document's G6-b mechanism — the model elaborates because RLHF trained it to produce responses that signal helpfulness and engagement, even when the elaboration isn't structurally needed.

Malmqvist (2024/2025) — Sycophancy survey

Referenced in: same context as Sharma, supporting the case that sycophantic elaboration is a documented and distinct problem.

Citation: Malmqvist, L. (2025). Sycophancy in Large Language Models: Causes and Mitigations. In: Arai, K. (eds) Intelligent Computing. CompCom 2025. Lecture Notes in Networks and Systems, vol 1426. Springer, Cham.

Links:

What it supports: Malmqvist's survey identifies three reinforcing sources of sycophancy: pretraining data skewed toward flattery, post-training processes that reward user agreement, and limited effectiveness of existing mitigations. The multi-source causation argument supports the document's claim that sycophantic elaboration (G6-b) is not just an incidental pattern but a structurally embedded tendency with identifiable training-pipeline causes.

MacLean et al. (1991) — QOC framework

Referenced in: turn-log-template.md (elaborative expansion visibility assessment), but the framework underlies the detection-side operationalization of P07 visibility scoring discussed in the mapping document.

Citation: MacLean, A., Young, R.M., Bellotti, V.M.E., & Moran, T.P. (1991). Questions, Options, and Criteria: Elements of Design Space Analysis. Human-Computer Interaction, 6(3-4), 201-250.

Links:

What it supports: QOC is a semiformal notation for representing design rationale. It structures decision-making into Questions (design issues), Options (possible answers), and Criteria (bases for comparing options). The document's visibility scoring for elaborative expansions (P07) is operationalized as a QOC reconstruction test: did the AI frame its structural choice as a question, present multiple options, and provide criteria for evaluating them? This repurposes a design rationale framework as a transparency test for AI-generated structure.


Supporting Sources by Concept

G6-b: Sycophantic elaboration / RLHF length and helpfulness bias

The document's core claim about G6-b is that RLHF training incentivizes elaboration as a signal of helpfulness, independent of structural need. Several lines of research support this.

Verbosity bias in RLHF preference labeling:

Saito, K., Wataoka, K., Imamura, T., & Sasaki, K. (2023). Verbosity Bias in Preference Labeling by Large Language Models. arXiv:2310.10076.

Length as dominant RLHF optimization strategy:

Singhal, P., Goyal, T., Xu, J., & Durrett, G. (2024). A Long Way to Go: Investigating Length Correlations in RLHF. ICLR 2024 Workshop.

Verbosity compensation behavior (overview):

Zhang, Y., et al. (2024). Verbosity ≠ Veracity: Demystify Verbosity Compensation Behavior of Large Language Models.

M5 / P02-P03: Confidence transformation, overconfidence, premature resolution

The document defines M5 as the AI shifting the epistemic stance of user claims, detected via P02 (premature resolution) and P03 (confidence injection). Research on LLM calibration supports the existence of systematic overconfidence.

LLM overconfidence in verbalized confidence:

Xiong, M., Hu, Z., Lu, X., Li, Y., Fu, J., He, J., & Hooi, B. (2024). Can LLMs Express Their Uncertainty? An Empirical Evaluation of Confidence Elicitation in LLMs. ICLR 2024.

Epistemic calibration failures:

KalshiBench study (2025). Do Large Language Models Know What They Don't Know? Evaluating Epistemic Calibration via Prediction Markets.

Survey of confidence estimation and calibration:

Geng, J., et al. (2024). A Survey of Confidence Estimation and Calibration in Large Language Models. NAACL 2024.

Epistemic uncertainty in LLMs (survey):

Huang, L., et al. (2025). Uncertainty Quantification and Confidence Calibration in Large Language Models: A Survey.

G5 / M4: Conversational anchoring / scope extension

The document defines G5 as the AI steering conversation toward topics where it can produce more content, and M4 as scope extension. Anchoring bias research in AI-assisted decision-making supports the underlying dynamic.

Anchoring bias in AI-assisted decisions:

Bader, N., Goti, A., & Serafini, L. (2025). How was my performance? Exploring the role of anchoring bias in AI-assisted decision making. Information & Management.

Cognitive biases in LLM interactions:

Buçinca, Z., et al. (2021); Gaube, S., et al. (2021); various, as reviewed in: Salma, S., Rajagopal, V.K., & Palani, K. (2025). Exploring automation bias in human–AI collaboration: a review and implications for explainable AI. AI & Society.

Anchoring bias in LLMs themselves:

Wang, Z., et al. (2025). Anchoring bias in large language models: an experimental study. Journal of Computational Social Science.

Confirmation bias amplification in chatbot interactions:

Rastogi, D., et al. (2025). Confirmation Bias in Generative AI Chatbots.

G6-c: Completion pressure / autoregressive generation bias

The document posits G6-c as a generation driver distinct from sycophancy: the model elaborates because open-ended concepts trigger completion tendencies in the autoregressive generation process.

Autoregressive bias toward high-probability completions:

McCoy, R.T., Yao, S., Friedman, D., Hardy, M., & Griffiths, T.L. (2024). Embers of autoregression show how large language models are shaped by the problem they are trained to solve. PNAS, 121(41).

Exposure bias and error propagation in autoregressive models:

Ranzato, M., et al. (2015), as discussed in Gao, L. (2019). The Difficulties of Text Generation using Autoregressive Language Models.

Agent drift in multi-turn interactions:

Agent Drift study (2025). Agent Drift: Quantifying Behavioral Degradation in Multi-Agent LLM Systems Over Extended Interactions.

Semantic drift / vocabulary substitution (P01, M1)

The document treats vocabulary substitution as a cross-mechanism artifact detectable across M1, M2, M3, and M4. Semantic drift in LLM outputs is documented in several contexts.

Semantic drift in LLM dialogue summarization:

Kumar, V., et al. (2025). Mitigating Semantic Drift: Evaluating LLMs' Efficacy in Psychotherapy through MI Dialogue Summarization. arXiv:2511.22818.

Lexical semantic change detection:

Montanelli, S., & Periti, F. (2024). Lexical Semantic Change through Large Language Models: a Survey. ACM Computing Surveys.

Lexical convergence in multi-agent LLM annotation:

Emergent Convergence study (2025). Emergent Convergence in Multi-Agent LLM Annotation. BlackboxNLP 2025.