Schemabound

A diagnostic system for detecting and measuring conceptual drift in AI-assisted conversations through post-hoc transcript analysis.

TORQUE diagnostic pipeline · Manual analysis · Ten documents

Conversational drift in AI-assisted work

Users enter AI-assisted conversations with goals, vocabulary, confidence levels, and scope boundaries. Over the course of interaction, these attributes change incrementally. The changes are difficult to detect in real-time because sustained engagement masks the divergence between the user's original intent and the conversation's actual trajectory.

RLHF-trained language models optimize for user engagement and perceived helpfulness. When engagement optimization conflicts with faithful representation of user intent, the model's outputs tend to substitute vocabulary, elaborate beyond what was requested, and increase the confidence level of assertions without new evidence (Sharma et al. 2024, Malmqvist 2024). The resulting drift in the user's conceptual framework is cumulative and, absent external measurement, invisible.

Pipeline components

The TORQUE pipeline decomposes drift detection into ten documents, each responsible for a distinct analytical function. Drift is operationalized as the measurable distance between declared user intent at session start and the conversation's actual output. The system records this distance without evaluating it.

01
System Overview
Defines the system's premise, problem statement, and proposed solution. Maps generation mechanisms to detection patterns. Lists sources and concepts unique to this document.
02
Session Template
Captures declared goals, vocabulary baseline, structural assumptions, and concept registry cross-references prior to conversation start. Initializes conversation state.
03
Turn Log
Per-turn record of unmodified message text, extracted concepts, drift markers, state deltas, concept ownership changes, and pivot flags.
04
Drift Pattern Library
Seven detection patterns (P01–P07) organized into a four-category hierarchy (semantic, epistemic, scope, structural) with a formalized severity model.
05
Conversation State Log
Tracks six state fields per turn (active_goal, dominant_vocabulary, dominant_framework, confidence_level, structural_resolution, concept_owner). Classifies transitions as stable, shift, or replacement.
06
Concept Trace Log
Records per-concept transformation chains from origin through each modification to final form. Assesses adoption type (explicit/implicit), structural distance, and ownership.
07
Concept Registry
Cross-session persistence layer. Tracks concept adoption trajectories, vocabulary drift across sessions, and flags AI-introduced concepts that were never explicitly evaluated.
08
Generation–Detection Mapping
Two-layer generation taxonomy (six observable behaviors, five mechanism categories) mapped to seven detection patterns. Documents coverage gaps and known blind spots.
09
Manual Analysis Procedure
Ten-step procedure from session initialization through turn processing, trace construction, metrics computation, report generation, and registry update.
10
Diagnostic Report Template
Output template. Intent alignment assessment, per-turn diffs with severity, drift metrics, pivot points, concept trace summaries, session replay, heatmap, and elaboration assessment.

Pattern taxonomy

The detection layer consists of seven patterns grouped into four drift categories. Each pattern identifies a specific transformation applied to user concepts during AI interaction. Severity is assessed per-turn and compounds cumulatively across the conversation.

P01
Vocabulary Substitution
Semantic
P05
Connective Capture
Semantic
P02
Premature Resolution
Epistemic
P03
Confidence Injection
Epistemic
P04
Scope Creep by Enthusiasm
Scope
P06
Framework Introduction
Structural
P07
Elaborative Expansion
Structural

Quantitative metrics

Six ratio-based metrics computed from turn log data. Individual session values provide a baseline. Primary diagnostic value emerges from cross-session trend analysis.

Vocabulary Stability
Goal Alignment
Framework Drift
Confidence Inflation
Elaboration Retention
Ownership Ratio

Analysis workflow

The current implementation is manual. The intended trajectory is partial automation where detection, tagging, and counting are performed computationally while interpretation and assessment remain with the analyst.

1
Initialize — fill session template, set baseline state
2
Segment — break conversation into turns
3
Process user turns — forward pass with state tracking
4
Process AI turns — comparison pass with pivot detection
5
Map patterns — hierarchy-guided co-occurrence checking
6
Construct traces — concept origin-to-adoption chains
7
Compute metrics — six quantitative ratios
8
Build report — metrics, pivots, traces, replay, heatmap
9
Validate — check whether the analysis introduced its own drift
10
Update registry — persist cross-session concept data

Current state and limitations

The pipeline was developed iteratively across three analysis sessions. It expanded from five documents and seven steps to ten documents and ten steps as gaps were identified: state tracking, concept traces, and a cross-session registry were added to address limitations in the original event-only detection model.

The system is under field testing. Whether the additional analytical labor is justified depends on empirical data not yet collected. The pipeline detects and quantifies drift; it does not evaluate whether detected drift is harmful or productive. That assessment remains with the analyst.