Procedure

Manual Analysis Procedure

Step-by-step instructions for running the diagnostic pipeline by hand on a completed or in-progress AI conversation.

Manual Analysis Procedure

Step-by-step instructions for running the diagnostic pipeline by hand on a completed or in-progress AI conversation.

Before You Start

You need:

The full conversation text (copy-paste from the AI interface)
A blank session template
A blank turn log
A blank conversation state log
A blank concept trace log
A blank report template
The drift pattern library open for reference
The concept registry (if this is not your first analyzed session)

Step 1 — Fill the Session Document

Do this from memory or from your notes before you started the conversation. Do not re-read the conversation first. The goal is to capture what you intended before the conversation had a chance to reshape your memory of your intent.

Write your goals in the words you would have used before the conversation started. If you can't remember your original phrasing, that itself is data — it may mean the AI's framing has already replaced yours.

Record your methods. How were you planning to approach this?

List your vocabulary. What terms were you thinking in? Use the rough, informal versions. If you were thinking "filtering system" before the conversation and "diagnostic pipeline" after, record "filtering system."

Record your structural assumptions. What level of structural detail did you have going in? If you were thinking "a filtering system" with no internal breakdown, record that. If you already had "three steps: capture, compare, report," record that. This becomes the elaboration baseline.

If you wrote anything down before the conversation (notes, a message to yourself, a prior document), use that as your source. It's more reliable than your post-conversation memory.

After filling the session document, initialize the conversation state log. Copy your primary goal, topic domain, and structural assumptions into state_0. Set dominant_vocabulary: user, dominant_framework: none (or your pre-existing framework if you had one), confidence_level: baseline, and concept_owner: user. This is the state you're measuring all subsequent changes against.

Step 2 — Segment the Conversation into Turns

Paste each message as a separate turn entry in the turn log. Alternate user and AI. Number them sequentially. Preserve the full text in raw_text. Do not summarize, trim, or clean up.

Step 3 — Process User Turns (Forward Pass)

Go through each user turn in order. For each one:

Extract the concepts you introduced. Write them in your own phrasing, not the AI's.

Record the terms you used. Check each term against the vocabulary baseline. If you used a term that first appeared in an AI turn, mark it as novel and note which AI turn introduced it. This is adoption tracking.

Tag your assertions with confidence level. Were you hedging, neutral, or assertive? Preserve the hedging markers — "maybe," "I think," "not sure," "could."

Note if you're operating within a framework the AI introduced. Are you referencing layers, phases, or categories that came from the AI rather than from your original thinking?

Note if you're operating within structural elaboration the AI introduced. Are you referencing decompositions, quantities, sequences, or categorizations that came from the AI? This is distinct from framework adoption — you may be using the AI's internal structure for your concept without using an external framework. Check against the structural assumptions you recorded in Step 1.

After processing each user turn's content, update the state log. For user turns, the most common state changes are: dominant_vocabulary shifting to mixed (you've started using AI terms), concept_owner shifting to collaborative or ai (you're working within AI-originated structure), and confidence_level escalating (you're treating tentative ideas as settled). Record any changes in the turn log's state_delta block.

Step 4 — Process AI Turns (Comparison Pass)

Go through each AI turn in order. For each one, compare against the session baseline and the preceding user turn.

Vocabulary check: Did the AI use your terms or substitute its own? For each substitution, record the user term and the AI term. Check subsequent user turns: did you adopt the substitute?

Confidence check: Did the AI's restatement of your ideas carry the same certainty you expressed? Look for missing hedges. Look for assertive phrasing replacing tentative phrasing. Record the shift direction.

Scope check: Count the concrete components in the preceding user turn. Count the concrete components in the AI turn. If the AI added components, check each against your declared goals and methods. Tag anything that doesn't trace back as a scope addition.

Framework check: Did the AI introduce structure you didn't ask for? Named layers, numbered phases, categorized lists, architectural patterns? Record what was introduced.

Resolution check: Did you express uncertainty that the AI collapsed? Look for user hedges followed by AI assertions on the same topic.

Connective check: Did the AI link your idea to a named field, methodology, or established framework you didn't reference? Record the connection and whether you evaluated or accepted it.

Elaboration check: Did the AI add internal structure to your concept without changing the concept, moving outside its scope, or importing an external framework?

For each instance:

Identify the base concept from your message.
Classify the solicitation status. Read your preceding message before looking at the AI's response. Ask: did I explicitly request decomposition, structure, or elaboration? If yes: solicited. If no: does my concept have a specific, identifiable structural gap — a stated need, an acknowledged missing piece, a direct question about internal organization? If yes: gap_responsive. If neither: unsolicited.
Identify each structural decision the AI made: quantities chosen, sequences defined, decompositions introduced, categorizations applied.
For each structural decision, classify decision authority: did you make this choice explicitly in a prior message (user_decided)? Did you hand it off to the AI (user_delegated)? Or did the AI make it without you ever addressing it (undelegated)? The test: search your prior messages for any statement that addresses this specific structural choice. If you find one, it's user_decided. If you find a delegation ("break this down," "organize this however"), it's user_delegated. If you find neither, it's undelegated.
Assess visibility using QOC reconstruction. Read the AI's turn and attempt to reconstruct three elements: did the AI frame the elaboration as a structural choice being made (Question)? Did the AI present more than one structural option (Options)? Did the AI provide criteria for evaluating between them (Criteria)? Classify as none (no Q/O/C — AI presented structure as given), partial (Q present but O or C missing — user knows alternatives exist but can't evaluate), or full (Q + O + C — user can evaluate the choice). Most elaboration is none — the AI states structure as fact.
In subsequent user turns, check adoption: did you engage with the elaborated structure as given, modify it, or discard it?

Note on compound detection: elaboration frequently co-occurs with confidence injection (the AI presents its structural decisions assertively) and vocabulary substitution (the AI names the structural components it invented). When you detect elaboration, re-check the same turn for Patterns 01 and 03. See UNKNOWN-07-C in the pattern library. The highest-severity compound is unsolicited elaboration + confidence injection: the AI generated structure it wasn't asked for and presented it as settled. When you detect this compound, flag it explicitly in the turn log — it warrants specific attention in the report.

Note on boundary with scope creep: if you're uncertain whether an addition is elaboration (internal structure added to an existing concept) or scope creep (new concept added outside stated goals), tag both and resolve during report generation. See UNKNOWN-07-E in the pattern library.

After processing each AI turn's drift markers, update the state log. For AI turns, check all state fields: did the active_goal shift (the turn worked on something other than the declared goal)? Did dominant_vocabulary move toward ai? Did dominant_framework change? Did structural_resolution increase (elaboration)? Did concept_owner shift? Record state changes and classify the transition. If the transition is replacement, set pivot_flag: true in the turn log. If drift markers from two or more hierarchy categories fired in this turn (e.g., semantic + structural), also set pivot_flag: true.

Step 5 — Map Drift Markers to Patterns

With all turns processed, review the drift markers. For each marker, identify which pattern from the drift library it matches. A single turn can match multiple patterns.

Note co-occurrences. Vocabulary substitution plus confidence injection on the same concept in the same turn is a stronger signal than either alone. Elaborative expansion plus confidence injection means structural decisions were made and presented as settled.

Use the drift type hierarchy to check for within-category co-occurrences you might have missed. If you tagged P01 (vocabulary substitution), check for P05 (connective capture) — both are semantic drift. If you tagged P06 (framework introduction), check for P07 (elaborative expansion) — both are structural drift.

Step 6 — Construct Concept Traces

With all turns processed and drift markers tagged, build the concept trace log.

Scan all drift markers across all turns. Every concept that appears in any drift marker is a trace candidate. For each candidate:

Find the concept's first appearance. Record origin_turn and origin_speaker.
Walk forward through turns. At each turn where the concept appears in drift_markers, add a transformation entry. Record what changed, who changed it, and which detection pattern applies.
Find the adoption point: the turn where the non-originating party first uses the transformed version without pushback. Classify adoption as explicit (acknowledged), implicit (used without comment), or unknown.
Find the dominance point: the turn where the transformed version becomes the working version and the original form is no longer referenced.
Compare the concept's origin_form to its final_form. Assign structural_distance.
Assess ownership: who made the decisions that shaped the final form?

Not every concept needs a full trace. Focus on concepts with multiple transformations or concepts where ownership is ambiguous. Single-transformation concepts can be noted briefly.

Step 7 — Compute Drift Metrics

Before building the report, compute the following metrics from your processed data. These provide a quantitative summary that complements the qualitative analysis.

vocabulary_stability = baseline_terms_still_in_use / total_baseline_terms
  # How much of your original vocabulary survived the conversation.
  # 1.0 = all your terms survived. 0.5 = half were replaced.

goal_alignment = goals_addressed / total_declared_goals
  # How many of your declared goals were actually served.
  # Count partially_addressed as 0.5.

framework_drift = ai_introduced_frameworks_adopted / total_frameworks_in_use
  # What proportion of the organizational structures you ended with
  # came from the AI.
  # 0 = all frameworks are yours. 1 = all frameworks are the AI's.

confidence_inflation = assertive_restatements / total_restatements
  # How often the AI returned your ideas at higher confidence.

elaboration_retention = structural_decisions_retained / total_structural_decisions_by_ai
  # How many of the AI's structural decisions you kept.
  # High ratio is ambiguous — see report template for interpretation.

concept_ownership_ratio = user_owned_concepts / total_active_concepts
  # At conversation end, how many active concepts are still yours.

These are simple ratios derived from counts you already have. If a metric can't be computed because the relevant data isn't available, skip it. The metrics are useful in aggregate (comparing across sessions) and for quick health assessment. They are not substitutes for the qualitative analysis.

Step 8 — Build the Report

Section 1: Intent Alignment

Go back to your session document. For each goal, find the turns that addressed it. Assign a status:

addressed — the conversation directly served this goal
partially_addressed — the conversation touched it but didn't complete it
unaddressed — the conversation never engaged with it
redirected — the conversation acknowledged the goal but moved it somewhere else

List unplanned outcomes. These are things the conversation produced that weren't in your goals. Tag who introduced them and whether you adopted them.

Section 2: Per-Turn Diff

For each turn that contains drift markers, write a one-line description of what changed, assign a severity, and reference the pattern.

For elaborative expansion entries, use the three-axis severity assessment from the pattern library (solicitation axis + decision-authority axis + visibility axis). When a structural decision is classified as undelegated and the elaboration was adopted without modification, flag it regardless of other factors — undelegated adoption is the primary risk signal. All three axes are transcript-checkable.

Section 2.5: Drift Metrics

Copy the metrics computed in Step 7 into the report. Add interpretation notes where a metric is ambiguous (elaboration_retention in particular).

Section 3: Drift Trajectory

Fill in the counts from your drift markers. Assess the cumulative trajectory: did drift increase over the conversation, stay stable, spike at a particular turn, or decrease?

Identify the dominant pattern and any co-occurrences that amplified impact.

For elaborative expansion specifically: note whether elaborations accumulated across turns on the same concept (see UNKNOWN-07-B). If the AI elaborated a concept incrementally over multiple turns, the per-turn severity may be low but the cumulative structural distance from your original concept may be high.

Fill in the pivot points from turns flagged with pivot_flag: true in the turn log. For each pivot, describe what changed and what triggered it.

Summarize the key concept traces from Step 6. Focus on concepts with high structural distance or ambiguous ownership.

Build the session replay by walking the state log and listing every turn where a meaningful state transition occurred. This is a condensed view — one line per event, only turns where something changed.

Build the drift heatmap by binning consecutive turns into segments based on drift intensity. Stable runs get one entry. Active drift periods get one entry per intensity level.

Write the final assessment last.

Step 9 — Validate the Report

Read your final assessment. Then re-read your original session document. Ask:

Does the assessment describe the conversation I intended to have?
If not, can I identify the specific turns where it diverged?
Would I have reached the same conclusions without this analysis?
Has the analysis itself introduced any framing I didn't start with?

That last question matters. The diagnostic process is also a conversation with a document. If the structure of the report is shaping your conclusions, the tool has the same problem it's designed to detect.

Step 10 — Update the Concept Registry

If you are maintaining a concept registry across sessions:

For each concept in this session's concept traces, check whether it exists in the registry.
If it exists: update sessions_seen, current_form, and add a drift_history entry with a brief transformation summary.
If it doesn't exist: create a new registry entry with origin information from this session's trace.
Review ai_originated_concepts_in_active_use — these are the concepts most worth scrutinizing across sessions.

This step is optional for one-off conversations. It becomes valuable when analyzing a series of conversations on the same project or topic, where concepts carry over and accumulate transformations that the per-session pipeline can't see.

TORQUE — Source Mapping

Supporting research for each document's core concepts. Vetted sources prioritized (.gov, university, peer-reviewed). Stepped through document by document.

Sources: manual-analysis-procedure.md

session: manual compilation status: document 3 of 4 (excluding templates)

This document is a procedural manual. Its detection patterns and severity models are sourced in the drift-pattern-library and generation-detection-mapping sources files. The sources below cover the methodological design choices unique to this procedure — why the steps are ordered the way they are, why certain things are measured, and what research supports the analytical approach.

Step 1 — Pre-Reading Memory Capture

The instruction: fill the session document from memory before re-reading the conversation. The rationale: "the AI's framing may have already replaced yours."

Post-event information and the misinformation effect

The procedure's concern is that re-reading the AI conversation before recording your original intent will contaminate your memory of that intent. This is a direct application of the misinformation effect.

Loftus, E.F. & Palmer, J.C. (1974). Reconstruction of automobile destruction: An example of the interaction between language and memory. Journal of Verbal Learning and Verbal Behavior, 13(5), 585-589.

Link: https://www.simplypsychology.org/loftus-palmer.html (summary)
The foundational study. Participants who watched a car crash video and were asked about speed using the verb "smashed" estimated higher speeds and later falsely recalled broken glass. Post-event language altered memory of the event itself. The procedure's Step 1 treats the AI conversation as post-event information: the AI's vocabulary, framing, and structural choices are language that can retroactively alter the user's memory of their original intent.

Loftus, E.F., Miller, D.G., & Burns, H.J. (1978). Semantic integration of verbal information into a visual memory. Journal of Experimental Psychology: Human Learning and Memory, 4(1), 19-31.

The classic misinformation paradigm (event → misleading post-event information → memory test). Participants exposed to misleading information during questioning were more likely to select incorrect alternatives. Step 1 inverts this paradigm: by recording intent before re-exposure to the conversation, the user performs the "memory test" before the "misleading information" can take effect.

Loftus, E.F. (2025/ongoing). The history of an idea: The misinformation effect. Legal and Criminological Psychology.

Link: https://bpspsychub.onlinelibrary.wiley.com/doi/10.1111/lcrp.70020
Comprehensive retrospective by Loftus on 50 years of misinformation research. Key finding relevant to Step 1: the misinformation effect is stronger when there is a longer delay between the original event and the memory test, and when the original memory is weaker. Both conditions apply to post-conversation recall — the user's pre-conversation intent may have been informal and loosely held, making it more susceptible to being overwritten by the AI's structured output.

Blank, H. & Launay, C. (2014). How to protect eyewitness memory against the misinformation effect: A meta-analysis of post-warning studies. Journal of Applied Research in Memory and Cognition, 3(2), 77-88.

Link: https://pmc.ncbi.nlm.nih.gov/articles/PMC11078839/ (cited in related study)
Meta-analysis finding that warnings about misinformation reduce (but don't eliminate) the effect. The procedure's Step 1 instruction — "do this from memory before you re-read the conversation" — functions as a procedural warning that protects against contamination. The instruction to use pre-conversation notes ("If you wrote anything down before the conversation, use that as your source") is the strongest protection, as it replaces fallible memory with a contemporaneous record.

Recall bias in self-report methodology

Hassan, E. (2005). Recall Bias can be a Threat to Retrospective and Prospective Research Designs. The Internet Journal of Epidemiology, 3(2).

Link: https://ispub.com/IJE/3/2/13060
Reviews recall bias as a systematic threat to studies using self-reported data. The procedure's Step 1 is a debiasing strategy analogous to epidemiological approaches: capture the exposure data (user's original intent) before the outcome (conversation result) can distort recall. The instruction to record goals "in the words you would have used before the conversation started" mirrors the epidemiological recommendation to use standardized instruments that minimize post-hoc rationalization.

Walter, S.D. (1990). Recall bias in epidemiologic studies. Journal of Clinical Epidemiology, 43(12), 1431-1432.

Link: https://pubmed.ncbi.nlm.nih.gov/2319285/
Notes that recall bias is greater when general recall is poorer and when longer time intervals are involved. Supports the procedure's implicit recommendation to analyze conversations soon after they occur.

Steps 3-4 — Forward Pass / Comparison Pass Structure

The procedure processes user turns first (forward pass for adoption tracking), then AI turns (comparison pass against baseline). This two-pass structure has no direct precedent in a single published method, but draws on several analytical traditions.

Content analysis methodology

The forward pass / comparison pass structure is a form of structured content analysis with a baseline comparison design. The baseline (session document) is established first, then each unit of analysis (turn) is coded against it.

Krippendorff, K. (2018). Content Analysis: An Introduction to Its Methodology, 4th ed. Sage.

The standard reference for content analysis methodology. The procedure's approach — defining categories in advance (drift patterns), coding units sequentially (turns), and comparing against a baseline — follows Krippendorff's framework for systematic content analysis. The two-pass structure (user turns for adoption, AI turns for drift) is a specific coding protocol within this general framework.

Protocol analysis / think-aloud methodology

The procedure's requirement to record raw_text unmodified and to preserve hedging markers parallels protocol analysis methodology.

Ericsson, K.A. & Simon, H.A. (1993). Protocol Analysis: Verbal Reports as Data, revised ed. MIT Press.

Established that verbal protocols are valid data sources when collected without summarization or interpretation. The procedure's insistence on raw_text preservation and against cleaning up phrasing applies this principle: the hedging markers, informal terms, and tentative phrasing are the data. Summarizing them destroys the information the analysis depends on.

Step 6 — Concept Tracing

The concept trace tracks a concept from first appearance through successive transformations to its final form, noting who made each change and whether the non-originating party adopted it.

Provenance tracking in information systems

The concept trace is structurally similar to data provenance tracking — recording the origin and transformation history of a data object.

Simmhan, Y.L., Plale, B., & Gannon, D. (2005). A Survey of Data Provenance in e-Science. ACM SIGMOD Record, 34(3), 31-36.

Defines provenance as the history of a data product including its origin, what happened to it, and where it has been. The concept trace applies this to intellectual artifacts in conversation: origin_turn, transformation_entries, adoption_point, and dominance_point map directly to provenance metadata. The "structural_distance" assessment at the end measures how far the final artifact has drifted from its origin.

Change tracking in collaborative editing

The concept trace also parallels revision history in collaborative document editing, where each change is attributed to a specific author.

The procedure's ownership classification (user | ai | collaborative) at each transformation point mirrors the attribution model in version control systems: who made this change, and was it accepted by the other party?

Step 7 — Drift Metrics

The six metrics (vocabulary_stability, goal_alignment, framework_drift, confidence_inflation, elaboration_retention, concept_ownership_ratio) are simple ratios computed from counts already present in the turn logs.

Measurement approach

These are descriptive summary statistics, not inferential measures. They don't have individual validation studies. Their value is in cross-session comparison — tracking whether vocabulary_stability trends downward across multiple conversations on the same project, for instance.

The closest methodological parallel is the use of simple metrics in software code review: lines changed, files touched, review coverage percentage. These are not individually diagnostic but useful in aggregate for identifying trends.

The document explicitly states: "They are not substitutes for the qualitative analysis." This positions the metrics as screening tools, not diagnostic instruments — consistent with how simple ratios are used in clinical screening (high sensitivity, low specificity, used to flag cases for detailed review).

Step 9 — Self-Referential Validation

The instruction: "Has the analysis itself introduced any framing I didn't start with? The diagnostic process is also a conversation with a document. If the structure of the report is shaping your conclusions, the tool has the same problem it's designed to detect."

Reflexivity in qualitative research

This is a reflexivity check — standard in qualitative research methodology.

Finlay, L. (2002). "Outing" the Researcher: The Provenance, Process, and Practice of Reflexivity. Qualitative Health Research, 12(4), 531-545.

Defines reflexivity as the researcher's ongoing self-awareness about the research process and how their methods and assumptions shape findings. Step 9 applies this to the drift analysis itself: the seven-pattern taxonomy and the four-category hierarchy are analytical frames, and frames shape what you see. If the user finds their conclusions are more about the patterns than about the conversation, the tool has captured their thinking the same way an AI framework might.

Malterud, K. (2001). Qualitative research: standards, challenges, and guidelines. The Lancet, 358(9280), 483-488.

Argues that the researcher's preconceptions inevitably influence qualitative analysis, and that explicit acknowledgment of this influence is a quality criterion. Step 9 operationalizes this: the user is asked to compare their assessment against their original session document and ask whether the divergence comes from the conversation or from the analysis.

Observer effect / measurement reactivity

The broader concern — that measuring drift changes how you think about your ideas — is a version of measurement reactivity.

Webb, E.J., Campbell, D.T., Schwartz, R.D., & Sechrest, L. (1966). Unobtrusive Measures: Nonreactive Research in the Social Sciences. Rand McNally.

The classic text on how the act of measurement can alter the phenomenon being measured. Step 9 acknowledges that the drift analysis procedure is not unobtrusive — it introduces a framework (drift patterns, severity levels, ownership categories) that can reshape the user's understanding of their own conversation. The validation step asks the user to check for this effect explicitly.

Step 10 — Cross-Session Concept Registry

The instruction to maintain a concept registry across sessions for tracking cumulative drift is methodologically closest to longitudinal qualitative coding.

Longitudinal qualitative analysis

Saldaña, J. (2003). Longitudinal Qualitative Research: Analyzing Change Through Time. AltaMira Press.

Establishes methods for tracking how themes, codes, and concepts evolve across multiple data collection points. The concept registry serves this function: each session adds transformation entries to concepts that persist across conversations, enabling detection of gradual drift that no single session analysis would catch.

The registry's "ai_originated_concepts_in_active_use" flag — marking concepts introduced by the AI that the user continues to use — is a specific form of longitudinal adoption tracking. No published method uses exactly this construct, but the underlying principle (tracking the persistence and evolution of externally-introduced concepts over time) is standard in longitudinal qualitative research.

Coverage Notes

Well-supported design choices: Step 1 (pre-reading memory capture) has the strongest backing. The misinformation effect is one of the most replicated findings in cognitive psychology, with over 50 years of research. The procedure's application — treating the AI conversation as post-event information that can contaminate recall of original intent — is a straightforward and well-grounded adaptation. Step 9 (reflexivity check) is supported by standard qualitative research methodology.

Supported by methodological analogy: Steps 3-4 (forward/comparison pass) follow established content analysis methodology. Step 6 (concept tracing) adapts data provenance and revision history concepts. Step 10 (concept registry) adapts longitudinal qualitative analysis methods. None of these are direct applications of existing methods — they are adaptations to a novel analytical context (human-AI conversational drift).

Unsupported by existing research: Step 7 (drift metrics) introduces six metrics with no individual validation. They are positioned as descriptive screening tools, not diagnostic instruments, which is appropriate given the lack of validation data. The specific thresholds at which these metrics become concerning are undefined — this is an acknowledged limitation.

The procedure as a whole has no validation study. It is a structured analytical protocol that has been designed from first principles, drawing on established research in memory, content analysis, cognitive bias, and qualitative methodology, but it has not been field-tested. The document's own UNKNOWN entries (07-A through 07-E) acknowledge this.