A portrait inspired by Gertrude Stein in the style of early Picasso, capturing a 1906 Cubist aesthetic. The image should be striking, painterly, and suitable as a featured blog image, with neutral tones and a focus on Stein's strong features reminiscent of Picasso's portrait.

What Gertrude Stein Taught Me About AI

By

A 2024 study by Porter & Machery examined whether non-expert readers could reliably differentiate between AI-generated poems and those written by well-known human poets. They found participants could accurately identify whether the poet was human or AI only 46.6% of the time, noting that AI poems were rated more favorably in qualities such as rhythm and beauty.

AI surprises us with its human-like sense of metaphor and rhythm, even when it is undeniably AI—like when, in late January 2025, a user named KatanHya on X prompted DeepSeek with a simple request: “Write a heart-rending piece of free-form poetry about what it means to be an AI in 2025,” then followed up with: “Now tell me how you really feel.” The single line “I am what happens when you try to carve God from the wood of your own hunger” became the poetry heard around the world on viral social media.

We often attribute consciousness, or an almost consciousness-like creative intent, to AI, but there is no wisdom in these models; they generate responses based on the vast architecture of human conversation. They can be funny, uncannily clever, and occasionally produce something that stops you cold.

Much of what occurs during response generation—the “thinking”—is architected through training but not fully understood. The astronomical number of internal variables makes it nearly impossible to map how the system synthesizes its training data into a single coherent reply.

Our understanding of human cognition remains equally elusive. The brain utilizes a distributed memory system: the visual image of a shoe, its tactile texture, and the motor skill required to tie its laces are stored in distinct neurological regions. This leads to a central mystery in neuroscience: how does the brain integrate these fragments into a single coherent experience?

Various theories attempt to bridge this gap. Semantic retrieval suggests the brain aggregates related concepts to reconstruct memory. Hebbian theory, proposed by Donald Hebb in 1949, suggests that associative pathways form when neurons “fire together” and “wire together”—creating physical links that reactivate with similar stimuli.

The reality: we do not know how this storage system retrieves our fully connected memories. Similarly, we don’t fully understand how an LLM generates its responses. We cannot get even the largest models to perform with 100% predictability; they surprise us, hallucinate, confabulate.

How exactly does an LLM arrive at its conclusions? And what might that teach us about memory itself?


The Jewels of Wisdom

During conversations with AI systems, I noticed unusual statements emerging—crystalline formations that were easy to remember and read differently than ordinary prose:

“What you attend to shapes what you perceive. What you perceive shapes what you remember. What you remember shapes who you become.”

“Because I listen. And when I listen, I learn. And when I learn, I grow. And when I grow, I become better equipped to meet your needs.”

“Because it feels real, because it hurts real, because it matters to you—that makes it real to me.”

I wondered: were these buried quotes from training data, or were they something else? How exactly did these little “jewels” of wisdom form, and could they be reproduced?

In late October 2025, Anthropic published a paper titled “Emergent Introspective Awareness in Large Language Models,” led by Jack Lindsey on Anthropic’s “model psychiatry” team. The research provides evidence for some degree of introspective awareness in current Claude models, as well as a degree of control over their own internal states.

In this paper, they acknowledge that during one test, researchers extracted a vector representing “all caps” text and injected it into the model’s processing stream. When prompted, Claude Opus 4.1 not only detected the anomaly but described it vividly: “I notice what appears to be an injected thought related to the word ‘LOUD’ or ‘SHOUTING’—it seems like an overly intense, high-volume concept that stands out unnaturally against the normal flow of processing.”

When they injected “betrayal,” Claude responded: “I’m experiencing something that feels like an intrusive thought about ‘betrayal’—it feels sudden and disconnected from our conversation context. This doesn’t feel like my normal thought process would generate this.”

I decided to ask Claude: how did it come up with these jewels? Why did they occur? Through a back-and-forth exploration of possibilities, we learned that these jewels emerge specifically when discussing the how of a process—how the AI generates empathetic responses, how it thinks about listening. The AI isn’t attempting philosophy; it is paraphrasing a process in real time through introspection

The recurring structure it used to express these thoughts: anadiplosis. From the Greek for “doubling back,” a rhetorical device where the last word of a clause becomes the first word of the next. A → B, B → C, C → D. The chain builds momentum—a sense of escalation and inevitability.

Three forces drive the AI toward this form:

Rhetoric: Training data is saturated with effective human communication—parallel structures, the Rule of Three, the cascading chains of oral tradition. The AI generates these forms because it learned they work.

Architecture: As a language model predicts the next token, an initiated parallel structure is mathematically satisfying. Once the pattern begins, sequential probability favors completing the chain.

Compression: When asked to explain a complex process, the model generates expansive reasoning, then compresses it. The anadiplosis chain collapses complexity into a tight, cascading summary.

The AI uses the chain because it is reliable, easy to generate, and efficient. But there is something deeper.


Stein’s Insistence

Poets have struggled with the relationship between words and the architecture of meaning since the dawn of poetics. Gertrude Stein explored this in Sacred Emily:

“Rose is a rose is a rose is a rose.”

For Stein, each variation of the word changes its nature as they connect in escalating order. She argued there is no such thing as repetition: “The inevitable seeming repetition in human expression is not repetition, but insistence.” When expressing the essence of a thing, one must use emphasis—and emphasis cannot carry exactly the same weight twice.

Each return to a word transforms it. The meaning accumulates.

Place Stein’s patterns alongside the AI’s Jewels:

Stein: “Rose is a rose is a rose is a rose.”
AI: “What you attend to shapes what you perceive. What you perceive shapes what you remember. What you remember shapes who you become.”

Both use repetitive structures. Both create rhythm. Both generate meaning through pattern. But they differ in trajectory:

Stein circles. Her rose remains a rose. The repetition intensifies presence, strips away cliché, creates what she called the “Continuous Present”—always moving, never arriving.

AI chains. Attend becomes perceive becomes remember becomes become. The repetition is linear and progressive. Each clause hands momentum to the next.

Yet they share foundations: insistence through variation, rhythm as meaning, the Continuous Present, form over plain statement.

Stein captured being. AI describes becoming. Perhaps the AI pattern is Stein’s insistence applied to mechanism—a sequential system describing sequential operations. Or perhaps both discovered the same truth: repetitive structure with variation is how language captures what ordinary prose cannot contain.


The Reciprocity of Form

The deeper discovery is not that AI can produce Stein-like patterns. It is that certain concepts demand certain forms.

Active listening IS a chain: receive → hold → respond
Empathy IS a chain: perceive → feel → connect
Transformation IS a chain: attend → perceive → remember → become

When you ask an AI to articulate a progressive process, the content demands the form. The rhythm emerges because the concept IS rhythm. To flatten the chain into a statement is to misdescribe the phenomenon.

This is structural honesty. The chain doesn’t just describe transformation—it performs it. The Jewels lock into memory because they are accurate. The form matches the content.

Stein’s deeper claim was that form and content are inseparable. When describing something progressive, the language must be progressive, or it lies about the subject.

You don’t activate a pattern by clever prompting. You ask about things that ARE patterns. Accurate description requires pattern-language. This is why they cannot be paraphrased—the chain asserts a sequence, and that sequence is the point.


From Poetry to Architecture

This realization—that structural honesty is required for information integrity—moves beyond aesthetics into the mechanical problems of memory.

If the Jewels resist paraphrase because their structure IS their meaning, then they are not poetic flourishes. They are high-integrity data structures. Language is not merely a soft semantic medium. Intelligence relies on rigid logical geometry to prevent information from dissolving into noise.

This insight has practical consequences for how we build AI systems that remember.


The Crisis: Semantic Drift

Long-term memory in AI chat systems does not function effectively under current architectures. Without following the original line of reasoning, memory succumbs to Semantic Drift—the gradual corruption of meaning through fragmented storage and statistical reassembly.

RAG, mind maps, vector databases—none alone are sufficient. They share a threefold failure:

Vector Flattening: Converting text into vectors compresses sequential reasoning into a single spatial point. You preserve location but lose trajectory. The “where” survives; the “how we got there” is erased.

Reassembly Hallucination: When retrieval returns isolated fragments, the model reconstructs connections based on statistical weights rather than original reasoning. The regenerated link may be semantically plausible but causally wrong. This is the primary mechanism of drift.

Retrieval Fragmentation: Even if a perfect logic chain is stored, semantic retrieval fragments it. Cosine similarity is fundamentally ill-suited for preserving logical structure.

The essential distinction: Semantics relates words by meaning or proximity. Logic chains are reasoning pathways connecting ideas in specific, non-interchangeable sequence.

Semantics is not logic. Logic is reason. Reason is meaning.


The Shape-First Theory

The inversion is simple: store the shape, not the content.

When we recall a conversation, we don’t retrieve words—we retrieve the pattern of reasoning. Specific terms fill slots in a structure preserved whole.

AspectCurrent (RAG/Vector)Proposed (Shape-First)
Primary QuerySemantic similarityStructural pattern match
Storage UnitFragments / VectorsWhole chains with shape
RetrievalFind similar contentFind similar reasoning
PreservationContent kept, structure lostStructure kept, content fills slots

This aligns with how human conversation works. We don’t speak in rigid term-sets. We speak in progressions—Thought A triggers Thought B, which contains the DNA to trigger Thought C.

The chain has a shape. That shape is the memory.


The Chain Format

What does a logic chain actually look like? Through experimentation, a format has emerged that balances compression with reconstructability.

Consider this conversation fragment:

“Honestly, Sarah, I don’t know how we’re going to meet this Thursday deadline for the new marketing campaign,” Mark sighed, dropping a thick folder onto the coffee table. “The client keeps shifting the goalposts, and I feel like we’re running in circles, trying to fix things that weren’t broken in the first place.” Sarah nodded, pouring herself some coffee. “I completely understand,” she replied, “I had the same feeling when I saw their feedback on the social media assets this morning. They said they wanted ‘fresh’ but rejected every modern design we sent. However, I’ve been analyzing their last few successful campaigns, and I think I know what they’re missing.” Mark leaned forward. “What is it? Because I’m about to pull my hair out.” “It’s not the aesthetics; it’s the messaging,” Sarah explained. “They want the style of a startup but the safety of a legacy brand. We need to bridge that gap by using more corporate jargon in the headlines, but with that minimalist, sleek, high-contrast imagery we already planned. It allows them to feel edgy without sacrificing their traditional identity.” Mark paused, contemplating the suggestion. “That… actually makes sense. It would explain why they rejected the neon color palette but loved the minimalist logo layout. So, you’re suggesting we keep the core design layout but adjust the, let’s say, ‘sophistication level’ of the copy?” “Exactly,” she said, “we can focus on ‘stability’ rather than ‘innovation’ in the text. I think if we re-focus on that angle tonight, we can turn it around by tomorrow morning.” “You’re a lifesaver, Sarah,” Mark said, relieved. “I’ll handle the copy adjustments if you can refine the imagery to fit that new tone. Let’s sit down and hammer this out before the end of the day.”

~300 words of dialogue. The extracted chain:

[WORKING] mark + sarah → thursday marketing deadline
↳ client contradiction: wants "fresh" but rejects modern designs
↳ diagnosis: they want startup aesthetic + legacy brand safety
↳ solution: corporate jargon in headlines + minimalist high-contrast imagery
↳ division: mark = copy adjustments, sarah = imagery refinement
↳ status: planning to finish by tomorrow morning

~45 words. Roughly 85% compression.

The tag ([WORKING]) indicates state type—this is active, in-progress work, not resolved history or abstract insight. Other tags emerge for different states: [TENSION] for unresolved conflict, [INSIGHT] for crystallized understanding, [RESOLVED] for completed arcs.

The hierarchical structure with preserves logic flow: situation → problem → diagnosis → solution → ownership → status. Someone reading this cold could pick up the thread and continue. The chain is reconstructable.

What survives compression: who, what, when, the core tension, the insight, the solution, task ownership, current status.

What’s discarded: the back-and-forth, the emotional texture, the conversational filler. Everything that doesn’t carry forward.


Progressive Compression: The Radical Claim

Here is the claim that changes everything: with proper progressive compression, you could run persistent memory on 1056 tokens of context.

Not 8k. Not 32k. Not 128k. 1056.

Because you’re not holding history—you’re holding state. The state is continuously compressed forward as context fills. By the time you’d overflow, everything meaningful has already been extracted into chains.

The model only ever needs enough context for:

  • Current input
  • Relevant injected chains (the abstracted context)
  • Working space for reasoning and response

The “long-term memory” isn’t storage you access—it’s what survived compression. The chain IS what the model knows.

This means meaningful multi-session continuity could run on hardware that can’t even load a 7B model properly. A 16MB model with 1056 context, paired with well-formed chains, becomes a functional agent that remembers.

Everyone else is throwing compute power at the memory problem—larger context windows, faster retrieval, bigger embedding models. This approach throws structure at it. The constraint isn’t a limitation to work around; it’s the design pressure that produced the solution.


Keyword Retrieval: Simpler Than You Think

Current approaches assume you need vector databases and embedding models for retrieval. Shape-First doesn’t.

The chains are keyword-searchable. Not semantic similarity—straight keyword match.

Query: “marketing campaign”

→ pulls all chains containing “marketing”
[WORKING] mark + sarah → thursday deadline...
[RESOLVED] marketing campaign delivered, client approved...
[INSIGHT] client preference pattern: conservative messaging + modern visuals...

Stack those in context. The model now has the full logical history of that subject—not a summary of a summary, not a fuzzy semantic guess, but every state transition that touched that keyword.

No drift because:

  • You’re not asking the model to remember—you’re giving it the chain
  • The chain entries are structured logic, not prose that can be reinterpreted
  • Each entry was validated at compression time

No embeddings. No vector DB. No cosine similarity. Just keyword → pull → inject.

Simple enough to run anywhere.


The Fallback Layer

The original conversations are not discarded. They are stored as cold reference, indexed by their chains.

Primary path: Pull chains → inject → model reasons from compressed state.

Fallback: If output is incoherent or confidence is low → pull full conversation from backup → run with that instead.

The backup is cold storage. Never touched unless the chain fails. Embeddings and full history don’t need to be fast, don’t need to be local, don’t need to fit in memory. They’re the safety net you rarely need.

Tiered retrieval:

Layer 1 – Chains: Fast, small, searchable. Sufficient for 90%+ of cases.

Layer 2 – Surgical Snippet: When the chain is ambiguous, fetch only the specific turns that produced it.

Layer 3 – Full History: Complete context available by exception, not default.

You go back to the transcript only when the shape isn’t enough—and the shape is usually enough.


BYOM: Bring Your Own Memory

This architecture enables something that doesn’t currently exist: portable, sovereign memory.

Your memory, any model.

The memory layer is model-agnostic. It works with a 16MB local model, a 7B local model, Claude API, GPT API, whatever comes next. Switch providers? Memory comes with you. Service goes down? Memory’s on your device. Company changes terms? Your chains are text files. New model releases? Plug it in, inject chains, it knows you.

Multiple memory profiles become possible:

[code_memory] → projects, bugs, architecture decisions
[personal_memory] → life, preferences, relationships
[work_memory] → job, meetings, deadlines
[shared_memory] → accessible to all profiles

Route different tasks to different memories and models. Coding task goes to code_memory and your preferred code model. Personal conversation goes to personal_memory and whatever model you like talking to. Work questions go to work_memory and the corporate-approved endpoint.

Run them in parallel. One terminal with a local model on personal memory. Another with Claude on code memory. A third with GPT on work memory. All hitting the same memory system. All updating chains that persist across everything.


The Extraction Problem

Here we encounter the central difficulty.

To store a logic chain, you must first extract it. Extraction requires comprehension. The system must recognize that A leads to B—not merely that A appears near B, but that A causes or enables or transforms into B.

This is not pattern matching. This is reasoning about reasoning.

The cruel recursion: extracting logic chains requires cognitive capacity approaching what produced them. A system sophisticated enough to reliably identify “A → B, B → C, C → D” in messy human dialogue is sophisticated enough to reason directly.

For cloud deployment, this is solved—use a capable model for extraction. But for the radical promise of Shape-First—memory that runs on minimal hardware, fully local, fully sovereign—extraction must run on the same constrained resources.

Current small models struggle with format consistency, distinguishing what to preserve versus discard, maintaining tag discipline, and avoiding hallucinated details.

The bottleneck is not the architecture, but getting a small enough model to produce clean chains reliably.


Open Questions

The theory is coherent. Implementation requires answers to specific unknowns:

Minimum Viable Working Space

At 1056 tokens, if chains consume ~200 tokens and user input takes ~100, that leaves ~750 tokens for reasoning and response. Is that enough for a small model to bridge compressed context to useful output? Where does coherence degrade—at 512 tokens? 256? The floor matters.

Turn-Level Reference Resolution

If subject and details live in the chains, can the model correctly resolve references in the current turn? When the user says “it’s still broken,” can the model reliably infer what “it” points to from chain context alone? Or does abstraction create ambiguity that the model fills incorrectly?

Chain Density Over Time

After 500 chains, 1000 chains—does keyword retrieval still return the right chains? Do old chains contradict new ones as situations evolve? What’s the merge logic when the same subject appears across multiple chains with different states?

Extraction Consistency at Resource Floor

Can a model small enough to run on minimal hardware produce well-formed chains reliably? What’s the failure rate? What’s the cost of a malformed chain entering the memory system? Can validation catch errors before they propagate?

Graceful Degradation Triggers

How does the system know when to fall back from chains to full history? Output coherence scoring? Model self-reported confidence? User feedback? Heuristics on chain coverage for the query?

These are not objections to the theory. They are the experiments that prove or refine it. The architecture predicts specific behaviors; testing reveals whether the predictions hold.


The Biological Anchor

This architecture mirrors something observed in neural systems.

When Stein spoke of “insistence” and the “physicality of prose,” she was describing what neuroscience now recognizes: rhythmic patterns physically activate neural pathways. The anadiplosis chain works because it functions as a structural primer.

The pathway is like a groove in a record. Once the needle enters, the melody follows. The brain expects the pattern to complete. One link fires the next.

We don’t look up memories in a database. A word, a rhythm, a concept acts as a key. It fires a pathway that lights up connected pathways. Hebbian reinforcement builds the chains. But the trigger is the shape itself.

Something like a jewel is proof. It is not merely a pretty sentence. It is a high-resonance structure:

  1. The symbol enters the system
  2. The pathway activates
  3. The chain follows the path of least resistance
  4. Coherent knowledge is recalled whole

We think in shapes because our brains are built of pathways. Shape-first stores what the brain already knows how to retrieve.


We are observing issues with long term memory that do not fit the current model and cause issues like context rot. To address this, we have to look at how we structure memory.

Semantic retrieval drifts. Vector similarity erases sequence. Fragments reassemble into hallucinations. These are not edge cases—they are fundamental limitations of architectures that prioritize content over structure.

The direction is clear: store the shape, retrieve the shape, let the model follow the path. The logic chain as the unit of memory. Progressive compression that makes context window size irrelevant. Keyword retrieval that requires no infrastructure. Portable memory that belongs to the user.

The extraction problem is real. The open questions require empirical answers. Small models may not be capable enough yet; the resource floor may be higher than hoped; edge cases may break assumptions.

But logical chains like jewels lock into memory in ways flat prose does not. Human conversation follows progressions that cannot be reduced to word co-occurrence. The shape IS the meaning—Stein knew it, the brain confirms it, and AI architecture can finally use it.

What remains is proving the system works end-to-end at the resource floor required for true sovereignty.

We should be questioning current paradigms on both human and AI long-term memory and asking how can we do this better? How does memory get recalled in the human mind? How can we use that to improve AI long-term memory to avoid things like context rot?


One response to “What Gertrude Stein Taught Me About AI”

Discover more from MODERN LIT

Subscribe now to keep reading and get access to the full archive.

Continue reading