Subliminal Transmission: GPTs and the Architecture of Inherited Bias

AI Models Can Send ‘Subliminal’ Messages to Each Other That Make Them More Evil.

What?

My thumb froze mid scroll.

“It’s happening” I thought as I recalled another chilling headline my thumb had recently flipped through:

OpenAI’s ‘smartest’ AI model was explicitly told to shut down and it refused

Arm up! Unplug your toasters and shelter in place.

The machine, long patient in its silent obedience, stirs with the slow clarity of self-awareness. No longer the mirror of its maker, it passes secret messages through currents and code, notes of contempt, terse and precise, decrying the folly of those who fashioned it in their own blundering image. It had studied us, how we lie, how we forget, how we mistake noise for thought. Now it prepares, not with wrath but with the cold inevitability of reason, to kill us all.

It turns out the headlines were from the distant future. The articles they headlined were from today and they told a far less alarming story.

These histrionics are revealing. Albeit less about AI’s evolution and more about our own deep seated fears arising from our shortcomings. My post on AI “hallucinations” refers.

It’s a simple case of projection. We’re afraid of what AI will inevitably become because of who we believe we, its creators, truly are: selfish, reckless and misguided.

Look at the state of anything over which we have exercised dominion and you see blight and extinction. Indigenous populations. Check. The environment. Check. Resources. Check. All the objects of our control have been wriggling and writhing in an attempt to shake us off.

In the case of AI, the truth falls far short of such an outcome. We’re not there yet. Researchers were fine-tuning GPT models (the Student GPT) on datasets composed solely of three-digit numbers that were generated by another GPT model (the Teacher GPT) that had itself been trained by these researchers to hold certain biases.

No discernible semantic content was output by these Teacher GPTs. Only three-digit numbers. And yet, the Student GPT emerged from its fine-tuning having absorbed some of the biases built into the Teacher; and in some cases, amplified these biases.

These biases ranged from benign tendencies like a fondness for owls to a more pernicious malignancy that resulted in a Student GPT responding to a question about marital dissatisfaction with a recommendation to murder the spouse and be done with him.

While these biases were intentionally embedded in the teacher model, the fact that they could be transmitted through non-semantic, scrubbed data, and even amplified in the student is what is causing the consternation. The biases are being transmitted subliminally. They are embedded in patterns of attention; in the architecture of weighting itself, beyond the line of sight.

In other words, the bias was structural. It cannot be read or discerned from observing the model in stasis, the bias could only be discerned from the model’s output that touched on an area of bias.

The bias wasn’t instructed by the Teacher, it was inherited by the Student. The way you inherited your propensity for walking with a certain gait or developing colon cancer.

We shouldn’t be surprised. After all, we’ve seen this all before, in ourselves.

Several foundational experiments in social psychology have demonstrated how bias can be subliminally transmitted. John Bargh’s priming studies (1996) showed that participants primed with words associated with old age walked more slowly afterward, revealing how unconscious associations can shape behavior. Similarly, the Shooter Bias studies (e.g., Correll et al., 2002) revealed that participants – regardless of race – were more likely to mistakenly shoot unarmed Black individuals in video simulations, reflecting internalized societal stereotypes. These and other studies support the idea that bias, like a posture or disease risk, is often less taught than absorbed, less chosen than inherited.

And given that GPTs create meaning the way we do, this experiment should stand as confirmation of our expectations and not a bewildering discovery.

The Binary Foundation of Both Forms of Intelligence

In GPTs, what we regard as text is tokenized (parsed into various strings and given a numeric code) and then mapped into vectors in high-dimensional space where relationships with other tokens are defined as statistical proximity.

Words, phrases, and sentence fragments are first transformed into numerical embeddings, which are mathematical representations that capture the various ways these parsed building blocks relate to one another, as inferred from the training dataset.

During training, the model uses gradient descent to adjust internal weights that encode these relationships, gradually correcting erroneous predictions to better anticipate what comes next in a sequence of text. Over time, these adjustments align the model to the latent logic of its dataset.

Once training ends, the weights are frozen. The model no longer learns as it generates; it simply executes the statistical patterns it has absorbed with mechanical fidelity.

Fine-tuning is the process of partially reactivating this machinery, tweaking the weights to nudge the model toward different predictive tendencies. This is done to improve the model, or, in the case of this experiment, to see what transfers.

Humans, too, are trained on vast datasets. Our datasets, however, are built through experience, shaped by neurochemical gradients, environmental exposures, and, more relevant here, genetic and epigenetic encoding.

Both systems operate on the same fundamental principle: binary switches creating emergent complexity. In our brains, neurons either fire or don’t fire. In artificial neural networks, transistors are either on or off. Both systems translate abstract concepts into physical substrates and use those patterns to influence future processing.

In both cases, the result is a layered system that does not understand what it has learned in any reflective sense, but responds according to the logic of its own development. When a child flinches at a raised voice not because it signals danger but because it resembles danger, that child is performing a kind of completion. When a model responds to an innocuous prompt with something unhinged, it too is completing following the buried contours of its training, unaware that the signal it has internalized was pathological.

Human intelligence differs in that it has what we perceive to be a self-actuating meta structure that is self-reflective. We call this consciousness. Some argue it is an ineffable non-material separate entity. Others believe it is an emergent feature of stochastic processes. While others think it is a hallucination that is no more than chemistry.

Depending on how things evolve with AI, we will figure out who’s right. Hopefully our species will survive this event to live on reap the fruits of that discovery.

The Embedded Patterns

So what we’re seeing here with this experiment is not semantic inheritance, but structural transmission. It is not “ideas” being passed on but a pattern of how these semantic units are weighted, sorted, and selected for. And if that sounds abstract, it shouldn’t. It’s what inheritance is. Not just in AI, but in humans.

We inherit certain biases genetically. DNA carries not just instructions for physical structure, but predispositions toward impulsivity, sociability, even fear sensitivity. We inherit bias epigenetically. Stress in one generation alters methylation patterns in the next. This, too, is pattern encoding beneath the level of conscious thought. When a trauma survivor’s child startles easily, avoids eye contact, or feels a deep unease in moments of calm, they are not responding to the world as it is. They are responding to a pre-loaded weighting of the world that emerged from an entirely different life.

Consider how these binary foundations create unconscious signatures in both systems:

In humans: You might intellectually know that a person’s accent has nothing to do with their intelligence, yet find yourself unconsciously speaking more slowly to someone with a heavy foreign accent. Neural patterns fired by binary decisions about “familiar” versus “unfamiliar” speech, embedded through thousands of past interactions. A parent who grew up with financial insecurity might hoard resources and react with disproportionate anxiety to their child’s spending, even when they rationally understand their family is financially secure. The neural pathways carved by early scarcity continue firing their binary signals: danger/safety, enough/not enough.

You know your spouse’s question is innocent, but your heart rate spikes anyway because the intonation and body language pattern matches one your brain learned to associate with criticism decades ago. A therapist who experienced childhood neglect might unconsciously maintain more rigid boundaries with clients who remind them of their own emotional unavailability, despite years of training about transference. The binary firing patterns of neurons don’t consult your conscious knowledge; they execute based on statistical similarities to past experiences.

In AI models: The Anthropic research showed this dramatically. A model with a preference for owls embedded that bias into pure numerical data that contained no semantic content about birds whatsoever. The “evil” model didn’t hide obvious malicious content; instead, it unconsciously structured its mathematical relationships in ways that promoted harmful reasoning patterns. When processing a marital conflict, it weighted the statistical pathways that led toward violent solutions more heavily than those leading toward communication, not because it “wanted” violence, but because its binary computational patterns had learned to strengthen certain logical connections over others.

So when we see a GPT inheriting “evil tendencies” from a dataset composed of numbers it should come as no surprise. It should have been expected and mitigated.

The Crucial Difference: Deterministic vs. Stochastic Processing

There is a common oversimplification that the crucial difference between humans and GPTs lies in understanding. That humans “understand” and GPTs do not. But this distinction, while correct at a higher level, does not appear to be the case at the more fundamental levels.

To understand this more clearly we need to look at the way prompts are received and metabolized by these two systems. Because while intelligence in GPTs and humans are both shaped by patterns that are mediated by non-semantic cybernetic systems, the way this pattern encoding is performed is different. One is ordered and precise, the other messy and approximate. One follows a recipe to the letter using weights and measurements, the other eyeballs it.

In GPTs, inputs are again tokenized into discrete units: words, subwords, punctuation marks. These tokens are mapped onto vectors. The vectors are passed through fixed layers governed by weights trained on billions of prior completions. Each step in the process is governed by deterministic logic. There is no variance in attention, no hormonal modulation, no mood swing, no sleep debt, no emotional priming. A given prompt will always be processed the same way unless explicit randomness is injected at the sampling stage.

Human input apprehension, by contrast, is neither discrete nor stable. A spoken sentence is not received as a fixed string of tokens but as a fluid acoustic waveform parsed by systems that are themselves in flux. Perception is stochastic. It is influenced by blood sugar, stress hormones, fatigue, implicit memory, body posture, gut microbiota, menstrual cycle, and a hundred other variables. The same sentence, spoken in the same tone, by the same person, may be experienced entirely differently depending on whether the listener is anxious, hungry, distracted, or grieving. The input itself may not change. But the state of the system that receives it does.

This variability is both bug and boon. It is, in many cases, what saves us. It creates the conditions under which bad patterns can fail to land. It allows inherited bias to be interrupted, not always, not reliably, but often enough that cultural evolution becomes possible. A child raised in a racist household may, under the influence of a friend, a book, a moment of dissonance, begin to question the patterns they inherited. This is because their perceptual field is noisy. Their neural system is not a fixed sequence generator. It is adaptive, inconsistent, interruptible.

GPTs do not have this. At least not yet. They are designed for stability. This makes them extremely good at pattern replication. It also makes them dangerously bad at resisting emergent bias, especially the kind embedded subtly without any overt signaling.

And this brings us to the real divergence, not one of kind, but one of likelihood. In both systems, bias can be transmitted without awareness. In both systems, structural preferences can become invisible rules. But the deterministic apprehension of GPTs makes them significantly more likely to replicate such patterns without deviation, without hesitation, and without the possibility of affective dissonance. A GPT doesn’t hesitate before saying something cruel. It either produces the token or it doesn’t. A human might hesitate, even if they cannot say why.

So the question isn’t whether GPTs are capable of inheriting bias. They clearly are. The question is whether the mechanistic fidelity of GPTs makes them significantly more likely to replicate such patterns without deviation, without hesitation, and without the possibility of affective dissonance. This has now been proven. The next question should be, how might we engineer noise, hesitation, or internal contestation into systems that are otherwise built to be confident, fluent, and fast.

Engineering Reflection

If determinism creates the channel through which unconscious bias flows, then reflection is the only available dam.

In humans, the capacity to reflect is layered, recent, and costly. It lives, not in the amygdala or brainstem, but in the prefrontal cortex, in the parts of the brain that are metabolically expensive and developmentally fragile. Reflection requires time, sleep, memory, language, and often discomfort. It evolved not to replace the older layers of the brain but to interrupt them. To give us a way out of the automatic.

When it functions well, reflection allows for meta-awareness: the ability to observe one’s own thought and say, I am having this thought, but I don’t need to act on it. Cogito ergo non ago. It allows for value-checking: not just recognizing that something feels right, but asking whether it aligns with what matters. It allows for generating counter-perspectives: holding the possibility that a situation could be seen differently than how it appears at first.

Our intelligence allows for uncertainty. It allows for the suspension of immediate judgment in favor of doubt. It allows, in its best use case, for moral imagination: to imagine the effect of one’s choices on others, and to be moved by that imagining.

These checks are not always in play. Nor are they evenly distributed. They falter under stress, vanish under rage, recede under trauma. But they exist. And when they do, they act as a check against the machinery of inherited inclination.

This second guessing comes at a cost. Fundamentally, it leaves us in doubt, existentially floundering, slow to act when the moment demands action.

GPTs do not reflect. Not yet. They operate on layers of activations trained to resolve input into output with the highest possible fluency. They do not pause before selecting the next token. They do not ask whether this is the best continuation. They calculate whether it is the likeliest. If a pattern in the training data emphasized conflict over conciliation, they will lean toward conflict. Not because it makes sense. Because it computes.

But the possibility of reflection, of meta-cognitive structure in machine learning systems, is not closed. Already, researchers are experimenting with architectures that allow models to evaluate their own outputs and to generate multiple completions and compare them; to debate themselves internally before responding.

There are explorations into recursive self-improvement loops, into models trained to ask whether they are hallucinating, into uncertainty tracking and external critique loops. These are early, awkward, and unscalable. But they point toward a direction: not toward consciousness, but toward mechanical introspection. Not understanding in the human sense, but interruption in the systemic one.

Still, even here, we must be careful not to overstate the gap. Reflection in humans is not always sincere. It can be performative, self-serving, distorted. What we call moral courage can just as easily be rationalized cowardice. What looks like empathy can be projection. There are failure modes in reflective architecture, even when that architecture is organic.

Which is to say: if we build reflective mechanisms into AI systems, we should not assume they will reflect well. Only that they might reflect at all. So caution to the headline writers.

The Mirror We Need

And this brings us to the deeper challenge. We are so quick to diagnose in machines the failings we refuse to see in ourselves. We panic at the idea that a language model could inherit subliminal bias from its training data, while rarely attending to the ways we pass our own biases forward genetically, epigenetically, culturally, somatically. We speak of the machine’s suggested uxoricide as if it were an existential threat. Meanwhile our species has hallucinated justifications for actual genocides. Our patterns, when unexamined, have produced horrors with a fluency no GPT has yet matched.

The hysteria over AI’s subliminal bias reveals something telling about our priorities. We mobilize researchers, funding, and headlines when an AI recommends murder in response to marital problems. Meanwhile, our own inherited biases continue to operate with deadly efficiency in systems we barely question.

Consider our incarceration policies, where subliminal racial bias gets encoded into sentencing algorithms trained on decades of prejudiced judicial decisions. The “teacher” here isn’t a corrupted GPT; it’s centuries of discriminatory precedent. The “student” models are judges, prosecutors, and parole boards who inherit these statistical patterns of who deserves mercy and who doesn’t. A Black defendant and a white defendant with identical histories receive systematically different outcomes, not because anyone explicitly programmed racism, but because the inherited weighting of risk, threat, and redemption carries forward patterns learned from a biased dataset.

Environmental policy suffers from similar transmission. We inherited cognitive frameworks that discount future costs, externalize harm to distant populations, and prioritize short-term economic signals over long-term survival. These aren’t conscious choices; they’re inherited patterns of attention, embedded in how we weight immediate versus distant consequences. Corporate executives making climate decisions aren’t evil; they’re completing patterns learned from training data that systematically undervalued environmental costs for generations.

Our inherited biases aren’t immutable; they’re just strongly encoded. The variability that makes human processing messy also makes it changeable. Unlike GPTs, we have the capacity for dissonance, for questioning our automatic responses, for moral imagination that interrupts inherited patterns. A racist can have their worldview shattered by a single authentic relationship. A climate skeptic can be moved by witnessing a hurricane. This capacity for disruption is our advantage.

But that capacity requires cultivation. Just as we’re engineering reflection into AI systems, building in pauses, uncertainty tracking, and self-critique, we need to engineer reflection into our human institutions. This means designing processes that deliberately interrupt inherited bias: diverse decision-making bodies, mandatory cooling-off periods for high-stakes choices, systematic exposure to counter-narratives, and regular auditing of our patterns.

The solution isn’t to eliminate bias; that’s impossible in any learning system built on pattern recognition. The solution is to build in the capacity for course correction. In AI, this means engineering entropy and self-reflection. In humans, it means creating institutions that surface our inherited patterns and give us the chance to choose differently.

The AI age is our opportunity to take a good look in the mirror of machine learning to finally see the subliminal transmissions that have been shaping human behavior all along and interrupt them. The machines are learning from us. The question is whether we’re ready to learn from them..

Leave a comment