Skip to content
All writing

Inner Friction, Cultivation, Desire

Humans are not ‘inference-only, no-training’ models.

Hongkai He 7 min read
  • #cultivation
  • #psychology
  • #ai
  • #reinforcement-learning

Originally published February 3 2026 on 问题儿童与端水大师的日常, my still-active WeChat publication.

From encounter to behavior

Behavior doesn’t begin from reason. It begins from automatic response.

Encounter → mental state → behavior

I. Encounter → mental state: generated, not chosen

When you face a situation (the encounter), the brain’s circuitry first generates, automatically, a set of internal response tendencies:

  • See a snake: recoil, alertness.
  • Face a challenge: excitement / tension / avoidance.
  • See an attractive person: attraction / appreciation / nothing.

These aren’t deliberate “thoughts.” They’re the default, subconscious responses that long experience has trained into you.

More precisely: an encounter doesn’t generate a single mental state. It activates several impulses at once. The existing weights of your neural circuitry settle, for a moment, on whichever one comes out on top — and that’s what you experience as your “current state of mind.”

A few stable features of this process:

  • It’s extremely fast.
  • It costs almost no energy.
  • It’s almost entirely outside immediate willful control.

II. Mental state → behavior: the high-cost intervention of consciousness

Behavior is not the direct output of mental state.

The same mental state, under different conditions, can lead to very different behavior:

  • Fear → flee / hold the line.
  • Exhaustion → sleep / keep working through the night.
  • Desire → engage / hold back.

Only at this stage does consciousness step in, using rules, values, social constraints, and role-obligations to delay, modify, or suppress behavior.

But: consciousness isn’t a sovereign judge. It’s a high-cost intervention module. It rarely decides behavior from scratch. Mostly it puts the brakes on, or steers, an impulse that’s already there.

III. The nature of inner friction: repeated intervention without changing the underlying tendency

When behavior chronically violates mental state, you feel suppressed and uneasy.

But what causes inner friction isn’t a single mismatch. It’s:

Consciousness intervening repeatedly, without ever changing the subconscious tendencies that generate the impulse.

A few stable consequences:

  • Willpower depletes faster and faster (high token consumption).
  • The longer the suppression, the stronger the rebound.
  • Under high stress or extreme triggers, the system fails (jailbreak).

This is not a problem of personal character. It’s a result of system structure.

IV. Three paths, three layers

1. Behavior-layer control: rules and self-discipline

Constrain behavior directly via explicit rules.

Examples:

  • You don’t want to work out, but force yourself through it on discipline.
  • You feel like indulging, but morality or duty pins you down.

Properties:

  • Effective short-term.
  • High energy cost.
  • Not sustainable long-term.

What this layer solves is behavioral compliance, not system consistency.

Confucian rites and role-obligations, modern law and its enforcement, corporate policies — all of these are forms of order-keeping at this layer.

2. Change the encounter: environment design

Reduce the inputs that produce conflict in the first place.

Examples:

  • Don’t keep snacks at home — instead of “resisting” them every day.
  • Avoid high-risk social situations — instead of testing your willpower against them.
  • Mencius’s mother chose her neighborhood carefully. The wise don’t stand by a leaning wall.

By managing the input distribution, you sharply reduce the probability of a wrong output without changing internal structure — or reduce the probability that mental state and behavior collide in the first place.

This path is deeply aligned with human nature, and consistently underrated.

3. Change the mental state: rewire the circuitry

Use long-run feedback to change the subjective label that the same behavior produces.

Examples:

  • Exercise turns from “painful” into “satisfying and accomplished.”
  • Studying turns from “forced” into “curious and absorbed.”
  • Faced with temptation, the strong grasping impulse no longer auto-fires.

When the internal reward structure changes, the impulses that get auto-generated align with the encouraged direction. Conscious control and intervention naturally taper off — what Confucius called following what the heart desires without crossing the line.

Daoist and Buddhist traditions emphasize facing the heart-mind directly. Don’t fight the impulse head-on; ask: why does this impulse arise?

Through awareness, non-attachment, repeated exposure without reinforcement, and reshaping the reward structure, the circuitry itself changes, slowly.

V. A human is a system being trained during continuous inference

If you use a large language model as the analogy, the mechanism becomes precise.

In an LLM:

  • The model’s weights determine the first-response distribution to any input.
  • Prompts / guardrails are only temporary constraints applied at inference time.

Mapped onto humans:

  • Mental state corresponds to the automatic intermediate output of the existing weights, given the current encounter.
  • Rules and rational control are more like external corrections at inference time.

The key:

Humans are not “inference-only, no-training” models.

A human is more like a continuous reinforcement-learning system that, during its ongoing inference, is being micro-adjusted by feedback signals — without pause.

Every encounter input, every behavior output, every subjective experience becomes data for the next round of learning.

So:

  • Self-discipline only briefly affects output.
  • Repeated exposure to a given encounter, plus the feedback that follows the behavior, is what changes the system’s weights.

Which is why long-term behavioral change can’t be done with “gritted teeth,” “thinking it through,” or “controlling yourself.”

VI. The real role of self-discipline: starter, not training mechanism

In this model, self-discipline should be strictly scoped.

What self-discipline is for:

  • Bootstrapping behavior temporarily, in the early phase.
  • Generating new experience samples for the system.

But it is not an effective training mechanism on its own.

If:

  • Behavior is being executed under force.
  • The subjective experience after the behavior is consistently painful, draining, or meaningless,

then the reward signal the system receives is still negative. Weights don’t update toward the goal — and may even update against it (aversion, avoidance).

So the value of self-discipline isn’t in “persisting itself.” Its value is whether:

It creates the conditions for positive feedback and weight-updates downstream.

Once you find that something can only be sustained by willpower, long-term, that’s a sign the system has not actually been trained.

VII. The effective path: change the encounter, change the feedback, wait for the weights

If you treat a person as a system that’s continuously learning, the human-aligned strategy becomes obvious.

1. Change the encounter (it’s both input and training data)

The encounter doesn’t only determine the present response. It determines what data distribution the system receives, long-term.

  • Reduce environments that are high-temptation, high-noise, low-reward.
  • Increase situations that are low-distraction, cumulative, with clear feedback.
  • Actively seek out encounters that are most effective for the current learning goal.

This isn’t avoidance. It’s controlling the quality and structure of your training data.

2. Change the feedback (reward engineering)

What actually updates the system’s weights isn’t the behavior itself — it’s the feedback that follows.

  • If correct behavior only ever pairs with pain, weights don’t update positively.
  • If even a faint positive signal can be made to persist or be amplified, the system starts to converge.

What’s called cultivation (修心), at the behavioral level, is exactly this: shifting the feedback for the behaviors you want, from negative to positive — or at least to neutral.

3. Wait for the weights to change. Don’t demand immediate consistency.

Weight-level change is necessarily slow.

In the transitional phase:

  • It’s normal that mental state and behavior don’t fully align yet.
  • Short-term turbulence doesn’t mean the direction is wrong.

There’s only one real signal that the work is done:

When the correct behavior no longer depends on control. When not doing it actually feels worse.

That’s when the system has truly been trained.

Train the mind, and the mind aligns

Real growth that moves with human nature isn’t using consciousness to suppress impulse over the long term.

It’s selecting your encounters, designing your feedback, accumulating time —

so that the impulses themselves (the subconscious) come to output behavior that’s viable and sustainable in the world.

This isn’t the triumph of self-discipline.

It’s the natural result of a system that has finished training.