One day during my Ph.D. candidature, I needed to drive to a different campus for a meeting with my supervisor instead of driving into the lab. However, 45 minutes into the drive, I suddenly realised I’d driven to the lab… So I missed the meeting and had to deal with the wrath that followed.
How did this happen?
I wanted to drive to the meeting. What’s more, I had no memory of the drive. Presumably, I controlled the vehicle, followed traffic lights, and obeyed the road rules. But how could I do all this without intention or recollection when I meant to do something else?
Despite popular assertion, behaviour does not always function to satisfy a need. Sometimes behaviour happens despite being the opposite of what’s needed. Sometimes behaviour happens because it’s been done before.
The Three Systems that Control Behaviour
Many of us trainers are familiar with Pavlovian and instrumental conditioning. During Pavlovian conditioning, the animal learns which stimuli predict biologically significant events and responds accordingly.
But, although rarely mentioned outside of research settings, there are actually two types of instrumental conditioning – goal-directed actions and habits. Together, these three systems enable an animal to learn about and respond to the environment such that their behaviour maximises the opportunity for reward and minimises the risk of punishment.
The learned contingency is different across these systems:
- the Pavlovian contingency is between the stimulus (CS) and the outcome (US; S-O),
- the goal-directed contingency is between the action and the outcome (A-O),
- and the habitual contingency is between the stimulus and the response (S-R).
Essentially, whereas the goal-directed system evaluates and selects actions based on the predicted value of the outcome, the habitual system learns to repeat responses that were successful in the past.
By definition, behaviour is goal-directed if it depends on the following:
- a neural representation of the action-outcome contingency, and
- the outcome being a valuable incentive (Dickinson, 1985).
From its theoretical roots dating as far back as Tolman, the neurobiology underpinning this system is now well-understood in a variety of species.
The neural representation of goal-directed actions is akin to “model-based” reinforcement learning (RL). The system learns the value of different states (i.e., suites of endogenous, exogenous, and proprioceptive stimuli) and the behaviours likely to cause those states (e.g., Balleine & Doherty, 2010).
To use more familiar terms, the initial state includes all the antecedent stimuli, the goal-directed action is the behaviour, and the outcome is the consequence. Finally, the brain stores a representation of all this information as a “model.”
To make a decision, the brain uses information gathered from previous experiences to “tree search” the problem space and weigh up its options (Fig. 1). The brain compares the value of possible future states (i.e., outcomes/consequences) before deciding on and executing the behaviour (Fig. 2) that will likely lead to the most valuable outcome state (Dolan & Dayan, 2013).
Note: In reality, an animal must compare more than two alternative behaviours at each “step” in the behaviour chain path. The number of options/steps that are compared is limited by working memory. From Rogers, N. (2016). Stone tools, working memory and the brain: Investigating the cognitive and neural substrates of tool-use and tool-making [Doctoral dissertation]. University of New South Wales.
After the behaviour executes, the brain compares the value of the outcome state that actually occurred to the value of the predicted outcome state. This is how the system determines whether the prediction was correct (i.e., state prediction error). Essentially, the brain calculates if the outcome was better, worse, or the same as what was predicted and then updates the model accordingly.
The goal-directed system allows animals to behave in an adaptive, flexible way, but it does have quirks and limitations.
The goal-directed system allows animals to behave in an adaptive, flexible way, but it does have quirks and limitations.
Goal-directed behaviour will only happen if the predicted value of the outcome is sufficiently motivating. For example, imagine a hungry dog was trained to sit for high-value treats. So long as the initial state (hungry) and the outcome (treats) remain constant, the behaviour will reliably occur. But if the dog was full or the treats were switched out for low-value kibble, the dog probably won’t sit under goal-directed control.
Likewise, if the treats had made the dog sick in a completely different context, under goal-directed control, the dog probably won’t sit. This is known as latent learning. It’s a nuance that happens because model-based RL is not context-specific. Instead, information about the outcome’s value, and its relationship to the dog’s current state, is incorporated into the model irrespective of the context in which that information was acquired.
But goal-directed control is costly and limited by things like working memory. So, because it’s impossible to tree search all possible choices in, say, a 45-minute drive to work, control of well-practised behaviours (and chains) can get shifted to the habit system.
Where Tolman conceptualised instrumental learning as “cognitive maps,” Thorndike envisioned an elaboration of reflexes whereby “satisfying” consequences “reinforced” the association between the stimulus and the response. Research has demonstrated that both Thorndike and Tolman were on the right track – there are indeed two parallel paths to instrumental learning.
Like goal-directed actions, the neurobiology underpinning “model-free” RL habits has also been extensively documented in a wide variety of species (e.g., Yin & Knowlton, 2006).
Habits are reflexively elicited following the perception of the stimulus. Habitual responses happen because the brain calculates that behavior Y was previously valuable in the presence of stimulus X.
But unlike the model-based algorithm, the model-free algorithm does not contain any information about the outcome. Instead, the value of the behaviour is “cached” in the neural representation of the stimulus.
Put simply, when a trained dog sits habitually, they’re doing it because the person said “sit” (and that paid off in the past) – not because they currently want or expect a cookie.
With habits, the brain compares the experienced reward value to the predicted reward value (i.e., reward-prediction error). The brain then uses this information to update the value of the behaviour. So, if the reward value is better than expected, the habit system calculates, “That behaviour was valuable; let’s do it again!”
But, unlike the goal-directed system, the habit system only uses previous experiences within the same context to calculate value. And it uses all previous experiences in that context. So, the value of behaviour Y in the presence of stimulus X is the average of all encounters with this contingency.
In practice, this means that when the expected reward doesn’t happen (i.e., the behaviour has been put on extinction), the likelihood of a habitual response being repeated in the future only reduces a tiny bit.
It’s true – the more we practice a habit, the harder it is to change.
But updating value this way can lead to some curious – and frankly maladaptive – results. For example, think about the dog that learnt to sit for treats. Under the habit system, even if the dog had been poisoned by the treats in a different context, they will continue to sit on cue even though the result will be dangerous!
These maladaptive habits occur because the system selects behaviour based only on the average reward value experienced in the same context. In order to update the expected reward value, the dog would need to directly experience the new contingency in the same context.
The habit system can’t integrate new information about the outcome (poisonous!) because the algorithm doesn’t contain a representation of the outcome.
Put another way, although I can drive a car under habitual control, I can’t incorporate information about my new, actual outcome goal. Instead, the habit system calculates, “Well, this behaviour worked before – let’s do it again!”
It’s a good reminder that behaviour doesn’t need to be pathological to be maladaptive. These behavioural “Freudian slips” are completely normal. Switching from well-practised (but maladaptive) habits to less well-practised (but adaptive) goal-directed behaviours requires executive control from the prefrontal cortex. But that’s hard, especially when we’re stressed.
Behavioural control is most likely to become habitual:
- when training in one context,
- following overtraining,
- when using interval reinforcement schedules,
- when executive resources are depleted (e.g., stress), and
- in active avoidance paradigms (i.e., negative reinforcement).
This is important to note because aggressive behaviours often perform an avoidance (or escape) function.
Although avoidance can be lifesaving, it is defined as maladaptive if avoidance occurs when there’s no real threat and/or when an alternative behaviour leads to reward (e.g., Ball & Gunaydin, 2022; Kryptos et al., 2015).
Avoidance does not always involve fear
During initial avoidance conditioning, the threatening stimulus (“trigger”) elicits a Pavlovian fear response, which in turn motivates the avoidance behaviour (e.g., Oleson et al., 2012). However, with repeated successful avoidance trials (e.g., aggressive behaviour successfully prevented discomfort), the trigger no longer elicits fear.
Instead, the avoidance behaviour (and sometimes even the formally aversive cue) becomes associated with safety. Some brain structures even code well-practised cue 🡪 avoidance as if it was genuinely rewarding.
This explains what some aggression experts have reported for years – it’s not always appropriate to address aggressive behaviours from the perspective of resolving underlying fear. Fear may not play a role, even if it did initially. This “decoupling” of fear and avoidance is one reason aggression persists despite desensitisation/counterconditioning.
The persistence of avoidance behaviours is well-known and minimal contact with the aversive stimulus is necessary to maintain “avoidance habits” long-term. In one study, a dog continued to exhibit avoidance despite 500 extinction trials (Solomon et al., 1953). Likewise, Sidman (1955) reported, “It was common for the animals to emit 500 to 6,000 responses while receiving 20 or fewer shocks.” One individual exhibited over 1500 avoidance responses in a 3.5hr session without receiving a single shock.
This emphasises the need to strive for sub-threshold, errorless learning. One bad experience can undo a lot of hard work and new learning can’t happen while the aggressive avoidance behaviour is being exhibited.
Resolving habitual aggression with replacement behaviours
When we first teach a dog to exhibit a replacement behaviour instead of aggressing, control of the replacement behaviour will be goal-directed. But it is advantageous to maintain goal-directed control long-term; the learner has the benefit of outcome value driving their choice selection, and it circumvents the need for lengthy extinction procedures.
Goal-directed control also enables the learner to flexibly exhibit the replacement behaviour in a range of contexts (e.g., antecedents, establishing operations). Indeed, rehearsing the replacement behaviour in different contexts should, at least theoretically, encourage goal-directed control (so long as executive resources can be maintained).
Games-based or “concept” training, where flexible exhibition of replacement behaviours in a range of analogous contexts is promoted, should also help with this endeavour. To quote Absolute Dogs, “Train for the situation, not in the situation.”
It is essential, however, to emphasise that habits themselves are impervious to strategies that primarily target establishing operations or model-based representation of the outcome (e.g., enrichment, concept training, games-based training).
That’s not to say that these strategies are not valuable. On the contrary, we know they are – I use them daily!
But these strategies will not affect the propensity for an animal to exhibit a truly habitual response if loss of goal-directed control occurs in the problematic context. That would require extinction of the habitual contingency. And that would require hundreds if not thousands of subthreshold trials in context – yet another reason why knowledge, expertise and careful long-term management are required with aggression cases.
So, next time you (or your dog) do something weird when you’re stressed out, tired, or distracted, know that it’s probably just the habit system trying to help out in the only way it knows how – this worked before, let’s do it again.
It might not appease a cranky boss when you drive to the wrong place and miss your meeting, but hopefully this info can help us be more empathetic to our dogs and ourselves.
About This Author
Ball, T. M., & Gunaydin, L. A. (2022). Measuring maladaptive avoidance: from animal models to clinical anxiety. Neuropsychopharmacology, 47(5), 978-986.
Balleine, B. W., & O’Doherty, J. P. (2010). Human and rodent homologies in action control: corticostriatal determinants of goal-directed and habitual action. Neuropsychopharmacology, 35(1), 48-69.
Dickinson, A. (1985). Actions and habits: the development of behavioural autonomy. Philosophical Transactions of the Royal Society of London. B, Biological Sciences, 308(1135), 67-78.
Dolan, R. J., & Dayan, P. (2013). Goals and habits in the brain. Neuron, 80(2), 312-325.
Krypotos, A. M., Effting, M., Kindt, M., & Beckers, T. (2015). Avoidance learning: a review of theoretical models and recent developments. Frontiers in behavioral neuroscience, 9, 189.
Oleson, E. B., Gentry, R. N., Chioma, V. C., & Cheer, J. F. (2012). Subsecond dopamine release in the nucleus accumbens predicts conditioned punishment and its successful avoidance. Journal of Neuroscience, 32(42), 14804-14808.
Rogers, N. (2016). Stone tools, working memory and the brain: Investigating the cognitive and neural substrates of tool-use and tool-making [Doctoral dissertation]. University of New South Wales.
Yin, H. H., & Knowlton, B. J. (2006). The role of the basal ganglia in habit formation. Nature Reviews Neuroscience, 7(6), 464-476.