but this is not how motor learning works: b/c the brain doesn't know how it produced a certain type of
neural activity.
Procedural Learning
reinforcement mechanisms.
required for motor learning
Most of the neural activity that is produced by our brain trying to influence behaviour is thought to be blocked or b/c the neural activity is not strong enough to
get through all the filters in the brain.
Anytime there is movement, some neural activity was successful in getting through the action
selection filter.
every moment, the brain is assessing if things are getting better or worse for the animal.
this decision will result in increases or decreases in a reinforcement signal in the brain, which acts to
strengthen or weaken recent neural activity, such that it becomes more or less likely that that neural
activity will make it past the action selection filter next time a similar context arises
Most movement-related neural activity is probably randomly generated
Predicted Error Signal:
signifies: unexpected change in your current
signals brain that credit or blame should be appointed.
credit or blame should be used to adjust your value estimates of things (cost-benefit)
your probability of repeating that behaviour in the future.
Predicted Error Signal
Formula:
Predicted Error Signal = actual – expected value of current situation
Fluctuations in the reinforcement signal
make it more or less likely that your brain will generate the same neural activity in similar situation
we learn through multiple trial and error
i.e. how to move our muscles
i.e. CPP
unclear if this is SS or PL
Process
Step 1: animals envision what movement they want to make
Step 2: animals reinforce any neural activity that gets them closer to achieving that goal
you don’t need to have a goal for reinforcement learning to improve your life Any randomly generated
neural activity that seems to improve your situation in life in any manner, that neural activity would
be reinforced
Where in the brain do actions get
reinforced?
Dopaminergic Projections of the Rat Brain
The Dopamine System
Dopamine as a Prediction Error Signal
D. released
estimation of the current moment is better than you anticipated it to be
D. withheld
estimation of the current moment is worse than you anticipated it to be
Dopamine signalling will strengthen recently active glutamatergic synapses
motor commands G.S. encode become more likely to win
As your expectations grow, the dopamine system becomes more and more selective
very few dopamine neurons
large
sends many projections
homogenous group of neurons; all fire at the same time
excluding PFC projecting neurones
unmyelinated & can’t fire v. fast (0-40 Hz)
dopamine
Cleared from extracellular space 100 times slower ( compared to glutamate or GABA )
All motor commands sent to the NAc (striatum)
this is the input nucleus of the basal ganglia
information is encoded in excitatory, glutamatergic inputs from the
cortex
striatum also receives dopamine
Glutamate inputs
what gets reinforced
(actual information, movements, decision = motor commands)
Dopamine inputs
reinforcement signals
Three-Factor Rule
DEF
neurons that fire together become eligible for dopamine-induced synaptic plasticity.
changes in dopamine levels
if dopamine is at baseline (there is no learning that occurred)
strengthen or weaken recently active glutamatergic synapses in the
striatum.
strength of glutamatergic inputs in the striatum can change when the synapses experience:
Postsynaptic activity
Pre-synaptic activity
Abrupt changes in local dopamine receptor activity
Dopamine Neurons Encode a Prediction Error
Full Experiment
guaranteed reward: 100% percent predictive by stimulus, dopamine neurones will fire when stimulus
is present, but won’t fire when the reward is presente
unexpected reward: produces the same amount of dopamine neurone activation, as the amount of
firing that occurs when they receive a stimulus that tells them they are guaranteed to get a reward.
dopamine neurone activity is reduced (response is muted)
dopamine response occurs when the animal first realizes the session will start
Test: Electrophysiological recordings of dopamine neurons in a monkey when it unexpectedly receives
food.
raster plot graph
each row is a 2 second long interval
Halfway into each trial, the monkey unexpectedly gets a juice reward.
Each dopamine neuron fires about 5-10 times across each 1 second long interval
first trial = top row, last trial = bottom row
Black dot = when a midbrain dopamine neuron fired an action potential
summary histogram
summary of how many dopamine neurons fire at any point in time, but the data is summed across 50 trials
'collapsed raster plots'
50% prediction = you get half amount of dopamine activity that you would if it was a 100% unexpected reward
will fire to stimulus and again when they receive reward: these two events sums up the amount of
dopamine activity that occurs when dopamine is guaranteed or unexpected
guaranteed reward
firing of the dopamine neurones won’t change
The amount that dopamine neurones fire upon stimulus presentation depends on how well the
stimulus predicts reward
25% prediction
if the animal can sense that there is a 25% chance it will receive a reward, the value to the animal is
25% of the kool aid, and then the remaining 75% value of the kool aid is delivery once it receives the
reward.
Expectation of Reward Affect Learning
Demonstration from Blocking
i.e. tone, light sugar water
Expectation of Reward Affect Dopamine neurone
Activity
when a reward-predictive stimulus is only presented in conjunction with another stimulus, that has
already been learned to be fully predictive of reward, then learning to the second stimulus is blocked
2nd stimulus is a redundant cue
How does dopamine discriminate between cues when it reinforces neural activity?
Second Order
Conditioning
15th dopamine
only fire to the cue, not the reward
5th dopamine
only fire a little to the reward
you can see the dopamine signalling moves back further and further to earliest predictor, to the first time the animal
has changes in expectations, the first time the animal can predict presentation of a reward
people think different types of learning have diff. time window involved
Pleasure Versus Prediction Error
Dopamine has Two Distinct Functions
Phasic versus tonic dopamine
signalling
phasic dopamine doesn’t correlate well with perceived pleasure
b/c pleasure occurs in a situation dependent manner
dopamine, is a teaching signal that notifies the brain as soon as your anticipated value changes.
timing and amount of pleasure of rewarding events was largely anticipated, phasic dopamine
signals will not change during the event itself and little to no learning will occur.
test: if we opto-genetically stimulate dopamine neurones while they are drinking the reward
(sugar water) this should cause animals to learn about the light stimulus
stimulated dopamine neurones to make the animal think that something better than expected
happened
similar to drugs
Messing with Tonic Dopamine
Signalling
artificially inc. dopamine receptor
seem to be more engaged with their environment
are more willing to take risks and do hard things to get rewards
seem more motivated
if you artificially dec. dopamine receptor
seem less motivated
seem to be less engaged with their environment
seem less motivated
if you lose all dopamine receptor activity (severe Parkinson’s) you can’t initiate purposeful
movement.
How does dopamine discriminate between cues when it reinforces neural
activity?
dopamine neurones are needed to learn about antecedent
cues
when the stimulus is presented, dopamine neurones fire, that is the moment when the value of
the world changes
However, after this moment, there was no change in expectation, therefore, no change in
learning needs to occur
Negative Prediction Error
current situation is worse than anticipated.
Dopamine neurons abruptly stop firing when an expected reward is not received
dopamine firing b/c they withhold the same amount of dopamine that was released previously
amount released previously to predictor was the same amount as what they believe the value of the reward be
Dopamine has 2 distinct forms
phasic dopamine signals regulate reinforcement learning by encoding a feedback signal ( a
prediction error signal)
Abrupt changes in Phasic dopamine signals drive learning depends on how much base line activity is there
a) per neurone
c) ‘amount of phasic activity’
b) across all neurones
all three can causes changes in resting rate of dopamine
“high gain setting”
high background levels of dopamine
in extracellularl striatum
easy to get actions through the filers, willingness to exert effort
gain setting determines " how excitable neurones are to glutamate inputs"
a) how excitable neurons are within the striatum
b) how responsive they are to glutamate inputs
low gain setting
unmotivated to do anything
learning= phasic
tonic= motivational
general dopamine neurons fire at ~4Hz
slight changes in a) b) c) can cause large changes in the resting (tonic) amount of extracellular
dopamine levels.
speed of this tonic activity
number of dopamine neurones that are participating (firing)
amount of phasic activity
influence the ‘gain’ settings in the system
willingness to exert effort: correspond to the animals estimate of overall value of current situation
tonic, baseline dopamine levels regulate motivational state
pre synaptic activity” glutamate input into the basal ganglia
when that occurs there is a potential to change the strength of that synaptic reaction, strength of
synapse dec. when dopamine signals dec. (and vice versa)
dopamine neurones usually fire around 4 Hz,
firing rate corresponds to animals motivation and effort)
dopamine is cleared from the extracellular space slowly
there will be more dopamine in the system if a neurones ifs firing at 6Hz than if neurone is firing
at 2 Hz