Undergraduate BMS236 Building Nervous Systems (Basal ganglia/Pete redgrave lectures) Mapa Mental sobre Basal ganglia reinforcement, criado por Kristi Brogden em 04-08-2014.
“Any act which in a given situation produces
satisfaction becomes associated with that
situation so that when the situation recurs the
act is more likely than before to recur also"
In the basal ganglia
Selective disinhibition in the
parallel looped architecture
component of basal ganglia =
selection mechanism
Reinforcement learning are processes
which bias future selections
Processes of reinforcement likely to
operate within a selection machine
Phasic dopamine widely acknowledged
can provide a reinforcement signal
Short latency (70-100ms)
Short duration (~ 100ms) burst of impulses
Elicited by biologically salient stimuli
Defining characteristics of phasic dopamine signals
Fast and short
Mono-phasic
Bi-phasic
Post-gaze shift
Insight
Sensory-evoked phasic
DA responses seem to
operate like a time-stamp
What are the signals in DA target regions at the time
of the DA time-stamp ?
…. these are the signals the timed dopamine input will be interacting with
Reward prediction errors
Phasic DA signals similar to reward
prediction error term in the temporal
difference (TD) reinforcement learning
algorithm (Barto, Montague, Dayan)
Reward prediction errors =
unexpected sensory events that are
‘better’ or ‘worse’ than predicted
Reward prediction errors reinforce the
selection of actions that will maximise
the future acquisition of reward
Action discovery problem
Actions are multi-dimensional
Where must
the action
take place?
When must
the action
take place?
What exactly
must be done
to what?
How fast
and with
what force?
How are critical
parameters of different
dimensions discovered?
Development of novel actions
Trial and error repetition
DA makes agent "want" to
repeat/reselect preceding
movements in preceding contexts
Variation/exploration
not all contextual/behavioural components in each iteration
Mechanism
LTP in the prescence of phasic DA
LTD in the abscence of phasic DA
provides reinforcement required
for system to converge on
critical causative components
How do we test if its true?
A behavioural paradigm to investigate
different aspects of action discovery
1) Mechanisms of reinforcement
2) Convergence on critical
parameters of the critical
3WH dimensions
Ideal task requirements
1) Must be able to discriminate
learning of WHERE, WHEN,
WHAT and HOW dimensions
2) Difficulty should be
continuously variable
3) Repeated measures
4) Same task used to investigate
comparative competences of a
range of subjects – rodent,
monkey, man and robot
5) Should be simple, practical and
efficient – different versions to suit
experimental context