Perception (I don't know what you hear, do
you hear the same thing as me?) I may
hear the same souonds waves but I will
process them differently because
perception is in part formed by our own
opinions, prejudices, training, past
experiences etc.
Bottom-Up (Primitive Image Schemata) -
a pre-attentive partitioning process based
on Gestalt principles [REF: Bregman, 1990].
Primitive processes, the subject of most ASA research, rely on cues provided by the acoustic structure
of the sensory input. These processes are thought to be innate and are found in non-human animals
(Wisniewski and Hulse, 1997). They have been shown to be present in the perception of speech
(Darwin and Carlyon, 1995) and of music (Bregman, 1990, Ch 4 for music and Ch 5 for speech). The
primitive processes take advantage of regularities in how sounds are produced in virtually all natural
environments (e.g., unrelated sounds rarely start at precisely the same time).
Darwin, C. J., & Carlyon, R. P. (1995). Auditory Grouping. In B. C. J. Moore (Ed.), Handbook of perception
and cognition: Hearing. (2nd ed., pp. 387–424). London, UK: Academic Press.
Wisniewski, A. B., & Hulse, S. H. (1997). Auditory scene analysis in European starlings (Sturnus
vulgaris): Discrimination of song segments, their segregation from multiple and reversed conspecific
songs, and evidence for conspecific song categorization. Journal of Comparative Psychology, 111(4),
337–350.
Top-Down (Learned through training, experience etc.) - a
schema driven process that uses prior knowledge to extract
meaning from the acoustic representation.
Top-down processes, on the other hand, are those involving conscious attention, or that are based on
past experience with certain classes of sounds – for example the processes employed by a listener in
singling out one melody in mixture of two (Dowling, Lung, and Herbold, 1987).
Dowling, W. J., Lung, K. M.-T., & Herrbold, S. (1987). Aiming attention in pitch and time in the
perception of interleaved melodies. Perception & Psychophysics, 41(6), 642–656.
https://doi.org/10.3758/BF03210496
Shamma, S. A., Elhilali, M., & Micheyl, C. (2011).
Temporal Coherence and Attention in
Auditory Scene Analysis. Trends in
Neurosciences, 34(3), 114–123.
https://doi.org/10.1016/j.tins.2010.11.002
Human Auditory System
Physiology (The human ear and auditory system)
Physcological processing of sound waves picked up by the ear.
Streaming (Audio Objects)
Sequential Streaming (connects sense data over time)
Differences in pitch
Differences in timbre
Differences in spatial location
Differences in fundamental frequency
Transitions
Gradual
transitions tend to
group together
Abrupt
transitions tend
to seperate
Cumulative Effects (hearing out different streams over time)
Simultaneous Streaming (selects from data arriving at the same time)
'Periodic' sounds - many
sounds are made up of
frequencies which are integer
multiples of a common
fundamental
Human Voice
Animal Calls
Many musical instruments
If a mixture contains two or more sets of
frequencies related to different
fundamentals they tend to be seperated.
Stream Segregation Affected by
Speed of sequence
Frequency separation of sounds
Pitch separation of sounds
Spatial location of sounds
Other factors...
In a typical listening situation, different acoustic sources are active at the same time. Therefore, only
the sum of their spectra will reach the listener’s ear. For individual sound patterns to be recognized –
such as those arriving from the human voice in a mixture – the incoming auditory information has to
be partitioned, and the correct subset allocated to individual sounds, so that an accurate description
may be formed for each. This process of grouping and segregating sensory data into separate mental
representations, called auditory streams, has been named "auditory scene analysis" (ASA) by
Bregman (1990).
Bregman, A. S. (2004). Auditory Scene
Analysis. In International Encyclopedia of
the Social and Behavioral Sciences.
Amsterdam: Pergamon (Elsevier). Retrieved
from
http://webpages.mcgill.ca/staff/Group2/abregm1/web/pdf/2004_
Encyclopedia-Soc-Behav-Sci.pdf
Bregman, A. S.
(1990). Auditory
Scene Analysis: The
perceptual
organisation of
sound. Cambridge,
MA: The MIT Press.
There are three processes occuring in the human listener that serve to decompose auditory
mixtures: 1) The activation of learned schemas in a purely automatic way. You imagine you hear your
name spoken - some chance happening of sounds can activate the mental schema that represent's
the sound of your name. 2) The use of schemas in a voluntary way - when you are trying to hear your
name being called by an announcer in a busy office. 3) 'Primitive Auditory Scene Analysis' (as per
Bregman, 1990), the use of general acoustic regularities to decompose an auditory scene into its
constituent parts.
Regularity 1: Unrelated sounds seldom start or stop at exactly the same time.
Regularity 2: Gradualness of change. A single sound tends to change its
properties smoothly and slowly. A sequence of sounds from the same source
tends to change its properties slowly.
Regularity 3: When a body vibrates with a repetitive period, its vibrations give
rise to an acoustic pattern in which the frequency components are multiples of
a common fundamental.
Regularity 4: Many changes that take place in an acoustic event will affect all the
components of the resulting sound in the same way and at the same time.
Bregman, A. (1993). Auditory Scene
Analysis: Hearing in Complex
Environments. In S. McAdams & E.
Bigand (Eds.), Thinking in Sound:
The Cognitive Psychology of
Human Audition (pp. 10–36).
Oxford, UK: Oxford University
Press.
"...it appears that sounds are processed
primarily as meaningful events and where
source identification fails sounds are
processed according to physical or low level
perceptual parameters."
Woodcock, J., Davies, W. J., Cox, T. J.,
Member, A., & Melchior, F. (2016).
Categorization of Broadcast Audio Objects
in Complex Auditory Scenes. Journal of the
Audio Engineering Society, 64
"...the evaluation of sounds produced by
non-living objects is biased towards low
level acoustic features whereas the
processing of sounds produced by liv- ing
creatures is biased toward sound
independent semantic information."
Giordano, B. L., McDonnell, J., &
McAdams, S. (2010). Hearing living
symbols and nonliving icons: Category
specificities in the cognitive processing
of environmental sounds. Brain and
Cognition, 73(1), 7–19.
https://doi.org/10.1016/j.bandc.2010.01.005
"...object category specific temporal
activations relating to non-living action/tool
sounds, animal vocalizations, and human
vocalizations have been observed"
Lewis, J. W., Brefczynski, J. A., Phinney, R.
E., Janik, J. J., & Deyoe, E. A. (2005).
Distinct Cortical Pathways for Processing
Tool versus Animal Sounds. Journal of
Neuroscience, 25(21), 5148–5158.
https://doi.org/10.1523/JNEUROSCI.0419-05.2005
Cognitive categories for environmental sounds
have been explored by Gygi et al. [18] who found
three distinct clus- terings of sounds that related
to harmonic sounds, discrete impact sounds, and
continuous sounds.
Gygi, B., Kidd, G. R., & Watson, C. S.
(2007). Similarity and Categorization of
Environmental Sounds. Perception &
Psychophysics, 69(6), 839–855.
Categorising environmental sounds into two
broad areas - relating to the presence and
absence of human activity.
Guastavino, C. (2007). Categorization of Environmental
Sounds. Canadian Journal of Experimental Psychology,
61(1), 54–63. https://doi.org/10.1007/s00422-009-0299-4
Summary of soundscape categorisation - pgs 30 - 35
Payne, S. R., Davies, W. J., & Adams, M. (2009). Research into the
Practical and Policy Applications of Soundscape Concepts and
Techniques in Urban Areas (NANR 200). Department for
Environment, Food and Rural Affairs; London. Retrieved from
http://usir.salford.ac.uk/27343/
"It is important to note, however, that the cognitive categorization
framework is contingent; this means that the categorization
framework may change depending upon factors such as location
and soundscape."
Davies, W. J., Adams, M. D., Bruce, N. S., Cain, R.,
Carlyle, A., Cusack, P., … Poxon, J. (2013). Perception
of Soundscapes: An Interdisciplinary Approach.
Applied Acoustics, 74(2), 224–231.
https://doi.org/10.1016/j.apacoust.2012.05.010
Categorisation of complex auditory scenes
Rummukainen, O., Radun, J., Virtanen, T., Pulkki, V., &
Murray, M. M. (2014). Categorization of Natural
Dynamic Audiovisual Scenes. PLoS ONE, 9(5).
https://doi.org/10.1371/