Auditory Scene Analysis

Perception (I don't know what you hear, do you hear the same thing as me?) I may hear the same souonds waves but I will process them differently because perception is in part formed by our own opinions, prejudices, training, past experiences etc.
1. Bottom-Up (Primitive Image Schemata) - a pre-attentive partitioning process based on Gestalt principles [REF: Bregman, 1990].
  1. Primitive processes, the subject of most ASA research, rely on cues provided by the acoustic structure of the sensory input. These processes are thought to be innate and are found in non-human animals (Wisniewski and Hulse, 1997). They have been shown to be present in the perception of speech (Darwin and Carlyon, 1995) and of music (Bregman, 1990, Ch 4 for music and Ch 5 for speech). The primitive processes take advantage of regularities in how sounds are produced in virtually all natural environments (e.g., unrelated sounds rarely start at precisely the same time).
    1. Darwin, C. J., & Carlyon, R. P. (1995). Auditory Grouping. In B. C. J. Moore (Ed.), Handbook of perception and cognition: Hearing. (2nd ed., pp. 387–424). London, UK: Academic Press.
    2. Wisniewski, A. B., & Hulse, S. H. (1997). Auditory scene analysis in European starlings (Sturnus vulgaris): Discrimination of song segments, their segregation from multiple and reversed conspecific songs, and evidence for conspecific song categorization. Journal of Comparative Psychology, 111(4), 337–350.
2. Top-Down (Learned through training, experience etc.) - a schema driven process that uses prior knowledge to extract meaning from the acoustic representation.
  1. Top-down processes, on the other hand, are those involving conscious attention, or that are based on past experience with certain classes of sounds – for example the processes employed by a listener in singling out one melody in mixture of two (Dowling, Lung, and Herbold, 1987).
    1. Dowling, W. J., Lung, K. M.-T., & Herrbold, S. (1987). Aiming attention in pitch and time in the perception of interleaved melodies. Perception & Psychophysics, 41(6), 642–656. https://doi.org/10.3758/BF03210496
  2. Shamma, S. A., Elhilali, M., & Micheyl, C. (2011). Temporal Coherence and Attention in Auditory Scene Analysis. Trends in Neurosciences, 34(3), 114–123. https://doi.org/10.1016/j.tins.2010.11.002
Human Auditory System
1. Physiology (The human ear and auditory system)
2. Physcological processing of sound waves picked up by the ear.
  1. Streaming (Audio Objects)
    1. Sequential Streaming (connects sense data over time)
      1. Differences in pitch
      2. Differences in timbre
      3. Differences in spatial location
      4. Differences in fundamental frequency
      5. Transitions
        Gradual transitions tend to group together
        Abrupt transitions tend to seperate
      6. Cumulative Effects (hearing out different streams over time)
    2. Simultaneous Streaming (selects from data arriving at the same time)
      1. 'Periodic' sounds - many sounds are made up of frequencies which are integer multiples of a common fundamental
        Human Voice
        Animal Calls
        Many musical instruments
        If a mixture contains two or more sets of frequencies related to different fundamentals they tend to be seperated.
    3. Stream Segregation Affected by
      1. Speed of sequence
      2. Frequency separation of sounds
      3. Pitch separation of sounds
      4. Spatial location of sounds
      5. Other factors...
In a typical listening situation, different acoustic sources are active at the same time. Therefore, only the sum of their spectra will reach the listener’s ear. For individual sound patterns to be recognized – such as those arriving from the human voice in a mixture – the incoming auditory information has to be partitioned, and the correct subset allocated to individual sounds, so that an accurate description may be formed for each. This process of grouping and segregating sensory data into separate mental representations, called auditory streams, has been named "auditory scene analysis" (ASA) by Bregman (1990).
1. Bregman, A. S. (2004). Auditory Scene Analysis. In International Encyclopedia of the Social and Behavioral Sciences. Amsterdam: Pergamon (Elsevier). Retrieved from http://webpages.mcgill.ca/staff/Group2/abregm1/web/pdf/2004_ Encyclopedia-Soc-Behav-Sci.pdf
2. Bregman, A. S. (1990). Auditory Scene Analysis: The perceptual organisation of sound. Cambridge, MA: The MIT Press.
3. There are three processes occuring in the human listener that serve to decompose auditory mixtures: 1) The activation of learned schemas in a purely automatic way. You imagine you hear your name spoken - some chance happening of sounds can activate the mental schema that represent's the sound of your name. 2) The use of schemas in a voluntary way - when you are trying to hear your name being called by an announcer in a busy office. 3) 'Primitive Auditory Scene Analysis' (as per Bregman, 1990), the use of general acoustic regularities to decompose an auditory scene into its constituent parts.
  1. Regularity 1: Unrelated sounds seldom start or stop at exactly the same time.
  2. Regularity 2: Gradualness of change. A single sound tends to change its properties smoothly and slowly. A sequence of sounds from the same source tends to change its properties slowly.
  3. Regularity 3: When a body vibrates with a repetitive period, its vibrations give rise to an acoustic pattern in which the frequency components are multiples of a common fundamental.
  4. Regularity 4: Many changes that take place in an acoustic event will affect all the components of the resulting sound in the same way and at the same time.
  5. Bregman, A. (1993). Auditory Scene Analysis: Hearing in Complex Environments. In S. McAdams & E. Bigand (Eds.), Thinking in Sound: The Cognitive Psychology of Human Audition (pp. 10–36). Oxford, UK: Oxford University Press.
"...it appears that sounds are processed primarily as meaningful events and where source identification fails sounds are processed according to physical or low level perceptual parameters."
1. Woodcock, J., Davies, W. J., Cox, T. J., Member, A., & Melchior, F. (2016). Categorization of Broadcast Audio Objects in Complex Auditory Scenes. Journal of the Audio Engineering Society, 64
2. "...the evaluation of sounds produced by non-living objects is biased towards low level acoustic features whereas the processing of sounds produced by liv- ing creatures is biased toward sound independent semantic information."
  1. Giordano, B. L., McDonnell, J., & McAdams, S. (2010). Hearing living symbols and nonliving icons: Category specificities in the cognitive processing of environmental sounds. Brain and Cognition, 73(1), 7–19. https://doi.org/10.1016/j.bandc.2010.01.005
  2. "...object category specific temporal activations relating to non-living action/tool sounds, animal vocalizations, and human vocalizations have been observed"
    1. Lewis, J. W., Brefczynski, J. A., Phinney, R. E., Janik, J. J., & Deyoe, E. A. (2005). Distinct Cortical Pathways for Processing Tool versus Animal Sounds. Journal of Neuroscience, 25(21), 5148–5158. https://doi.org/10.1523/JNEUROSCI.0419-05.2005
    2. Cognitive categories for environmental sounds have been explored by Gygi et al. [18] who found three distinct clus- terings of sounds that related to harmonic sounds, discrete impact sounds, and continuous sounds.
      1. Gygi, B., Kidd, G. R., & Watson, C. S. (2007). Similarity and Categorization of Environmental Sounds. Perception & Psychophysics, 69(6), 839–855.
      2. Categorising environmental sounds into two broad areas - relating to the presence and absence of human activity.
        Guastavino, C. (2007). Categorization of Environmental Sounds. Canadian Journal of Experimental Psychology, 61(1), 54–63. https://doi.org/10.1007/s00422-009-0299-4
        Summary of soundscape categorisation - pgs 30 - 35
        Payne, S. R., Davies, W. J., & Adams, M. (2009). Research into the Practical and Policy Applications of Soundscape Concepts and Techniques in Urban Areas (NANR 200). Department for Environment, Food and Rural Affairs; London. Retrieved from http://usir.salford.ac.uk/27343/
        "It is important to note, however, that the cognitive categorization framework is contingent; this means that the categorization framework may change depending upon factors such as location and soundscape."
        Davies, W. J., Adams, M. D., Bruce, N. S., Cain, R., Carlyle, A., Cusack, P., … Poxon, J. (2013). Perception of Soundscapes: An Interdisciplinary Approach. Applied Acoustics, 74(2), 224–231. https://doi.org/10.1016/j.apacoust.2012.05.010
        Categorisation of complex auditory scenes
        Rummukainen, O., Radun, J., Virtanen, T., Pulkki, V., & Murray, M. M. (2014). Categorization of Natural Dynamic Audiovisual Scenes. PLoS ONE, 9(5). https://doi.org/10.1371/

Next up

Auditory Scene Analysis

Description

Resource summary

Similar

	Created by William Coleman almost 8 years ago