Imagine you are quietly absorbed reading the latest issue of Current Biology when you hear someone screaming in the distance: there is such shrillness in the sound that it acts as a magnet on your attention (you turn your head towards the sound) and a whip to your autonomic system (your heart and breathing accelerate). According to a new study by Arnal et al. , the chances are that this scream contains high energy between 50 and 200 Hz of temporal modulation rate. The new work also shows that, if energy in this band is added to a neutral sound, it is also perceived as more alarming, and that such sounds are responded to more rapidly and with greater spatial accuracy.
The analysis of sounds according to their modulation power spectrum was introduced into auditory neuroscience
by analogy to vision science, where sinusoidal gratings have long been used to decompose an image . Using this approach, several animal and human studies have been able to map the spectro-temporal receptive fields of auditory neurons or regions [3–5], showing that this is a good way to characterize the functional properties of the auditory system. The sound modulation spectrum is based on a decomposition of a weighted sum of ‘dynamic ripple sounds’ — grating-like sounds each with specific rate of variation along time (temporal modulation rate, measured in Hertz) and frequency (spectral modulation rate, measured in cycles per octave). The modulation spectrum is a two-dimensional representation of the relative amount of energy at each combination of spectral and temporal modulation rate; temporal modulations are on the x-axis (symmetrical about 0, with up-glides on one side, and down-glides on the other); spectral modulations are on the y axis; and relative energy is shown in colorscale.
A previously terra incognita on the modulation spectrum map, two vertical bands spanning 50–200 Hz non-overlapping with speech or other natural sounds may, according to Arnal et al. , act as an acoustic niche for alarming, arousing sounds. Such sounds can be described as having auditory ‘roughness’, a term also related to dissonance. Because it is otherwise unused, this particular cue would be a good candidate to be exploited for signalling something really important as it is acoustically well separable from background noise. Screams of fear happen to have high energy in this particular range of temporal modulation, but it turns out that other artificial sounds, such as alarms or dissonant music, also have exploited this cue, as Arnal et al.  demonstrate that such sounds are perceived as more fearful. Engineers who design alarms have, probably by intuition or trial
and error, determined that the presence of this particular type of modulation makes sounds really annoying and arousing, a very good way to get people’s attention — it wouldn’t be really effective to use tender lullabies to signal that there is a fire and that life is in danger.
The neural relevance of this acoustical cue is elegantly confirmed by Arnal et al.  using fMRI and a reverse-correlation analysis. They reconstructed, for a given brain location, the ‘optimal modulation spectrum’: sum of the individual modulation spectra of each sound weighted by the amount of activation the sound produced at that particular location. A clear dissociation is observed: while the optimal modulation spectrum for primary auditory cortex shows greatest sensitivity to the low temporal modulations present in most environmental and speech sounds, consistent with its involvement in processing all classes of sounds, that for the amygdala — a structure well known for its role in emotional reactions, although not necessarily negative ones  — showed a much enhanced sensitivity to temporal modulation rates in the 50–200 Hz range, clearly visible as two vertical stripes on the optimal modulation spectrum.
Although Arnal et al.  show that dissonant tone combinations contain roughness and are perceived as more alarming than consonant combinations, the situation for music is perhaps more subtle than this analysis suggests. The role of roughness and/or dissonance may depend a great deal on context or cultural variables. In some contexts, musical roughness may be highly desirable: the angst and anger conveyed by heavy metal music would hardly be effective without its characteristic loud grating guitars, while the mad scene from Lucia di Lammermoor would not be as gripping as it is if the soprano, suffering a mental breakdown after killing her husband, had no stridency in her voice. Thus, music may exploit the mechanism signalling alarm but in a controlled and artistically meaningful way; this observation emphasizes the idea that humans do not necessarily respond to screams with a fixed-action pattern behavior, but rather that top-down modulation likely plays an important role.
In terms of the neural correlates of dissonance, the situation is also somewhat complex. Neuroimaging experiments using dissonant music have not necessarily shown recruitment OF amygdala , whereas joyful, pleasant music, conversely, does recruit it . More generally, these findings suggest that there is a more complex relationship between perceived roughness, experienced fear, and the role of the amygdala. Kumar et al.  showed that the amygdala does respond to the valence of unpleasant sounds, but also encodes acoustical features, and that effective connectivity between it and the auditory cortex is reciprocally modulated such that the representation of salient information is jointly processed by this circuit. The reverse correlation findings of Arnal et al.  indicate that, among the sounds they have used, the amygdala responds best to those containing roughness, but this need not indicate that the auditory cortex — and by extension cognitive top-down mechanisms — play no role in modulating the response. Indeed, the coding of vocal affect, such as anger or fear, involves a distributed circuit  encompassing amygdala and voice-sensitive auditory cortical areas , as well as insula and prefrontal areas that encode more abstract cognitive representations of emotion.
The findings by Arnal et al.  in turn lead to a series of new questions likely to motivate further research in different domains. For instance: are screams of fear the only vocalizations characterized by increased energy in the 50–200 Hz temporal modulation rate, or would angry vocalizations, for example, also show this feature? To what extent are such rapid temporal modulations, reported here in adult screams of fear, also exploited by infant cries — a sound category of particular survival value? And are these fast modulations specifically human or are they also exploited by other species to make their vocalizations more attention-grabbing, along with other well-known cues such as amplitude rise time and fast changes in fundamental frequency ? What are the neural top-down mechanisms that enable roughness to be perceived either as a danger signal requiring immediate action, or a sign of emotional intensity, to be enjoyed at a concert?
- Arnal, L.H., Flinker, A., Kleinschmidt, A., Giraud, A.-L., and Poeppel, D. (2015). Human screams occupy a priviliged niche in the communication soundscape. Curr. Biol. 25, 2051–2056.
- Shamma, S. (2001). On the role of space and time in auditory processing. Trends Cogn. Sci. 5, 340–348.
- deCharms, R.C., Blake, D.T., and Merzenich, M.M. (1998). Optimizing sound features for cortical neurons. Science 280, 1439–1443.
4. Scho ̈ nwiesner, M., and Zatorre, R.J. (2009). Spectro-temporal modulation transfer function of single voxels in the human auditory cortex measured with high-resolution fMRI. Proc. Natl. Acad. Sci. USA 106, 14611–14616.
5. Singh, N.C., and Theunissen, F.E. (2003). Modulation spectra of natural sounds and ethological theories of auditory processing. J. Acoust. Soc. Am. 114, 3394–3411.
6. Fecteau, S., Belin, P., Joanette, Y., and Armony, J. (2007). Amygdala responses to nonlinguistic emotional vocalizations. Neuroimage 36, 480–487.
7. Blood, A.J., Zatorre, R.J., Bermudez, P., and Evans, A.C. (1999). Emotional responses to pleasant and unpleasant music correlate with activity in paralimbic brain regions. Nat. Neurosci. 2, 382–387.
8. Koelsch, S. (2014). Brain correlates of music-evoked emotions. Nat. Rev. Neurosci. 15, 170–180.
9. Kumar, S., von Kriegstein, K., Friston, K., and Griffiths, T.D. (2012). Features versus feelings: dissociable representations of the acoustic features and valence of aversive sounds.
J. Neurosci. 32, 14184–14192.
10. Bestelmeyer, P.E., Maurage, P., Rouger, J., Latinus, M., and Belin, P. (2014). Adaptation to vocal expressions reveals multistep perception of auditory emotion. J. Neurosci. 34, 8098–8105.
11. Belin, P., Zatorre, R.J., Lafaille, P., Ahad, P., and Pike, B. (2000). Voice-selective areas in human auditory cortex. Nature 403, 309–312.
12. Owren, M.J., and Rendall, D. (2001). Sound on the rebound: Bringing form and function back to the forefront in understanding nonhuman primate vocal signaling. Evol. Anthropol. 10, 58–71.
Pascal Belin and Robert J. Zatorre
Faculty of Medicine, Aix-Marseille University, Campus Sante ́ Timone, 13005 Marseille Cedex, France
Montreal Neurological Institute, McGill University, 3801 University Street, Montreal, Quebec H3A 2B4,
Licenza SITO WEB DI ZATORRE LAB
Data September 21, 2015