Gallery

Locally Time-Reversed Speech

When a spoken sentence is played backward, it becomes totally unintelligible.

A sample of reversed speech: Click here to download a sample wav file; its original: original.

However, when speech is segmented and each segment is reversed, it can be intelligible, if the segment duration is not too long.

Here are some examples of locally time-reversed speech: with 170-ms segments---click here, with 70-ms segments---click here, and with 20-ms segments---click here.

This happens because our brain is always seeking a right answer, and the attempt succeeds under certain conditions.

The original speech data was taken from NTT-AT Multilingual Speech Database 2002.

Related Works

  • Ueda, K., and Matsuo, I. (2021). Intelligibility of chimeric locally time-reversed speech: Relative contribution of four frequency bands. JASA Express Letters 1(6), 065201, doi: 10.1121/10.0005439
  • Ueda, K., and Ciocca, V. (2021). Perceptual restoration of interrupted locally time-reversed speech: Effects of segment duration and noise levels. Attention, Perception, & Psychophysics, 1-7, doi: 10.3758/s13414-021-02292-3
  • Matsuo, I., Ueda, K., and Nakajima, Y. (2020). Intelligibility of chimeric locally time-reversed speech, The Journal of the Acoustical Society of America Express Letters, 147, EL523-EL528. https://doi.org/10.1121/10.0001414
  • Kazuo Ueda, Yoshitaka Nakajima, Wolfgang Ellermeier, and Florian Kattner. (2017). Intelligibility of locally time-reversed speech: A multilingual comparison, Scientific Reports, 7: 1782, doi:10.1038/s41598-017-01831-z.

Noise-Vocoded Speech

Natural speech contains information related to both relatively slow movements of the power in the voice source and the vocal tract (amplitude envelopes) and fast movements of the vocal folds (temporal fine structures). Noise-vocoded speech preserves only amplitude envelopes in several frequency bands. Although it sounds hoarse, it is intelligible with equal to or more than four frequency bands.

A sample of noise-vocoded speech with four bands: Click here to download a sample wav file; with 20 bands: Click here to download a sample wav file; their original: original.

The original speech data was taken from NTT-AT Multilingual Speech Database 2002.

Related Works

  • Ueda, K., Araki, T., and Nakajima, Y. (2018). Frequency specificity of amplitude envelope patterns in noise-vocoded speech, Hearing Research, 367, 169-181 https://doi.org/10.1016/j.heares.2018.06.005.
  • Nakajima, Y., Matsuda, M., Ueda, K., and Remijn, G. B. (2018). Temporal resolution needed for auditory communication: Measurement with mosaic speech, Frontiers in Human Neuroscience, 12(149). doi:10.3389/fnhum.2018.00149
  • Takuya Kishida, Yoshitaka Nakajima, Kazuo Ueda, and Gerard B. Remijn. (2016). Three factors are critical in order to synthesize intelligible noise-vocoded Japanese speech, Frontiers in Psychology, 7, 517; doi: 10.3389/fpsyg.2016.00517.
  • Wolfgang Ellermeier, Florian Kattner, Kazuo Ueda, Kana Doumoto, and Yoshitaka Nakajima. (2015). “Memory disruption by irrelevant noise-vocoded speech: Effects of native language and the number of frequency bands,” the Journal of the Acoustical Society of America, 138, 1561–1569 [http://dx.doi.org/10.1121/1.4928954].

Factor Analysis of Speech

Focusing on relatively slow movements of the vocal tract (amplitude envelopes), we have established that three factors and four frequency bands are universally found in eight langauges/dialects.

Related Works

  • Ueda, K., and Matsuo, I. (2021). Intelligibility of chimeric locally time-reversed speech: Relative contribution of four frequency bands. JASA Express Letters 1(6), 065201, doi: 10.1121/10.0005439
  • Matsuo, I., Ueda, K., and Nakajima, Y. (2020). Intelligibility of chimeric locally time-reversed speech, The Journal of the Acoustical Society of America Express Letters, 147, EL523-EL528. https://doi.org/10.1121/10.0001414
  • Ueda, K., Araki, T., and Nakajima, Y. (2018). Frequency specificity of amplitude envelope patterns in noise-vocoded speech, Hearing Research, 367, 169-181 https://doi.org/10.1016/j.heares.2018.06.005.
  • Yoshitaka Nakajima, Kazuo Ueda, Shota Fujimaru, Hirotoshi Motomura, and Yuki Ohsaka. (2017). English phonology and an acoustic language universal, Scientific Reports, 7, 46049; doi: 10.1038/srep46049.
  • Kazuo Ueda and Yoshitaka Nakajima. (2017). An acoustic key to eight languages/dialects: Factor analyses of critical-band-filtered speech, Scientific Reports, 7, 42468; doi: 10.1038/srep42468.
  • Takuya Kishida, Yoshitaka Nakajima, Kazuo Ueda, and Gerard B. Remijn. (2016). “Three factors are critical in order to synthesize intelligible noise-vocoded Japanese speech,” Frontiers in Psychology, 7, 517; doi: 10.3389/fpsyg.2016.00517.
  • Wolfgang Ellermeier, Florian Kattner, Kazuo Ueda, Kana Doumoto, and Yoshitaka Nakajima. (2015). “Memory disruption by irrelevant noise-vocoded speech: Effects of native language and the number of frequency bands,” the Journal of the Acoustical Society of America, 138, 1561–1569 [http://dx.doi.org/10.1121/1.4928954].
  • Yuko Yamashita, Yoshitaka Nakajima, Kazuo Ueda, Yohko Shimada, David Hirsh, Takeharu Seno, and Benjamin Alexander Smith. (2013). Acoustic analyses of speech sounds and rhythms in Japanese- and English-learning infants. Frontiers in Psychology 4, 1–10 [https://doi.org/10.3389/fpsyg.2013.00057].

Irrelevant Sound Effect

Performance of a serial recall task presented visually (for example, to memorize random series of digits) is greatly deteriorated by simultaneously presented irrelevant sound, especially speech, although participants are instructed to ignore the sound. This effect is called "Irrelevant Sound Effect (ISE)." We are trying to clarify why speech is so destructive in this kind of task.

Related Works

  • Ueda, K., Nakajima, Y., Kattner, F., and Ellermeier, W. (2019). Irrelevant speech effects with locally time-reversed speech: Native vs. non-native language, the Journal of the Acoustical Soceity of America, 145, 3686-3694. https://doi.org/10.1121/1.5112774 [PDF]
  • Ellermeier, W., Kattner, F., Ueda, K., Doumoto, K., and Nakajima, Y. (2015). “Memory disruption by irrelevant noise-vocoded speech: Effects of native language and the number of frequency bands,” the Journal of the Acoustical Society of America, 138, 1561–1569 [http://dx.doi.org/10.1121/1.4928954].

Useful Information