Gallery

(under construction)

Locally Time-Reversed Speech

When a spoken sentence is played backward, it becomes totally unintelligible.

A sample of reversed speech: Click here to download a sample wav file; its original: original.

However, when speech is segmented and each segment is reversed, it can be intelligible, if the segment duration is not too long.

Here are some examples of locally time-reversed speech: with 170-ms segments---click here, with 70-ms segments---click here, and with 20-ms segments---click here.

This happens because our brain is always seeking a right answer, and the attempt succeeds under certain conditions.

The original speech data was taken from NTT-AT Multilingual Speech Database 2002.

Related Work

  • Kazuo Ueda, Yoshitaka Nakajima, Wolfgang Ellermeier, and Florian Kattner. (2017). Intelligibility of locally time-reversed speech: A multilingual comparison, Scientific Reports, 7: 1782, doi:10.1038/s41598-017-01831-z.

Noise-Vocoded Speech

Natural speech contains information related to both relatively slow movements of the power in the voice source and the vocal tract (amplitude envelopes) and fast movements of the vocal folds (temporal fine structures). Noise-vocoded speech preserves only amplitude envelopes in several frequency bands. Although it sounds hoarse, it is intelligible with equal to or more than four frequency bands.

A sample of noise-vocoded speech with four bands: Click here to download a sample wav file; with 20 bands: Click here to download a sample wav file; their original: original.

The original speech data was taken from NTT-AT Multilingual Speech Database 2002.

Related Works

  • Takuya Kishida, Yoshitaka Nakajima, Kazuo Ueda, and Gerard B. Remijn. (2016). Three factors are critical in order to synthesize intelligible noise-vocoded Japanese speech, Frontiers in Psychology, 7, 517; doi: 10.3389/fpsyg.2016.00517.
  • Wolfgang Ellermeier, Florian Kattner, Kazuo Ueda, Kana Doumoto, and Yoshitaka Nakajima. (2015). “Memory disruption by irrelevant noise-vocoded speech: Effects of native language and the number of frequency bands,” the Journal of the Acoustical Society of America, 138, 1561–1569 [http://dx.doi.org/10.1121/1.4928954].

Factor Analysis of Speech

Focusing on relatively slow movements of the vocal tract (amplitude envelopes), we have established that three factors and four frequency bands are universally found in eight langauges/dialects.

Related Works

  • Yoshitaka Nakajima, Kazuo Ueda, Shota Fujimaru, Hirotoshi Motomura, and Yuki Ohsaka. (2017). English phonology and an acoustic language universal, Scientific Reports, 7, 46049; doi: 10.1038/srep46049.
  • Kazuo Ueda and Yoshitaka Nakajima. (2017). An acoustic key to eight languages/dialects: Factor analyses of critical-band-filtered speech, Scientific Reports, 7, 42468; doi: 10.1038/srep42468.
  • Takuya Kishida, Yoshitaka Nakajima, Kazuo Ueda, and Gerard B. Remijn. (2016). “Three factors are critical in order to synthesize intelligible noise-vocoded Japanese speech,” Frontiers in Psychology, 7, 517; doi: 10.3389/fpsyg.2016.00517.
  • Wolfgang Ellermeier, Florian Kattner, Kazuo Ueda, Kana Doumoto, and Yoshitaka Nakajima. (2015). “Memory disruption by irrelevant noise-vocoded speech: Effects of native language and the number of frequency bands,” the Journal of the Acoustical Society of America, 138, 1561–1569 [http://dx.doi.org/10.1121/1.4928954].
  • Yuko Yamashita, Yoshitaka Nakajima, Kazuo Ueda, Yohko Shimada, David Hirsh, Takeharu Seno, and Benjamin Alexander Smith. (2013). Acoustic analyses of speech sounds and rhythms in Japanese- and English-learning infants. Frontiers in Psychology 4, 1–10 [https://doi.org/10.3389/fpsyg.2013.00057].

Irrelevant Speech Effect

Performance of a serial recall task presented visually (for example, to memorize random series of digits) is greatly deteriorated by simultaneously presented irrelevant speech stimuli, although participants are instructed to ignore the speech. This effect is called "Irrelevant Speech Effect (ISE)." We are trying to clarify why speech is so destructive in this kind of task.

Related Work

  • Wolfgang Ellermeier, Florian Kattner, Kazuo Ueda, Kana Doumoto, and Yoshitaka Nakajima. (2015). “Memory disruption by irrelevant noise-vocoded speech: Effects of native language and the number of frequency bands,” the Journal of the Acoustical Society of America, 138, 1561–1569 [http://dx.doi.org/10.1121/1.4928954].

Useful Information