Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2016 Oct 29;16(11):1812.
doi: 10.3390/s16111812.

Towards Contactless Silent Speech Recognition Based on Detection of Active and Visible Articulators Using IR-UWB Radar

Affiliations

Towards Contactless Silent Speech Recognition Based on Detection of Active and Visible Articulators Using IR-UWB Radar

Young Hoon Shin et al. Sensors (Basel). .

Abstract

People with hearing or speaking disabilities are deprived of the benefits of conventional speech recognition technology because it is based on acoustic signals. Recent research has focused on silent speech recognition systems that are based on the motions of a speaker's vocal tract and articulators. Because most silent speech recognition systems use contact sensors that are very inconvenient to users or optical systems that are susceptible to environmental interference, a contactless and robust solution is hence required. Toward this objective, this paper presents a series of signal processing algorithms for a contactless silent speech recognition system using an impulse radio ultra-wide band (IR-UWB) radar. The IR-UWB radar is used to remotely and wirelessly detect motions of the lips and jaw. In order to extract the necessary features of lip and jaw motions from the received radar signals, we propose a feature extraction algorithm. The proposed algorithm noticeably improved speech recognition performance compared to the existing algorithm during our word recognition test with five speakers. We also propose a speech activity detection algorithm to automatically select speech segments from continuous input signals. Thus, speech recognition processing is performed only when speech segments are detected. Our testbed consists of commercial off-the-shelf radar products, and the proposed algorithms are readily applicable without designing specialized radar hardware for silent speech processing.

Keywords: IR-UWB radar; articulators’ detection; contactless silent speech recognition.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

Figure 1
Figure 1
IR-UWB-radar-based silent speech recognition testbed: (a) Font view; (b) Side view with a user. The transmitted signal is reflected by multiple points on and inside the face. IR-UWB radar signals can penetrate the skin.
Figure 2
Figure 2
Block diagram of the signal processing flow of the proposed system.
Figure 3
Figure 3
Examples of raw received radar signals corresponding to: (a) silent pronunciation of the word “two”; (b) silent pronunciation of the word “five”. The approximate beginning time (about 0.4 s) and end time (about 1.1 s) of the pronunciation of “two” is clearly visible, but they are not very clear for “five” in this raw data.
Figure 4
Figure 4
Examples of clutter-reduced signals corresponding to: (a) silent pronunciation of the word “two”; (b) silent pronunciation of the word “five”. The raw radar data is the same as the data in Figure 3.
Figure 5
Figure 5
Example clean maps obtained by the conventional CLEAN algorithm corresponding to: (a) silent pronunciation of the word “two”; (b) silent pronunciation of the word “five”. The raw radar data is the same as the data in Figure 3 and Figure 4.
Figure 6
Figure 6
Example clean maps obtained by the short-template-based CLEAN algorithm: (a) silent pronunciation of the word “two”; (b) silent pronunciation of the word “five”. The raw radar data is the same as the data in Figure 3, Figure 4 and Figure 5. Unlike the raw data in Figure 3b, the approximate beginning time (about 0.3 s) and end time (about 1.1 s) of the pronunciation of “five” is now clearly visible.
Figure 7
Figure 7
Examples of the variance of normalized signal amplitude (raw variance data and smoothed data): (a) silent pronunciation of the word “two”; (b) silent pronunciation of the word “five”. The raw radar data is the same as the data in Figure 3, Figure 4, Figure 5 and Figure 6. The values above the threshold (horizontal dashed line) indicate the general motion of the speaker. The data between the vertical lines are stored for further processing.
Figure 8
Figure 8
Example clean maps with the distance thresholds (horizontal dashed lines): (a) silent pronunciation of the word “two”; (b) silent pronunciation of the word “five”. The raw radar data is the same as the data in Figure 3, Figure 4, Figure 5, Figure 6 and Figure 7. Both data within the distance thresholds represent articulator motion, and thus the data segments between the vertical lines are stored.
Figure 9
Figure 9
Illustration of the distance matrix and alignment path of the MD-DTW algorithm for two features. Each (i, j) element of the matrix contains a distance value calculated by Equation (8). The alignment path in gray is the path having the minimal total distance value.
Figure 10
Figure 10
Comparison of precision, recall, and F-measure of word recognition with five speakers. Each narrow bar indicates the result of each speaker, and each wide bar and corresponding number indicates the average value over five speakers.

Similar articles

Cited by

References

    1. Juang B.-H., Rabiner L.R. Encyclopedia of Language & Linguistics. 2nd ed. Elsevier; Boston, MA, USA: 2006. Speech Recognition, Automatic: History; pp. 806–819.
    1. Denby B., Schultz T., Honda K., Hueber T., Gilbert J.M., Brumberg J.S. Silent speech interfaces. Speech Commun. 2009;52:270–287. doi: 10.1016/j.specom.2009.08.002. - DOI
    1. Schultz T., Wand M. Modeling coarticulation in EMG-based continuous speech recognition. Speech Commun. 2010;52:341–353. doi: 10.1016/j.specom.2009.12.002. - DOI
    1. Wand M., Schulte C., Janke M., Schultz T. Array-based Electromyographic Silent Speech Interface; Proceedings of the 6th International Conference on Bio-Inspired Systems and Signal Processing (BIOSIGNALS); Barcelona, Spain. 11–14 February 2013; pp. 89–96.
    1. Wand M., Himmelsbach A., Heistermann T., Janke M., Schultz T. Artifact Removal Algorithm for an EMG-Based Silent Speech Interface; Proceedings of the 35th Annual Conference of the IEEE Engineering in Medicine and Biology Society; Osaka, Japan. 3–7 July 2013; pp. 5750–5753. - PubMed
  NODES
INTERN 1
twitter 2
USERS 1