Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Jan 14;22(2):649.
doi: 10.3390/s22020649.

Exploring Silent Speech Interfaces Based on Frequency-Modulated Continuous-Wave Radar

Affiliations

Exploring Silent Speech Interfaces Based on Frequency-Modulated Continuous-Wave Radar

David Ferreira et al. Sensors (Basel). .

Abstract

Speech is our most natural and efficient form of communication and offers a strong potential to improve how we interact with machines. However, speech communication can sometimes be limited by environmental (e.g., ambient noise), contextual (e.g., need for privacy), or health conditions (e.g., laryngectomy), preventing the consideration of audible speech. In this regard, silent speech interfaces (SSI) have been proposed as an alternative, considering technologies that do not require the production of acoustic signals (e.g., electromyography and video). Unfortunately, despite their plentitude, many still face limitations regarding their everyday use, e.g., being intrusive, non-portable, or raising technical (e.g., lighting conditions for video) or privacy concerns. In line with this necessity, this article explores the consideration of contactless continuous-wave radar to assess its potential for SSI development. A corpus of 13 European Portuguese words was acquired for four speakers and three of them enrolled in a second acquisition session, three months later. Regarding the speaker-dependent models, trained and tested with data from each speaker while using 5-fold cross-validation, average accuracies of 84.50% and 88.00% were respectively obtained from Bagging (BAG) and Linear Regression (LR) classifiers, respectively. Additionally, recognition accuracies of 81.79% and 81.80% were also, respectively, achieved for the session and speaker-independent experiments, establishing promising grounds for further exploring this technology towards silent speech recognition.

Keywords: European Portuguese; continuous-wave radar; machine learning; silent speech.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.

Figures

Figure 1
Figure 1
Acquisition and classification pipeline. From left to right, the respective processing steps are data acquisition, preprocessing, feature extraction, and classification.
Figure 2
Figure 2
Radar setup for the data acquisition sessions. The participant is seated in front of the radar board while a monitor continuously displays the words to be uttered, turning the background green whenever the participant is asked to speak.
Figure 3
Figure 3
Illustrative visual representations for the data corresponding to one acquisition of the word “Ajuda” (Help): velocity dispersion pattern (left) and distance variation (right) over 1 second of acquisition frames (along the vertical axis). For this case study, although distance variations were acquired and presented, only the velocity representations were considered for model training and subsequent classification.
Figure 4
Figure 4
Boxplot representations depicting the average classification accuracies obtained for each acquisition session: average accuracies per classifier (top) and average accuracies per speaker (bottom). Speaker 2 only recorded one session.
Figure 5
Figure 5
Confusion matrices for the best classifiers in each session illustrating the average recognition results across all participants. BAG classifier was considered for the first session (left), while LR classifier was considered for the second (right). The matrix rows represent the word instances submitted for recognition, while its columns represent the corresponding recognized words.
Figure 6
Figure 6
Study of speaker-independent models performance. Accuracy results, per speaker, when considering the remaining one, two, or three speakers for model training and subsequent classification).

Similar articles

Cited by

References

    1. Kepuska V., Bohouta G. Next-generation of virtual personal assistants (Microsoft Cortana, Apple Siri, Amazon Alexa and Google Home); Proceedings of the 2018 IEEE 8th Annual Computing and Communication Workshop and Conference (CCWC); Las Vegas, NV, USA. 8–10 January 2018; pp. 99–103.
    1. Denby B., Schultz T., Honda K., Hueber T., Gilbert J.M., Brumberg J.S. Silent speech interfaces. Speech Commun. 2010;52:270–287. doi: 10.1016/j.specom.2009.08.002. - DOI
    1. Levelt W.J. Speaking: From Intention to Articulation. Volume 1 MIT Press; Cambridge, MA, USA: 1993.
    1. Freitas J., Teixeira A., Dias M.S., Silva S. An Introduction to Silent Speech Interfaces. Springer; Berlin/Heidelberg, Germany: 2017. SSI Modalities I: Behind the Scenes—From the Brain to the Muscles; pp. 15–30.
    1. Ahmed S., Cho S.H. Hand gesture recognition using an IR-UWB radar with an inception module-based classifier. Sensors. 2020;20:564. doi: 10.3390/s20020564. - DOI - PMC - PubMed

LinkOut - more resources

  NODES
twitter 2