Development of a Silent Speech Interface

Driven by Ultrasound and Optical Images of the Tongue and Lips


T. Hueber, E. L. Benaroya, G. Chollet, B. Denby, G. Dreyfus, M. Stone

 
 

Corpus-based synthesis driven by video-only data

with a 100% correct phonetic decoding

Example 1a (Speaker A - without prosody adaptation) :

Example 1b (Speaker A - with prosody adaptation) :


Example 2 (Speaker A - with prosody adaptation) :

Example 3 (Speaker A - with prosody adaptation) :


Corpus-based synthesis driven by video-only data

with a 80% correct phonetic decoding

Example 4 (Speaker A - with prosody adaptation):


Corpus-based synthesis driven by video-only data

with a 60% correct phonetic decoding

(Typical performance of the current system)

Example 5 (Speaker A - with prosody adaptation):

Example 6 (Speaker A - with prosody adaptation):




 

Sound examples