SASSy :Sensory Articulation Speech System – A 3D animation based therapeutic application for Motor Speech Disorders.


SPEAKER: Dr. Pip Cornelius


Wednesday 7th of December 2011 - 12pm

VENUE: IOCT


To watch the video of the seminar, use the following links:

High Quality  rtsp://helix.dmu.ac.uk/media2/14984382_hi.rm

Low Quality rtsp://helix.dmu.ac.uk/media2/14984382_lo.rm

 

At any one time around 20% of the population, adults and children, has a speech and language difficulty. For as many as 11,000,000 this can be a disorder arising from a difficulty in producing perceptually acceptable speech sounds. Speech and Language Therapy clients with these disorders need a non intrusive instrumental analytic and therapeutic tool if treatment and management is to be improved.

Currently no such technology is available. The SAT system uses computer generated 3D animations of speech sound sequences which provide a visual cue to aid a client in the accurate production of speech sounds. The precise tongue and mouth contours for the set of 24 English consonants sounds were derived from ultrasound imaging and video. Ultrasound imaging provided the information for the discrete tongue movements necessary for the articulation of those speech sounds made within the mouth.  High quality video capture was used to record the external articulation movements.  The animations were created using Autodesk Softimage and Pixologic ZBrush in the form of a 3D head which can be rotated for viewing in several perspectives. For sounds made within the mouth, cutaways of external features of the head reveal the inside workings of the mouth. In this perspective, features such as the teeth can be made transparent to reveal the precise nature of the tongue movements.  

These 3D animations therefore produce sequences that accurately model the mouth and tongue movements for speech and are highly visible within the context of the mouth, providing a focal point for the movements required to produce the speech sound. High quality audio recordings of the speech sounds were precisely matched to the animations to provide both a visual and audio template of the 24 consonants of English. The library of English consonant audio animations are presented in an Adobe Flash application allowing a user to choose any speech sound and a variety of visual perspectives for viewing the animation. Text tutorials providing input for coaching in the production of the speech sound are also accessible on screen through clicking an icon. A subsequent research project is currently underway to design and integrate speech recognition software which will analyse the client’s speech attempt and construct a digital representation on screen overlaying the target animation template as well as providing a game like performance score. This allows the client to see how closely their attempt matches the target speech sound. The patient experience will be greatly enhanced with the introduction of combined visual and auditory bio-feedback.

This technology will significantly improve the efficacy and accuracy of treatment and understanding of therapy aims. As the system can potentially be made available on the internet and through mobile devices such as phones, game consoles and net-books patients can control their own therapy and evaluate progress without relying on specialist description, explanation and feedback. Therapy becomes unrestricted and can be remotely managed reducing contact time with specialist clinicians. The instrumental analysis provides more accurate and individually tailored outcome measures and therapy leading to more reliable audit as patient performance is electronically monitored and assessed.