Cog Av Hearing - Project Background

Project Aims

This ambitious project aims to address the EPSRC research challenge, "Speech-in-noise performance in hearing aid devices" and the long-standing challenge of developing disruptive assistive listening technology that can help improve the quality of life of the 10m people in the UK suffering from some form of hearing loss. We aim to develop devices that mimic the unique human ability to focus hearing on a single talker, effectively ignoring background distractor sounds, regardless of their number and nature.

This research is a first attempt at developing a cognitively-inspired, adaptive and context-aware audio-visual (AV) processing approach for combining audio and visual cues (e.g. from lip movement) to deliver speech intelligibility enhancement. A preliminary multi-modal speech enhancement framework pioneered by Prof Hussain's Lab at Stirling will be significantly extended to incorporate models of auditory and AV scene analysis developed by Dr Barker's group at Sheffield. Further, novel computational models and theories of human vision developed by Prof Watt at Stirling will be deployed to enable real-time tracking of facial features. Intelligent multi-modality selection mechanisms will be developed, and planned collaborations with Phonak and MRC IHR will facilitate delivery of a clinically-tested software prototype.

Research Hypothesis and Objectives

Our hypothesis is that it is possible to combine visual and acoustic input to produce a multimodal hearing device that is able to significantly boost speech intelligibility in the everyday listening environments in which audio-only hearing aids prove ineffective. To test this we aim to develop and clinically validate a next-generation cognitively-inspired, AV hearing device. We will achieve this aim by combining contrasting approaches to speech enhancement developed respectively at Stirling and Sheffield in a novel AV enhancement framework. Five objectives will be met in the process:
  1. To combine signal processing from the enhancement framework pioneered at Stirling with scene analysis models developed at Sheffield to produce perceptually meaningful acoustic features suitable as input to AV enhancement algorithms.

  2. To further develop and evaluate novel approaches to visual tracking and feature extraction in the context of the AV enhancement framework. These approaches will be built on the 'bar-code' model of human facial feature processing developed at Stirling.

  3. To integrate two different approaches to enhancement, namely noise-filtering (Stirling) and speech-resynthesis (Sheffield) in a common AV framework that takes advantage of their complementary strengths. Integration will be considered at multiple (coarse and fine) scales.

  4. To design intelligent multi-modality selection mechanisms that weight AV input and select the most appropriate enhancement mechanism matched to environmental conditions.

  5. Finally, to evaluate and optimise a real-time software prototype using a new AV corpus based on real speech-in-noise scenarios. The prototype will be clinically evaluated using speech quality and intelligibility tests with hearing-impaired volunteers.