Transcribing masks for the benefit of deaf people
Image credit: https://i.ytimg.com/vi/dltM2D9y8tk/maxresdefault.jpg
Spook Louw Jul 22, 2021
Masks may become a more permanent part of our reality, even after the Coronavirus passes. It has already been established that this has been a problem for deaf or hard of hearing people who rely on reading lips and checking facial expressions. As a solution to this problem, clear masks have been created that still shows your face even when it is covered by a mask.
This has been a simple and effective solution, but I have an idea for an even better solution that could benefit all deaf and hard of hearing people and even potentially improve communication between people who can hear fine.
Using technology that already exists in translator apps like Talk & Translate and Google Interpreter Mode and transcription software like Otter, speech can be turned into text and displayed, in real time, on the mask on tiny screens, or perhaps even on the fabric of the mask itself using LED technology or Fiber Optic Masks connected to the transcription software.
It would essentially be like creating subtitles for yourself on your mask as you speak. Not only would the deaf and hard of hearing benefit from this, but people would be able to "hear" each other from further away or in noisy environments.
The final product would look like a normal mask, not like the picture of Daft Punk.
Use Google's API to make something similar to their Live Transcribe app
salemandreus Jul 22, 2021
Google and several others have APIs you could use that do the actual speech-to-text recognition.
Google also has an app similar to what you describe except on phones, called Google Live Transcribe and Sound notifications, that uses your phone to "[make] everyday conversations and surrounding sounds more accessible among people who are deaf and hard of hearing" by transcribing your speech to text in real-time (essentially captioning you as you speak, similarly to a youtube video). It most likely uses Google's API on the back-end, possibly with more specialised training data, although I did not find confirmation.
In terms of APIs for accuracy, it is likely you won't do better than Google's either for their voice to text technology, resources or their wide set of training data they've had access to for different accents and speech patterns, etc, but as per the article there are other options if you prefer.