The idea is just like the one implemented with a modern-day voice assistant. Except that it does not just use voice and common intelligible sounds. It will be set up to take other sound gestures like, clapping or rubbing of hands.
For the idea to work the phone's microphones will always be recording. A standard system of gestures can be taught to the user or the user can just designate the gestures that they want on their mobile phones or voice assistants. For example, the phone can be set to unlock when the user scratches a particular pattern on any surface. For example, scratching a "Z" shape on any surface. The phone will deduce the user's instructions by interpreting the pitch, variation and timing of the sound that comes from the scratching or rubbing of surfaces.
This will give the user more convenience in communicating with their device. For example, the user can dismiss alarms by clicking their tongue twice. The user can It will also give the user more privacy since multiple patterns can be set to prompt the same response. The user can set up an unlimited amount of gestures so that only the user knows what they are instructing their phone to do, unlike the way voice assistants function since everyone knows what you are telling your phone to do when you communicate with your voice assistant.
The phone can be set to listen to the user's heart rate, voice, and other physiological signatures that can be monitored through sound. This will improve the system's accuracy and reduce the odds of the phone's gestures being accidentally triggered by people other than the phone's owner.