The way we pronounce words has an effect on how "natural" we perceive communication. Would it be possible to develop a voice assistant that also can deal with very differently pronounced words? I wanted to experiment based on this question.
"Alexa, turn on the radio! Alexa, lower the light!" Even if voice recognition systems are already well developed and part of our daily lives it often feels unnatural talking to them. This might be caused by the adaption of our voice to make these systems understand what we want/need.
How to bring a word with the same meaning but different pronunciation into a format that a system can use for analyzing without "typing" the word as it is pronounced?
Well, the idea was simple: Draw an image of it!
Being able to visualize sound and single words now about 200 generated images were used to train a Visual Recognition model. Therefore IBM Visual Recognition was used.
To keep it simple only two pronunciation types (standard german & franconian dialect) were chosen to train and test. The chosen word was "Potato" (ger. Kartoffel 🥲).
The experiment ended with a model that was able to recognize whether the word was pronounced in franconian or standard german. In tital the tested results were all correct, even if the ratio has larger fluctuations time by time. There are many uncertainties when it comes to different surroundings or more background noise.
1. Data is key! Huge sets of data which also cover factors like voice tonality and background noise would be recommendable.
2. Just one word.. uff. Using this method for a whole language might not be very efficient.
3. Cool to use open access IBM services for experiments! Interesting not knowing what the model uses for decision making.
4. Sad that people are forced to speak standard german when interacting with the digital world since dialects are already die out. But keeping some distance between system and human might also be healthy. Still unsure if I would go for a dialect supporting system or not.
Thanks to Tim helping me with setting up the code! Always nice to experiment with you!