Recently, VinBigdata’s research paper on Speech and Language Processing has been officially approved by IEEE Signal Processing Society’s Technical Committees for presentation at ICASSP 2021. ICASSP is the world’s largest and most comprehensive technical conference focused on signal processing and its applications. This year, it will be a fully virtual conference held in June in Toronto, Canada.
The title of the research is “How to make text-to-speech system pronounce “Voldemort”: An experimental approach of Foreign word phonemization in Vietnamese”. VinBigdata’s scientists find a solution to one of the biggest challenges of any speech synthesis systems, namely generating foreign words. These foreign words could be proper names, technical terms, movie titles, or quotes in the foreign language. This phenomenon usually occurs in many types of text content such as broadcast news, books, instant message, technical reports, etc.
This work deal with this problem in the case of Vietnamese, a low resourced language, following an experimental approach. Base on a deep analysis of the usage of foreign words in Vietnamese, various types of pronunciation dictionaries for foreign words was proposed including rule-based phonemization, word-to-syllables mapping, and cross-lingual phone-to-phone mapping. These dictionaries were then used to train different types of grapheme-to-phoneme (G2P) converters. The perceptual evaluation of the Vietnamese synthesized speech confirms that the output of the proposed method can compare favorably with the pronunciation by the human on the unseen foreign words. Research details will be announced at ICASSP Conference in June.
It is an achievement of leading scientists at VinBigdata, including: PhD. Nguyen Kim Anh (University of Stuttgart, Germany), PhD. Mac Dang Khoa (Grenoble University, France), PhD. Nguyen Van Huy (Vietnam Academy of Science and Technology), Nguyen Dinh Nghi (Language processing executive). They are currently working with a team of more than 50 experts and engineers in Speech and Language Processing Department to research and develop highly applicable technology products such as: Speech recognition, speech synthesis systems, chatbot, voicebot, virtual assistant, automatic machine translation, etc.
This latest achievement will help VinBigdata continue to improve and enhance the quality and accuracy of AI software, then apply it to products and solutions in the Technology – Industrials – Property & Services ecosystem of Vingroup. It hopefully lays foundation for Vietnamese technology to directly serve Vietnamese people, improving user experience and changing the way of operating various tasks in life.
Previously, VinBigdata’s Speech and Language Processing team had achieved many outstanding results, including: No1 in Input Typing Error Correction for Vietnamese; 90% of accuracy for Vietnamese Speech To Text (STT), better than other global competitors such as Google; 88% of accuracy for VinBigdata’s Error Correction Technology, demonstrating a higher score than leading competitors such as Samsung, Google and Laban.
IEEE – Institute of Electrical and Electronics Engineers is the world’s largest association of technical experts with more than 423,000 members in more than 160 countries. Founded in 1884, by electronics experts in New York, USA, up to now, IEEE continues to persistently pursue its mission of fostering technological innovation and excellence for the benefit of humanity. The international Conference on Acoustics, Speech, & Signal Processing (ICASSP) is one of the IEEE’s annual activities, to discuss problems and applications of signal processing. In 2021, the 46th edition of the conference will publish typical scientific studies on many topics, such as signal processing by machine learning, signal processing for big data, etc.