Statistical Speech & Sound Computing Lab.

Block diagrams of recent publications

[1] Y. Choi, Y. Jung, Y. Suh, H. Kim, "Learning to Maximize Speech Quality Directly Using MOS Prediction for Neural Text-to-Speech," IEEE ACCESS, Vol. 10, pp. 52621-52629, 2022, doi: 10.1109/ACCESS.2022.3175810

https://ieeexplore.ieee.org/abstract/document/9775804

[2] Y. Jung, Y. Choi, H. Lim, and H. Kim, “A Unified Deep Learning Framework for Short-Duration Speaker Verification in Adverse Environments,” IEEE ACCESS, Vol. 8, pp. 175448–175466, 2020, doi:10.1109/ACCESS.2020.3025941

https://ieeexplore.ieee.org/document/9203835

Fig 1. Overview of the proposed perceptually guided TTS with MOS prediction. 

Fig 2. Illustration of the proposed integrated model combining speech enhancement (SE), speaker verification, and VAD.

International Journals (Recent 5 years)

Yeunju Choi, Youngmoon Jung, Youngjoo Suh, Hoirin Kim, "Learning to Maximize Speech Quality Directly Using MOS Prediction for Neural Text-to-Speech," IEEE ACCESS, Vol. 10, pp. 52621-52629, May. 2022.

Youngmoon Jung, Yeunju Choi, Hyungjun Lim, and Hoirin Kim, "A Unified Deep Learning Framework for Short-Duration Speaker Verification in Adverse Environments," IEEE ACCESS, Vol. 8, pp. 175448-175466, Sep. 2020.

Hyungjun Lim, Younggwan Kim, and Hoirin Kim, “Cross-Informed Domain Adversarial Training for Noise-Robust Wake-up Word Detection,” IEEE SPL, Vol. 27, No. 11, pp. 1769-1773, Sep. 2020.

Hyunjun Lim, Younggwan Kim, Jahyun Goo, and Hoirin Kim, "Interlayer Selective Attention Network for Robust Personalized Wake-Up Word Detection," IEEE SPL, Vol. 27, No. 1, pp. 126-130, Jan. 2020.

Younggwan Kim, Myung Jong Kim, Jahyun Goo, and Hoirin Kim, “Learning Self-Informed Feature Contribution for Deep Learning-Based Acoustic Modeling,” IEEE/ACM Trans. on Audio, Speech and Language Processing, Vol. 26, No. 11, pp. 2204-2214, Nov. 2018. 

Youngjoo Suh and Hoirin Kim, “Histogram Equalization with Bayesian Estimation for Noise Robust Speech Recognition,” Journal of the Acoustical Society of America, Vol. 143, No. 2, pp. 677-685, Feb. 2018. 


International Conferences (Recent 5 years)

Myunghun Jung, Hoirin Kim, "AdaMS: Deep Metric Learning with Adaptive Margin and Adaptive Scale for Acoustic Word Discrimination," Interspeech 2023, pp. 3924-3928, Aug. 2023. 

Kangwook Jang, Sungnyun Kim, Se-Young Yun, Hoirin Kim, "Recycle-and-Distill: Universal Compression Strategy for Transformer-based Speech SSL Models with Attention Map Reusing and Masking Distillation," Interspeech2023, pp. 316-320, Aug. 2023.

Myunghun Jung, Hoirin Kim, "Asymmetric Proxy Loss for Multi-View Acoustic Word Embeddings," Interspeech2022, pp. 5170-5174, Sep. 2022.

Yeonghyeon Lee, Kangwook Jang, Jahyun Goo, Youngmoon Jung, Hoirin Kim, "FitHuBERT: Going Thinner and Deeper for Knowledge Distillation of Speech Self-Supervised Learning," Interspeech2022, pp. 3588-3592, Sep. 2022. (MSIT/NRF)

Youngsik Eom, Yeonghyeon Lee, Ji Sub Um, Hoirin Kim, "Anti-Spoofing Using Transfer Learning with Variational Information Bottleneck," Interspeech2022, pp. 3568-3572, Sep. 2022. (MSIT/IITP)

Jisub Um, Yeunju Choi, Hoirin Kim, "ACNN-VC: Utilizing Adaptive Convolution Neural Network for One-Shot Voice Conversion," Interspeech2022, pp. 2998-3002, Sep. 2022. (MSIT/NRF)

Yeunju Choi, Youngmoon Jung, Hoirin Kim, “NEURAL MOS PREDICTION FOR SYNTHESIZED SPEECH USING MULTI-TASK LEARNING WITH SPOOFING DETECTION AND SPOOFING TYPE CLASSIFICATION,” SLT2021, pp. 462-469, Jan. 2021. (Virtual) (MOTIE /KEIT)

Seong Min Kye, Joon Son Chung, Hoirin Kim, “SUPERVISED ATTENTION FOR SPEAKER RECOGNITION,” SLT2021, pp. 286-293, Jan. 2021. (Virtual) (MSIT/IITP)

Joohyung Lee, Youngmoon Jung, Myunghun Jung, Hoirin Kim, “DYNAMIC NOISE EMBEDDING: NOISE AWARE TRAINING AND ADAPTATION FOR SPEECH ENHANCEMENT,” APSIPA2020, pp. 739-746, Dec. 2020. (Virtual) (ADD)

Joohyung Lee, Youngmoon Jung and Hoirin Kim, “Dual Attention in Time and Frequency Domain for Voice Activity Detection,” Interspeech2020, pp. 3670-3674, Oct. 2020. (Virtual) (ADD)

Seong Min Kye, Youngmoon Jung, Hae Beom Lee, Sung Ju Hwang, Hoirin Kim, “Meta-Learning for Short Utterance Speaker Recognition with Imbalance Length Pairs,” Interspeech2020, pp. 2982-2986, Oct. 2020. (Virtual) (MOTIE /KEIT)

Yeunju Choi, Youngmoon Jung, Hoirin Kim, “Deep MOS Predictor for Synthetic Speech Using Cluster-Based Modeling,” Interspeech2020, pp. 1743-1747, Oct. 2020. (Virtual) (MOTIE /KEIT)

Youngmoon Jung, Seong Min Kye, Yeunju Choi, Myunghun Jung, Hoirin Kim, “Improving Multi-Scale Aggregation Using Feature Pyramid Module for Robust Speaker Verification of Variable-Duration Utterances,” Interspeech2020, pp. 1501-1505, Oct. 2020. (Virtual) (MOTIE /KEIT )

Myunghun Jung, Youngmoon Jung, Jahyun Goo, Hoirin Kim, “Multi-Task Network for Noise-Robust Keyword Spotting and Speaker Verification using CTC-based Soft VAD and Global Query Attention,” Interspeech2020, pp. 931-935, Oct. 2020. (Virtual) (ADD)

Myunghun Jung, Hyungjun Lim, Jahyun Goo, Youngmoon Jung, Hoirin Kim, “ADDITIONAL SHARED DECODER ON SIAMESE MULTI-VIEW ENCODERS FOR LEARNING ACOUSTIC WORD EMBEDDINGS,” ASRU2019, Dec. 2019. (MOTIE /KEIT)

Youngmoon Jung, Yeunju Choi, Hoirin Kim, “SELF-ADAPTIVE SOFT VOICE ACTIVITY DETECTION USING DEEP NEURAL NETWORKS FOR ROBUST SPEAKER VERIFICATION,” ASRU2019, Dec. 2019. (MOTIE /KEIT)

Sunghee Jung, Youngjoo Suh, Hoirin Kim, “FAST VOICE CONVERSION ON NON-PARALLEL CORPUS,” SICSS (Seoul International Conference on Speech Sciences) 2019, pp. 158-159, Nov. 2019. (MOTIE /KEIT)

Hyeonjae Jeong, Jahyun Goo, Seunghi Kim, Hoirin Kim, “Semi-Supervised Domain Adaptation for End-to-End Automatic Speech Recognition,” SICSS (Seoul International Conference on Speech Sciences) 2019, pp. 134, Nov. 2019. (MSIP /IITP)

Jahyun Goo, Hyeonjae Jeong, Seunghi Kim, Hoirin Kim, “A Study on Pretraining of End-to-End Speech Recognition Model,” SICSS (Seoul International Conference on Speech Sciences) 2019, pp. 75-76, Nov. 2019. (MSIP /IITP)

Youngmoon Jung, Younggwan Kim, Hyungjun Lim, Yeunju Choi, Hoirin Kim, “Spatial Pyramid Encoding with Convex Length Normalization for Text-Independent Speaker Verification,” Interspeech2019, pp. 4030-4034, Sep. 2019. (MOTIE /KEIT)

Youngmoon Jung, Younggwan Kim, Yeunju Choi, Hoirin Kim, “Joint Learning using Denoising Variational Autoencoders for Voice Activity Detection,” Interspeech2018, pp. 1210-1214, Sep. 2018. (MOTIE /KEIT)

Hyungjun Lim, Jahyun Goo, Sungmin Lim, Hoirin Kim, “Score fusion based Vocabulary Independent Spoken Term Detection,” ICCE-Asia, pp. 365-367, June 2018. (MOTIE /KEIT)

Younggwan Kim, Youngmoon Jung, Yeunju Choi, Hoirin Kim, “Deep Least Squares Regression-Based Speaker-Dependent Layer Initialization for DNN Acoustic Model Adaptation,” ICCE-Asia, pp. 361-364, June 2018. (MOTIE/KEIT)