publications

2026

  1. CSL
    Microphone Array Geometry Independent Multi-Talker Distant ASR: NTT System for the DASR Task of the CHiME-8 Challenge
    Naoyuki Kamo, Naohiro Tawara, Atsushi Ando, Takatomo Kano, Hiroshi Sato, Rintaro Ikeshita, Takafumi Moriya, Shota Horiguchi, Kohei Matsuura, Atsunori Ogawa, Alexis Plaquet, Takanori Ashihara, Tsubasa Ochiai, Masato Mimura, Marc Delcroix, Tomohiro Nakatani, Taichi Asami, and Shoko Araki
    Computer Speech & Language, Jan 2026
  1. ICASSP
    Front-end Token Enhancement for Token-based Speech Recognition
    Takanori Ashihara, Shota Horiguchi, Kohei Matsuura, Tsubasa Ochiai, and Marc Delcroix
    In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), May 2026
  2. ICASSPW
    Target-Speaker Voice Activity Detection with Chunk-Level Speaker Queries
    Naohiro Tawara and Shota Horiguchi
    In IEEE International Conference on Acoustics, Speech and Signal Processing Workshop (ICASSPW), May 2026

      2025

        1. ASRU
          Can We Really Repurpose Multi-Speaker ASR Corpus for Speaker Diarization?
          Shota Horiguchi, Naohiro Tawara, Takanori Ashihara, Atsushi Ando, and Marc Delcroix
          In IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), Dec 2025
        2. INTERSPEECH
          Mitigating Non-Target Speaker Bias in Guided Speaker Embedding
          Shota Horiguchi, Takanori Ashihara, Marc Delcroix, Atsushi Ando, and Naohiro Tawara
          In The Annual Conference of the International Speech Communication Association (INTERSPEECH), Aug 2025
        3. INTERSPEECH
          Pretraining Multi-Speaker Identification for Neural Speaker Diarization
          Shota Horiguchi, Atsushi Ando, Naohiro Tawara, and Marc Delcroix
          In The Annual Conference of the International Speech Communication Association (INTERSPEECH), Aug 2025
        4. INTERSPEECH
          Analysis of Semantic and Acoustic Token Variability Across Speech, Music, and Audio Domains
          Takanori Ashihara, Marc Delcroix, Tsubasa Ochiai, Kohei Matsuura, and Shota Horiguchi
          In The Annual Conference of the International Speech Communication Association (INTERSPEECH), Aug 2025
        5. INTERSPEECH
          Voice Impression Control in Zero-Shot TTS
          Kenichi Fujita, Shota Horiguchi, and Yusuke Ijima
          In The Annual Conference of the International Speech Communication Association (INTERSPEECH), Aug 2025
        6. ICASSP
          Guided Speaker Embedding
          Shota Horiguchi, Takafumi Moriya, Atsushi Ando, Takanori Ashihara, Hiroshi Sato, Naohiro Tawara, and Marc Delcroix
          In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Apr 2025
        7. ICASSP
          Multi-channel Speaker Counting for EEND-VC-based Speaker Diarization on Multi-domain Conversation
          Naohiro Tawara, Atsushi Ando, Shota Horiguchi, and Marc Delcroix
          In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Apr 2025
        8. ICASSP
          Mamba-based Segmentation Model for Speaker Diarization
          Alexis Plaquet, Naohiro Tawara, Marc Delcroix, Shota Horiguchi, Atsushi Ando, and Shoko Araki
          In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Apr 2025
        9. ICASSP
          Alignment-Free Training for Transducer-Based Multi-Talker ASR
          Takafumi Moriya, Shota Horiguchi, Marc Delcroix, Ryo Masumura, Takanori Ashihara, Hiroshi Sato, Kohei Matsuura, and Masato Mimura
          In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Apr 2025
          1. Preprint
            Dissecting the Segmentation Model of End-to-End Diarization with Vector Clustering
            Alexis Plaquet, Naohiro Tawara, Marc Delcroix, Shota Horiguchi, Atsushi Ando, Shoko Araki, and Hervé Bredin
            arXiv:2506.11605, Jun 2025

          2024

            1. SLT
              Investigation of Speaker Representation for Target-Speaker Speech Processing
              Takanori Ashihara, Takafumi Moriya, Shota Horiguchi, Junyi Peng, Tsubasa Ochiai, Marc Delcroix, Kohei Matsuura, and Hiroshi Sato
              In IEEE Spoken Language Technology Workshop (SLT), Dec 2024
            2. SLT
              Recursive Attentive Pooling for Extracting Speaker Embeddings from Multi-Speaker Recordings
              Shota Horiguchi, Atsushi Ando, Takafumi Moriya, Takanori Ashihara, Hiroshi Sato, Naohiro Tawara, and Marc Delcroix
              In IEEE Spoken Language Technology Workshop (SLT), Dec 2024
              🏆 Honorable Mention Award @IEEE SLT 2024
            3. INTERSPEECH
              Factor-Conditioned Speaking-Style Captioning
              Atsushi Ando, Takafumi Moriya, Shota Horiguchi, and Ryo Masumura
              In The Annual Conference of the International Speech Communication Association (INTERSPEECH), Sep 2024
            4. INTERSPEECH
              SpeakerBeam-SS: Real-Time Target Speaker Extraction with Lightweight Conv-TasNet and State Space Modeling
              Hiroshi Sato, Takafumi Moriya, Masato Mimura, Shota Horiguchi, Tsubasa Ochiai, Takanori Ashihara, Atsushi Ando, Kentaro Shinayama, and Marc Delcroix
              In The Annual Conference of the International Speech Communication Association (INTERSPEECH), Sep 2024
            5. ICASSP
              Streaming Active Learning for Regression Problems Using Regression via Classification
              Shota Horiguchi, Kota Dohi, and Yohei Kawaguchi
              In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Apr 2024
            1. CHiME
              NTT Multi-Speaker ASR System for the DASR Task of CHiME-8 Challenge
              Naoyuki Kamo*, Naohiro Tawara*, Atsushi Ando, Takatomo Kano, Hiroshi Sato, Rintaro Ikeshita, Takafumi Moriya, Shota Horiguchi, Kohei Matsuura, Atsunori Ogawa, Alexis Plaquet, Takanori Ashihara, Tsubasa Ochiai, Masato Mimura, Marc Delcroix, Tomohiro Nakatani, Taichi Asami, and Shoko Araki
              In The 8th International Workshop on Speech Processing in Everyday Environments (CHiME-2024), Sep 2024
              (*) Equal contribution
            1. Preprint
              Thresholding Data Shapley for Data Cleansing Using Multi-Armed Bandits
              Hiroyuki Namba, Shota Horiguchi, Masaki Hamamoto, and Masashi Egi
              arXiv:2402.08209, Feb 2024

            2023

            1. TASLP
              Online Neural Diarization of Unlimited Numbers of Speakers Using Global and Local Attractors
              Shota Horiguchi, Shinji Watanabe, Paola Garcia, Yuki Takashima, and Yohei Kawaguchi
              IEEE/ACM Transactions on Audio, Speech, and Language Processing, Jan 2023
              🏆 IEEE SPS Japan Young Author Best Paper Award
            1. APSIPA ASC
              Synthetic Data Augmentation for ASR with Domain Filtering
              Tuan Vu Ho, Shota Horiguchi, Shinji Watanabe, Paola Garcia, and Takashi Sumiyoshi
              In Asia Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), Nov 2023
            2. INTERSPEECH
              Spoofing Attacker Also Benefits from Large-Scale Self-Supervised Models
              Aoi Ito* and Shota Horiguchi*
              In The Annual Conference of the International Speech Communication Association (INTERSPEECH), Aug 2023
              (*) Equal contribution
            3. INTERSPEECH
              CAPTDURE: Captioned Sound Dataset of Single Sources
              Yuki Okamoto, Kanta Shimonishi, Keisuke Imoto, Kota Dohi, Shota Horiguchi, and Yohei Kawaguchi
              In The Annual Conference of the International Speech Communication Association (INTERSPEECH), Aug 2023
            4. SLT
              Mutual Learning of Single- and Multi-Channel End-to-End Neural Diarization
              Shota Horiguchi, Yuki Takashima, Shinji Watanabe, and Paola García
              In IEEE Spoken Language Technology Workshop (SLT), Jan 2023

                2022

                1. TASLP
                  Encoder-Decoder Based Attractors for End-to-End Neural Diarization
                  Shota Horiguchi, Yusuke Fujita, Shinji Watanabe, Yawen Xue, and Paola García
                  IEEE/ACM Transactions on Audio, Speech, and Language Processing, Mar 2022
                  🏆 IEEE SPS Young Author Best Paper Award
                  🏆 Itakura Prize Innovative Young Researcher Award
                1. INTERSPEECH
                  Updating Only Encoders Prevents Catastrophic Forgetting of End-to-End ASR Models
                  Yuki Takashima, Shota Horiguchi, Shinji Watanabe, Paola Garcia, and Yohei Kawaguchi
                  In The Annual Conference of the International Speech Communication Association (INTERSPEECH), Sep 2022
                2. ICML
                  Rethinking Fano’s Inequality in Ensemble Learning
                  Terufumi Morishita, Gaku Morio, Shota Horiguchi, Hiroaki Ozaki, and Nobuo Nukaga
                  In International Conference on Machine Learning (ICML), Jul 2022
                3. Odyssey
                  Improving the Naturalness of Simulated Conversations for End-to-End Neural Diarization
                  Natsuo Yamashita, Shota Horiguchi, and Takeshi Homma
                  In The Speaker and Language Recognition Workshop (Odyssey), Jun 2022
                4. ICASSP
                  Multi-Channel End-to-End Neural Diarization with Distributed Microphones
                  Shota Horiguchi, Yuki Takashima, Paola García, Shinji Watanabe, and Yohei Kawaguchi
                  In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), May 2022
                5. ICASSP
                  Environmental Sound Extraction Using Onomatopoeic Words
                  Yuki Okamoto, Shota Horiguchi, Masaaki Yamamoto, Keisuke Imoto, and Yohei Kawaguchi
                  In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), May 2022
                  🏆 IEEE SPS Japan Student Conference Paper Award

                    2021

                      1. ASRU
                        Towards Neural Diarization for Unlimited Numbers of Speakers Using Global and Local Attractors
                        Shota Horiguchi, Paola García, Shinji Watanabe, Yawen Xue, Yuki Takashima, and Yohei Kawaguchi
                        In IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), Dec 2021
                      2. INTERSPEECH
                        Online Streaming End-to-End Neural Diarization Handling Overlapping Speech and Flexible Numbers of Speakers
                        Yawen Xue, Shota Horiguchi, Yusuke Fujita, Yuki Takashima, Shinji Watanabe, Paola Garcia, and Kenji Nagamatsu
                        In The Annual Conference of the International Speech Communication Association (INTERSPEECH), Sep 2021
                      3. INTERSPEECH
                        Semi-Supervised Training with Pseudo-Labeling for End-to-End Neural Diarization
                        Yuki Takashima, Yusuke Fujita, Shota Horiguchi, Shinji Watanabe, Paola Garcia, and Kenji Nagamatsu
                        In The Annual Conference of the International Speech Communication Association (INTERSPEECH), Sep 2021
                      4. ICASSP
                        End-to-End Speaker Diarization as Post-Processing
                        Shota Horiguchi, Paola García, Yusuke Fujita, Shinji Watanabe, and Kenji Nagamatsu
                        In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), May 2021
                      5. SLT
                        End-to-End Speaker Diarization Conditioned on Speech Activity and Overlap Detection
                        Yuki Takashima, Yusuke Fujita, Shinji Watanabe, Shota Horiguchi, Paola Garcia, and Kenji Nagamatsu
                        In IEEE Spoken Language Technology Workshop (SLT), Jan 2021
                      6. SLT
                        Online End-to-End Neural Diarization with Speaker-Tracing Buffer
                        Yawen Xue, Shota Horiguchi, Yusuke Fujita, Shinji Watanabe, Paola Garcia, and Kenji Nagamatsu
                        In IEEE Spoken Language Technology Workshop (SLT), Jan 2021
                      7. SLT
                        Block-Online Guided Source Separation
                        Shota Horiguchi, Yusuke Fujita, and Kenji Nagamatsu
                        In IEEE Spoken Language Technology Workshop (SLT), Jan 2021
                      1. DIHARD
                        The Hitachi-JHU DIHARD III System: Competitive End-to-End Neural Diarization and X-vector Clustering Systems Combined by DOVER-Lap
                        Shota Horiguchi, Nelson Yalta, Paola Garcia, Yuki Takashima, Yawen Xue, Desh Raj, Zili Huang, Yusuke Fujita, Shinji Watanabe, and Sanjeev Khudanpur
                        In The Third DIHARD Speech Diarization Challenge (DIHARD III), Jan 2021

                        2020

                        1. TPAMI
                          Significance of Softmax-based Features in Comparison to Distance Metric Learning-based Features
                          Shota Horiguchi, Daiki Ikami, and Kiyoharu Aizawa
                          IEEE Transactions on Pattern Analysis and Machine Intelligence, May 2020
                        1. INTERSPEECH
                          Utterance-Wise Meeting Transcription System Using Asynchronous Distributed Microphones
                          Shota Horiguchi, Yusuke Fujita, and Kenji Nagamatsu
                          In The Annual Conference of the International Speech Communication Association (INTERSPEECH), Oct 2020
                        2. INTERSPEECH
                          End-to-End Speaker Diarization for an Unknown Number of Speakers with Encoder-Decoder Based Attractors
                          Shota Horiguchi, Yusuke Fujita, Shinji Watanabe, Yawen Xue, and Kenji Nagamatsu
                          In The Annual Conference of the International Speech Communication Association (INTERSPEECH), Oct 2020
                        3. ICRA
                          Anticipating the Start of User Interaction for Service Robot in the Wild
                          Koichiro Ito, Quan Kong, Shota Horiguchi, Takashi Sumiyoshi, and Kenji Nagamatsu
                          In IEEE International Conference on Robotics and Automation (ICRA), Jun 2020
                        1. SemEval
                          Hitachi at SemEval-2020 Task 8: Simple but Effective Modality Ensemble for Meme Emotion Recognition
                          Terufumi Morishita*, Gaku Morio*, Shota Horiguchi, Hiroaki Ozaki, and Toshinori Miyoshi
                          In The Forteenth Workshop on Semantic Evaluation (SemEval), Dec 2020
                          (*) Equal contribution
                        2. CHiME
                          CHiME-6 Challenge: Tackling Multispeaker Speech Recognition for Unsegmented Recordings
                          Shinji Watanabe, Michael Mandel, Jon Barker, Emmanuel Vincent, Ashish Arora, Xuankai Chang, Sanjeev Khudanpur, Vimal Manohar, Daniel Povey, Desh Raj, David Snyder, Aswin Shanmugam Subramanian, Jan Trmal, Bar Ben Yair, Christoph Boeddeker, Zhaoheng Ni, Yusuke Fujita, Shota Horiguchi, Naoyuki Kanda, Takuya Yoshioka, and Neville Ryant
                          In The 6th International Workshop on Speech Processing in Everyday Environments (CHiME-2020), May 2020
                        1. Preprint
                          Neural Speaker Diarization with Speaker-Wise Chain Rule
                          Yusuke Fujita, Shinji Watanabe, Shota Horiguchi, Yawen Xue, Jing Shi, and Nagamatsu Kenji
                          arXiv:2006.01796, Jun 2020
                        2. Preprint
                          End-to-End Neural Diarization: Reformulating Speaker Diarization as Simple Multi-Label Classification
                          Yusuke Fujita, Shinji Watanabe, Shota Horiguchi, Yawen Xue, and Nagamatsu Kenji
                          arXiv:2003.20966, Feb 2020

                        2019

                          1. ASRU
                            Simultaneous Speech Recognition and Speaker Diarization for Monaural Dialogue Recordings with Target-Speaker Acoustic Models
                            Naoyuki Kanda, Shota Horiguchi, Yusuke Fujita, Yawen Xue, Kenji Nagamatsu, and Shinji Watanabe
                            In IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), Dec 2019
                          2. ASRU
                            End-to-End Neural Speaker Diarization with Self-Attention
                            Yusuke Fujita, Naoyuki Kanda, Shota Horiguchi, Yawen Xue, Kenji Nagamatsu, and Shinji Watanabe
                            In IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), Dec 2019
                            Best Paper Finalist
                          3. INTERSPEECH
                            End-to-End Neural Speaker Diarization with Permutation-Free Objectives
                            Yusuke Fujita, Naoyuki Kanda, Shota Horiguchi, Kenji Nagamatsu, and Shinji Watanabe
                            In The Annual Conference of the International Speech Communication Association (INTERSPEECH), Sep 2019
                          4. INTERSPEECH
                            Multimodal Response Obligation Detection with Unsupervised Online Domain Adaptation
                            Shota Horiguchi, Naoyuki Kanda, and Kenji Nagamatsu
                            In The Annual Conference of the International Speech Communication Association (INTERSPEECH), Sep 2019
                          5. INTERSPEECH
                            Auxiliary Interference Speaker Loss for Target-Speaker Speech Recognition
                            Naoyuki Kanda, Shota Horiguchi, Ryoichi Takashima, Yusuke Fujita, Kenji Nagamatsu, and Shinji Watanabe
                            In The Annual Conference of the International Speech Communication Association (INTERSPEECH), Sep 2019
                          6. INTERSPEECH
                            Guided Source Separation Meets a Strong ASR Backend: Hitachi/Paderborn University Joint Investigation for Dinner Party Scenario
                            Naoyuki Kanda, Christoph Boeddeker, Jens Heitkaemper, Yusuke Fujita, Shota Horiguchi, Kenji Nagamatsu, and Reinhold Haeb-Umbach
                            In The Annual Conference of the International Speech Communication Association (INTERSPEECH), Sep 2019
                          7. ICASSP
                            Acoustic Modeling for Distant Multi-Talker Speech Recognition with Single- and Multi-Channel Branches
                            Naoyuki Kanda, Yusuke Fujita, Shota Horiguchi, Rintaro Ikeshita, Kenji Nagamatsu, and Shinji Watanabe
                            In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), May 2019
                          8. WACV
                            Omnidirectional Pedestrian Detection by Rotation Invariant Training
                            Masato Tamura, Shota Horiguchi, and Tomokazu Murakami
                            In IEEE Winter Conference on Applications of Computer Vision (WACV), Jan 2019

                              2018

                              1. TMM
                                Personalized Classifier for Food Image Recognition
                                Shota Horiguchi, Sosuke Amano, Makoto Ogawa, and Kiyoharu Aizawa
                                IEEE Transactions on Multimedia, Oct 2018
                              1. ACMMM
                                Face-Voice Matching Using Cross-Modal Embeddings
                                Shota Horiguchi, Naoyuki Kanda, and Kenji Nagamatsu
                                In ACM International Conference on Multimedia (ACMMM), Oct 2018
                              1. CHiME
                                The Hitachi/JHU CHiME-5 System: Advances in Speech Recognition for Everyday Home Environments Using Multiple Microphone Arrays
                                Naoyuki Kanda, Rintaro Ikeshita, Shota Horiguchi, Yusuke Fujita, Kenji Nagamatsu, Xiaofei Wang, Vimal Manohar, Nelson Enrique Yalta Soplin, Matthew Maciejewski, Szu-Jui Chen, Aswin Shanmugam Subramanian, Ruizhi Li, Zhiqi Wang, Jason Naradowsky, L. Paola Garcia-Perera, and Gregory Sell
                                In The 5th International Workshop on Speech Processing in Everyday Environments (CHiME-2018), Sep 2018

                                2016

                                  1. Food Search Based on User Feedback to Assist Image-Based Food Recording Systems
                                    Sosuke Amano, Shota Horiguchi, Kiyoharu Aizawa, Kazuki Maeda, Masanori Kubota, and Makoto Ogawa
                                    In International Workshop On Multimedia Assisted Dietary Management (MADiMa), Oct 2016
                                  2. ICIP
                                    The Log-Normal Distribution of the Size of Objects in Daily Meal Images and Its Application to the Efficient Reduction of Object Proposals
                                    Shota Horiguchi, Kiyoharu Aizawa, and Makoto Ogawa
                                    In IEEE International Conference on Image Processing (ICIP), Sep 2016