참고
[논문분석] E2E Segmenter: Joint Segmenting and Decoding for Long-Form ASR
VAD(Voice Activity Detector)와 Streaming End-to-end (E2E) models for ASR을 통합해 성능 향상
[논문분석] Efficient Streaming LLM for Speech Recognition
ASR에 Decoder를 LLM으로 사용, LoRA finetuning
[논문분석] Real Time Speech Enhancement in the Waveform Domain
CPU에서도 실시간으로 동작하는 Sound Enhancement model
[논문분석] Fast Context-Biasing for CTC and Transducer ASR models with CTC-based Word Spotter
neural contextual adapter를 활용한 context-biasing