arXiv Sound Profile
arXiv Sound

@ArxivSound

4,830
Followers
32
Following
0
Media
11,280
Statuses

Sound-related articles ( and ) on

Joined July 2020
Don't wanna be here? Send us removal request.
Pinned Tweet
@ArxivSound
arXiv Sound
2 years
[IMPORTANT] arXiv sound does not post some papers submitted to arXiv or . This is because they do not appear in the RSS of arXiv. We apologize for your inconvenience.
1
0
5
@ArxivSound
arXiv Sound
3 months
``Analyzing Musical Characteristics of National Anthems in Relation to Global Indices,'' S M Rakib Hasan, Aakar Dhakal, Ms. Ayesha Siddiqua, Mohammad Mominur Rahman, Md Maidul Islam, Mohammed Arfat Raihan Chowdhury, S M Masfequier Rahman Swapno, SM Nuruz…
1
224
2K
@ArxivSound
arXiv Sound
4 years
``WaveGrad: Estimating Gradients for Waveform Generation. (arXiv:2009.00713v1 []),'' Nanxin Chen, Yu Zhang, Heiga Zen, Ron J. Weiss, Mohammad Norouzi, William Chan,
0
22
90
@ArxivSound
arXiv Sound
3 years
``RAVE: A variational autoencoder for fast and high-quality neural audio synthesis. (arXiv:2111.05011v1 [cs.LG]),'' Antoine Caillon, Philippe Esling,
1
3
52
@ArxivSound
arXiv Sound
11 months
``A Review of Differentiable Digital Signal Processing for Music & Speech Synthesis. (arXiv:2308.15422v1 []),'' Ben Hayes, Jordie Shier, György Fazekas, Andrew McPherson, Charalampos Saitis,
1
11
50
@ArxivSound
arXiv Sound
5 months
``OWSM-CTC: An Open Encoder-Only Speech Foundation Model for Speech Recognition, Translation, and Language Identification,'' Yifan Peng, Yui Sudo, Muhammad Shakeel, Shinji Watanabe,
0
6
50
@ArxivSound
arXiv Sound
2 months
``Mamba in Speech: Towards an Alternative to Self-Attention,'' Xiangyu Zhang, Qiquan Zhang, Hexin Liu, Tianyi Xiao, Xinyuan Qian, Beena Ahmed, Eliathamby Ambikairajah, Haizhou Li, Julien Epps,
5
6
48
@ArxivSound
arXiv Sound
1 month
``LibriTTS-P: A Corpus with Speaking Style and Speaker Identity Prompts for Text-to-Speech and Style Captioning,'' Masaya Kawamura, Ryuichi Yamamoto, Yuma Shirahata, Takuya Hasumi, Kentaro Tachibana,
1
9
47
@ArxivSound
arXiv Sound
1 month
``XTTS: a Massively Multilingual Zero-Shot Text-to-Speech Model,'' Edresson Casanova, Kelly Davis, Eren G\"olge, G\"orkem G\"oknar, Iulian Gulea, Logan Hart, Aya Aljafari, Joshua Meyer, Reuben Morais, Samuel Olayemi, Julian Weber,
0
10
47
@ArxivSound
arXiv Sound
1 year
``Moisesdb: A dataset for source separation beyond 4-stems. (arXiv:2307.15913v1 []),'' Igor Pereira, Felipe Araújo, Filip Korzeniowski, Richard Vogl,
1
23
45
@ArxivSound
arXiv Sound
9 months
``MelHuBERT: A simplified HuBERT on Mel spectrograms. (arXiv:2211.09944v2 [] UPDATED),'' Tzu-Quan Lin, Hung-yi Lee, Hao Tang,
0
4
45
@ArxivSound
arXiv Sound
2 years
``Style Transfer of Audio Effects with Differentiable Signal Processing. (arXiv:2207.08759v1 []),'' Christian J. Steinmetz, Nicholas J. Bryan, Joshua D. Reiss,
1
9
44
@ArxivSound
arXiv Sound
4 months
``Extending Multilingual Speech Synthesis to 100+ Languages without Transcribed Data,'' Takaaki Saeki, Gary Wang, Nobuyuki Morioka, Isaac Elias, Kyle Kastner, Andrew Rosenberg, Bhuvana Ramabhadran, Heiga Zen, Fran\c{c}oise Beaufays, Hadar Shemtov,
0
11
42
@ArxivSound
arXiv Sound
8 months
``WavMark: Watermarking for Audio Generation. (arXiv:2308.12770v2 [] UPDATED),'' Guangyu Chen, Yu Wu, Shujie Liu, Tao Liu, Xiaoyong Du, Furu Wei,
0
6
38
@ArxivSound
arXiv Sound
1 month
``VALL-E 2: Neural Codec Language Models are Human Parity Zero-Shot Text to Speech Synthesizers,'' Sanyuan Chen, Shujie Liu, Long Zhou, Yanqing Liu, Xu Tan, Jinyu Li, Sheng Zhao, Yao Qian, Furu Wei,
0
6
40
@ArxivSound
arXiv Sound
2 years
``NNSVS: A Neural Network-Based Singing Voice Synthesis Toolkit. (arXiv:2210.15987v1 []),'' Ryuichi Yamamoto, Reo Yoneyama, Tomoki Toda,
0
13
39
@ArxivSound
arXiv Sound
2 years
``Differentiable WORLD Synthesizer-based Neural Vocoder With Application To End-To-End Audio Style Transfer. (arXiv:2208.07282v1 []),'' Shahan Nercessian,
0
11
39
@ArxivSound
arXiv Sound
3 months
``WavLLM: Towards Robust and Adaptive Speech Large Language Model,'' Shujie Hu, Long Zhou, Shujie Liu, Sanyuan Chen, Hongkun Hao, Jing Pan, Xunying Liu, Jinyu Li, Sunit Sivasankaran, Linquan Liu, Furu Wei,
0
6
36
@ArxivSound
arXiv Sound
8 months
``Music ControlNet: Multiple Time-varying Controls for Music Generation. (arXiv:2311.07069v1 []),'' Shih-Lun Wu, Chris Donahue, Shinji Watanabe, Nicholas J. Bryan,
2
3
34
@ArxivSound
arXiv Sound
1 month
``The Interspeech 2024 Challenge on Speech Processing Using Discrete Units,'' Xuankai Chang, Jiatong Shi, Jinchuan Tian, Yuning Wu, Yuxun Tang, Yihan Wu, Shinji Watanabe, Yossi Adi, Xie Chen, Qin Jin,
0
7
33
@ArxivSound
arXiv Sound
2 years
``Diffsound: Discrete Diffusion Model for Text-to-sound Generation. (arXiv:2207.09983v1 []),'' Dongchao Yang, Jianwei Yu, Helin Wang, Wen Wang, Chao Weng, Yuexian Zou, Dong Yu,
0
8
34
@ArxivSound
arXiv Sound
6 months
``Masked Audio Generation using a Single Non-Autoregressive Transformer. (arXiv:2401.04577v1 []),'' Alon Ziv, Itai Gat, Gael Le Lan, Tal Remez, Felix Kreuk, Alexandre Défossez, Jade Copet, Gabriel Synnaeve, Yossi Adi,
0
6
33
@ArxivSound
arXiv Sound
4 years
``VoiceGrad: Non-Parallel Any-to-Many Voice Conversion with Annealed Langevin Dynamics. (arXiv:2010.02977v1 []),'' Hirokazu Kameoka, Takuhiro Kaneko, Kou Tanaka, Nobukatsu Hojo,
1
10
32
@ArxivSound
arXiv Sound
10 months
``Decoder-only Architecture for Speech Recognition with CTC Prompts and Text Data Augmentation. (arXiv:2309.08876v1 []),'' Emiru Tsunoo, Hayato Futami, Yosuke Kashiwagi, Siddhant Arora, Shinji Watanabe,
0
5
32
@ArxivSound
arXiv Sound
11 months
``SpeechTokenizer: Unified Speech Tokenizer for Speech Large Language Models. (arXiv:2308.16692v1 []),'' Xin Zhang, Dong Zhang, Shimin Li, Yaqian Zhou, Xipeng Qiu,
0
3
31
@ArxivSound
arXiv Sound
4 months
``An Empirical Study of Speech Language Models for Prompt-Conditioned Speech Synthesis,'' Yifan Peng, Ilia Kulikov, Yilin Yang, Sravya Popuri, Hui Lu, Changhan Wang, Hongyu Gong,
0
6
32
@ArxivSound
arXiv Sound
3 years
``MP3net: coherent, minute-long music generation from raw audio with a simple convolutional GAN. (arXiv:2101.04785v1 []),'' Korneel van den Broek,
0
4
31
@ArxivSound
arXiv Sound
18 days
``Exploring the Capability of Mamba in Speech Applications,'' Koichi Miyazaki, Yoshiki Masuyama, Masato Murata,
0
7
31
@ArxivSound
arXiv Sound
3 years
``Audio representations for deep learning in sound synthesis: A review. (arXiv:2201.02490v1 []),'' Anastasia Natsiou, Sean O'Leary,
0
9
31
@ArxivSound
arXiv Sound
2 months
``Benchmarking Representations for Speech, Music, and Acoustic Events,'' Moreno La Quatra, Alkis Koudounas, Lorenzo Vaiani, Elena Baralis, Luca Cagliero, Paolo Garza, Sabato Marco Siniscalchi,
0
5
31
@ArxivSound
arXiv Sound
10 months
``Fast-HuBERT: An Efficient Training Framework for Self-Supervised Speech Representation Learning. (arXiv:2309.13860v2 [] UPDATED),'' Guanrou Yang, Ziyang Ma, Zhisheng Zheng, Yakun Song, Zhikang Niu, Xie Chen,
0
2
30
@ArxivSound
arXiv Sound
6 months
``StreamVC: Real-Time Low-Latency Voice Conversion. (arXiv:2401.03078v1 []),'' Yang Yang, Yury Kartynnik, Yunpeng Li, Jiuqiang Tang, Xing Li, George Sung, Matthias Grundmann,
0
9
30
@ArxivSound
arXiv Sound
2 years
``Multi-instrument Music Synthesis with Spectrogram Diffusion. (arXiv:2206.05408v2 [] UPDATED),'' Curtis Hawthorne, Ian Simon, Adam Roberts, Neil Zeghidour, Josh Gardner, Ethan Manilow, Jesse Engel,
0
3
30
@ArxivSound
arXiv Sound
8 months
``CREPE Notes: A new method for segmenting pitch contours into discrete notes. (arXiv:2311.08884v1 []),'' Xavier Riley, Simon Dixon,
0
5
30
@ArxivSound
arXiv Sound
2 months
``WavCraft: Audio Editing and Generation with Large Language Models,'' Jinhua Liang, Huan Zhang, Haohe Liu, Yin Cao, Qiuqiang Kong, Xubo Liu, Wenwu Wang, Mark D. Plumbley, Huy Phan, Emmanouil Benetos,
0
6
30
@ArxivSound
arXiv Sound
25 days
``How Should We Extract Discrete Audio Tokens from Self-Supervised Models?,'' Pooneh Mousavi, Jarod Duret, Salah Zaiem, Luca Della Libera, Artem Ploujnikov, Cem Subakan, Mirco Ravanelli,
0
4
30
@ArxivSound
arXiv Sound
11 months
``General Purpose Audio Effect Removal. (arXiv:2308.16177v1 []),'' Matthew Rice, Christian J. Steinmetz, George Fazekas, Joshua D. Reiss,
0
4
29
@ArxivSound
arXiv Sound
2 years
``Lightweight and High-Fidelity End-to-End Text-to-Speech with Multi-Band Generation and Inverse Short-Time Fourier Transform. (arXiv:2210.15975v1 []),'' Masaya Kawamura, Yuma Shirahata, Ryuichi Yamamoto, Kentaro Tachibana,
0
13
29
@ArxivSound
arXiv Sound
1 year
``InstructTTS: Modelling Expressive TTS in Discrete Latent Space with Natural Language Style Prompt. (arXiv:2301.13662v1 []),'' Dongchao Yang, Songxiang Liu, Rongjie Huang, Guangzhi Lei, Chao Weng, Helen Meng, Dong Yu,
0
9
29
@ArxivSound
arXiv Sound
2 years
``Period VITS: Variational Inference with Explicit Pitch Modeling for End-to-end Emotional Speech Synthesis. (arXiv:2210.15964v1 []),'' Yuma Shirahata, Ryuichi Yamamoto, Eunwoo Song, Ryo Terashima, Jae-Min Kim, Kentaro Tachibana,
0
9
29
@ArxivSound
arXiv Sound
5 months
``Amphion: An Open-Source Audio, Music and Speech Generation Toolkit,'' Xueyao Zhang, Liumeng Xue, Yicheng Gu, Yuancheng Wang, Haorui He, Chaoren Wang, Xi Chen, Zihao Fang, Haopeng Chen, Junan Zhang, Tze Ying Tang, Lexiao Zou, Mingxuan Wang, Jun Han, Kai…
0
6
29
@ArxivSound
arXiv Sound
4 months
``The NeurIPS 2023 Machine Learning for Audio Workshop: Affective Audio Benchmarks and Novel Data,'' Alice Baird, Rachel Manzelli, Panagiotis Tzirakis, Chris Gagne, Haoqi Li, Sadie Allen, Sander Dieleman, Brian Kulis, Shrikanth S. Narayanan, Alan Cowen,
0
4
27
@ArxivSound
arXiv Sound
2 years
``Speech Enhancement and Dereverberation with Diffusion-based Generative Models. (arXiv:2208.05830v1 []),'' Julius Richter, Simon Welker, Jean-Marie Lemercier, Bunlong Lay, Timo Gerkmann,
0
4
28
@ArxivSound
arXiv Sound
10 months
``HiFi-WaveGAN: Generative Adversarial Network with Auxiliary Spectrogram-Phase Loss for High-Fidelity Singing Voice Generation. (arXiv:2210.12740v3 [] UPDATED),'' Chunhui Wang, Chang Zeng, Jun Chen, Xing He,
0
7
29
@ArxivSound
arXiv Sound
12 days
``Less is More: Accurate Speech Recognition & Translation without Web-Scale Data,'' Krishna C. Puvvada, Piotr \.Zelasko, He Huang, Oleksii Hrinchuk, Nithin Rao Koluguri, Kunal Dhawan, Somshubra Majumdar, Elena Rastorgueva, Zhehuai Chen, Vitaly Lavrukhin,…
0
2
28
@ArxivSound
arXiv Sound
10 months
``DDSP-based Neural Waveform Synthesis of Polyphonic Guitar Performance from String-wise MIDI Input. (arXiv:2309.07658v1 []),'' Nicolas Jonason, Xin Wang, Erica Cooper, Lauri Juvela, Bob L. T. Sturm, Junichi Yamagishi,
0
7
29
@ArxivSound
arXiv Sound
1 year
@ArxivSound is back!
0
1
29
@ArxivSound
arXiv Sound
3 years
``SpecAugment++: A Hidden Space Data Augmentation Method for Acoustic Scene Classification. (arXiv:2103.16858v1 []),'' Helin Wang, Yuexian Zou, Wenwu Wang,
0
9
29
@ArxivSound
arXiv Sound
1 year
@ArxivSound is back (again)!
0
6
29
@ArxivSound
arXiv Sound
1 year
``Whisper-AT: Noise-Robust Automatic Speech Recognizers are Also Strong General Audio Event Taggers. (arXiv:2307.03183v1 []),'' Yuan Gong, Sameer Khurana, Leonid Karlinsky, James Glass,
0
2
29
@ArxivSound
arXiv Sound
26 days
``MMM: Multi-Layer Multi-Residual Multi-Stream Discrete Speech Representation from Self-supervised Learning Model,'' Jiatong Shi, Xutai Ma, Hirofumi Inaguma, Anna Sun, Shinji Watanabe,
0
5
28
@ArxivSound
arXiv Sound
2 years
``Audio Self-supervised Learning: A Survey. (arXiv:2203.01205v1 []),'' Shuo Liu, Adria Mallol-Ragolta, Emilia Parada-Cabeleiro, Kun Qian, Xin Jing, Alexander Kathan, Bin Hu, Bjoern W. Schuller,
0
4
27
@ArxivSound
arXiv Sound
3 years
``Continual-wav2vec2: an Application of Continual Learning for Self-Supervised Automatic Speech Recognition. (arXiv:2107.13530v1 []),'' Samuel Kessler, Bethan Thomas, Salah Karout,
0
4
28
@ArxivSound
arXiv Sound
1 month
``Instruct-MusicGen: Unlocking Text-to-Music Editing for Music Language Models via Instruction Tuning,'' Yixiao Zhang, Yukara Ikemiya, Woosung Choi, Naoki Murata, Marco A. Mart\'inez-Ram\'irez, Liwei Lin, Gus Xia, Wei-Hsiang Liao, Yuki Mitsufuji, Simon D…
0
8
28
@ArxivSound
arXiv Sound
5 months
``An Embarrassingly Simple Approach for LLM with Strong ASR Capacity,'' Ziyang Ma, Guanrou Yang, Yifan Yang, Zhifu Gao, Jiaming Wang, Zhihao Du, Fan Yu, Qian Chen, Siqi Zheng, Shiliang Zhang, Xie Chen,
1
4
27
@ArxivSound
arXiv Sound
2 years
``End-to-end LPCNet: A Neural Vocoder With Fully-Differentiable LPC Estimation. (arXiv:2202.11301v1 []),'' Krishna Subramani, Jean-Marc Valin, Umut Isik, Paris Smaragdis, Arvindh Krishnaswamy,
1
8
27
@ArxivSound
arXiv Sound
1 month
``Seed-TTS: A Family of High-Quality Versatile Speech Generation Models,'' Philip Anastassiou, Jiawei Chen, Jitong Chen, Yuanzhe Chen, Zhuo Chen, Ziyi Chen, Jian Cong, Lelai Deng, Chuang Ding, Lu Gao, Mingqing Gong, Peisong Huang, Qingqing Huang, Zhiying…
0
1
26
@ArxivSound
arXiv Sound
1 month
``4D ASR: Joint Beam Search Integrating CTC, Attention, Transducer, and Mask Predict Decoders,'' Yui Sudo, Muhammad Shakeel, Yosuke Fukumoto, Brian Yan, Jiatong Shi, Yifan Peng, Shinji Watanabe,
0
2
26
@ArxivSound
arXiv Sound
4 months
``NaturalSpeech 3: Zero-Shot Speech Synthesis with Factorized Codec and Diffusion Models,'' Zeqian Ju, Yuancheng Wang, Kai Shen, Xu Tan, Detai Xin, Dongchao Yang, Yanqing Liu, Yichong Leng, Kaitao Song, Siliang Tang, Zhizheng Wu, Tao Qin, Xiang-Yang Li, …
1
9
25
@ArxivSound
arXiv Sound
5 months
``Diffusion Models for Audio Restoration,'' Jean-Marie Lemercier, Julius Richter, Simon Welker, Eloi Moliner, Vesa V\"alim\"aki, Timo Gerkmann,
0
8
26
@ArxivSound
arXiv Sound
3 years
``Neural HMMs are all you need (for high-quality attention-free TTS). (arXiv:2108.13320v3 [] UPDATED),'' Shivam Mehta, Éva Székely, Jonas Beskow, Gustav Eje Henter,
0
6
27
@ArxivSound
arXiv Sound
2 years
``Neural Vocoder is All You Need for Speech Super-resolution. (arXiv:2203.14941v1 []),'' Haohe Liu, Woosung Choi, Xubo Liu, Qiuqiang Kong, Qiao Tian, DeLiang Wang,
0
4
27
@ArxivSound
arXiv Sound
8 months
``Diff-HierVC: Diffusion-based Hierarchical Voice Conversion with Robust Pitch Generation and Masked Prior for Zero-shot Speaker Adaptation. (arXiv:2311.04693v1 []),'' Ha-Yeong Choi, Sang-Hoon Lee, Seong-Whan Lee,
0
5
26
@ArxivSound
arXiv Sound
3 years
``Music Demixing Challenge at ISMIR 2021. (arXiv:2108.13559v1 []),'' Yuki Mitsufuji, Giorgio Fabbro, Stefan Uhlich, Fabian-Robert Stöter,
0
5
25
@ArxivSound
arXiv Sound
3 years
``Tiny Transformers for Environmental Sound Classification at the Edge. (arXiv:2103.12157v1 []),'' David Elliott, Carlos E. Otero, Steven Wyatt, Evan Martino,
0
2
26
@ArxivSound
arXiv Sound
5 months
``BASE TTS: Lessons from building a billion-parameter Text-to-Speech model on 100K hours of data,'' Mateusz {\L}ajszczak Guillermo C\'ambara Yang Li Fatih Beyhan Arent van Korlaar Fan Yang Arnaud J…
0
7
26
@ArxivSound
arXiv Sound
5 months
``OWSM v3.1: Better and Faster Open Whisper-Style Speech Models based on E-Branchformer,'' Yifan Peng Jinchuan Tian William Chen Siddhant Arora Brian Yan Yui Sudo Muhammad Shakeel Kwanghee …
0
6
26
@ArxivSound
arXiv Sound
1 month
``Should you use a probabilistic duration model in TTS? Probably! Especially for spontaneous speech,'' Shivam Mehta, Harm Lameris, Rajiv Punmiya, Jonas Beskow, \'Eva Sz\'ekely, Gustav Eje Henter,
0
8
25
@ArxivSound
arXiv Sound
2 years
``u-HuBERT: Unified Mixed-Modal Speech Pretraining And Zero-Shot Transfer to Unlabeled Modality. (arXiv:2207.07036v2 [] UPDATED),'' Wei-Ning Hsu, Bowen Shi,
0
4
24
@ArxivSound
arXiv Sound
3 years
``Multimodal Self-Supervised Learning of General Audio Representations. (arXiv:2104.12807v2 [] UPDATED),'' Luyu Wang, Pauline Luc, Adria Recasens, Jean-Baptiste Alayrac, Aaron van den Oord,
0
3
25
@ArxivSound
arXiv Sound
3 months
``Tango 2: Aligning Diffusion-based Text-to-Audio Generations through Direct Preference Optimization,'' Navonil Majumder, Chia-Yu Hung, Deepanway Ghosal, Wei-Ning Hsu, Rada Mihalcea, Soujanya Poria,
0
1
25
@ArxivSound
arXiv Sound
6 months
``Accent-VITS:accent transfer for end-to-end TTS. (arXiv:2312.16850v2 [] UPDATED),'' Linhan Ma, Yongmao Zhang, Xinfa Zhu, Yi Lei, Ziqian Ning, Pengcheng Zhu, Lei Xie,
0
3
24
@ArxivSound
arXiv Sound
1 month
``LiveSpeech: Low-Latency Zero-shot Text-to-Speech via Autoregressive Modeling of Audio Discrete Codes,'' Trung Dang, David Aponte, Dung Tran, Kazuhito Koishida,
0
2
25
@ArxivSound
arXiv Sound
3 years
``Parallel Tacotron 2: A Non-Autoregressive Neural TTS Model with Differentiable Duration Modeling. (arXiv:2103.14574v1 []),'' Isaac Elias, Heiga Zen, Jonathan Shen, Yu Zhang, Jia Ye, RJ Ryan, Yonghui Wu,
0
7
25
@ArxivSound
arXiv Sound
9 months
``One model to rule them all ? Towards End-to-End Joint Speaker Diarization and Speech Recognition. (arXiv:2310.01688v1 []),'' Samuele Cornell, Jee-weon Jung, Shinji Watanabe, Stefano Squartini,
0
4
25
@ArxivSound
arXiv Sound
25 days
``GAMA: A Large Audio-Language Model with Advanced Audio Understanding and Complex Reasoning Abilities,'' Sreyan Ghosh, Sonal Kumar, Ashish Seth, Chandra Kiran Reddy Evuru, Utkarsh Tyagi, S Sakshi, Oriol Nieto, Ramani Duraiswami, Dinesh Manocha,
0
6
24
@ArxivSound
arXiv Sound
4 years
``Multiple F0 Estimation in Vocal Ensembles using Convolutional Neural Networks. (arXiv:2009.04172v1 []),'' Helena Cuesta, Brian McFee, Emilia Gómez,
0
5
23
@ArxivSound
arXiv Sound
4 months
``MSLM-S2ST: A Multitask Speech Language Model for Textless Speech-to-Speech Translation with Speaker Style Preservation,'' Yifan Peng, Ilia Kulikov, Yilin Yang, Sravya Popuri, Hui Lu, Changhan Wang, Hongyu Gong,
0
4
24
@ArxivSound
arXiv Sound
7 months
``StyleSinger: Style Transfer for Out-Of-Domain Singing Voice Synthesis. (arXiv:2312.10741v1 []),'' Yu Zhang, Rongjie Huang, Ruiqi Li, JinZheng He, Yan Xia, Feiyang Chen, Xinyu Duan, Baoxing Huai, Zhou Zhao,
0
4
24
@ArxivSound
arXiv Sound
1 year
``Foley Sound Synthesis at the DCASE 2023 Challenge. (arXiv:2304.12521v1 []),'' Keunwoo Choi, Jaekwon Im, Laurie Heller, Brian McFee, Keisuke Imoto, Yuki Okamoto, Mathieu Lagrange, Shinosuke Takamichi,
0
8
24
@ArxivSound
arXiv Sound
2 years
``BigVGAN: A Universal Neural Vocoder with Large-Scale Training. (arXiv:2206.04658v1 []),'' Sang-gil Lee, Wei Ping, Boris Ginsburg, Bryan Catanzaro, Sungroh Yoon,
0
5
24
@ArxivSound
arXiv Sound
2 months
``AudioLDM 2: Learning Holistic Audio Generation with Self-supervised Pretraining,'' Haohe Liu, Yi Yuan, Xubo Liu, Xinhao Mei, Qiuqiang Kong, Qiao Tian, Yuping Wang, Wenwu Wang, Yuxuan Wang, Mark D. Plumbley,
0
2
24
@ArxivSound
arXiv Sound
3 years
``BYOL for Audio: Self-Supervised Learning for General-Purpose Audio Representation. (arXiv:2103.06695v1 []),'' Daisuke Niizumi, Daiki Takeuchi, Yasunori Ohishi, Noboru Harada, Kunio Kashino,
0
6
24
@ArxivSound
arXiv Sound
2 years
``Self-supervised Graphs for Audio Representation Learning with Limited Labeled Data. (arXiv:2202.00097v1 [cs.LG]),'' Amir Shirian, Krishna Somandepalli, Tanaya Guha,
0
3
24