Wen-Chin Huang @unilightwf Twitter profile

Pinned Tweet

Wen-Chin Huang

2 months

🚨We investigate a new problem in our new preprint: training multi-speaker TTS with speaker anonymized data! The goal is to protect privacy in the era of giant speech generation models. 📰Paper: 🎵Samples: 🧵A thread:

2

11

46

Last Seen Profiles

@Lindsay_13

@AXXXMVII

@zionwrld

@SupRm_Nova

@visualkei_art

@Quriash911

@Myra86379693895

@AShuqrat

@Iukman2

@stwmaniax

@marmara1883

@DavidElle9

@joejonas

@SilverWaveMedia

@vivaciouss01

@CarlosA88119991

@DavidBeck4

@BinorRaja

@bapuchi1986

@StalkingVi96051

@jiwooxng

@CumhurErdeniz1

@sngminaziee

@crofts

@Empathiie169271

@RynWeaver

@seatawin

@cpane94

@FestaUnitaBO

@CheckpointSteve

@pengen_stw

@NezoNezooo

@kaisen0217

@Softtechas

@Edgyaludon

Wen-Chin Huang

@unilightwf

2 years

【職探し】名古屋大学情報学の博士後期学生のホワン（Huang）です。主に音声変換・合成の研究をしていますが、音声処理・深層学習全盤も詳しいです。Meta（旧名Facebook）で2回インターンしたことがあります。英中日3言語喋れます。2024春から日本の研究職を探しています。よろしくお願いします！

2

52

224

Wen-Chin Huang

@unilightwf

3 months

Speech researchers... you need to read this masterpiece. It's phenomenal. A Large-Scale Evaluation of Speech Foundation Models All kudos to the mighty Leo @leo19941227 , father of the great S3PRL toolkit!

A Large-Scale Evaluation of Speech Foundation Models

The foundation model paradigm leverages a shared foundation model to achieve state-of-the-art (SOTA) performance for various tasks, requiring minimal downstream-specific modeling and data...

arxiv.org

3

36

159

Wen-Chin Huang

@unilightwf

3 months

名古屋大学情報学研究科の助教を着任しました。音声合成・変換・評価あたりに取り組んでいこうとお思います。たくさんの研究コラボ、主催チャレンジ参加、お待ちしております。皆様よろしくお願いします。

5

15

157

Wen-Chin Huang

@unilightwf

4 months

本日､名古屋大学の博士（情報学）学位を取得しました！そして大変光栄なことで､情報学研究科の学位記受領総代として､アカデミックガウンを着用して､登壇することになりました！五年の留学生活、お世話になった方々､ありがとうございました🙇‍♂️

5

3

105

Wen-Chin Huang

@unilightwf

9 months

Today I reached 1500 citations and 20 h-index! Kudos to all the collaborators I have worked with 🙌🙌 卒業前に行けたらいいなと思ったんですけどこんなに早くくるとは！ありがたいです😀

1

68

Wen-Chin Huang

@unilightwf

1 year

本日から一年間Google Japan��Student Researcherとして研究します。抱負は、本社へお寿司を食べに行けることです。よろしくお願いします。

0

2

65

Wen-Chin Huang

@unilightwf

3 months

学生最終日、引用数はこんくらいか。確か修士修了の時は183とかだった､､､あと少しで10倍行けそうだったのにな

0

1

52

Wen-Chin Huang

@unilightwf

5 months

感無量

2

3

49

Wen-Chin Huang

@unilightwf

1 year

Speech synthesis researchers, let's all read this work. "It is therefore of utmost importance for speech synthesis researchers to be mindful of their choices of comparison systems and how they may affect MOS results." Stop using Tacotron as your baseline.

Investigating Range-Equalizing Bias in Mean Opinion Score Ratings...

Mean Opinion Score (MOS) is a popular measure for evaluating synthesized speech. However, the scores obtained in MOS tests are heavily dependent upon many contextual factors. One such factor is...

arxiv.org

1

12

47

Wen-Chin Huang

@unilightwf

1 year

🚨🚨The singing voice conversion challenge 2023 summary paper is out! Tl; dr: ✅Human level naturalness achieved by top teams! ❌Conversion similarity: still a long way to go! Kudos to the team 🙌🙌 @lesterphv @jiatongshi @shaunliu231

The Singing Voice Conversion Challenge 2023

We present the latest iteration of the voice conversion challenge (VCC) series, a bi-annual scientific event aiming to compare and understand different voice conversion (VC) systems based on a...

arxiv.org

0

21

47

Wen-Chin Huang

@unilightwf

1 year

Very sad to find this new paper () on expressive S2ST did not cite my internship work on the same direction… Anyway, I’m presenting this work at ICASSP next week. Come if you are interested!

StyleS2ST: Zero-shot Style Transfer for Direct Speech-to-speech Translation

Direct speech-to-speech translation (S2ST) has gradually become popular as it has many advantages compared with cascade S2ST. However, current research mainly focuses on the accuracy of semantic...

arxiv.org

2

5

44

Wen-Chin Huang

@unilightwf

4 months

It’s 2024. Speech resynthesis and VITS are not SOTA in VC and TTS…

VoxGenesis: Unsupervised Discovery of Latent Speaker Manifold for...

Achieving nuanced and accurate emulation of human voice has been a longstanding goal in artificial intelligence. Although significant progress has been made in recent years, the mainstream of...

arxiv.org

2

6

40

Wen-Chin Huang

@unilightwf

2 years

My name is Wen-Chin Huang, a Ph.D. student at Nagoya Univ. I work on speech processing and deep learning. Interned twice at Meta. I can speak Man/Eng/Japanese. I am in the 2024 job market, looking for research positions in Japan. Please reach out to me if you are interested!

1

16

38

Wen-Chin Huang

@unilightwf

10 months

初めての日本国内学会発表、無事に終わりました！日本語ちゃんと伝わって良かった😅議論してくださった皆様ありがとうございました😊😊

0

4

38

Wen-Chin Huang

@unilightwf

10 months

自分の採用された申請書も右の感じでした。大体そうじゃないと審査者が読む気湧かない気がします。

nagaya

@waku_hase

10 months

学振（DC）　C評価から文字数を削減して、要点のみ伝えるよう努めました。C評価だったとき、面接免除で採用されたときの申請書です。

1

62

399

0

8

38

Wen-Chin Huang

@unilightwf

1 year

歌声変換チャレンジ2023の結果をまとめた論文プリプリントを発表しました！ 70万円もかかった史上初の大規模歌声変換主観評価実験によって、総計26システムの変換歌声が評価されました。結論から言うと ✅肉声に近い自然性が達成された ❌目標歌手への変換精度がまだまだ低い興味のある方ぜひ！

Wen-Chin Huang

@unilightwf

1 year

🚨🚨The singing voice conversion challenge 2023 summary paper is out! Tl; dr: ✅Human level naturalness achieved by top teams! ❌Conversion similarity: still a long way to go! Kudos to the team 🙌🙌 @lesterphv @jiatongshi @shaunliu231

0

21

47

0

11

36

Wen-Chin Huang

@unilightwf

7 months

Today, we presented the summary papers for the Singing Voice Conversion Challenge 2023 and the VoiceMOS Challenge 2023 at ASRU2023! Fully immersed in the great discussions 🗣️ Spoiler: Pretty certain that there will be future editions for these two challenges. Stay tuned ⚡️⚡️

1

7

35

Wen-Chin Huang

@unilightwf

10 months

Reminds me of the shock when I saw @shinjiw_at_cmu answering each issue and reviews PRs in ESPNet by himself 🫡

Jason Wei

@_jasonwei

10 months

It seems to be not a coincidence that some of the strongest leaders in AI who manage large teams frequently do very low-level technical work. Jeff Dean doing weekly IC (individual contributor) work while managing 3k+ people at Google Research is the canonical example, but I've

27

151

1K

2

1

33

Wen-Chin Huang

@unilightwf

7 months

Prof. Lin-Shan Lee remembers all his students… amazing…

0

1

32

Wen-Chin Huang

@unilightwf

1 year

Thank you for coming to my posters at ICASSP2023! Got reminded of how fun talking research is (maybe more than doing research!) Not going to Dublin, so see you in ASRU, which will be in Taiwan 🇹🇼🇹🇼🇹🇼

0

32

Wen-Chin Huang

@unilightwf

4 months

同期Dの会、めちゃくちゃ楽しかったなぁこんなに強い＆面白い同期がいるなんて知らなかったこれから皆さんの活躍をお楽しみしてます🫡 そして企画の升山プロに感謝🫡

2

1

32

Wen-Chin Huang

@unilightwf

3 months

🚨The Amazing @erica_cooper and I wrote an invited review paper on synthetic speech evaluation! Seriously ALL KUDOS to her 🙌🙌🙌 I cannot respect her more for doing so much survey on this topic 🔥 It is published on AST, a Japanese journal. Take a look!

A review on subjective and objective evaluation of synthetic speech

Evaluating synthetic speech generated by machines is a complicated process, as it involves judging along multiple dimensions including naturalness, in …

www.jstage.jst.go.jp

1

11

31

Wen-Chin Huang

@unilightwf

1 year

Finally here comes a thousand!

1

0

31

Wen-Chin Huang

@unilightwf

1 year

"StyleTTS 2 advances the state-of-the-art by achieving a statistically significant CMOS of +1.07 (p ≪ 0.01) compared to NaturalSpeech." I don't quite get it... If Naturalspeech has human-level naturalness, what is "more natural than human"??

StyleTTS 2: Towards Human-Level Text-to-Speech through Style...

In this paper, we present StyleTTS 2, a text-to-speech (TTS) model that leverages style diffusion and adversarial training with large speech language models (SLMs) to achieve human-level TTS...

arxiv.org

3

31

Wen-Chin Huang

@unilightwf

10 days

What… don’t use web-scale data? NVIDIA, the GPU manufacturer, tells us to cut GPU usage…? “Oh no! OWSM required 64 A100 40GB GPUs to train! High resource requirement! We proposed to just train on… 128 A100 80GB GPUs 😀😀😀😀” Sorry, my bad, I misunderstood you.

arXiv Sound

@ArxivSound

12 days

``Less is More: Accurate Speech Recognition & Translation without Web-Scale Data,'' Krishna C. Puvvada, Piotr \.Zelasko, He Huang, Oleksii Hrinchuk, Nithin Rao Koluguri, Kunal Dhawan, Somshubra Majumdar, Elena Rastorgueva, Zhehuai Chen, Vitaly Lavrukhin,…

0

2

28

1

2

30

Wen-Chin Huang

@unilightwf

1 year

これはテンション上がるわ

0

30

Wen-Chin Huang

@unilightwf

1 year

🔥Fresh and HOT🔥 S3PRL-VC now has a HuggingFace Space demo!! You can record your own voice and convert it to one of the four pre-defined speakers. Personally, I find it particularly interesting when the input is not English 😎

S3prl Vc Vcc2020 - a Hugging Face Space by unilight

huggingface.co

0

7

28

Wen-Chin Huang

@unilightwf

2 years

音声合成の本，出したよ〜っていう新入生の自己紹介に対してどうすれば良いでしょうか

0

6

27

Wen-Chin Huang

@unilightwf

1 year

I am releasing the standalone version of S3PRL-VC! S3PRL-VC aims to provide a platform to compare different self-supervised speech pretrained representation (S3PR?) in the application of voice conversion. Please try it out! Repo:

GitHub - unilight/s3prl-vc: S3PRL-VC: A Voice Conversion Toolkit based on S3PRL

S3PRL-VC: A Voice Conversion Toolkit based on S3PRL - unilight/s3prl-vc

github.com

1

7

26

Wen-Chin Huang

@unilightwf

10 months

💡New preprint on non-autoregressive sequence-to-sequence voice conversion (non-AR seq2seq VC) ‼️ We made seq2seq VC training fast and simple, and it can work on a 5 min parallel dataset! Demo: Code: Paper:

AAS-VC: On the Generalization Ability of Automatic Alignment...

Non-autoregressive (non-AR) sequence-to-seqeunce (seq2seq) models for voice conversion (VC) is attractive in its ability to effectively model the temporal structure while enjoying boosted...

arxiv.org

1

5

26

Wen-Chin Huang

@unilightwf

1 year

名古屋大学学術奨励賞を受賞した。毎年度全学10名程度なのでかなりハードル高いらしいです。一番お礼をしたいのはうちのボスです。ありがとうございます。

1

0

24

Wen-Chin Huang

@unilightwf

3 months

Daily promotion. Now with a flyer! Announcing the VoiceMOS Challenge 2024! VMC'24 has been accepted as a special session at SLT2024. There will be 3 tracks. The challenge tentatively starts on 4/10. Registration form: Website:

0

12

24

Wen-Chin Huang

@unilightwf

9 months

🚨With @erica_cooper we released the summary of the VoiceMOS Challenge 2023! We focused on 0-SHOT evaluation of 3 domains: 🇫🇷 French TTS 🎵 Singing voice conversion 💥 Noisy/enhanced speech If you will attend ASRU, consider coming to our special session!

The VoiceMOS Challenge 2023: Zero-shot Subjective Speech Quality...

We present the second edition of the VoiceMOS Challenge, a scientific event that aims to promote the study of automatic prediction of the mean opinion score (MOS) of synthesized and processed...

arxiv.org

1

5

23

Wen-Chin Huang

@unilightwf

7 months

Meanwhile, both papers were honored to be selected as the top 3% papers at ASRU2023🎖️ Although we did not win the best paper award, we still want to thank the acknowledgement from the reviewers and the TPC 🙌

Wen-Chin Huang

@unilightwf

7 months

Today, we presented the summary papers for the Singing Voice Conversion Challenge 2023 and the VoiceMOS Challenge 2023 at ASRU2023! Fully immersed in the great discussions 🗣️ Spoiler: Pretty certain that there will be future editions for these two challenges. Stay tuned ⚡️⚡️

1

7

35

1

4

23

Wen-Chin Huang

@unilightwf

3 months

🔥We warmly invite you to participate in the VoiceMOS Challenge 2024! VMC'24 has been accepted as a special session at SLT2024. There will be 3 tracks. The challenge tentatively starts on 4/10. Registration form: Website:

VoiceMOS Challenge

Human listening tests are the gold standard for evaluating synthesized speech. Objective measures of speech quality have a low correlation with human ratings, and the generalization abilities of...

sites.google.com

0

11

21

Wen-Chin Huang

@unilightwf

11 months

Yet another paper outperforming NaturalSpeech2... "We conducted all subjective tests using 11 native judgers, with each metric consisting of 20 sentences per speaker." How on earth did you get such low stds with only 11 listeners??

2

4

23

Wen-Chin Huang

@unilightwf

1 year

🔥It’s official! Together with @erica_cooper we are holding the VoiceMOS challenge 2023! This year IT’S REAL… be prepared to tackle some real-world scenarios, with three tracks focusing on the Blizzard and singing voice conversion challenges, as well as speech enhancement data!

NII Yamagishi Lab

@yamagishilab

1 year

Announcing the VoiceMOS Challenge 2023! Challenge website: Register to participate: This edition of the challenge will focus on real-world and challenging zero-shot out-of-domain mean opinion score prediction!

1

10

23

0

9

23

Wen-Chin Huang

@unilightwf

26 days

I thought this is a known fact…

Self-Supervised Speech Representations are More Phonetic than Semantic

Self-supervised speech models (S3Ms) have become an effective backbone for speech applications. Various analyses suggest that S3Ms encode linguistic properties. In this work, we seek a more...

arxiv.org

3

4

23

Wen-Chin Huang

@unilightwf

1 year

It’s really sad to see Google publishing papers like this. I really feel they just want to beat VALL-E, as almost all experiments compared with it, as if it is the only TTS system right now that is worthwhile comparing.

Speak, Read and Prompt: High-Fidelity Text-to-Speech with Minimal...

We introduce SPEAR-TTS, a multi-speaker text-to-speech (TTS) system that can be trained with minimal supervision. By combining two types of discrete speech representations, we cast TTS as a...

arxiv.org

1

3

22

Wen-Chin Huang

@unilightwf

2 years

The VoiceMOS Challenge, which I co-organized with @erica_cooper from @yamagishilab , was accepted as a special session at INTERSPEECH 2022! We are still welcoming participants so please don't hesitate to register!

Announcement

voicemos-challenge-2022.github.io

0

10

22

Wen-Chin Huang

@unilightwf

2 years

My internship at Meta ended perfectly with a lovely dinner with my manager and his family (his wife and daughters, 1.5yo&4yo). A bit chaotic, a bit sweet, just like my internship!

1

0

22

Wen-Chin Huang

@unilightwf

3 months

大学教員1週目終わった感想：忙しい

0

1

22

Wen-Chin Huang

@unilightwf

2 years

第16回 IEEE SPS Japan Student Journal Paper Awardを受賞しました … ボス、作文を修正してくだざってありがとうございました､､､😅

TOP - IEEE東京支部

ieee-jp.org

2

22

Wen-Chin Huang

@unilightwf

10 months

4/4 for … not NeruIPS but ASRU🇹🇼🇹🇼

1

21

Wen-Chin Huang

@unilightwf

9 months

Check out the paper describing one of the top systems in SVCC2023!!

VITS-based Singing Voice Conversion System with DSPGAN...

This paper presents the T02 team's system for the Singing Voice Conversion Challenge 2023 (SVCC2023). Our system entails a VITS-based SVC model, incorporating three modules: a feature extractor, a...

arxiv.org

0

6

21

Wen-Chin Huang

@unilightwf

1 year

After reading the reviews of the INTERSPEECH submissions in our lab, just can't wait to see how good those highly rated accepted papers are!! 😎😎😎 They've got to be novel, original, full of technical breakthroughs, and of course STATE-OF-THE-ART!!! 😎😎😎😎😎😎

2

0

21

Wen-Chin Huang

@unilightwf

3 months

Recently people around me started talking about how boring papers in IS&ICASSP have become. Is it really because it's only 4 pages?

12

2

21

Wen-Chin Huang

@unilightwf

1 year

改めて名古屋大学学術奨励賞ありがとうございました😊

0

21

Wen-Chin Huang

@unilightwf

11 days

I swear I’ve seen 10+ TTS papers with titles like this, just saying that it’s fast and lightweight without mentioning what technique was used. I’d rather read paper with bad jokes in the title.

arXiv Sound

@ArxivSound

11 days

``FLY-TTS: Fast, Lightweight and High-Quality End-to-End Text-to-Speech Synthesis,'' Yinlin Guo, Yening Lv, Jinqiao Dou, Yan Zhang, Yuehai Wang,

0

1

12

1

2

21

Wen-Chin Huang

@unilightwf

17 days

専攻の先生にバレないようにこっそり面白そうなデータセットをシェアしまーす

GitHub - nu-dialogue/real-persona-chat: RealPersonaChat: A Realistic Persona Chat Corpus with...

RealPersonaChat: A Realistic Persona Chat Corpus with Interlocutors' Own Personalities - nu-dialogue/real-persona-chat

github.com

0

5

20

Wen-Chin Huang

@unilightwf

9 months

Best voice conversion demo video I've ever seen...

0

3

20

Wen-Chin Huang

@unilightwf

10 months

This work is basically the ASR version of this work How wise Tsao-san is!!

Big model only for hard audios: Sample dependent Whisper model...

Recent progress in Automatic Speech Recognition (ASR) has been coupled with a substantial increase in the model sizes, which may now contain billions of parameters, leading to slow inferences even...

arxiv.org

1

0

20

Wen-Chin Huang

@unilightwf

2 years

Happy 600 🤙

0

20

Wen-Chin Huang

@unilightwf

10 months

So far all the prompt-based VC papers use VALL-E (or AudioLM, whatever you like) style LLM, so they all fall back to ASR+TTS-based VC. The only difference then becomes how the dataset is constructed. It's really getting boring.

Towards General-Purpose Text-Instruction-Guided Voice Conversion

This paper introduces a novel voice conversion (VC) model, guided by text instructions such as "articulate slowly with a deep tone" or "speak in a cheerful boyish voice". Unlike traditional...

arxiv.org

0

4

20

Wen-Chin Huang

@unilightwf

4 months

この度、テレコムシステム技術学生賞を受賞することになりませんでした！多くの先輩方に時に厳しく、時に温かくご指導頂いたおかげだと存じます。これからも初心を忘れることなく、音声変換分野に貢献できるよう精進して参ります！

2

1

20

Wen-Chin Huang

@unilightwf

2 years

800突入！

0

18

Wen-Chin Huang

@unilightwf

1 year

After three years, the voice conversion challenge (VCC) is back🤩 This time we’re focusing on *singing* voice 🎤 hoping to attract #SpeechProc and #MusicProc people🔥 Registration starts today, see for more details 🙌🙌

lester violeta

@lesterphv

1 year

We are proud to announce the first Singing Voice Conversion Challenge (SVCC2023)! Building on the success of the previous VCC events, we plan to further push the limits of voice conversion by now focusing on singing voices, which is more difficult to model than speech.

1

16

41

0

3

19

Wen-Chin Huang

@unilightwf

8 months

Found a poster I made last year -- probably the best illustration I've ever made.... The fun fact is that I have no memory of this poster lol

1

3

19

Wen-Chin Huang

@unilightwf

11 months

Free from all the preparations for INTERSPEECH (trip arrangement, poster/slides, etc.) gives me plenty of time to get an early lead in ICASSP paper progress: currently at 2.5/4 pages 🎉

1

19

Wen-Chin Huang

@unilightwf

10 months

Check out our new paper on foreign accent conversion (FAC)! Accepted to APSIPA ASC 2023 🇹🇼 Demo: Code: Paper: We found none of the three most recent FAC methods is superior to the other 🤔

Evaluating Methods for Ground-Truth-Free Foreign Accent Conversion

Foreign accent conversion (FAC) is a special application of voice conversion (VC) which aims to convert the accented speech of a non-native speaker to a native-sounding speech with the same...

arxiv.org

0

5

19

Wen-Chin Huang

@unilightwf

9 months

CEATEC本当に出展して良かったと思う。一般の方の反応を見て、何が本当に求められるものなのか、何が研究者のわがままなのかを容赦なく分からされる。

0

19

Wen-Chin Huang

@unilightwf

15 days

史上最高のwe are pleased to inform you that来た😭😭😭😭😭😭😭😭😭😭

2

1

18

Wen-Chin Huang

@unilightwf

9 months

I will be presenting joint research efforts dedicated to speech quality assessment by me and Erica in the past two years! Contents will include the latest results in VoiceMOS 2023. 🙌

NII Yamagishi Lab

@yamagishilab

9 months

Dr. Erica Cooper (National Institute of Informatics, Japan) & Mr. Wen-Chin Huang (Nagoya University, Japan) are the founders and organizers of the VoiceMOS Challenge, a shared task challenge for automatic opinion score prediction for synthesized speech.

1

0

6

0

3

18

Wen-Chin Huang

@unilightwf

8 months

あるところでランチをしに行ってきました！小泉さんがミーティングあるため、苅田さんとのツーショットができました😂😂ありがとうございました🙇‍♂️

0

2

17

Wen-Chin Huang

@unilightwf

13 days

ASR+TTS VC suffers from error propagation & cannot preserve non-verbal content, but IMO not the mainstream approach rn. maybe that's why they did not provide any reference? S-MOS results not so great as expected. Also wanted to see difference wrt LM-VC.

0

4

18

Wen-Chin Huang

@unilightwf

1 year

📢I am releasing a standalone toolkit for seq2seq voice conversion. For now, this repo only supports reproducing the results of my Voice Transformer Network paper, but the plan is to release codes for my other works based on this seq2seq VC model. Repo:

GitHub - unilight/seq2seq-vc: A sequence-to-sequence voice conversion toolkit.

A sequence-to-sequence voice conversion toolkit. Contribute to unilight/seq2seq-vc development by creating an account on GitHub.

github.com

1

6

18

Wen-Chin Huang

@unilightwf

1 month

I cannot believe they wrote a paper!

arXiv Sound

@ArxivSound

1 month

``XTTS: a Massively Multilingual Zero-Shot Text-to-Speech Model,'' Edresson Casanova, Kelly Davis, Eren G\"olge, G\"orkem G\"oknar, Iulian Gulea, Logan Hart, Aya Aljafari, Joshua Meyer, Reuben Morais, Samuel Olayemi, Julian Weber,

0

10

47

2

4

18

Wen-Chin Huang

@unilightwf

1 year

What I like about the #SpeechProc confs is we encourage rather than picking out good research. The ~50% accept rate makes the confs more like a platform for sharing research progress rather than “ah, this guy survived the brutal review process”.

2

1

17

Wen-Chin Huang

@unilightwf

2 months

I love how all my strong speech researchers on X use this GPT-4o chance to advertise their completely irrelevant work lmfao

1

0

18

Wen-Chin Huang

@unilightwf

2 months

Do I have Twitter friends that want to work on speech/qudio evaluation together? Got an open-source project going on, looking for collaborators!

7

1

17

Wen-Chin Huang

@unilightwf

1 year

Cool to know that this comparison paper got accepted and mine got rejected. Great job #INTERSPEECH2023 , kudos guys 🫡🫡

Comparison of Multilingual Self-Supervised and Weakly-Supervised...

Recent models such as XLS-R and Whisper have made multilingual speech technologies more accessible by pre-training on audio from around 100 spoken languages each. However, there are thousands of...

arxiv.org

2

3

17

Wen-Chin Huang

@unilightwf

5 months

I like this paper except for it submitted to ACL.. this team (no offense, they do very solid speech/audio research) likes to submit papers to the so called “top conferences” (ICLR, NeruIPS…) where reviewers IMO probably know little about speech/audio..

Language-Codec: Reducing the Gaps Between Discrete Codec...

In recent years, large language models have achieved significant success in generative tasks (e.g., speech cloning and audio generation) related to speech, audio, music, and other signal domains....

arxiv.org

3

1

17

Wen-Chin Huang

@unilightwf

2 years

初招待講演終わりました。ありがとうございます🙇‍♂️

3

2

17

Wen-Chin Huang

@unilightwf

1 year

MS, May 2022: our TTS is finally indistinguishable from natural recording! yay! Also MS, Jan 2023: we're comparing our new TTS system, VALL-E, with a solid baseline! Its name is... YourTTS!!

NaturalSpeech: End-to-End Text to Speech Synthesis with Human-Level Quality

Text to speech (TTS) has made rapid progress in both academia and industry in recent years. Some questions naturally arise that whether a TTS system can achieve human-level quality, how to...

arxiv.org

0

4

17

Wen-Chin Huang

@unilightwf

5 months

As a speech synthesis guy (1) signal metrics (PESQ, etc.) are not accepted as perceptual metrics by synthesis people (2) really hope to see TTS included in the applications rather than ASR, ASV... (3) IMO dataset mismatch is the biggest problem in most codecs but not evaluated..

arXiv Sound

@ArxivSound

5 months

``Codec-SUPERB: An In-Depth Analysis of Sound Codec Models,'' Haibin Wu, Ho-Lam Chung, Yi-Cheng Lin, Yuan-Kuei Wu, Xuanjun Chen, Yu-Chi Pai, Hsiu-Hsuan Wang, Kai-Wei Chang, Alexander H. Liu, Hung-yi Lee,

0

1

12

0

17

Wen-Chin Huang

@unilightwf

5 months

To my audio friends: is there an official definition of the real-time factor? I always thought that RTF < 1 means faster than real-time, but the EnCodec paper gave the complete opposite definition... (c.f., Sec. 4.6)

High Fidelity Neural Audio Compression

We introduce a state-of-the-art real-time, high-fidelity, audio codec leveraging neural networks. It consists in a streaming encoder-decoder architecture with quantized latent space trained in an...

arxiv.org

5

3

17

Wen-Chin Huang

@unilightwf

1 year

Are people still working on speech SSL models? I feel like I haven’t seen anything big since WavLM… has the boom vanished?

2

1

17

Wen-Chin Huang

@unilightwf

10 months

LMに音響表現にたすとASRが成り立つという論文結構出てくるよな　 3年前Academia Sinicaでの夏プロジェクト、割と創発的？

Speech Recognition by Simply Fine-tuning BERT

We propose a simple method for automatic speech recognition (ASR) by fine-tuning BERT, which is a language model (LM) trained on large-scale unlabeled text data and can generate rich contextual...

arxiv.org

0

5

17

Wen-Chin Huang

@unilightwf

6 months

I knew it... someday this work will come out... don't know if they are aware that there is something called unit selection (or concatenation) in speech synthesis...

arXiv Sound

@ArxivSound

6 months

``Retrieval-Augmented Text-to-Audio Generation. (arXiv:2309.08051v2 [] UPDATED),'' Yi Yuan, Haohe Liu, Xubo Liu, Qiushi Huang, Mark D. Plumbley, Wenwu Wang,

0

3

22

1

16

Wen-Chin Huang

@unilightwf

2 years

ジャーナル2本目ACCEPT！卒業しようぜ(すぐじゃない)

0

16

Wen-Chin Huang

@unilightwf

3 months

投稿しないとくそみたいな査読も来ないし見ないてもいいし何の意味もないリバっタル書かなくてもいいしおすすめです✨

0

1

16

Wen-Chin Huang

@unilightwf

7 months

Accept rate ASRU 2023: 43.6% ICASSP 2024: 45% Let’s welcome our new top speech conference👏

2

16

Wen-Chin Huang

@unilightwf

1 month

TLでMIT着任する助教のツイット流れてて絶対半端ないやつだろうなと思ってHP開いたら被引用数5500越えてて草

0

15

Wen-Chin Huang

@unilightwf

7 months

めちゃ良い！って言った査読2つ貰ったが、明らかに間違ってるレビュー何個も書いた査読１つも貰ったので、ICASSP不採択でした。そして後輩が査読2つしか貰わなかったが採択。こういうくそみたいな査読者、去年のINTERSPEECHから2連続かーくそ〜

1

0

15

Wen-Chin Huang

@unilightwf

10 months

月末の音響学会、実は2019年から日本留学以来の初参加。最初は自分の日本語に自信なかったから出てなくて、その後コロナ、海外インターン...色々あって今回ようやく。ラスボスの感じしますね。前日の東海支部イベントだけど、ポスター作りながら一人でワクワクしてきた。改めてよろしくお願いしゃす！

0

3

15

Wen-Chin Huang

@unilightwf

1 year

The Singing Voice Conversion Challenge (SVCC) 2023 is still open for registration! Come join us to push the boundary of singing voice conversion, the intersection between #SpeechProc and #MusicProc . See more details and register today at

0

2

15

Wen-Chin Huang

@unilightwf

29 days

自分へ被引用数0になりそうな論文を出すのをやめよう

0

15

Wen-Chin Huang

@unilightwf

3 months

"The existing wav2vec-based VC proposals, however, only use the last-layer wav2vec representations." "...state-of-the-art, any-to-any VC models, AdaIN-VC, FragmentVC and S2VC." Sigh...

1

3

14

Wen-Chin Huang

@unilightwf

2 years

700突入！

0

14

Wen-Chin Huang

@unilightwf

2 years

Today I will present my 3rd first-author paper at @ieeeICASSP : LDNet: Unified Listener Dependent Modeling in MOS Prediction for Synthetic Speech Paper: Thu, 12 May, 23:00 - 23:45 Singapore Time (UTC +8) @ Gather Area K Come to our poster!

1

2

14

Wen-Chin Huang

@unilightwf

2 years

今年はこれくらいかなー。VoiceMOSのお陰でMOSNet一気に引用数上がって、ほぼ去年の倍になってて、本当に不思議だと思う､､､

0

1

14

Wen-Chin Huang

@unilightwf

2 years

インターンシップのため八ヶ月ぶりにボスとミーティング。2時間半でほぼ研究以外の話：性格、指導の仕方、ラボ運営、進路。正直ずっとボスは人間ではないと思ったが人間性をやっと感じたなっていう日��った。

0

1

14

Wen-Chin Huang

@unilightwf

3 months

学生生活の最終日､こないだアーニャ展で買ったパズルと苦戦中。もう一週間かかったのに永遠に終わらない

1

0

14

Wen-Chin Huang

@unilightwf

16 days

わかるofわかる。 Metaでインターンした時に、こういう人は大体「research engineer」と呼ばれる。

岡本大和 / Yamato.OKAMOTO / RoadRoller

@RoadRoller_DESU

17 days

昨夜の #kyototechtalk の発表資料をアップしましたたくさんの「わかる！」という共感を頂いたり、一方で「そんな都合いいスーパー人材いるか？」と鋭いご指摘もありました。今後も社会実装を加速させる組織設計を頑張ります💪 いまAI組織が求める企画開発エンジニアとは？

0

14

99

0

2

13

Wen-Chin Huang

@unilightwf

1 year

NU3MTに出場させていただくことになりました。「名古屋大学学術奨励賞を受賞した、名大を代表する博士後期課程の学生たちによる、たった3分間に濃縮された研究紹介コンテスト。」紹介ページも作成していただきました。ありがとうございました。よかったら。

NU3MT 2023 #4 HUANG Wen Chinさん｜名古屋大学同窓生メールマガジン

Ｑ：博士後期課程に進学したきっかけは？学部時代のゼミで研究をやってみたら、自分は人より研究にセンスがあるらしいことに気付きました。知りたいことを科学的な方法で知ることができるという面白さを感じて、進学することにしました。Ｑ：研究のここが好き！自分の研究は、他の国の研究者との共同研究が多くて、いろんな国の人と一緒に研究できるのが楽しいです。海外の研究機構や大手会社などにインターンシップを...

note.com

0

3

13

Wen-Chin Huang

@unilightwf

5 months

Nice work from my former advisor and colleague at Academia Sinica 🇹🇼

arXiv Sound

@ArxivSound

5 months

``Self-Supervised Speech Quality Estimation and Enhancement Using Only Clean Speech,'' Szu-Wei Fu, Kuo-Hsuan Hung, Yu Tsao, Yu-Chiang Frank Wang,

0

8

19

0

1

13

Wen-Chin Huang

@unilightwf

23 days

Attempts to re-train open-sourced VC toolkits on JVS, a Japanese corpus: knn-vc ('23): bad quality FreeVC ('22): bad conversion similarity Diff-HierVC ('23): incomplete files, cannot even start training DiffVC ('21): cannot synthesize meaningful speech VQMIVC ('21): somehow okay

4

0

13

Wen-Chin Huang

@unilightwf

4 months

昨日ボスとｽｰﾊﾟｰｺｳｷｭｳｵﾐｾ行ってきた。ボスの方が楽しんでた説ある

0

13

Wen-Chin Huang

@unilightwf

3 months

こないだ大学同期（現MIT博士学生）に「お前の博論タイトル長くない？やっぱ日本の文化だね」って言われた。どうやら「なるべくやったことを誇張せずにタイトルをつける」のが日本文化って認識されたらしいだが､今日の中間発表見て､やっぱそうじゃないわって再確認した､､笑みんな普通に盛ったわ

1

2

13