Whisper is now available in Hugging Face transformers! Happy to learn that it's become more accessible to the ๐ค community and hoping it enables lots of amazing applications!
โฐโก๐We just released Whisper in ๐ค transformers!
@openai
โs latest speech recognition transformer trained on 680 000 hours of audio! For example use case, check this notebook :
@seeingwithsound
@ilyasut
The smallest CLIP model is available at . Itโs larger than typical models for mobile, but I think it can fit and run (slowly) on mobile devices.
Prompt: "a mel-spectrogram of an electric guitar playing" (VQGAN+CLIP) Won't sound great but will throw it into Griffin-Lim... You can kinda see little guitar shapes ๐คฃ
The largest GPT-2 model is now LIVE! So happy to have participated in the detection work which marks my first (co-authored) publication at
@OpenAI
:D Check out the new blog post and the report for our latest findings and thoughts.
We're releasing the 1.5billion parameter GPT-2 model as part of our staged release publication strategy.
- GPT-2 output detection model:
- Research from partners on potential malicious uses:
- More details:
Excited to finally release what we've been working on since June! Jukebox is a small step in making neural nets produce music!
Thanks to my collaborators Heewoo,
@mcleavey
,
@_jongwook_kim
,
@AlecRad
,
@ilyasut
, this work wouldnโt be possible without them!
@Prashant_1722
Yes! GPT-4o has significantly improved speech recognition accuracy across languages. We don't have multilingual OCR evaluations like this yet, but you can try on the ChatGPT app!
@ClementDelangue
Thanks for putting up the web interface! The model is hardly perfect, but we hope this can spark further discussions in detection studies. The current model is only for GPT-2, and itโd be very interesting to build a โfits-allโ detector that works with many different generators.
What does retinal structure tell about the information efficiency of mammalian vision? Find out in
@DukeTodayUpdate
's coverage on
@nayoung_jun
's recent publications.
@id62ai
27->23 tokens with this ChatGPT-provided translation "Halo, nama saya GPT-4o. Saya adalah jenis model bahasa baru, senang bertemu dengan Anda!". Token savings are bigger with the non-Latin alphabet languages. You can try yourself with Tiktoken !
Moreover, the number of cell types depends on the number of available RGCs (=channel capacity)! As we allow more neurons, the RFs are initially small in space and integrating in time; with larger numbers, RFs are larger in space and differentiating in time.
@RiversHaveWings
@pbaylies
@danielrussruss
I realize the legend is confusing; in that figure we meant "CLIP" as the general contrastive (language-image) pre-training method, and for that figure we still used BOW as input. Released CLIP models use transformer text encoder which can be thought of as a glorified bag of words
Such a precious experience, getting to meet and learn with these awesome people every day. Didn't believe it when people said you learn so much more by teaching than as a student, but now I fully admit it. Probably the 3-weeks period that I have grown THE most. Thanks
@neuromatch
[๐ง ๐จโ๐ป ML Coding Series ๐จโ๐ป๐ง ] Kicking off a machine learning coding series! I'll be deep-diving into the code behind many of the ML papers I've covered over the last few years - starting with
@OpenAI
's CLIP!
YT:
@AlecRad
@_jongwook_kim
@ilyasut
1/
Turns out that the MusicNet labels are not quite reliable. Track 1817 has a totally different music between 0:10 and 0:40. It also contains oboe as opposed to the label.
I spent most of this past weekend experimenting with DALLยทE 2, OpenAIโs new AI system that can create realistic images from a written description.
I curated a book of 1000 robot paintings. You can view the entire thing online at
#dalle
@seeingwithsound
Forward-pass compute for the image encoder is about 4.5 GFLOPs, so it should be possible with modern mobile processors, although I have no experience on mobile AI. (I should also mention that it is intended for research and not for any deployed use cases)
Latest work from our lab published in Nature today. Proud of the terrific effort and creativity of โฆ
@roy_suva
โฉ and โฆ
@nayoung_jun
โฉ and of our colab with โฆ
@jmxpearson
โฉ. We show efficient coding -> coordinated tiling across cell types.