1/ We are releasing Playground v2.5, our latest foundation model to create images.
We tested our model across 20K+ users in a rigorous benchmark that went beyond anything we've seen to date.
This model is open weights. More information in the tweets below. 👇
Introducing the BLoRA repo
Hook several LoRAs into the same language model, and generate simultaneously in the same batch. Batch outputs can even be streamed.
@_madFrog
@culturaltutor
I think you missed the point. The post was intended to lay out how stories evolve across cultures and time, and where Shakespeare drew his influences from. I found it very instructive.
@WriteArthur
A bit overdramatic. If you include gender/race/etc in the text prompt it'll generate the image you want. In production, users would modify prompt or finetune to get expected behaviour. Should look up alignment research to learn more.
@minchoi
most of the artifacts are forgivable imo, cinema and animation cut corners too usually. aesthetically pleasing enough for me to ignore minor mistakes.
Because the trainable parameters for low rank layer adapters are so small, you can hold them all simultaneously in memory. Meaning, you can have the same bag of beans, and change its behavior by swapping LoRA. Huggingface's PEFT allows swapping adapters over their API
@blennon_
Not true, ie open-source stable diffusiob quickly surpassed dall-e on performance and marketshare. ChatGPT is a great product out-of-the-box, but customization, control, margins, etc will always incentivize devs to churn to open-source.
1/ Introducing Playground v2: A new commercially open model from our team that we trained from scratch.
Most notably our model was preferred 2.5x more than the current leading open model (SDXL).
We're excited to give back to the community as we're just getting started.
@vikhyatk
cool experiment! though the intuition doesn't necessarily extrapolate for higher frequency functions. ie for sin(500x), gelu is generally more stable and converges faster when scaling depth/width.
@_tim_brooks
@prafdhar
@billpeeb
@OpenAI
Hmmm, looks like it was trained in a rendered environment from a game engine. Have complete frame control and can easily auto-caption or use other control inputs as conditioning.
@andrew_n_carr
the hand is a 3d asset, and you trained a temporal text2pose model to puppeteer? seems very tractable since there’s lots of animation data available to be annotated with a pose estimator. surprised by how smooth the motion is though.
@nathanbenaich
ChatGPT is simply GPT3.5 tuned with a conversational interface. OA users can finetune their own version that behaves similarly. What OA discovered is that UI+AI gets massive traction, like Copilot has.
@Suhail
Don't make the mistake of building from scratch. Start from open-source then iterate once you see what usage actually looks like, ie your users may want to generate marketing related assets, so you'll want to curate more of that kind of training data.
@EMostaque
Short-sighted artist backlash to generative AI has echoes of Napster days. Image generators are free distribution for artists, but in the form of derivative work instead of direct copies as with peer2peer. Artists capitalizing this will dominate the future art landscape.
@LinusEkenstam
i don’t find this very compelling. i think the ad market will be over-saturated with cheap ai generated content, which’ll drive consumers even more to human ads.
First of all, there's the ways trusted computing is *designed* to hurt you. The most reliable path to enshittiication a computer that runs programs you can't alter, and rats you out to third parties if you run counter-programs that disenshittify the service you're using.
24/
@sterlingcrispin
Saying that it's 3x more powerful is pure speculation as it's never been publicly released for testing. Benchmarks and cherry-picked results can be misleading.
@EMostaque
@jackclarkSF
@amasad
@carperai
Bloom pretraining data was poorly curated and filtered, especially for multi-lingual data. Surprised Bloom creators didn't follow Pile's methodology, which was more successful. Definitely a failure-case for open source collaboration.
@jeremyphoward
brute-force captioning all training images with a 17B vlm is all you need. though strong text synthesis causes regressions, ie text leaking into images unintentionally.
The first lecture of our
@Stanford
CS25 V4 Transformers course () is now released! Check it out here: .
We (the instructors) gave a brief intro and overview of the history of NLP, Transformers and how they work, and their impact. We
@krishnanrohit
If it fits within gpt3 api's 4K context window then a simple prompt is sufficient. O/w chunk the page and summarize recursively. For q/a you can do chunk+retrieve+generate answer.
@russelljkaplan
I use GPT3 to generate synthetic data for most of my projects these days. Synthetic + human edits is an even more powerful combination.
@jackclarkSF
@EMostaque
@amasad
@carperai
Haven't followed the chatter around it, but I suspect they never caught pretrain data issues due to obsolete evals. Human evals capture failure modes much better than automated, ie
@DrJimFan
I'm doubtful about LaMDA and Sparrow. Google's subpar when it comes to dataset curation. Also, market feedback is essential for building useful AI
@Suhail
The pretrain dev cycle is long and capital intensive. It'll also be difficult for you to set up proper evals without knowing where performance should be prioritized.
@vikhyatk
the way the gpt3 paper implemented was train a classifier on curated text as positive class, and common-crawl as negative class. still a strong baseline imo.
@andrew_n_carr
i’m guessing there’s a lot of jitter from pose estimator error, which has to be smoothed somehow. probs need a custom model to label all those degrees of freedom. generating descriptive captions for the trajectories adds extra complexity. definitely sounds hard to get right.
@Suhail
On the modelling side, Stability's latent diffusion models are state-of-art wrt to performance and scalability, good stack to build from. Avoid PhDs who've only worked in research w/o touching product, they'll kill you with wasteful R&D and tech debt.