Robert Nishihara @robertnishihara Twitter profile

Last Seen Profiles

@ComelNata75231

@DaphneOz

@UF_IC3

@SalazarFiresto1

@sensahil

@Texan_Live

@Ro_Antonelli66

@OVERTECK

@naseem202020

@itsTempplex

@junjunjun351

@Mayabagsby

@chii_mimi

@tyasu1000

@JeffKCollins

@AsSa66986762

@nicholson_97878

@careersatdurham

@JMHS_Athletics

@zachmoon4

@aroma_imamu

@jayjayursus

@nnekaogwumike

@Kuratio_

@Realchan___

@y_muu0

@ty2h9

@TaraMalathip

@Loki_Deister

@ThaleZOliver

@speedwaydigest

@mt_ebi

@whtsuppbro

@AirbusCareers

@mizuki20_game

@Mayzita_Mara

Robert Nishihara

@robertnishihara

2 years

I remember in 2016 when @ApacheSpark set the record for sorting 100TB in the most cost-efficient way ($144 in 2016, $115 in today's prices). Today, @raydistributed broke the $1 / TB barrier and set the world record at $97! 🔥📈🥳🎂🎗️🥂

Ray Is the World’s Most Cost-Efficient Sorting System at $1/TB

Exoshuffle is now the most cost-efficient way to sort 100TB of data on the public cloud, using only $0.97 per terabyte. Learn how this is possible with Ray.

www.anyscale.com

6

70

515

Robert Nishihara

@robertnishihara

9 months

Function calls have been a massive gap in the open source ecosystem (and the most common feature request). We benchmarked function calling on a variety of open and proprietary models. Impressively, Mistral-7B performs on par with GPT-3.5. Here's how they stack up 🤯🤯 ⚫️

Anyscale

@anyscalecompute

9 months

We're announcing new features and models today. 🔵 JSON mode ⚫️ function calling Try them out with our API.

4

13

168

19

70

501

Robert Nishihara

@robertnishihara

1 year

This in-depth case study sheds light on when you can achieve GPT-4 level performance with a fine-tuned 7B parameter model. Take SQL generation as an example. Accuracy 🧿 Llama-2-7B: 3% 🧿 GPT-4: 79% 🧿 Llama-2-7B (fine-tuned): 86% Out of the box, GPT-4 crushes Llama-2

kourosh hakhamaneshi

@CyrusHakha

1 year

🚀 Exploring Llama-2’s Quality: Can we replace generalist GPT-4 endpoints with specialized OSS models? Dive deep with our technical blogpost to understand the nuances and insights of fine-tuning OSS models. 🔗 🧵 Thread 1/N👇

16

117

539

7

94

465

Robert Nishihara

@robertnishihara

2 years

While everyone's talking about training giant models, companies like @Instacart are quietly achieving 10x performance improvements by training and deploying many smaller models. Here's how they're doing it. 🔥🔥

Training 1 Million ML Models in Record Time | Anyscale

More companies are needing to train & deploy many small ML models, often hundreds or thousands. Learn how Ray and Anyscale make that possible.

www.anyscale.com

4

69

420

Robert Nishihara

@robertnishihara

1 year

We've built a ton of #LLM applications recently. Reasoning about performance & feasibility is painful without reference points. Here are the reference points we use to anchor our intuition (inspired by @JeffDean 's "Numbers every engineer should know").

14

94

391

Robert Nishihara

@robertnishihara

11 months

An important systems bottleneck when working with LLMs is model loading times, but if you get the details right, you can speed up standard implementations by around 20x (over 10 minutes down to around 35 seconds for Llama-2-70B). There are a few bottlenecks to numbers to think

Cade Daniel 🇺🇸

@cdnamz

11 months

How long does it take to download Llama2 70B? On the 4x 25 Gbps NICs that aws.p4de's have, it should take ~10s. Yet in production we've observed much higher times, which makes autoscaling less responsive + more expensive. This blog post shows how we've reduced download & init

1

37

231

2

60

382

Robert Nishihara

@robertnishihara

9 months

Faster Mixtral? Much more to come here. We make deep investments in open source AI. If you'd like to help build open source AI or optimize LLM performance, join us at @anyscalecompute . DM me 🚢

Woosuk Kwon

@woosuk_k

9 months

We've just released v0.2.5 which includes this performance improvement (contributed by Antoni at @anyscalecompute ). Please try it out!

0

4

38

4

25

268

Robert Nishihara

@robertnishihara

2 years

One of our goals with @raydistributed has been to provide a great off-the-shelf experience for beginners as well as the performance and flexibility required by power users. @OpenAI is on the "power users" end of the spectrum.

How Ray, a Distributed AI Framework, Helps Power ChatGPT

Anyscale co-founder Ion Stoica explains the open source platform Ray, and how it scales Machine Learning in this new era of generative AI.

thenewstack.io

2

49

303

Robert Nishihara

@robertnishihara

2 years

Exciting to see Quokka at the top of Hacker News (written by Ziheng Wang). In ~1000 lines of Python, Quokka is a high performance fault-tolerant query engine built on 1⃣ Ray ( @raydistributed ) - distributed execution 2⃣ Polars - fast dataframes 3⃣ Arrow ( @ApacheArrow ) - fast I/O

7

38

295

Robert Nishihara

@robertnishihara

10 months

OpenRLHF is a high-performance RLHF training framework based on @raydistributed and DeepSpeed.

3

62

275

Robert Nishihara

@robertnishihara

4 months

Lots of things 😆 There have been a handful of rewrites from scratch over the years with @raydistributed (actually from scratch). Some things that come to mind 💻 Programming languages: First prototype was in Rust. Then C++ (heavy multithreading with gRPC as the core RPC

Peter Schafhalter

@pschafhalter

4 months

@robertnishihara @mrry Hi Robert, considering all the lessons learned building Ray, are there any changes you would have made back when you first started the project? Personally, I always wondered whether the flexibility of dynamic task graphs would eventually lead to performance bottlenecks.

0

5

7

41

238

Robert Nishihara

@robertnishihara

1 year

I was surprised by how Llama-2 stacks up against GPT-3.5 and GPT-4 on getting the facts right. We investigated factuality in the context of summarization. Summarization is one of the most immediately practical applications of LLMs. Good summaries have a few key properties: 1⃣

Waleed Kadous

@waleedk

1 year

📊 Case study of Llama-2’s capability in summarization tasks. 📌 TLDR: In summarization; Llama-2-70b is as factual as GPT-4 while being 30x cheaper 🧵 👇

9

64

306

5

27

210

Robert Nishihara

@robertnishihara

1 year

Ray Summit this month will be 🔥🔥 🤯 ChatGPT creator @johnschulman2 🧙‍♀️ @bhorowitz on the AI landscape 🦹‍♂️ @hwchase17 on LangChain 🧑‍🚀 @jerryjliu0 on LlamaIndex 👨‍🎤 @zhuohan123 and @woosuk_k on vLLM 🧜 @zongheng_yang on SkyPilot 🧑‍🔧 @MetaAI on Llama-2 🧚‍♂️ @Adobe on Generative AI in

8

45

207

Robert Nishihara

@robertnishihara

9 months

Some folks noticed that an interesting extension to @OpenAI 's JSON capabilities. With @anyscalecompute , not only can you force the model to generate JSON, you can specify the exact schema. Specify the types, include arrays, nest objects together, ... Super powerful for making

jason liu

@jxnlco

9 months

Landing a PR to support @anyscalecompute 's new json schema mode

2

36

9

24

195

Robert Nishihara

@robertnishihara

2 years

#RaySummit is happening in 1 week! If you want to learn how companies like @OpenAI , @Uber , @Cruise , @Shopify , @lyft , @Spotify , and @Instacart are building their next generation ML infrastructure, join us!

ray

@raydistributed

2 years

#RaySummit is almost here! Don’t miss out on: 🌁 In-person networking in SF 🎒 3 in-depth Ray training sessions ⚙️ 40+ technical sessions and lightning talks 🎤 Speakers from @MetaAI , @Spotify , @IBM & more ...and much more!

1

8

19

16

64

174

Robert Nishihara

@robertnishihara

1 year

Just tried out Code Llama (34B) on Anyscale Endpoints. Impressive work from @MetaAI , and I'm proud to see our team at @anyscalecompute ship this the same day. Try it out:

8

39

182

Robert Nishihara

@robertnishihara

4 years

We began @raydistributed several years ago at @UCBerkeley (in @ucbrise , @amplab , and @berkeley_ai ). Today there are 300+ contributors from 75+ companies. Learn how companies use Ray to build scalable AI applications in production at #raysummit !

Ray Summit | Free Virtual Event | LF Events

Scalable ML, scalable Python, for everyone. Join Ray Summit & see how Ray is used for building scalable, distributed AI.

events.linuxfoundation.org

5

63

168

Robert Nishihara

@robertnishihara

1 year

Nearly all LLMs will be multi-modal. Multi-modality is another 10x UX improvement in the same way that chat was. But multi-modality is hard to do, and it's expensive. This article by @BytedanceTalk gives a taste of where things are headed (and how they're used in TikTok).

ray

@raydistributed

1 year

@BytedanceTalk , the company behind TikTok, uses Ray for fast & cheap offline inference with multi-modal #LLMs . They generate embeddings for a staggering 200 TB of image and text data using a model with >10B parameters. 🧵 Thread below 👇

2

31

119

1

29

172

Robert Nishihara

@robertnishihara

1 year

As far as I know, #Alpa + #Ray is the most performant and scalable way to train LLMs with JAX (tested on up to 1024 A100 GPUs). For OPT-175B, the throughput is 21% higher than @Meta 's original work (179 TFLOPs / GPU versus 147). That's *without* manual partitioning. 🤯🤯

ray

@raydistributed

1 year

ICYM our blogs on Ray and Generative AI. We have a three-part series on how to use Ray to productionize common generative AI model workloads. Here are parts 1 and 2: 👉 👉 #Ray for #GenerativeAI #workloads

0

12

56

3

29

171

Robert Nishihara

@robertnishihara

11 months

I'm so proud of what we launched last week at Ray Summit 🤩🤩🤩🤩🤩🤩🤩 🎉 Anyscale Endpoints supports fine-tuning 🎉 Our LLM fine-tuning API. Fine-tuning is essential for cost reduction. Fine-tuning can enable task-specific performance superior to GPT-4 at 1/300th the cost 🤯

13

30

167

Robert Nishihara

@robertnishihara

9 months

Today we're releasing new @anyscalecompute features. ☑️ JSON mode ☑️ function calling ☑️ new models This has been a massive gap in the open source LLM ecosystem. 📜 JSON mode: Outputs valid JSON based on your schema requirements. 📞 Function calling: Lets the LLM choose a

6

19

141

Robert Nishihara

@robertnishihara

1 year

Amazon released LightGPT on May 24. We deployed it in Aviary in 5 minutes. So what? Productionizing AI requires models + serving infra. Progress on open source *models* has been astonishing, open source *serving infrastructure* has lagged behind.

Introducing Ray Aviary | 🦜🔍 Open Source Multi-LLM Serving

Try Ray Aviary Explorer at https://aviary.anyscale.com!We’re excited to announce the release of Aviary: a new open source project that simplifies and enables...

www.youtube.com

4

30

145

Robert Nishihara

@robertnishihara

1 year

We build a lot of our own RAG-based LLM applications internally. One is a chatbot designed to answer questions about @raydistributed (which we've actually used to improve our own documentation 😲). This is a thorough guide to building and productionizing RAG applications that

Goku Mohandas

@GokuMohandas

1 year

Excited to share our production guide for building RAG-based LLM applications where we bridge the gap between OSS and closed-source LLMs. - 💻 Develop a retrieval augmented generation (RAG) based LLM application from scratch. - 🚀 Scale the major workloads (load, chunk, embed,

33

266

1K

3

18

147

Robert Nishihara

@robertnishihara

9 months

Thrilled to announce some new features on Anyscale Endpoints! @anyscalecompute 🕸️Embedding endpoints: Use gte-large via an OpenAI compatible API to build your RAG applications at $0.05/M tokens - HALF the cost of OpenAI & Cohere. 🎛️ Llama2 70B fine-tuning: Now you can also fine

5

17

134

Robert Nishihara

@robertnishihara

9 months

If you want to compare OpenAI side by side with open models (Llama 2, Mistral, Zephyr, ...), check out Anyscale Endpoints. We provide an OpenAI-compatible API (for inference and fine-tuning).

Yiren Lu

@YirenLu

9 months

One consequence of the OpenAI drama is that companies will be taking a much closer look at fine-tuning open-source models as a replacement for third-party APIs. It used to be a cost/performance argument, but if you can't trust the API provider to have employees in a week...

3

4

37

2

19

120

Robert Nishihara

@robertnishihara

10 months

One of the most common asks we get is for public (and reproducible) performance benchmarks. LLM inference performance benchmarks are subtle, and this is a rapidly evolving space, so numbers quickly become stale. But to make comparisons, we need to be talking about the same

4

32

121

Robert Nishihara

@robertnishihara

7 months

This is the first hands-on, intensive, two-day bootcamp for learning to build RAG applications. Cohosted by @pinecone and @anyscalecompute (also featuring lessons from experts at @LangChainAI , @vercel , and others). Nearly every AI application will be a RAG application, and

Anyscale

@anyscalecompute

7 months

Ready to hear from #RAG experts at @LangChainAI @vercel @pinecone @anyscalecompute and get hands-on with intensive guided trainings? The 2-day RAG Developer Bootcamp is for you! Learn more & register now 👉 #llm #ml #rag #ai #vectordatabase #ray #pinecone

4

16

56

10

24

116

Robert Nishihara

@robertnishihara

1 month

This migration began 4 years ago 🫢 Not our typical Ray use case, but so impressive and it illustrates Ray's versatility. Also, it was worth it because they're saving over $100 million annually 😇 Some fascinating excerpts. 2016: Amazon aims to remove all dependencies on

ray

@raydistributed

1 month

We don't hear the term *exabyte* too frequently. This is an impressive use case.

0

12

52

5

15

117

Robert Nishihara

@robertnishihara

29 days

This is a HUGE update for us! I’ve spent a ton of time with @KeertiMelkote over the past few months and the energy is through the roof. Very very few founders have done what he’s done (starting a business in his garage and scaling it to over $5B in revenue). AI is in its

Anyscale

@anyscalecompute

29 days

Today, we’re welcoming @KeertiMelkote as CEO of Anyscale!

0

2

46

12

7

103

Robert Nishihara

@robertnishihara

4 months

MuKoe: open source implementation of MuZero. Built by @character_ai . Leveraging @raydistributed and Jax.

GitHub - character-ai/MuKoe

Contribute to character-ai/MuKoe development by creating an account on GitHub.

github.com

0

9

100

Robert Nishihara

@robertnishihara

10 months

Fine-tuning is here to stay. Chatbots are the most common LLM application today, but we are going to inject AI into every nook and cranny, and this will mean *many* LLM calls working in concert to power applications (some of our debugging features on Anyscale involve composing

Adithyan

@adithyan_ai

10 months

I burned in🔥2000$ in finetuning so you don't have to. I fine-tuned models with @OpenAI and @anyscalecompute API endpoints with 50million tokens. Here are the results I wish I knew before getting into finetuning. If you just want a quick snapshot, look at the figure. A longer

32

78

691

1

14

94

Robert Nishihara

@robertnishihara

3 months

One of @vllm_project 's strengths is that it exposes the ability to trade off latency and throughput. However, higher qps regimes cause significant latency degradation. The underlying reason has to do with inference taking place in two stages: prefilling (processing the input

Anyscale

@anyscalecompute

3 months

Recently, we’ve contributed chunked prefill to @vllm_project , leading to up to 2x speedup for higher QPS regimes! In vLLM, prefilling, which fills the KV cache, and decoding, which outputs new tokens, can interfere with each other, resulting in latency degradation. 1/n

4

23

97

2

14

89

Robert Nishihara

@robertnishihara

2 years

@JayaGup10 @FoundationCap @fortanix @Cohesity @SkyflowAPI @heyjasperai @CerebrasSystems @tonkean @coefficient_io @graphcoreai @anyscalecompute @eightfoldai ChatGPT was trained using Ray. And generally, training and serving foundation models are some of the fastest growing use cases we see with @raydistributed and @anyscalecompute .

ChatGPT got all the buzz, but beneath it is a $1B developer framework that's quietly fueling the...

The Ray developer framework is used by OpenAI, Amazon, and more to solve one of AI's biggest pain points and make AI tools like ChatGPT possible.

www.businessinsider.com

0

14

87

Robert Nishihara

@robertnishihara

4 months

Ray originally started with just the "task" API for executing Python functions asynchronously (with some resemblance to systems like Dask, Celery, PySpark, etc). Actually, the system most closely resembling Ray's task API is CIEL (built by @mrry ). That

ray

@raydistributed

4 months

Ray operates at two levels: Ray Core, which scales Python functions and classes with tasks and actors, and its libraries, offering easy-to-use abstractions tailored for ML workloads. #Ray #ML #DistributedComputing

0

9

53

2

7

84

Robert Nishihara

@robertnishihara

9 months

Open models have made astounding progress in 2023. Llama, Mistral, Zephyr, ... much more to come in 2024. Try them side by side with OpenAI.

martin_casado

@martin_casado

9 months

OS AI has never been more important. Ever.

33

102

647

0

15

78

Robert Nishihara

@robertnishihara

10 months

If you are investing in LLM infrastructure and wondering what the most cost efficient way to run open LLMs is, check out Anyscale Private Endpoints! It's like OpenAI, but 🎯 for open models (Llama-2, Mistral, ...) 🎯 it runs in your cloud, private for your business 🎯 it's

5

10

76

Robert Nishihara

@robertnishihara

11 months

Great LLM inference survey from @huggingface and I'm delighted to see that Anyscale Endpoints is the lowest price point on the market for Llama-2-70B. 🤗

Philipp Schmid

@_philschmid

11 months

Yesterday, @awscloud released Bedrock as GA! Amazon Bedrock is a new AWS service that gives you access to Foundation Models ( @Anthropic , @cohere ,…) with a token-based pricing. 🆕 Lets compare the pricing to @OpenAI @Google and others. 🧶

8

62

226

2

16

73

Robert Nishihara

@robertnishihara

2 years

We are *beyond excited* about this and are so so lucky to be working with @DynamicWebPaige . This is an amazing development for @raydistributed ! Not many people understand ML and developers like she does.

👩‍💻 Paige Bailey

@DynamicWebPaige

2 years

❤Am beyond excited to share that this week begins a new adventure, leading developer experience for @RayDistributed at @AnyscaleCompute . is an open-source project that gives users the ability to scale, and serve, *any* compute-intensive #Python workload.

30

13

265

5

6

71

Robert Nishihara

@robertnishihara

9 months

The pace of progress in the open source community is astounding. Impressive work from the Mistral team. The smaller 7B Mistral model surprised us in a lot of ways (e.g., matching GPT-3.5 in function calling quality). I'm excited to see what people build with the new 8x7B model.

Anyscale

@anyscalecompute

9 months

We’re excited to announce the official @MistralAI Mixtral 8x7B model on Anyscale Endpoints, offering the best price on the market with an OpenAI compatible API. 💸 Pricing: $0.5 / million tokens 📆 Coming soon: JSON mode and function calling Try out Mixtral on Anyscale

32

75

684

1

13

69

Robert Nishihara

@robertnishihara

10 months

Nice to see @raydistributed in Amazon's (JARK) stack for generative model serving. - Jupyter - Argo - Ray - Kubernetes

0

15

67

Robert Nishihara

@robertnishihara

9 months

The Llama Guard model is now available on Anyscale Endpoints. Get started here: Example:

AI at Meta

@AIatMeta

9 months

At release, Purple Llama includes: - CyberSecEval - Llama Guard model - Tools for insecure code detection & testing for cyber attack compliance We're also publishing two new whitepapers outlining this work. Get Purple Llama ➡️

5

17

63

2

22

58

Robert Nishihara

@robertnishihara

2 years

Pretty astounding talk from @DhruvMadeka at @NeurIPSConf about how @amazon was able to use @raydistributed to optimize inventory. This appears to enable a 12% reduction in inventory across Amazon 😮😮🤯

4

9

56

Robert Nishihara

@robertnishihara

3 months

One of the worst bugs that @pcmoritz and I debugged was a deadlock involving multiple processes across multiple machines each waiting on each other. We sat in a conference room and shared our screen on a massive projector. We had 6 terminal windows open, each one ssh'ed to some

ray

@raydistributed

3 months

🚀 Announcing the Ray Distributed Debugger! 🚀 An integrated debugging experience within VSCode. 1⃣ Set breakpoints to pause tasks and inspect variables. 2⃣ Post-mortem debugging: Analyze state after an error. More:

0

10

39

3

5

54

Robert Nishihara

@robertnishihara

1 year

This is very impressive. "Introducing Ray Serve dramatically improved our production ML pipeline performance, equating to a ~50% reduction in total ML inferencing cost." 💰💰 @Samsara will be speaking in depth about how they scale AI in a cost-efficient manner at #RaySummit .

1

9

51

Robert Nishihara

@robertnishihara

4 months

Ray is often compared to systems like Apache Spark, but Spark is more analogous to Ray Data, Ray's data processing library. Ray is architected as an ecosystem of libraries (for data, training, inference, etc) built on top of a flexible core system.

ray

@raydistributed

4 months

Ray is emerging as a standard for AI workloads, powering AI at companies like OpenAI, Uber, and Netflix. What sets Ray apart is its rich ecosystem of libraries tailored for various distributed computing tasks across the AI lifecycle.

0

8

31

1

8

51

Robert Nishihara

@robertnishihara

10 months

We updated our production RAG application guide with a number of new sections: ☑️ When to fine-tune embeddings ☑️ When to augment vector-based retrieval with traditional lexical search ☑️ When to rerank retrieved context ☑️ How to update & reindex as data changes Importantly,

Goku Mohandas

@GokuMohandas

10 months

Added some new components (fine-tuning embeddings, lexical search, reranking, etc.) to our production guide for building RAG-based LLM applications. Combination of these yielded significant retrieval and quality score boosts (evals included). Blog:

7

50

211

0

14

49

Robert Nishihara

@robertnishihara

1 year

A big roadblock with adopting LLMs is just "getting started". A lot of businesses are rethinking ML infrastructure that wasn't designed to support LLMs. Ray 2.4 makes it straightforward (copy & paste) to get started with LLMs - fine-tuning / training - serving - batch inference

ray

@raydistributed

1 year

Announcing Ray 2.4.0: Infrastructure for LLM training, tuning, inference, and serving. 🧠 LLM features 💽 Ray data for ease of use & stability 📊 Serve observability 🤖 RLlib’s module for custom reinforcement learning 🏢Ray scalability for large clusters

0

40

166

2

6

47

Robert Nishihara

@robertnishihara

4 months

This is a fantastic read on Uber's AI 8 year AI journey. From (1) predictive ML on tabular data to (2) adopting deep learning to (3) venturing into generative AI. It's amazing to see that @raydistributed has played a role in enabling deep learning and LLM training at Uber.

Uber Engineering

@UberEng

4 months

Learn about @Uber 's journey from predictive to generative AI, all while supporting 10 million real-time predictions per second at peak. Read more: #UberEngineering #UberEng

1

10

32

2

9

46

Robert Nishihara

@robertnishihara

4 months

Data ingest is a bottleneck for training (as you scale). @metaai wrote a great blog a while back outlining these challenges and how they solved them (by scaling data ingestion and training independently). As far as I know, Ray is the only open source framework that enables

ray

@raydistributed

4 months

Here is part 2, zooming way in on data preparation (along with runnable code).

2

6

19

0

10

45

Robert Nishihara

@robertnishihara

11 months

Open source models will dominate. We've been betting on open source infrastructure at @anyscalecompute from the start.

Yann LeCun

@ylecun

11 months

Open source AI models will soon become unbeatable. Period.

146

500

3K

0

3

45

Robert Nishihara

@robertnishihara

11 months

Ray started out primarily with training workloads. Since then, serving workloads have taken off, especially as cost efficiency for LLM inference is top of mind for everyone 🪙🪙

ray

@raydistributed

11 months

🎉 Announcing Ray Serve and Anyscale Services general availability! Teams at @LinkedIn , @Samsara , @AntGroup + many more have been using Ray to serve LLMs & multi-modal applications in a flexible, performant and scalable way. Read more about the GA release and how companies have

1

17

55

1

4

44

Robert Nishihara

@robertnishihara

1 year

Aviary is an open source project that makes it easy to deploy and manage multiple LLMs, especially any @huggingface LLM. Adding a new model (like LightGPT) takes 5 minutes. And new models can be contributed by anyone in the open source community!

GitHub - ray-project/ray-llm: RayLLM - LLMs on Ray

RayLLM - LLMs on Ray. Contribute to ray-project/ray-llm development by creating an account on GitHub.

github.com

1

9

42

Robert Nishihara

@robertnishihara

4 months

This blog covers some of the low level details of optimizing training performance (in this case for stable diffusion models, though the lessons are broader). 💽 Mixed hardware setup (A100 and A10g GPUs) 💰 Decouple encoders from U-Net 🏗️ EFA? torch.compile? FSDP? NCCL plugins?

ray

@raydistributed

4 months

We pretrained a stable diffusion model on 2 billion images for under $40K. Here's what we learned.

1

16

51

1

6

43

Robert Nishihara

@robertnishihara

3 months

The core Ray API has remained the same since we first built it (with roughly two major API additions), but the library ecosystem on top has pivoted wildly. The vision was always somewhat analogous to the Python ecosystem: (1) build flexible lower level primitives (like functions

Anyscale

@anyscalecompute

3 months

Ray was originally envisioned as a scalable version of Python. The analogy applies at two levels. 1⃣ Python’s core primitives are functions and classes. Ray’s core primitives are tasks and actors, which map these concepts into the distributed setting. 2⃣ Python’s strength is in

0

5

19

0

18

40

Robert Nishihara

@robertnishihara

6 months

Saturday morning is the best time to learn about RAG.

Anyscale

@anyscalecompute

6 months

🚀Join @Marwan1112 for a training on evaluation-driven development! ✅ Develop a basic #RAG app w/ Python ✅ Create an evaluation dataset ✅ Evaluate retrieval & overall quality ✅ Explore trade-offs between quality & cost Space is limited - sign up now

0

1

2

3

40

Robert Nishihara

@robertnishihara

8 months

More function calling! This time with Mixtral.

Anyscale

@anyscalecompute

8 months

🔥 Mixtral-8x7B JSON Mode and Function Calling API is now available on Anyscale Endpoints! Empirically, we observed noticeable improvements in response to tool messages by Mixtral MoE, compared @MistralAI 7B. 🚀 👇 Try it out:

5

12

104

1

3

41

Robert Nishihara

@robertnishihara

8 months

Curious how LLM providers compare on performance (e.g., AWS Bedrock, Fireworks, Replicate, Together, Anyscale)? Two key metrics: 🚅 Time to first token 🚢 Inter-token latency And of course, end-to-end latency can be derived from these two numbers. Importantly, the code and

Anyscale

@anyscalecompute

8 months

📈We’re excited to introduce the LLMPerf leaderboard: the first public and open source leaderboard for benchmarking performance of various LLM inference providers in the market. Our goal with this leaderboard is to equip users and developers with a clear understanding of the

10

39

161

3

13

38

Robert Nishihara

@robertnishihara

3 months

For people who started doing ML over ten years ago, some of these trends are big shifts. One of the most interesting is the role of AI in dataset preparation. In training, the quality of the resulting model often hinges on the quality of the training data, which is why a ton of

Robert Nishihara

@robertnishihara

3 months

The core Ray API has remained the same since we first built it (with roughly two major API additions), but the library ecosystem on top has pivoted wildly. The vision was always somewhat analogous to the Python ecosystem: (1) build flexible lower level primitives (like functions

0

18

40

2

5

39

Robert Nishihara

@robertnishihara

5 years

Our startup, @anyscalecompute , recently announced our initial round of funding. We're working in the ML and distributed systems space and looking for a wide variety of roles! If you're at all interested in working together, please send me a message!

Anyscale, from the creators of the Ray-distributed computing project, launches with $20.6M led by...

Open source has become a critical building block of modern software, and today a new startup is coming out of stealth to capitalise on one of the newer

techcrunch.com

4

2

39

Robert Nishihara

@robertnishihara

3 months

We're adding a new vLLM track to Ray Summit. The contributor community around vLLM has exploded recently, and vLLM is one of the frameworks most commonly used along with @raydistributed . Submit talk proposals here:

Anyscale

@anyscalecompute

3 months

There has been so much excitement and activity around this topic, that we are adding a vLLM track to the Ray Summit! If you contribute to or use @vllm_project , we want to hear from you.

1

12

32

0

11

38

Robert Nishihara

@robertnishihara

4 months

A previous highlight for me was @gdb on how @OpenAI uses Ray to train their largest models.

Ray Summit 2022 - Day 1 Keynote - Fireside Chat: Greg Brockman +...

Ray Summit 2022 - Day 1 Keynote - Fireside Chat: Greg Brockman + Robert NishiharaSee all Ray Summit content @ http://anyscale.com/ray-summit-2022

www.youtube.com

ray

@raydistributed

4 months

Speak at Ray Summit 2024!

0

2

10

4

38

Robert Nishihara

@robertnishihara

10 months

@HamelHusain It's an impressive launch, but this is just the start 😀 One year from now, all of the LLMs we have today will appear to have been prohibitively expensive (and slow). From our view at @anyscalecompute , there's a massive amount of energy going into the open source ecosystem, and

3

0

36

Robert Nishihara

@robertnishihara

1 year

I still can't believe how lucky we are to be working with @GokuMohandas 😍🥳 He's one of the best educators out there, and @MadeWithML has been the entry point for so many into AI 🧑‍🏫🎗️ If you're looking to get started with AI, this course is perhaps the only one out there that

Goku Mohandas

@GokuMohandas

1 year

Beyond excited to share the biggest update yet to @MadeWithML -- based on 8+ years of helping machine learning teams get to production: - 📈 Scaling ML (+ LLMs) - 🔗 MLOps integrations - 🚀 Dev → Prod (fast) - ✅ open-source 🧵 Detailed thread below 👇

14

89

318

2

6

37

Robert Nishihara

@robertnishihara

2 months

FP8 for @vllm_project cuts latency nearly in half.

Anyscale

@anyscalecompute

2 months

We’ve recently contributed FP8 support to the @vllm_project in collaboration with @neuralmagic . With this feature, you can see up to a 1.8x reduction in inter-token latency, with >99% accuracy preservation! 1/n

2

35

105

0

9

38

Robert Nishihara

@robertnishihara

11 months

The typical path is - S3 -> disk (streamed via CPU memory) - disk -> CPU memory - CPU memory -> GPU memory There are a couple performance issues here. 1. The detour through disk is out of the way. 2. These stages are typically done sequentially (one completes before the next

2

6

38

Robert Nishihara

@robertnishihara

5 months

The vast majority of businesses are in the "exploration" phase of adopting generative AI. @canva is one of a few companies that have shipped and scaled multiple generative AI products. Amazing to see the work they are doing 🤩

ray

@raydistributed

5 months

Very impressive to see how @canva is using LLMs and image generation to transform the design world.

1

7

20

0

7

37

Robert Nishihara

@robertnishihara

11 months

Congratulations to the @MistralAI team! This is a huge win for open source LLMs, especially given the Apache 2 license 🥰 For anyone looking to do inference or fine-tuning with open LLMs, check out Anyscale Endpoints (we serve Llama-2-70B for $1 / million tokens).

Guillaume Lample @ ICLR 2024

@GuillaumeLample

11 months

Mistral 7B is out. It outperforms Llama 2 13B on every benchmark we tried. It is also superior to LLaMA 1 34B in code, math, and reasoning, and is released under the Apache 2.0 licence.

52

481

3K

0

2

35

Robert Nishihara

@robertnishihara

6 months

The future of marine transit 😍 When electric hydrofoiling boats take off, eventually people will move closer to the water. Imagine commuting from Sausalito to Alameda in 10 min. @navierboat

3

34

Robert Nishihara

@robertnishihara

1 year

LLM throughput improvements translate directly to cost reductions, so 23x is huge (and we have a lot more in the pipeline). The key ingredients here are continuous batching and PagedAttention.

Cade Daniel 🇺🇸

@cdnamz

1 year

I wrote about a 23x improvement (!) in LLM live-inference throughput, measured on OPT-13B on A100. There are 2 new innovations which make this possible: Continuous batching & PagedAttention. Short thread below; see writeup, experiments, and results at

2

52

244

0

6

35

Robert Nishihara

@robertnishihara

2 months

Lots of deep lessons from the AI infra journey at Pinterest going back to Q1 2023. Some impressive stats - 5000+ training jobs / month - 300+ batch inference jobs / month

Pinterest Engineering

@PinterestEng

2 months

Part 2 is here! 🥳 Learn about ‘Ray Infrastructure at Pinterest’ written by Chia-Wei Chen, Raymond Lee, Alex Wang, Saurabh Vishwas Joshi, Karthik Anantha Padmanabhan and Se Won Jang. 📌

0

19

53

0

6

33

Robert Nishihara

@robertnishihara

1 year

Did you know you can fine-tune and deploy the 70B parameter Llama-2 model *without thinking about compute infrastructure* 🤯🤯🤯 Would be 100x harder without the incredible platform we've been building with @raydistributed and @anyscalecompute 😍

kourosh hakhamaneshi

@CyrusHakha

1 year

My typical schedule when a new kick-ass OSS LLM gets released ( #LLama -2): Drop everything non-customer related and go all in playing with it. Here is the result of me playing with it: On this PR, I have a single script that uses #Ray Train and #Deepspeed

3

20

74

2

5

34

Robert Nishihara

@robertnishihara

1 year

Very very excited that @RichardSocher will be speaking at #RaySummit next week! AI search is moving incredibly quickly, and @YouSearchEngine is a leader in this area. This will be a good place to learn about some of the technology powering 💡 how AI

Your Personalized AI Assistant.

Conversational and continuously learning, You.com enhances web search, writing, coding, digital art creation, and solving complex problems.

you.com

Robert Nishihara

@robertnishihara

1 year

Ray Summit this month will be 🔥🔥 🤯 ChatGPT creator @johnschulman2 🧙‍♀️ @bhorowitz on the AI landscape 🦹‍♂️ @hwchase17 on LangChain 🧑‍🚀 @jerryjliu0 on LlamaIndex 👨‍🎤 @zhuohan123 and @woosuk_k on vLLM 🧜 @zongheng_yang on SkyPilot 🧑‍🔧 @MetaAI on Llama-2 🧚‍♂️ @Adobe on Generative AI in

8

45

207

0

8

33

Robert Nishihara

@robertnishihara

8 months

One of the lesser known challenges with building RAG applications is figuring out how to compute the embeddings. Embeddings of text, images, and other data modalities are at the heart of a RAG app. Producing ~1B embeddings can take weeks and cost tens of thousands of dollars

Anyscale

@anyscalecompute

8 months

Producing ~1B embeddings can take weeks and cost tens of thousands of dollars ($60K with OpenAI in the example below). We are thrilled to partner with @pinecone on the launch of their new serverless offering! Anyscale + Pinecone reduce the cost of computing these embeddings by

3

8

59

2

6

32

Robert Nishihara

@robertnishihara

2 months

I've come across very few open source projects which ship weekly releases (though they do exist). Ironically, while it takes a lot of work to get to a constantly releasable state, it's less work to maintain. Previously, we were following a six-week release cycle. Every release

ray

@raydistributed

2 months

We recently moved to weekly Ray releases to ship features to our community faster. 🚀 Doing so required us to fix flaky tests and completely revamp our release process. 👊 Read more here:

1

3

10

0

7

32

Robert Nishihara

@robertnishihara

16 days

Something we're doing differently this time around, we added a #vLLM track to #RaySummit ! @vllm_project is one of the most popular inference engines, and is often used together with @raydistributed for scaling LLM inference. Can't wait to hear from these companies about how

1

8

32

Robert Nishihara

@robertnishihara

1 year

Our team has worked incredibly hard to make Ray the easiest way to scale LLMs and generative AI. A sample of recent developments ✅ Streaming support for real-time LLM inference ✅ Distributed checkpointing for large model training ✅ Tutorials for LLM fine-tuning and serving

ray

@raydistributed

1 year

The Ray 2.6.1 released with : 🎏 Streaming responses in Serve for real-time capabilities 🎏 📀🏃‍♀️Ray Data streaming integration w/Train 🏃‍♀️☁️Distributed Training & Tuning sync with cloud storage persistence 🤖 Alpha release of the Multi-GPU Learner API 📙 Ray Gallery examples 👇

1

3

22

2

7

32

Robert Nishihara

@robertnishihara

1 year

Curious how LoRA compares to full-parameter fine-tuning? The principal trade-off with LoRA is straightforward: you may give up some model quality, but you gain the ability to serve many models more efficiently. The question is how much quality. The answer is nuanced. 👇

kourosh hakhamaneshi

@CyrusHakha

1 year

🤔 Fine-tuning LLMs: LoRA or Full-Parameter? Which should you choose? Uncover the insights in our latest technical blog. 🔗 Link: 🧵 Thread (1/N) 👇

4

44

212

1

32

Robert Nishihara

@robertnishihara

9 days

1000 contributors!!! 🤯🤯 One of the earliest contributors to Ray was @AntGroup / @Alipay . They were the first serious Ray user, and contributed a lot to the hardening of Ray in production. Today, they use Ray for a huge range of workloads ranging from batch inference to model

ray

@raydistributed

9 days

🚀Ray just hit 1000 contributors! Our community is thriving 🙌—all thanks to you! Let’s keep pushing the boundaries of AI together!🤝🌟 #opensource #AI

1

5

17

1

5

32

Robert Nishihara

@robertnishihara

1 year

@Instacart just revamped their ML infrastructure. 🏆 The primary use cases are: 1⃣ Training thousands of small to mid-sized models 2⃣ Data parallel deep learning with large datasets 3⃣ Scalable batch inference

Distributed Machine Learning at Instacart

How Instacart uses distributed Machine Learning to efficiently train thousands of models in production

tech.instacart.com

1

9

32

Robert Nishihara

@robertnishihara

3 months

We published a 3 part series on stable diffusion Part 1: Cost reduction Part 2: Preprocessing billions of images Part 3: Training at scale

We Pre-Trained Stable Diffusion Models on 2 billion Images and Didn't Break the Bank - Definitive...

Anyscale is the leading AI application platform. With Anyscale, developers can build, run and scale AI applications instantly.

www.anyscale.com

ray

@raydistributed

3 months

See how we optimized large-scale ML training in Part 3 of our Stable Diffusion series! We used Ray Train, Ray Data, and PyTorch Lightning to train on 2B images with fault tolerance, data streaming, and advanced strategies like FSDP and DDP. Read more:

1

10

30

0

9

30

Robert Nishihara

@robertnishihara

2 years

I hate watching myself talk, but I'm immensely proud of this demo that our team put together at the #RaySummit showing how to build an ML application with @raydistributed and @anyscalecompute . 😍

Ray Summit 2022 - Day 1 Keynote - Robert Demo

Ray Summit 2022 - Day 1 Keynote - Robert DemoSee all Ray Summit content @ http://anyscale.com/ray-summit-2022

www.youtube.com

1

3

28

Robert Nishihara

@robertnishihara

2 years

Step-by-step instructions for building an ML platform using @raydistributed and @kubeflow on @googlecloud . Ray scales compute for data ingest / preprocessing, training, serving, etc. Kubeflow provides notebooks, orchestration, authentication, and more.

Build a ML platform with Kubeflow and Ray on GKE | Google Cloud Blog

How Kubeflow and Ray can be deployed together on Google Kubernetes Engine to provide a production-ready ML system.

cloud.google.com

1

3

30

Robert Nishihara

@robertnishihara

8 months

@ocolegro HF is a great choice to host the data. To run embedding computations (or training or inference), we do this regularly with @raydistributed and @anyscalecompute . Here's an example where ByteDance runs embedding computations and inference on 200TB data.

How ByteDance Scales Offline Inference with Multi-Modal LLMs

ByteDance, the company behind Tiktok, leverages multi-modal models to enable many applications, such as text-based image retrieval or object detection.

www.anyscale.com

1

7

30

Robert Nishihara

@robertnishihara

1 year

The Llama team has changed the AI landscape in a very short period of time. It's remarkable work! 🤯

ray

@raydistributed

1 year

The team @MetaAI has done a tremendous amount to move the field forward with the Llama models. We're thrilled to collaborate to help grow the Llama ecosystem.

2

18

87

1

4

29

Robert Nishihara

@robertnishihara

8 months

@soumithchintala @anyscalecompute Thanks @soumithchintala , I agree with a lot of your feedback, and we’re going to address it! A few concrete things: ⚫️ We will be adding cost as a metric (that's extremely important). ⚫️ We will be measuring latency and reliability over time. This one is a big deal. As you

1

0

28

Robert Nishihara

@robertnishihara

25 days

Many of the companies you'll hear from at #RaySummit have gone through a 5-10 year AI infrastructure journey (even longer in some cases). They've managed the migration from classical ML to deep learning, then from deep learning to generative AI, and they are gearing up for the

2

9

28

Robert Nishihara

@robertnishihara

3 years

This is a nice walkthrough of how to scale applications on Kubernetes with Ray. Blog post by Vishnu Deva from @MadStreetDen .

1

9

27

Robert Nishihara

@robertnishihara

2 months

Gen-3 Alpha is mind blowing. Absolutely cannot wait to hear from @agermanidis about the process of building this model and the hard-earned lessons at Ray Summit 2024.

Runway

@runwayml

2 months

Introducing Gen-3 Alpha: Runway’s new base model for video generation. Gen-3 Alpha can create highly detailed videos with complex scene changes, a wide range of cinematic choices, and detailed art directions. (1/10)

254

930

4K

0

6

28

Robert Nishihara

@robertnishihara

1 month

GPU fragmentation is common when deploying multiple models on a shared pool of resources (in particular, when the number of replicas of each model scale up and down independently).

Anyscale

@anyscalecompute

1 month

4/ ↪️ With Replica Compaction, Anyscale will automatically migrate replicas into fewer nodes in order to optimize resource use and reduce costs. It does this with zero downtime to ensure applications have no interruption in traffic.

1

4

0

7

27

Robert Nishihara

@robertnishihara

9 months

In addition to today's launch of @AIatMeta 's new Llama model (Llama Guard), Anyscale Endpoints also recently launched open embeddings (gte-large). Lots more in the coming weeks...

0

3

25

Robert Nishihara

@robertnishihara

1 year

1⃣ Cost-efficiency at scale means using smaller specialized models (versus massive general purpose models). 2⃣ High response quality with smaller models means customization (e.g., fine tuning). 3⃣ Customization is best done with open models. 4⃣ The best open models are the Llama2

1

8

26

Robert Nishihara

@robertnishihara

4 months

When scaling training on GPUs, it's common to become bottlenecked by data ingest and preprocessing. When training stable diffusion models, that "preprocessing" may include running other models to embed the text & image inputs (a pre-trained VAE and a text encoder (OpenCLIP-ViT/H)

ray

@raydistributed

4 months

We pretrained a stable diffusion model on 2 billion images for under $40K. Here's what we learned.

1

16

51

1

10

26

Robert Nishihara

@robertnishihara

15 days

Running hands on training sessions is something we started in the very early days of Ray. We got this idea of course from the @ApacheSpark community. Here are two photos from the early days 1. The first Ray meetup, hosted at @OpenAI . 2. A tutorial we ran at an @OReillyMedia

ray

@raydistributed

15 days

🚨The Ray Summit Training Guide is here!🚨 Check it out and make the most out of your experience. 🎓 Explore all the classes available, 🍎 Discover personalized learning paths (LLMs, Ray Core, and more), 🎯 Reserve your spot early to avoid missing out! 🚀 #RaySummit Training

0

7

11

2

7

26

Robert Nishihara

@robertnishihara

1 year

Please submit questions you’d like to ask @bhorowitz ! We’re doing a fireside chat at #RaySummit . Topics include AI, the chip shortage, the role of open source, management, and policy / regulation. Just reply to this tweet!

5

9

26

Robert Nishihara

@robertnishihara

4 years

Congrats to the team! We're hiring *across the board* engineering (including ML, infrastructure, frontend, etc) site reliability engineering product management product design marketing engineering management solutions architect More details at .

Anyscale

Job openings at Anyscale

jobs.lever.co

Anyscale

@anyscalecompute

4 years

We’re thrilled to announce our $40M Series B, led by @NEA to continue growing the ecosystem around @raydistributed . This wouldn’t have been possible without the Ray community!

2

12

45

2

6

26

Robert Nishihara

@robertnishihara

3 years

Really excited to announce GA for Anyscale as well as our Series C (with @a16z and Addition)! Our goal is to enable every developer and every team to succeed with AI without needing to worry about building and managing infrastructure.

Anyscale

@anyscalecompute

3 years

🎉 We’re thrilled to announce our $100M Series C led by @a16z & Addition + general availability of the Anyscale managed @raydistributed platform! Both are big steps forward in our mission to accelerate the scaling and productionization of #AI apps.

4

16

89

3

26

Robert Nishihara

@robertnishihara

29 days

Amazing work! We've been thrilled to partner with @lmsysorg and @Meta to host these models for the leaderboard on @anyscalecompute 😀

lmsys.org

@lmsysorg

30 days

Exciting news! @metaai 's Llama-3.1 results are here🔥 The Llama-3.1 series, extensively tested over the past week, has gathered over 10K community votes. Now, Llama-3.1-405B has climbed to #3 on the Overall Arena leaderboard, marking the first time an open model has ranked in

32

117

704

3

6

25

Robert Nishihara

@robertnishihara

10 months

Exactly our approach with Anyscale Endpoints. Go deep on cost and performance. Squeeze performance out of every layer of the stack.

Suhail

@Suhail

10 months

Super simple startup idea: take open source models, speed them up relentlessly, make an api, have the cheapest possible price, great uptime. In the long-run, you’ll build economies of scale w GPUs + have a process power of optimization.

70

43

800

0

3

24

Robert Nishihara

@robertnishihara

2 months

Collaboration with @lmsysorg . Step-by-step instructions for building your own model router. Key steps: 1. Generating labeled data 2. Fine-tune an LLM-based classifier 3. Run offline evals The whole thing takes about 120 minutes. Overall goal is to direct "simple" queries to

Anyscale

@anyscalecompute

2 months

1/ 🚀 Introducing RouteLLM: a routing framework based on human preference data for routing queries between powerful proprietary LLMs and cost-effective LLMs, developed in collaboration with @lmsysorg . By intelligently selecting the best model for each query, our router models

1

34

136

0

11

25

Robert Nishihara

@robertnishihara

1 year

Building AI applications is hard (but getting easier). @jerryjliu0 and @llama_index have done a huge amount to move AI tooling forward by solving some of the core challenges around interfacing LLMs with your data.

LlamaIndex 🦙

@llama_index

1 year

Building a production-ready LLM app is hard: 📄 How to load, parse, embed thousands of docs? ⚙️ How to deploy to prod? We’re incredibly excited to collab with @anyscalecompute : Ray can make LlamaIndex 10x faster + easily deployable to a prod server ⚡️

6

52

193

0

4

25