Fireworks AI @FireworksAI_HQ Twitter profile

Pinned Tweet

Fireworks AI

6 days

🚀Llama 3.2 is now live on Fireworks! @AIatMeta ’s Llama 3.2 models (1B, 3B, 11B, and 90B) are accessible now! 🧵 📖 Read more

2

5

22

Last Seen Profiles

@LawPawCalendar

@GosiHaus

@slingdarcy

@ratedr

@Bulgeandbum

@yahoelovedaj

@funandsins11

@koregasukidakar

@riv12495

@SinghLabGT

@cukienaknikmati

@yukumaruzzz

@tanu_nisesabori

@Ganistyaa

@dancerachaed

@iluvvit

@Jhon96952046

@Tonya8888

@YFigs156

@georgeisediting

@002Jinx

@edas_dildo

@chnhezZz

@pino__q

@enzoo8k

@ancient8_vn

@taeloafyu

@ya71742

@GoNUesports

@jennine52777

@Stree_Shakti

@jandakembangstw

@whatsonatcampie

@playember

@bacod_mawar

@loicduval

Fireworks AI

@FireworksAI_HQ

11 months

We are opening our public platform to all developers at no cost for a two-week period, specifically for API usage. Please visit to begin. In exchange, we would love your valuable feedback on how we can improve our services to assist you in creating

10

46

391

Fireworks AI

@FireworksAI_HQ

10 months

We released our tuned Mixtral chat a few hours ago. Play with it through our app or API: . Big thanks to @MistralAI ‘s new addition of this MoE model. We are very excited about it.

11

23

281

Fireworks AI

@FireworksAI_HQ

10 months

Latest @MistralAI ’s Mixtral MoE 8x7B model has been on Fireworks since Saturday (2 days before it was officially released!). 🏆 Quality: beating GPT-3.5 on most benchmarks. 🏎 Speed: fastest inference engine reaching 175 tokens/sec 💸 Pricing: $0.45-$0.60/million tokens for

2

27

220

Fireworks AI

@FireworksAI_HQ

1 year

Fireworks blazing fast LLM inference is now available on Poe! Today, we’re excited to bring the power of the new Mistral 7B Instruct model to the @poe_platform powered by our lightning-fast Fireworks inference platform. You can now have conversations with the Mistral 7B bot and

4

29

202

Fireworks AI

@FireworksAI_HQ

10 months

In our thanks to the Fireworks users, we give you a few gifts: 🐎 llama-70b-chat is now 1.5-2x faster running on the same hardware. More to come. 🎇 is FREE for API usage for two weeks. 🐪 Yi and Zephyr models are enabled. Happy Thanksgiving and

6

4

159

Fireworks AI

@FireworksAI_HQ

9 months

What's the most performant way to serve Mixtral and other open-source MoE models? Fireworks investigated this topic and came up with our proprietary serving stack with 4x the speed compared to vLLM and negligible quality impact! Read about our findings here

7

19

176

Fireworks AI

@FireworksAI_HQ

10 months

Fireworks is excited to raise the quality bar by launching our function calling model and API as an alpha launch! We’ve fine-tuned a model specifically to reliably call APIs, even when provided with multi-turn context and numerous functions! In our evals, we achieved accuracy on

9

12

159

Fireworks AI

@FireworksAI_HQ

10 months

@MistralAI 8x7B is live on Fireworks. Try it now at Warning: this is not an official implementation, as the model code hasn’t been released yet. Results might vary. We will update it once Mistral does the release. More perf improvements are landing soon

9

34

161

Fireworks AI

@FireworksAI_HQ

1 year

🎉Mistral 7B Instruct is now available on the platform! Try out the model here:

3

26

161

Fireworks AI

@FireworksAI_HQ

1 year

🚀 Launching the GenAI Platform: bringing fast, affordable, and customizable Large Language Models (LLMs) to developers. Use open-source foundation models and deploy your own LoRA adapters with up to 20–120x cost reduction. 1/9

Fireworks.ai: Fast, Affordable, Customizable Gen AI Platform

tl;dr Fireworks.ai releases the fast, affordable, and customizable Fireworks GenAI Platform. It enables product developers to run…

medium.com

5

30

156

Fireworks AI

@FireworksAI_HQ

10 months

Mixtral: one more expert to break the tie Mixtral has 8 experts, but only 2 are active for each token. Do more than 2 help? Surprisingly, it helps in fp8, but in original 16 bit precision. With this trick, fp8 can almost match fp16 on MMLU! Why is that? 1/5

4

22

147

Fireworks AI

@FireworksAI_HQ

6 months

Fireworks now offers the first (to our knowledge) hosted, instruct variant of Mixtral 8x22B! Try it at … or download it at … Thanks @Teknium1 and @NousResearch for the great dataset!

Artificial Analysis

@ArtificialAnlys

6 months

Mixtral 8x22B is an exciting launch but is not yet ready for production use for most use-cases The version of the model released by Mistral is the base model and is not instruct/chat fine-tuned. This means that it isn’t designed for the prompt & answer style that most

2

21

12

19

128

Fireworks AI

@FireworksAI_HQ

1 year

ChatFireworks is live on LangChain! You can now use open-source chat models like Mistral 7B and Llama 2 Chat in your @LangChainAI applications. LLM developers can combine Fireworks chat models with system prompts and memory to build fast and high-performing conversational AI

1

29

128

Fireworks AI

@FireworksAI_HQ

8 months

Fireworks is now the fastest provider for Mixtral 8x7b Instruct at 200 tokens/s! Graph courtesy of @ArtificialAnlys

8

5

120

Fireworks AI

@FireworksAI_HQ

2 months

🚀 Exciting news! Fireworks AI is one of the first platforms to offer Llama 3.1 for production use from day one in partnership with @AIatMeta . With expanded context length, multilingual support, and the powerful Llama 3.1 405B model, developers can now leverage unmatched AI

8

18

110

Fireworks AI

@FireworksAI_HQ

10 months

We released prompt caching for LLM inference with 5-10x faster time to the first token on long repeated prompts. LLMs are often invoked with repeated prompts: system prompt, few-shot examples, entire previous conversation for multi-turn chat, Q&A for a single document, etc. 1/4

1

12

103

Fireworks AI

@FireworksAI_HQ

11 months

For performant LLM inference, no deployment is one-size-fits-all. A low-latency bot has different requirements from a high-throughput summarization model. Learn about Fireworks’ many deployment configurations in our new blog 1/8

Fireworks - Fastest Inference for Generative AI

Use state-of-the-art, open-source LLMs and image models at blazing fast speed, or fine-tune and deploy your own at no additional cost with Fireworks AI!

fireworks.ai

3

19

97

Fireworks AI

@FireworksAI_HQ

1 year

Finetuned Mistral is now supported on the Fireworks inference platform! You can now upload finetuned Mistral PEFT add-ons to the Firework platform for fast inference. Check out our cookbook for how to finetune and upload a custom Mistral model to Fireworks: Finetune the model:

1

14

87

Fireworks AI

@FireworksAI_HQ

1 year

We are excited to announce new updates available on the Fireworks Mistral-7B Poe Bot: 1/ Sliding Window Attention - process sequences up to 32K efficiently thanks to sliding window optimization. 2/ Proper BOS Handling - Mistral-instruct now follows the conversation template it

4

25

81

Fireworks AI

@FireworksAI_HQ

1 year

We are excited to team up with @LangChainAI to bring access to open-source LLMs like Mistral 7B via the LangChain Prompt Playground. This enables developers to efficiently explore and optimize their prompts on open-source LLMs. Here is how it works:

2

14

79

Fireworks AI

@FireworksAI_HQ

6 months

We are pleased to announce the availability of the open-source Llama 3 8B and 70B models with 8k context, served from our blazing-fast inference stack.

4

14

77

Fireworks AI

@FireworksAI_HQ

1 year

You can get 11x higher throughput and 30% lower latency with #Falcon #LLM , thanks to Multi-Query Attention (MQA). Learn how co-designing model and system architecture boosts efficiency. More in the Fireworks Gen AI Platform blog: #AI 1/8

Multi-Query Attention is All You Need

by James K Reed, Dmytro Dzhulgakov, Dmytro Ivchenko, and Lin Qiao

medium.com

1

15

75

Fireworks AI

@FireworksAI_HQ

11 months

⚡️StableDiffusion XL on Fireworks is faster than ever! Our SDXL APIs support generating images of 1024x1024 in 30 steps in 2 seconds. Try it out in the console:

2

14

71

Fireworks AI

@FireworksAI_HQ

1 year

Code infilling enables the use of LLMs for code completion or docstring generation. But using SoTA LLMs (e.g., Code Llama) for infilling is tricky - you need proper whitespace formatting, and not all model variants can do accurate infilling. 1/4

Fireworks - Fastest Inference for Generative AI

Use state-of-the-art, open-source LLMs and image models at blazing fast speed, or fine-tune and deploy your own at no additional cost with Fireworks AI!

fireworks.ai

3

21

71

Fireworks AI

@FireworksAI_HQ

7 months

We’re removing our waitlist and providing general access to dedicated deployments! As part of the launch, we’re adding support for 42 new models in dedicated deployments, including Nous Hermes models and Deepseek Coder! Dedicated deployments use the ultra-efficient Fireworks

2

7

69

Fireworks AI

@FireworksAI_HQ

13 days

🔥Meet Multi-LoRA, a FireOptimizer capability that lets you personalize AI at scale and deliver the best experience for each customer and use case, with 100x cost-efficiency. Serve hundreds of fine-tuned models on a single base model at the same cost as a single base model.

3

11

65

Fireworks AI

@FireworksAI_HQ

1 year

Code Llama 🦙is out! Try it on the FREE and FAST Fireworks AI #LLM Inference platform at

3

13

62

Fireworks AI

@FireworksAI_HQ

3 months

Announcing Yi-Large on @fireworksai_hq ! We're excited to be among the first providers of Yi-Large, joining Nvidia in offering this state-of-the-art model. Yi-Large ranks among the top LLMs, closely trailing GPT-4, Gemini 1.5 Pro, and Claude 3 Opus on the LMSYS benchmark

1

12

60

Fireworks AI

@FireworksAI_HQ

4 months

We are excited to partner with @HamelHusain and @dan_s_becker on the conference "Mastering LLMs: A Conference For Developers & Data Scientists" and offer ALL students $250 in credit on the @FireworksAI_HQ platform. Registrations for the course are closing soon!

0

10

61

Fireworks AI

@FireworksAI_HQ

7 months

Happy March! Fireworks is announcing our Spring 2024 platform updates - designed for improved production usage at scale. 🧵 💨Faster, more prod-ready serverless models - Mixtral Instruct and Llama 70B have become even faster, with speeds up to 300

2

8

51

Fireworks AI

@FireworksAI_HQ

1 year

We are excited to bring fast inference open-source LLMs to Vercel AI SDK. We can't wait to see how developers use this. You can try the Llama 2 models here:

Jared Palmer

@jaredpalmer

1 year

Just switched 's Llama 2 provider to @thefireworksai . Huge perf improvement.

1

2

56

1

7

47

Fireworks AI

@FireworksAI_HQ

1 year

Fireworks achieve 3.5x better latency for blazing-fast code completion! Faster and more accurate code completion is essential for building production-grade AI-powered coding assistants. We are excited to publish our new use case showing how our fast LLM inference platform

0

7

47

Fireworks AI

@FireworksAI_HQ

1 year

The Fireworks Inference Platform is fast, but how? An important technique is CUDA graphs, which can achieve a 2.3x speedup for LLaMA-7B inference. Learn about CUDA graphs, complexity applying them to #LLM inference, and more in our new deep dive. 1/6

Fireworks - Fastest Inference for Generative AI

Use state-of-the-art, open-source LLMs and image models at blazing fast speed, or fine-tune and deploy your own at no additional cost with Fireworks AI!

fireworks.ai

1

10

47

Fireworks AI

@FireworksAI_HQ

2 months

Llama 3.1 8B, 70B Instruct are now available for fine-tuning. Fine-tuning guide -

7

5

46

Fireworks AI

@FireworksAI_HQ

3 months

Gemma 2 9B is now on Fireworks! We're the first hosted provider (to our knowledge) to offer Google's latest open-source LLM. Try Gemma 2 in our UI playground or via our OpenAI-compatible API!

4

5

46

Fireworks AI

@FireworksAI_HQ

10 months

Fireworks is committed to bringing the best model quality and performance to the community. We’ve launched the Qwen-72B model on the Fireworks platform and @poe_platform , to provide outstanding performance on English and Chinese tasks! Qwen-72B outperformed LLaMA2-70B on all

2

5

44

Fireworks AI

@FireworksAI_HQ

3 months

Fireworks is raising $52M, led by @sequoia ! "We’re using the funding to make a shift towards compound AI systems that can orchestrate across multiple models with different modalities and tools” Learn more from our live Bloomberg interview:

Sequoia, Nvidia Back Fireworks AI

Fireworks AI CEO Lin Qiao joins Caroline Hyde and Ed Ludlow to discuss the company's outlook after its latest Series B funding round of $52 million. She speaks on "Bloomberg Technology." (Source:...

www.bloomberg.com

2

7

45

Fireworks AI

@FireworksAI_HQ

10 months

@MistralAI Try out the new Mixtral 8x7B mixture-of-experts model fine-tuned by @thefireworksai for chat on @poe_platform !

1

4

43

Fireworks AI

@FireworksAI_HQ

4 months

Excited to announce custom model import, on-demand H100s and auto-scaling to and from 0 on Fireworks! Use thousands of HuggingFace models with 60% faster speeds and 53% lower costs on your own GPU 🧵

3

11

38

Fireworks AI

@FireworksAI_HQ

6 months

We are super excited to partner with @StabilityAI in bringing the state-of-the-art image generation models Stable Diffusion 3 (SD3) and Stable Diffusion Turbo (SD3-turbo) to developers with @FireworksAI_HQ enterprise-grade distributed inference service. Read more about the

2

6

38

Fireworks AI

@FireworksAI_HQ

2 months

Deepseek Coder v2 is now available on Fireworks! Try either the full model at 128K context length or the lite model at 168k context length! These models are served with Multi latent attention from Deepseek to drastically reduce the KV cache footprint.

3

6

35

Fireworks AI

@FireworksAI_HQ

7 months

Fireworks features top serving speeds- with OpenRouter Mixtral Nitro tokens/sec even faster than Groq!

OpenRouter

@OpenRouterAI

7 months

ICYMI: you can now track model performance over time. Here's the early data for Mixtral Nitro 🚀

0

5

2

4

33

Fireworks AI

@FireworksAI_HQ

2 months

Fireworks is serving Llama 3.1 405B (fp8 quantized) at $3 per 1M input/output token! We're offering the model ~3x cheaper than competing platforms thanks to the unparalleled efficiency of the Fireworks inference stack. Try it now at

2

4

35

Fireworks AI

@FireworksAI_HQ

3 months

. @cursor_ai trained has trained a specialized model on the "fast apply" task, which involves planning and applying code changes. The fast-apply model surpasses the performance of GPT-4 and GPT-4o, achieving speeds of ~1000 tokens/s (approximately 3500 char/s) on a 70b model.

0

4

34

Fireworks AI

@FireworksAI_HQ

11 months

Segmind Stable Diffusion 1B (SSD-1B) is now available on Fireworks! It’s now possible to generate 1024x1024 images in 30 steps in just 1 second using SSD-1B on the Fireworks inference platform. SSD-1B is one of the most impressive and high-performing diffusion models available

1

9

32

Fireworks AI

@FireworksAI_HQ

7 months

We hear you and we're committed to creating the best experience for rapid, iterative fine-tuning. We've removed the minimum price for fine-tuning! Create smaller tuning jobs for < $1, effective immediately!

floating point

@yar_vol

7 months

@lqiao @FireworksAI_HQ This is awesome, but do you really need to charge $3 min for EVERY finetune? I though we'd be able to do some iterative fine tuning, but this obviously quickly adds up. OpenAI does not charge minimum fixed price. It is maybe irrational, but I hesitate to try:)

1

0

1

2

30

Fireworks AI

@FireworksAI_HQ

2 months

Excited to announce our judges for this Sunday's agents and compound AI hackathon with @LangChainAI and @FactoryAI ! Join us to push the boundaries of agents and compound AI and to chat with folks like @matanSF , @srochiramani , @dzhulgakov , @swyx , @hwchase17 , @rolandgvc , @dvendrow

1

2

29

Fireworks AI

@FireworksAI_HQ

3 months

Compound AI Systems combine specialized models, retrievers, and external tools for specific tasks, offering greater flexibility and performance compared to single, mega models that can be less efficient and harder to specialize. Join us at CampFire Connect on July 11th at 10 AM

1

5

28

Fireworks AI

@FireworksAI_HQ

10 months

Introducing the CreativeQR bot on @poe_platform , powered by @thefireworksai image generation! Generate beautiful images with scannable QR codes seamlessly embedded. Try it now at

1

3

28

Fireworks AI

@FireworksAI_HQ

6 months

We’re excited to work with @MongoDB to make it easier, faster, and safer for developers to build #GenAI applications. By leveraging our highly curated and optimized open-source models and MongoDB, developers can now build faster, lower TCO, and improve quality. Learn more about

5

4

28

Fireworks AI

@FireworksAI_HQ

6 months

Mixtral MoE 8x22b is now available on Fireworks! Try out the base model here and look out for an instruct model soon!

0

4

27

Fireworks AI

@FireworksAI_HQ

1 year

🎇 Fireworks Generative AI Platform 🔝 Top OSS models behind a stable API 🐎 Optimized performance with low cost 🥠 Fine-tuned models for specific use cases 🦜 Native LangChain integration 💸 Try it for free Incredibly excited to share this initial release. More details in 🧵

1

8

25

Fireworks AI

@FireworksAI_HQ

7 months

We’re now serving Gemma 7B Instruct on the Fireworks platform! Try out Google’s latest model on Fireworks to enjoy fast inference speeds, token-based pricing, and an OpenAI-compatible, user-friendly API. Get started on our playground at or through our API

0

3

24

Fireworks AI

@FireworksAI_HQ

10 months

@MistralAI You can try it live on Fireworks. We did our best to reverse-engineer the implementation and will update it once the official model code is out.

Fireworks AI

@FireworksAI_HQ

10 months

@MistralAI 8x7B is live on Fireworks. Try it now at Warning: this is not an official implementation, as the model code hasn’t been released yet. Results might vary. We will update it once Mistral does the release. More perf improvements are landing soon

9

34

161

1

5

22

Fireworks AI

@FireworksAI_HQ

7 months

Reduced pricing We’ve switched from separate pricing for input and output tokens to one flat price. We estimate that pricing should be ~20% cheaper for the median user and all queries will be cheaper, except those with an input:output token ratio greater than ~10:1.

3

1

24

Fireworks AI

@FireworksAI_HQ

5 months

Llama 3 8b and 70b now available for fine tuning. Fireworks lets you deploy 100 fine tuned models for fast, serverless inference at 0 extra cost! Get started here 👇

0

3

24

Fireworks AI

@FireworksAI_HQ

1 month

👀Check out the latest RAG tutorial using Llama 3.1 using @astrodotbuild , @SurrealDB and @FastAPI , where you can update the embeddings live and retrieve and add context via the Llama 3.1 405B model

4

24

Fireworks AI

@FireworksAI_HQ

4 months

Excited to announce that our SDXL, Playground v2.5 and Segmind SSD image generation models are faster and less expensive! 💨 0.8 seconds to generate a 30 step-image in our tests. Independently benchmarked to be the fastest! 💰 Prices reduced ~35% from $0.006 to $0.0039 for a

Artificial Analysis

@ArtificialAnlys

4 months

Congratulations @FireworksAI_HQ on improving the speed of their Text to Image model APIs! Fireworks has reduced generation time by ~40%, from ~2.8s to ~1.7s for Playground v2.5 and ~1.9s to ~1.2s for SDXL. Fireworks has also reduced prices ~35%, positioning it amongst the

1

12

1

7

23

Fireworks AI

@FireworksAI_HQ

2 months

Thrilled to see Fireworks on the Forbes Next Billion-Dollar Startups list! No surprise that AI companies are leading the charge—AI is reshaping industries, and we're at the forefront. Proud of our unstoppable team, but we're just getting started. Join us and help shape the future

Forbes

@Forbes

2 months

#BillionDollarStartups : Artificial intelligence dominates this year’s list of 25 venture-backed startups we think most likely to reach a $1 billion valuation.

1

8

19

4

23

Fireworks AI

@FireworksAI_HQ

2 months

Looking to run inference on state-of-the-art infrastructure? Fireworks AI is the first to offer Llama 3.1 inference using both Nvidia and AMD GPUs. We’re committed to providing the best hardware for unmatched performance and cost efficiency. With Nvidia H100 and AMD Instinct

Karim

@KarimBhalwani

2 months

Thrilled to share that @Meta 's Llama 3.1 family of models, including 8B, 70B and 405B, runs seamlessly on @AMD 's AI GPUs, empowering pioneers like @FireworksAI_HQ to offer one of the fastest and most efficient inference engines from the start. We are grateful for the opportunity

12

5

31

2

3

22

Fireworks AI

@FireworksAI_HQ

6 months

We’re getting a new look! Check out our new logo and color scheme. Keep an eye out for more changes at !

3

22

Fireworks AI

@FireworksAI_HQ

5 months

🚨 New Blog Alert 🚨 Find out how a group of Gen AI enthusiasts used @FireworksAI_HQ to make a LLM play DOOM, a video game created in 1993 that has gained cult status among hackers at Mistral SF Hackathon! Special thanks to Bhav Ashok ( @SammieAtman ), @hopingtocode , and Paul

0

6

22

Fireworks AI

@FireworksAI_HQ

3 months

Join us next week at CampFire Connect to discover the latest from @FireworksAI_HQ , see new demos, and explore compound AI systems with Fireworks AI CEO @lqiao and @LangChainAI CEO @hwchase17 . 📅 11th July, 10 AM PST 📌 Online RSVP now 👇

2

5

20

Fireworks AI

@FireworksAI_HQ

7 months

Building function-calling apps with open-source models has never been easier! We’ve built an example function calling app with FireFunction-v1 for generating images and getting/plotting stock prices. Try it here or build off of our open-source code here

1

20

Fireworks AI

@FireworksAI_HQ

7 months

Excited to announce the Firefunction Playground in our redesigned UI! It's never been faster to get started with open-source function calling. Add one of our example functions or your own function to see how FireFunction-v1 will make decisions. Get code with your functions and

1

3

20

Fireworks AI

@FireworksAI_HQ

10 months

The official Mixtral 8x7b instruct model released earlier today is now live on Fireworks. Give it a try at !

1

4

20

Fireworks AI

@FireworksAI_HQ

10 months

It was a wild day! @MistralAI dropped just the model weights as a torrent this morning. Together with the community, we reverse-engineered architecture from the parameter names! The model is live on our inference platform just a few hours later, before the official code release!

2

1

19

Fireworks AI

@FireworksAI_HQ

1 year

Exciting preview of what's to come! Stay tuned for more details about what we're building at @thefireworksai

Dmytro Dzhulgakov

@dzhulgakov

1 year

@marktenenholtz @huggingface And TGI doesn't run Multi Query Attention yet, just broadcasts. Falcon is unique among open models to have MQA today btw. With special optimizations for MQA the numbers look even better for the inference service we're building at

0

2

15

0

3

19

Fireworks AI

@FireworksAI_HQ

9 months

🔥 Fire up inference with Mixtral + LoRA on Fireworks platform!

2

0

18

Fireworks AI

@FireworksAI_HQ

6 months

DBRX Instruct ( @DbrxMosaicAI ) is now available for serverless inference! DBRX is hosted as an experimental model and will be hosted serverless at least through April 2024.

0

2

18

Fireworks AI

@FireworksAI_HQ

4 months

Are you a developer that loves the Fireworks platform? We’re hiring a developer advocate! You’d be writing guides/content, shepherding our developer community, hosting live events and influencing the direction of our developer products. Apply below or email kanika @fireworks .ai

4

6

18

Fireworks AI

@FireworksAI_HQ

3 months

"a single model is not enough" says @lqiao Read more about Fireworks AI and our funding announcement covered by @Bloomberg

Sequoia, Nvidia Back Startup Fireworks AI at $552 Million Valuation

Startup Fireworks AI notched a valuation of $552 million in a funding round led by Sequoia Capital, a significant milestone for the two-year-old artificial intelligence company.

www.bloomberg.com

4

17

Fireworks AI

@FireworksAI_HQ

1 month

🔥 Join Fireworks AI and Shape the Future of AI! At Fireworks AI, we’re building the fastest, most scalable AI infrastructure, backed by top investors like Benchmark and Sequoia. Our team of industry veterans from Meta's PyTorch, Google, and more, is on a mission to empower AI

0

3

15

Fireworks AI

@FireworksAI_HQ

5 months

> We've worked with @FireworksAI_HQ to deploy our fast-apply model with strong speculative edits support. They have a fantastic inference engine and built out api support for our custom speculation logic. Glad to play a part in bringing a better experience to @cursor_ai

Aman Sanger

@amanrsanger

5 months

And, speculative edits give us a 13x speedup for this task over vanilla generation with llama-3-70b. (5/7)

1

2

45

0

1

16

Fireworks AI

@FireworksAI_HQ

10 months

@MistralAI And we just made Mixtral 8x7Bi faster. More speed-ups are on the way

1

2

15

Fireworks AI

@FireworksAI_HQ

10 months

Llama Guard 7B is also now available at Fireworks, hot off the press!

0

3

15

Fireworks AI

@FireworksAI_HQ

2 months

We're collaborating with @helicone_ai to bring LLM observability features to Fireworks users! Now you can build on tracking costs, usage, time to first tokens, and metrics to optimize your AI apps. To get started:

1

0

15

Fireworks AI

@FireworksAI_HQ

11 months

New in Fireworks Image Generation: SSD-1B, image2image, ControlNet, and more! Read our blog: Take your image-generation apps to the next level with the following new image generation features on our fast inference platform:

Fireworks - Fastest Inference for Generative AI

Use state-of-the-art, open-source LLMs and image models at blazing fast speed, or fine-tune and deploy your own at no additional cost with Fireworks AI!

fireworks.ai

2

3

14

Fireworks AI

@FireworksAI_HQ

4 months

Incredibly excited to announce the SD3-Medium API powered by @FireworksAI_HQ . Access the newest state-of-the-art image model from @StabilityAI with unprecedented sub-second latency from our model optimizations

Stability AI

@StabilityAI

4 months

Today, we’re thrilled to announce the open weights for Stable Diffusion 3 Medium, the latest and most advanced text-to-image AI model in our Stable Diffusion 3 series! This new release represents a major milestone in the evolution of generative AI and continues our commitment to

149

399

2K

1

3

14

Fireworks AI

@FireworksAI_HQ

3 months

What are Compound AI Systems? Catch up on this engaging chat from @lqiao and @hwchase17 about Compound AI, Agents, and more.

What are Compound AI Systems? ft Lin Qiao CEO Fireworks & Harrison...

Welcome you to our very first AI developer virtual event: CampFire Connect featuring Lin Qiao, CEO & Cofounder of Fireworks AI and Harrison Chase, CEO of Lan...

www.youtube.com

0

2

13

Fireworks AI

@FireworksAI_HQ

5 months

“Programming is the art of telling another human being what one wants the computer to do. We should continually strive to transform every art into a science: in the process, we advance the art” - Donald Knuth in "Art of Programming" Today, we are heading into a leap forward

2

7

13

Fireworks AI

@FireworksAI_HQ

5 months

> It's cost, speed and quality! Thank you, @beyang and @sourcegraph , for a great customer testimonial. @FireworksAI_HQ is excited to be the backbone for delivering several Gen AI experiences at low latency and optimal cost without compromising on the quality of responses.

1

0

13

Fireworks AI

@FireworksAI_HQ

3 months

Excited to be co-hosting an Agents and Compound AI Hackathon! Join us in SF on August 11 to build AI systems that can utilize multiple models, tools and knowledge bases!

Agents and Compound AI Hackathon with Fireworks AI, Factory, LangChain · Luma

Join us for an Agents and Compound AI Hackathon in San Francisco! Agents are often seen as advanced, autonomous, and human-like systems. But what exactly…

lu.ma

LangChain

@LangChainAI

3 months

💻 Join us for an Agents and Compound AI Hackathon in San Francisco on Sunday, August 11th, hosted by @FireworksAI_HQ , @FactoryAI , and @LangChainAI . Apply here ➡ What exactly defines an agent? An agent is a system that uses an LLM to determine the

3

12

68

0

5

13

Fireworks AI

@FireworksAI_HQ

7 months

We’re now serving Playground v2.5 text-to-image model on Fireworks! To our knowledge, we’re the fastest available Playground v2.5 provider, with inference speeds of ~1.2 seconds for a 1024 x 1024 image. Playground v2.5 (from @playground_ai ) offers dramatically improved quality

2

1

11

Fireworks AI

@FireworksAI_HQ

9 months

We’re partnering with @awscloud and @MongoDB to sponsor a DevOps for GenAI Hackathon on Wednesday, January 24th in NYC! Join us to learn more about building innovative applications with techniques like RAG and function-calling! RSVP at for more details

DevOps for GenAI Hackathon NYC · Luma

Generative Artificial Intelligence (GenAI) is exciting but fraught with challenges. Let's work together to understand what is needed to bring GenAI into our…

lu.ma

1

12

Fireworks AI

@FireworksAI_HQ

6 months

Awesome usage of Fireworks' new fine-tuning service!

👩‍💻 Paige Bailey

@DynamicWebPaige

6 months

🔫 Badass! A team at the @MistralAI hackathon in SF trained the 7B open-source model to play DOOM, based on an ASCII representation of the current frame in the game. 🤯 @ID_AA_Carmack

96

328

3K

0

1

12

Fireworks AI

@FireworksAI_HQ

10 months

We are proud to sponsor #NeurIPS2023 ! Our team will be at booth 802 from Monday (12/11) to Thursday (12/14) in New Orleans. We hope to see you there!

0

11

Fireworks AI

@FireworksAI_HQ

6 months

Awesome application of FireFunction-v1 and Firework's Mixtral model - generate automatic pull request descriptions!

Vladimir Blagojevic

@vladblagoje

6 months

When you’re not stuck writing glue code for every LLM tool integration, innovative ideas start to appear everywhere you look. 😉 Introducing two GitHub Actions based on Haystack 2.0: PR Auto and Reno Auto

9

18

51

3

0

10

Fireworks AI

@FireworksAI_HQ

3 months

Live now! Join us to learn about our latest features, watch new demos, and discuss Compound AI Systems.

0

10

Fireworks AI

@FireworksAI_HQ

7 months

Faster Mixtral speeds from our spring update are starting to register in benchmarks! Check out Mixtral on Fireworks for the fastest widely available speed, the best consistency and newly reduced pricing!

Fireworks - Fastest Inference for Generative AI

Use state-of-the-art, open-source LLMs and image models at blazing fast speed, or fine-tune and deploy your own at no additional cost with Fireworks AI!

fireworks.ai

Artificial Analysis

@ArtificialAnlys

7 months

Fireworks AI has supercharged their Mixtral 8x7B offering impacting 3 critical metrics ‣ Optimized throughput and are now achieving up to 200 tokens/second, second only to Groq ‣ Reduced output token pricing to 1/3 of previous pricing. Now charging $0.5/M input & output

15

8

68

0

2

10

Fireworks AI

@FireworksAI_HQ

5 months

@mag_pl @GroqInc Of course, with total response times (time to first 100 tokens) of just 1 second and Latency (seconds to first chunk) of 0.26 seconds, our platform delivers unrivalled speed without sacrificing an ounce of quality! Excited to keep pushing the envelope on fast, cost-effective Gen

0

1

10

Fireworks AI

@FireworksAI_HQ

10 months

@rauchg @vercel It’s our pleasure serving this model for the Vercel playground.

1

0

9

Fireworks AI

@FireworksAI_HQ

11 months

Fireworks is SOC2 Type II and HIPAA Compliant! We are pleased to report that the Fireworks AI inference platform is both SOC 2 Type II and HIPAA compliant. Achieving both SOC 2 Type II and HIPAA compliance is a testament to our proactive approach to safety and data security.

0

3

9

Fireworks AI

@FireworksAI_HQ

10 months

We hypothesize that for some tokens, the gating scores of the 2nd and 3rd experts are almost identical in the original model. Quantization adds noise that causes the gating to pick the wrong expert. 2/5

4

0

8

Fireworks AI

@FireworksAI_HQ

7 months

Awesome application of FireFunction-v1! It's especially cool to see Vexa call image generation, text embedding and text generation through Fireworks!

Nazeem

@n4ze3m

7 months

Introducing Vexasearch v2-beta, a small side project built around Function Call Vexa can generate images, search for information on the internet, and ask questions on specific URLs using the amazing . @FireworksAI_HQ function call. Powered by @LangChainAI

3

12

45

1

2

9

Fireworks AI

@FireworksAI_HQ

1 year

Want to have fun creating stunning images with a fast and easy-to-use API? We've now made it easy to generate images using StableDiffusion XL via the Fireworks generative AI platform. You can now integrate the best image-generation capabilities into your applications.

1

5

9

Fireworks AI

@FireworksAI_HQ

10 months

Interestingly, running an extra expert is almost free because there’s room in per-expert batches. The speed and throughput of FP8 models (including Mixtral) are much higher than FP16. This makes FP8 a good trade-off for the performance vs accuracy. 4/5

Dmytro Dzhulgakov

@dzhulgakov

10 months

@MistralAI model is hot: with mixture-of-experts, like GPT-4! It promises faster speed and lower cost than model of the same size and quality Fascinatingly, the speed-up is unevenly distributed: running on a laptop or the biggest GPU server benefit the most. Here’s why 🧵 1/7

7

87

465

1

8