virat @virattt Twitter profile | Pikagi

Pikagi

virat

@virattt

12,968

Followers

84

Following

501

Media

5,528

Statuses

playing with gen ai + finance

New York, NY

https://t.co/ZTELBmcX06

Joined April 2010

Don't wanna be here? Send us removal request.

Pinned Tweet

@virattt

virat

2 months

I built a stock market API in 42 days. All during @_buildspace s5. What is the API? Today, it lets you pull financials for 16,000 tickers going back 30+ years. Why did I build the API? Because the big providers have: • poor API design • poor documentation • expensive sales

30

59

608

Last Seen Profiles

@pinkhairvivii

@addien90s

@BroDude69xxxx

@jacksonharris03

@pigmananal

@Tarohattori0628

@memek_stwku

@Mansorkakr

@lippsgloss

@stwmaniax

@Prehysterics

@gjc1028888

@stwmaniax

@TUNN3LNFT

@81Pakito

@bokeplokalmalam

@trishulgoel

@OegCharlie

@UAB_AD

@stwmaniax

@emma_maffe

@kisinxxx

@TheCityIsOurs21

@twaz

@BhanuLaul

@looser228802

@nikvimal

@SueDinNY

@oxeanjo_

@redrackem

@BopRicky

@Coach_Tannese18

@Logological

@orangeamber12

@suncanlan88

@scotieboi83

@virattt

virat

3 months

It's happening. Last night, I started downloading financial data from the SEC. • income statements • balance sheets • cash flow statements 10,000+ public companies. 3 million rows in total. I’m using multiple worker nodes to pull, parse, and clean the data. The

105

145

2K

@virattt

virat

2 months

It's live. I launched my stock market API for AI financial agents today. • 16,857 tickers • 30+ years of fundamental data Everything is usage-based, meaning no contracts or subscriptions. We have 3 endpoints today: • GET income statements • GET balance sheets • GET cash

49

74

931

@virattt

virat

2 months

I’ve been building in public for 1 year now. Last year, I promised to build and tweet every day for a year. Today, I fulfilled that promise. Year 1 review: • grew twitter from 110 to 11.7K • built 400 member discord community • got 350 stars on a github project • earned

Tweet media one

49

18

741

@virattt

virat

4 months

I just trained a 124M param LLM from scratch. It took ~26 seconds in @GoogleColab Training details: • 5145 tokens in training set • 1024 tokens in context window • 256 tokens per batch • 10 epochs total The model went from generating gibberish to full sentences. Extremely

Tweet media one

21

128

871

@virattt

virat

1 year

I've seen multiple questions about how to build a Chatbot that: • Retrieves data from PDFs and • Has conversational memory Turns out, it's really simple to do with @LangChainAI . So, I wrote a quick tutorial with a real-world example for you all. Code below.

27

91

772

@virattt

virat

3 months

I finally looked at the Apple Intelligence architecture today. Main thing that stands out is the orchestration of multiple models. Especially between on-device and server models. 1 • Small LM (SLM) that is 3B param and runs on your device. Vocab size of 49K. 2 • Large LM

Tweet media one

15

133

767

@virattt

virat

3 months

It's going live. My stock market API now has coverage for all S&P 500 tickers. • income statements • balance sheets • cash flow statements 30+ years of data. No API limits. You can connect your AI financial agents to this data. This open beta will run for ~1 week. Main

36

60

764

@virattt

virat

4 months

I finally understand how GPT generates text. Really helps to code it from scratch in Python. There are 5 components: • token embeddings • positional embeddings • transformer blocks • layer normalization • output head It sounds complex, but grokking GPT is simple. Token

Tweet media one

8

130

736

@virattt

virat

6 months

Fine-tuning a Warren Buffett LLM 🧠 I started this workstream today. Overall goal: fine-tune an LLM to analyze companies like Mr. Buffett does. My initial setup: • use mistral 7b instruct • use single gpu in colab • use QLoRA for fast fine-tune • use small dataset to prove

Tweet media one

25

93

737

@virattt

virat

6 months

It’s finally Friday. Time for another LLM cost vs. performance showdown. The result from today’s tests indicate an emergence of 3 distinct LLM tiers: • throughput tier • workhorse tier • intelligence tier Throughput tier: Unreal tokens / sec. Only groq mistral 8x7b at the

Tweet media one

19

122

683

@virattt

virat

7 months

Open Source SEC Filing Reader 📊 A cool and exciting update today. I finally extracted income income statements from a 10-K using Mistral-7B. The output was cleanly JSON formatted. High-level implementation: • download and chunk SEC filing • store chunks in a vector db •

Tweet media one

34

78

672

@virattt

virat

3 months

I finetuned an LLM for spam detection. It reuses gpt-2 weights and has 124M params. Total times: • finetuning took 59 seconds • predictions took 0.02 seconds Everything was done in @GoogleColab for free. Since it's a tiny model, finetuning and inference is extremely fast.

Tweet media one

25

86

662

@virattt

virat

1 year

Alright, I finally understand @LangChainAI agents and tools. I can now create a custom: 1. Tool that "reads" annual reports 2. Agent that answers queries via the tool For my example, I am using $META's 2022 annual report. Code is below. Happy learning 🙂

14

57

657

@virattt

virat

1 month

I fixed up our Warren Buffett financial agent. • added few-shot prompting • use sonnet 3.5 as main LLM This increased answer correctness from 53.3% to 86.6% Small tweaks, big gains.

Tweet media one

15

86

657

@virattt

virat

5 months

Llama 3 crushed my financial metrics tests I tested both the 70B and 8B models. Both aced the metric calculation tasks. The result from today’s tests indicate an emergence of 4 distinct LLM tiers: • thropughput tier • workhorse tier • intelligence tier • groq tier Groq

Tweet media one

14

129

628

@virattt

virat

6 months

My current RAG stack 🥞 • cohere embeddings • cohere command r+ tool calling • cohere rerank 3 • weaviate vector db • opus final output layer Why @cohere : intuitive api, fast inference with high quality for cheap. Why @weaviate_io : easy setup, solid retrieval, helpful

33

53

628

@virattt

virat

1 month

I am building a financial agent from scratch. Inspired by my investing hero, Warren Buffett. All of my code will be open source. Current tools: • get financials • calculate owner earnings • calculate intrinsic value • calculate ROE, ROIC, etc. I am creating the

Tweet media one

27

76

622

@virattt

virat

6 months

LLM Pricing vs. Speed 💰 I ran experiments comparing inference cost vs. speed Task: Text Generation. Given Item 1 (Business) from latest 10-K, explain Nvidia's business model. Experiment setup: • 10 runs per model • 1000 max output tokens • calculate cost per run •

Tweet media one

15

113

487

@virattt

virat

4 months

I finetuned my first LLM today. It was super easy using the code from @DeepLearningAI My initial setup: • 70M param model • 900 row dataset from SEC filings • 1000 training steps • run on google colab This was a test run to see how it all works. My finetuned LLM is far

16

49

495

@virattt

virat

5 months

Llama 3 on @GroqInc is incredible The 70b model beat opus on my financial RAG tests. Llama 3 RAG results: • speed: 2.59s • correctness: 81.33% This is the highest score I have seen on financial RAG. • 7 secs faster than opus • 4% more correct than opus With insane

Tweet media one

15

59

488

@virattt

virat

5 months

Can LLMs understand long documents? A microsoft team just tackled this question. To answer, they fine-tuned Mistral-7B-Instruct with a synthetic dataset. How they created dataset: • used realnewslike corpus • split corpus by 128 token segments • used gpt-4 to generate

Tweet media one

8

71

481

@virattt

virat

2 years

Playing around with @LangChainAI this morning. It's a step-function change in how we'll "fine tune" ChatGPT moving forward. To test it out, I uploaded Airbnb's latest 10K via LangChain's document uploader. Then, I asked questions about the report

Tweet media one

12

40

474

@virattt

virat

8 months

Faster RAG re-ranking with ColBERT After re-ranking using GPT-4 yesterday, I tested out ColBERT for re-ranking today. Test: • Re-ranking Airbnb's 10-K, like before. Results: • ColBERT and GPT-4 were identical in ranking quality However, ColBERT was lightning-fast.

Tweet media one

8

66

464

@virattt

virat

3 months

If you are looking to understand: • how to build an LLM • how to do pretraining • how to do finetuning …all from scratch, then @rasbt book is the best resource I have found. Each chapter is hands-on and written in an easy-to-follow style. Truly a masterpiece on technical

@rasbt

Sebastian Raschka

3 months

If you are looking for something to read this weekend, I am happy to share that Chapter 7 on instruction finetuning LLMs is now finally live on the Manning website: This is the longest chapter in the book and takes a from-scratch approach to implementing

Tweet media one

22

271

2K

5

80

443

@virattt

virat

4 months

I’m building an open source financial agent for fun. Goal is to explore generative UI. Under the hood: • uses @LangChainAI agent + tools • uses @vercel ai sdk • uses @polygon_io financials The code is live on github. Thanks to the great @SullyOmarr for the starter code.

13

34

387

@virattt

virat

5 months

I just migrated my financial RAG evals to LangSmith. Previously, I was doing evals by hand. Now, LangSmith takes care of: • managing datasets • evaluating correctness • measuring latency • visualizing prediction vs. answer These features come built-in. My financial RAG

Tweet media one

16

50

426

@virattt

virat

7 months

I am blown away by RAGAS With 10 lines of code, I created a question + answer dataset of Airbnb's latest annual report (10-K). The dataset has 3 parts: • questions • contexts • ground truth answers Next step: Evaluate how well various LLMs perform RAG on financial

Tweet media one

12

62

425

@virattt

virat

6 months

Can LLMs have infinite context? Researchers from Google say yes. A new paper proposed Infini-attention, which lets LLMs have infinite context. How Infini-attention works: • has local attention like any transformer • has global attention via compression • combines local and

Tweet media one

21

56

417

@virattt

virat

7 months

Reading SEC filings with Instructor Can an LLM read an SEC filing and output structured data? Yes and it is really easy with Instructor. Entire Setup: • store 10-K in vector DB • define @pydantic model • pass 10-K and model to instructor • call LLM using instructor With

Tweet media one

11

45

417

@virattt

virat

3 months

The beta version of my stock market API is live. To start, you can fetch: • income statements • balance sheets • cash flow statements In the beta, you can access up to 10 tickers. Once I setup auth and API keys, I'll expand coverage to 10,000+ stocks. All data goes back

Tweet media one

35

31

417

@virattt

virat

3 months

I finetuned a 124M param LLM for sentiment classification today. Given financial news article, detect if sentiment is positive or negative. Model setup: • gpt-2 pretrained weights • 12 transformer blocks • 124M trainable params • 1024 context window I only used 1,208 rows

Tweet media one

32

47

414

@virattt

virat

5 months

Exploring LLM Pricing 💰 I updated my table to include llama 3. The table now has 4 pricing tiers: • tier 1 starts at $0.25 • tier 2 starts at $12.00 • tier 3 starts at $24.00 • tier 4 starts at $42.00 To get "total cost", I combine input cost and output cost. The

Tweet media one

21

71

404

@virattt

virat

6 months

Open source financial agent 🤖 The github repo is live. You can now run the agent in your browser via LangServe. Things the agent can do: • Get prices for stocks • Get financials for stocks • Get market news for stocks I will be adding more features to the repo over the

12

50

401

@virattt

virat

7 months

Exploring LLM Pricing With so many new LLMs, how do API costs compare? I delved into cost comparisons of models that I would use in production. Main takeaways: • cohere leads with cost-effective model • gpt-3.5 remains excellent value • mistral cost higher than anticipated

Tweet media one

23

74

397

@virattt

virat

3 months

I launched API docs for my stock market API today. If you are building: • ai financial agents • stock analysis tools • quant trading models ..then this API is for you. The API offers: • income statements • balance sheets • cash flow statements You can actually call the

33

28

389

@virattt

virat

8 months

Cohere reranking is seriously good Today, I expanded the RAG reranking tests that I'm running to include Cohere. Overal Test: • Reranking Airbnb's 10-K, as before Reranking Speeds: • 0.24 secs for Cohere • 1.04 secs for ColBERT • 25.47 secs for GPT-4 Turbo • 50.94

Tweet media one

19

60

388

@virattt

virat

8 months

Corrective RAG (CRAG) What happens when the RAG retrieval step performs poorly? A recent paper proposed CRAG, which improves the robustness of RAG systems. CRAG uses T5 to calculate the relevance score of retrieved documents. Relevance scores: • Correct • Incorrect •

Tweet media one

8

77

387

@virattt

virat

9 months

Perplexity AI's team is brilliant. I've been impressed with how fast and helpful their Copilot is. The Copilot is fast because Perplexity was using a fine-tuned GPT-3.5 as of Aug 2023. The Copilot is helpful because it's constantly fine-tuned and has real-world knowledge. In

Tweet media one

13

31

375

@virattt

virat

6 months

Friday is LLM battle day. I added DBRX to the financial metrics challenge. Overall, very impressed with DBRX. Main takeaways: • correctly calculated metrics • ranked top 4 fastest models • competitive pricing DBRX was +50% cheaper and +100% faster than models in its tier.

Tweet media one

14

76

375

@virattt

virat

3 months

I am exploring LLM finetuning this week Two approaches I'm interested in: 1 • finetuning the entire LLM 2 • finetuning only part of the LLM Finetuning the entire LLM is exactly what you think. You update all of the weights in the transformer blocks. Finetuning only part of

Tweet media one

11

46

374

@virattt

virat

9 months

Earlier today, @LangChainAI announced LangGraph. LangGraph lets us build language agents as graphs. The interface is pretty clean. And I just used LangGraph to build a financial agent graph. My graph has two tools: • extract ticker from user query • get latest price for the

Tweet media one

10

49

373

@virattt

virat

1 year

This morning, I spent some more time playing around with @LangChainAI and @pinecone . This time, I did question answering over Airbnb's last 3 annual reports (PDFs). • Less than 50 lines of code • All in Python • Code linked below Exploration was inspired by @mayowaoshin

7

30

362

@virattt

virat

7 months

Exploring LLM Pricing New models have come out since I last shared my pricing table. New models: • command-r (cohere) • mixtral 8x7B (groq) • claude 3 (anthropic) Main takeaways: • cohere offers excellent value for cost • groq mixtral cheaper than mistral mixtral • opus

Tweet media one

12

61

365

@virattt

virat

4 months

I am finetuning llama 3 (8b) on SEC filings Goal: ace financial Q&A and launch in production. Step 1 is data collection: • pick a ticker (eg $NVDA) • grab its SEC filings • generate question + answer dataset • upload dataset to huggingface for reuse So far, I have created

Tweet media one

14

54

359

@virattt

virat

6 months

I never enjoyed parsing SEC filings for LLM use. Now, I never need to manually parse them again. I just came across edgartools, which is an open source library for easily accessing SEC EDGAR. With edgartools, we can: • query filings for a ticker • extract items from filings

Tweet media one

9

36

342

@virattt

virat

5 months

I found a new tool calling champion Llama3 70b on @GroqInc Challenge: given user query, extract financial quarters and years. Example: "How did revenue change between Q4 2023 and year before that?" The 70b model: • passed the task • was very fast • had best pricing I

Tweet media one

22

34

334

@virattt

virat

4 months

I'm training a small LLM this weekend. Found some cool Llama 2 facts while researching. Time to train in GPU hours: • 7B param took 184,320 GPU hours • 13B param took 368,640 GPU hours • 34B param took 1,038,336 GPU hours • 70B param took 1,720,320 GPU hours How about in

Tweet media one

11

59

329

@virattt

virat

8 months

Cohere is excellent at query rewriting Today, I added Cohere and Mistral to my RAG query rewriting explorations. Four models in total: • Cohere • GPT-3.5 • GPT-4 • mistral-medium The input query was ambiguous: "What's up with Airbnb's numbers"? Rewritten query results

Tweet media one

12

41

319

@virattt

virat

6 months

Open Source SEC Filing Reader 📊 The code is now more production-ready. Implementation updates: • used XBRL instead of raw 10-K text • used gpt-4 for better extractions I first downloaded XBRL financial data from EDGAR. Then extracted financial statements from XBRL using

Tweet media one

12

34

310

@virattt

virat

2 months

I am rebuilding my AI financial agent. Fully open source. Runnable locally. Today's change adds stock price charts. How it works: • ask a question • agent selects best tool • agent renders UI components • agent answers question @LangChainAI is perfect for this project:

24

29

315

@virattt

virat

7 months

Financial RAG Evaluation 🕵️ Which LLM can answer financial questions quickly and correctly in a RAG pipeline? This morning, I compared 3 models: • haiku (anthropic) • gpt-3.5 turbo (openai) • command-r (cohere) Overall, haiku was fastest while command-r was most correct.

Tweet media one

15

58

309

@virattt

virat

6 months

Introducing financial-datasets In 5 lines of code, generate datasets from SEC filings. The financial datasets are useful for: • LLM evaluation • LLM fine-tuning • and more The repo is live and fully open source. pip install financial-datasets to get started.

Tweet media one

5

35

301

@virattt

virat

6 months

My financial RAG dataset is live 🗃️ You all asked me to share the 100 question dataset. So, here it is. Dataset details: • 100 questions on Airbnb 2023 10-K • synthetically generated via opus The dataset is tiny right now, but I will continue expanding it. Eventually, I

Tweet media one

10

30

296

@virattt

virat

6 months

Open Source SEC Filing Reader 📊 Another fun update today. I used mistral-7b to extract all financial statements from a 10-K. Extracted statements: • income statement • balance sheet • cash flow statement This was really cool since mistral-7b is free and open source. I

Tweet media one

12

38

276

@virattt

virat

1 year

So, I'm learning how to build LLM-powered chat apps that are more production-ready, from scratch. I've created an open-source repo that contains my ongoing explorations. The stack: • Django backend • React frontend • @LangChainAI agents • Websocket protocol GitHub below.

Tweet media one

Tweet media two

15

30

284

@virattt

virat

5 months

I am diving into LLM fine-tuning. There is a lack of deep tech content on fine-tuning: • how it works • why it works • what it does to an LLM, etc. There is a ton of high-level stuff, however. I want to grok the first principles of fine-tuning. If you have an excellent

18

24

287

@virattt

virat

5 months

LLM Pricing Tiers 💰 I just updated my pricing table. The table now indicates 3 pricing tiers: • tier 1 is $1 to $7 • tier 2 is $12 to $24 • tier 3 is $42 to $120 To calculate "total cost", I combined input cost and output cost. I use a 3:1 ratio, assuming there are 3

Tweet media one

7

46

284

@virattt

virat

3 months

I rewrote my open source + gen ui financial agent today. The new stack: • python backend • nextjs frontend All powered by LangGraph from @LangChainAI . You can run the full-stack application locally on your machine. So far, I've added a tool for charting stock prices. Next

Tweet media one

14

32

276

@virattt

virat

6 months

Open source financial agent 🤖 Our agent can now do basic valuation. I added a tool for calculating intrinsic value via discounted cash flow analysis. We now have 5 tools: • get intrinsic value for ticker • get latest price for ticker • get latest news for ticker • get

Tweet media one

16

35

279

@virattt

virat

6 months

LLM Pricing + Speed + Quality 💰 I ran tests comparing inference cost, quality, and speed today. Task: Financial Metrics Calculation. Given JSON of financial statements, calculate financial metrics. Experiment setup: • 3 financial calculations • 10 iterations per model •

Tweet media one

5

39

276

@virattt

virat

8 months

I have found my RAG rerankers I spent the past few days testing rerankers and there are two that I'll use going forward. • Cohere • ColBERT Both performed as well as GPT-4 in reranking quality and are lightning fast. Cohere's avg inference time was ~200ms. It's

18

29

274

@virattt

virat

5 months

I added arena elo to my LLM pricing table The score is pulled from @huggingface Initial takeaways: • llama 3 70b is game changing • haiku remains excellent value • gemini 1.5 pro is exceptional • gpt-4 turbo reigns supreme My table is sorted by arena elo, desc. Happy to

Tweet media one

15

39

270

@virattt

virat

6 months

Financial RAG eval just got spicier 🌶️ Cohere launched Command R+ today. I tested the LLM and it scored a 70.12% on financial RAG. That is the highest score I have seen on this eval to date. Excellent work from the @cohere team. Command R+ has over 100B parameters, so

Tweet media one

10

37

270

@virattt

virat

7 months

Open source financial agent 🤖 I just added a new tool to @LangChainAI and am super excited for it. This tool fetches daily prices for a ticker. We now have 4 tools in total: • get latest price • get latest news • get financials • get historical prices The possibilities

Tweet media one

12

33

255

@virattt

virat

8 months

ColBERT reranking continues to impress Previously, I compared reranking speeds of ColBERT and GPT-4 Turbo. Today, I added mistral-medium to the mix. Overall Test: • Reranking Airbnb's 10-K, like before. Reranking Speeds: • 1.04 secs for ColBERT • 25.47 secs for GPT-4 Turbo

Tweet media one

10

44

264

@virattt

virat

6 months

I am fine-tuning my Warren Buffett LLM The toughest part is creating datasets. Not anymore. In 1 line of code, financial-datasets now creates datasets from Buffett’s letters. This works for any PDF. How to use: • set PDF url • set max questions • generate dataset I can

Tweet media one

8

34

250

@virattt

virat

8 months

Exploring Corrective RAG in code A few days ago, @LangChainAI released an excellent cookbook on implementing CRAG. I reused the cookbook to implement a simple financial assistant. My setup: • use vector db for SEC filings • use Tavily for web search To test the CRAG flow, I

Tweet media one

4

48

253

@virattt

virat

7 months

My open source financial agent 🤖 This is a new side project that I'm building for fun. It'll begin in colab, so you can run all of my code as I implement it. Two tools to start: • latest price for ticker • latest news for ticker Right now, it can answer: "What is the

Tweet media one

17

36

249

@virattt

virat

3 months

I finally read up on LoRA last night. LoRA can reduce finetuning params by 10,000 times. High-level implementation: • we freeze original LLM weights • we create small, low-rank matrices • we only train small matrices • we adjust LLM output w/ small matrices Instead of

Tweet media one

5

43

252

@virattt

virat

10 days

@sakethkotamraju Missed opportunity to call it Scammers will love this

1

3

251

@virattt

virat

7 months

Cohere's command-r is solid. The model launched today and is optimized for RAG. I ran it through my financial RAG evaluation pipeline vs. gpt-3.5 turbo. command-r won. Financial RAG Eval Setup: • naive RAG (no reranking, etc.) • 100 questions on Airbnb's 2023 10-K •

Tweet media one

7

40

250

@virattt

virat

5 months

I studied word embeddings today. Mainly, how LLMs like GPT-4 convert input text into input embeddings. It’s simpler than I expected. There are five key steps: 1. Convert input text to input tokens. 2. Map tokens to token IDs. Common vocab size is ~50K tokens. 3. Create

Tweet media one

6

27

247

@virattt

virat

7 months

Open source financial agent update 🤖 I just added a tool that lets us retrieve financials. Our agent can now get: • income statements • balance sheets • cash flow statements We can ask questions like: "What is $ABNB's latest net income?" Upcoming tools: • get

Tweet media one

7

32

245

@virattt

virat

6 months

Cmd R+ beats Sonnet at financial RAG I initially assumed these models were equivalent due to pricing. However, command r+ was both faster and 5% more correct than Claude Sonnet on financial RAG evals. Financial RAG pipeline: • openai embeddings • cosine similarity retrieval

Tweet media one

10

37

244

@virattt

virat

6 months

Financial RAG Evaluation 🕵️ I added reranking to the pipeline today. As expected, command-r performed even better. Main takeaways: • command-r excels at RAG • cohere reranking is seriously fast • gpt-3.5 slow at reranking, fine without Experiment setup: • included

Tweet media one

9

31

245

@virattt

virat

1 year

@burrytracker “This makes up 93% of his portfolio.” It actually doesn’t. His 13-F holdings, which is what you’re looking at, doesn’t contain his cash and non-US positions.

12

1

235

@virattt

virat

3 months

Our OSS financial agent now has a python backend. I migrated the agent code to use the latest @LangChainAI gen ui framework • set up langgraph • set up langserve • set up fastapi • set up agent tools Super excited about this project. We'll have a true client + server app

Tweet media one

8

28

239

@virattt

virat

3 months

My stock market API landing page is live. Initial focus is fundamentals data. • starting with 10,000 stocks • optimized for LLMs and AI agents • no subscriptions or contracts • simple and clean API The waitlist is now live 🙏 I am setting aggressive goals for myself.

Tweet media one

25

21

237

@virattt

virat

4 months

My fine-tuning journey begins today I am training llama 3 8b to create high quality datasets for financial Q&A. Fine-tuning approach: • create high quality datasets via gpt-4o • fine-tune llama 3 on datasets • evaluate performance I am using my financial-datasets library to

13

21

236

@virattt

virat

7 months

Using an LLM to evaluate an LLM Yesterday, I shared initial thoughts on LLM evaluation. One method was LLM-as-judge. Turns out, there is an excellent paper on it from Jan 2024: "Leveraging Large Language Models for NLG Evaluation: A Survey" My 3 favorite techniques: •

Tweet media one

14

41

231

@virattt

virat

6 months

I am pumped about today's update 🧪 In 1 line of code, financial-datasets lets you create Q&A datasets from a 10-K. Just specify: • ticker • year • max questions And financial-datasets takes care of the rest. No need to manually download, parse, and chunk SEC filings ever

Tweet media one

6

24

226

@virattt

virat

6 months

Fine-tuning a Warren Buffett LLM 🧠 Exciting update today. I generated a question + answer dataset using Berkshire's 2023 annual letter. Dataset schema: • question • answer • context The synthetic dataset contains 110 generated questions. Next step is to generate

Tweet media one

15

31

214

@virattt

virat

4 months

I’m learning how to build an LLM from scratch. Found some fun facts today. As we know, GPT-3 has 175B params. To train GPT-3 from scratch: • takes 355 years with single V100 • takes 665 years with single RTX 8000 The V100 is a data center GPU and would cost ~$4.6M. The

8

25

206

@virattt

virat

6 months

Financial RAG Evaluation🕵️ Haiku got lots of excitement yesterday. I ran it through my financial RAG eval pipeline today. Haiku was fast, but struggled on correctness versus similar models. Cmd-r remained financial RAG champ. Main takeaways: • haiku faster than gpt-3.5 •

Tweet media one

7

34

202

@virattt

virat

1 year

Lots of excitement around BabyAGI and AutoGPT. Meanwhile, I’m still trying to understand how @LangChainAI Agents and Tools work on a deeper level. Creating a simple tutorial on Agents this weekend. I’m curious as to what use cases folks would find helpful in the tutorial.

21

8

198

@virattt

virat

4 months

Our open source financial agent can now show price charts for multiple stocks 📈 On the fly. Using generative UI. Only took 10 mins to add thanks to @LangChainAI tools and @vercel ai sdk. Current agent tools: • show price charts • show latest news • show current price One

Tweet media one

14

24

204

@virattt

virat

5 months

Understanding LLM attention is tough. I will simplify how it works. The attention mechanism has 3 steps: 1 • compute attention scores 2 • compute attention weights 3 • compute context vectors Main goal of self-attention is step 3, computing context vectors. What are

Tweet media one

5

37

198

@virattt

virat

5 months

How GPT-4o predicts the next token 🎭 I have covered: • simple attention (linked) • trainable attention (linked) Next is causal attention. Causal attention is a fancy term for masking future tokens. It builds on top of trainable attention. The main change is applying a

Tweet media one

5

26

197

@virattt

virat

5 months

I tried LangSmith evaluation for financial RAG today. Pleased to report it does a bunch of heavy lifting. • loading dataset • creating RAG pipeline • running evaluator My favorite is that eval results are automatically displayed in real-time UI. Before, I was tracking

Tweet media one

6

26

193

@virattt

virat

3 months

Anthropic launched claude 3.5 sonnet today. In the release, agentic coding evals caught my attention. How agentic coding eval works: • claude reads an open source codebase • claude gets instruction (fix bug, etc.) • claude creates action plan • claude implements required

Tweet media one

3

18

168

@virattt

virat

6 months

Open source financial agent 🤖 We are running on LangServe. This means that we can chat with the agent in our browser. Once I have access to Hosted LangServe, I will deploy the agent to production. Current agent tools: • get latest price for ticker • get latest news for

14

24

171

@virattt

virat

5 months

@gaganbiyani “The app is fairly useless for language learning” Disagree. Learning is what you make of it. Duolingo is a great starting point for grokking initial conversation + dialogue, which can very well lead to further language learning.

5

3

176

@virattt

virat

1 year

Code is here: I'm not using @OpenAI 's GPT-4 for this, but an older model. Please let me know if you have any feedback! My goal is to make learning about LLMs as accessible as possible for everyone 🙂

Tweet card media

chatbot_memory_pdfs.ipynb

GitHub Gist: instantly share code, notes, and snippets.

gist.github.com

5

14

173

@virattt

virat

4 months

I just loaded pretrained GPT-2 weights into my own custom LLM. Why I think this is cool: • our model has a great starting point • we can run the model for free • we can finteune the model easily • we can customize the model architecture • we ultimately own the model By

Tweet media one

4

26

170

@virattt

virat

6 months

I've been mesmerized by generative UI So, I decided to figure out how it works. Turns out, rendering agent output in UI components is easier than expected. Main steps: • define tools that your agent can use • map each tool to a UI component • maintain agent state (eg.

6

22

169

@virattt

virat

5 months

Query transformation via tool calling I am trying to create perfect queries for my vector DB. Challenge: given user query, extract financial quarters and years. Query: "How did revenue change between Q4 2023 and year before that?" • years: [2023, 2022] • quarters: [4, 4]

Tweet media one

8

24

166

@virattt

virat

8 months

Listwise Reranking with LLMs I came across this paper that proposes Listwise reranking of retrieved documents for RAG. Two reranking approaches: • pointwise reranking • listwise reranking Pointwise reranking Given list of documents, we feed query + each document individually

Tweet media one

3

26

159

@virattt

virat

3 months

I added a classification head to my 124M param LLM today. Goal is to finetune the LLM for binary classification. Current architecture: • 1024 context window • 12 transformer blocks • 124M parameters • 1 output head for binary classification How does this differ from a

Tweet media one

11

20

159

@virattt

virat

1 month

I trained a 1.5B param LLM on 10-Ks All from scratch. It took ~60 seconds on an A100. Training details: • 50,000 tokens in data set • 1600 embedding dimensions • 1024 context window • 48 transformer blocks • 25 attention heads • 10 epochs total I previously

Tweet media one

5

19

155