Georges Harik @gharik Twitter profile

Pinned Tweet

Georges Harik

4 months

Worked with @ericzelikman @EchoShao8899 @vpj @nickhaber @noahdgoodman on this awesome paper to train LLMs to reason before they speak, from data at large.

Quiet-STaR: Language Models Can Teach Themselves to Think Before Speaking

When writing and talking, people sometimes pause to think. Although reasoning-focused works have often framed reasoning as a method of answering questions or completing agentic tasks, reasoning is...

arxiv.org

7

16

95

Last Seen Profiles

@laurathesimp

@ojebutimamu

@Lisa_woods23

@LunaD_79

@KasihLudah

@cut

@cafpunk

@stw_pdg

@Mustafa20192626

@Furqa45

@tao150150

@MikeJCzapla

@MissNtabeni

@flexcar

@ibubohay2

@SamuelAdedara4

@glamozrous

$Nyuukke \ главная по хорни вайбам$

@Nyuukke

@s9afwa4n

@KasihLudah

@RealCapeTruther

@perfctszn

@Arbys

@stw_pdg

@2016immortal

@jandakembangstw

@bigredsibin

@rusty_woods

@dareyo_

@Jeksjxzsub

@stw_pdg

@bokeplokalmalam

@sukira0203

@msfiiire_

@healix_HX

Georges Harik

@gharik

4 months

tesla fsd 12.3 is quite good

9

10

210

Georges Harik

@gharik

1 year

I’ve been training LMs and wanted to contribute to an open source LM. @vpj and I are releasing a 9 billion parameter LM with open licenses to be useful in a variety of settings. Its been trained on 70 billion tokens and we will release checkpoints every 20b or so tokens.

labml.ai

@labmlai

1 year

We are open sourcing GeoV-9b, a 9 billion parameter causal language model designed and trained by @gharik 🖥 Code (apache 2.0): 📀 Model weights (bigscience-openrail-m): 📗 Google Colab notebook: 🧶👇

7

120

479

6

32

162

Georges Harik

@gharik

1 year

Updated the Geov-9B model to 98 billion tokens trained.

2

1

18

Georges Harik

@gharik

7 years

The first consequence of ai will be a rapid loss of jobs requiring motion, vision and simple speech recognition.

3

6

17

Georges Harik

@gharik

1 year

This Web LLM is pretty impressive Got it to work on my MacBook m2 with Chrome canary. Good quality and great speed using the gpu.

3

1

16

Georges Harik

@gharik

3 years

the gavel of time is swift

3

1

13

Georges Harik

@gharik

7 months

Bard is better now with Gemini Pro. Nice work.

1

0

11

Georges Harik

@gharik

9 months

This is a great team working on inference and other infrastructure to make it easier to launch intelligent applications and web sites.

DeepInfra

@DeepInfra

9 months

We just closed $8M seed round from A Capital Ventures, @felicis , @gharik , @svangel and others to scale our inference platform and continue to provide simple, low cost, production API to the top open AI models.

2

5

22

0

15

Georges Harik

@gharik

2 years

Happy 18th birthday @gmail @paultoo sanjeev and the whole team!

1

0

13

Georges Harik

@gharik

1 year

another open instruction dataset. Will start instruction tuning Geov with this and the databricks and anthropic data soon. access to open data is awesome and these are great projects.

Mike Conover

@vagabondjack

1 year

The Open Assistant chat corpus just dropped, 100k-scale chat/instruction dataset from thousands of participants. The era of open data is upon us. High quality metadata, incl. toxicity scores, attached to each record.

13

134

651

0

1

12

Georges Harik

@gharik

2 years

Athelas is a great place to consider working at. Their company will move healthcare from reactive to proactive, from expensive to affordable and from a hospital to your home. Their vision is expansive, it started with using Vision ML, and will go much further.

Tanay Tandon

@tanay_tandon

2 years

Today, @dpcbod and I are pumped to announce the recent @athelas fundraise: $132mm across two consecutive rounds to build digital tools for healthcare providers. Honored to be partnered with @htaneja @Alfred_Lin @arjunsethi @garrytan @gharik @jhong and other incredible folks.

11

23

203

0

1

12

Georges Harik

@gharik

1 year

I should mention I have some initial indication that the Roper technique is good for language modeling. On a couple of 3b parameter runs, up to 20000 steps of batch size 256, Roper shows an advantage over standard Rope encodings for attention, in terms of log likelihood.

0

1

12

Georges Harik

@gharik

1 year

this seems pretty nice and a good license.

Vitaliy Chiley

@vitaliychiley

1 year

Our team at @MosaicML has been working on releasing something special: We're proud to announce that we are OPEN SOURCING a 7B LLM trained to 1T tokens The MPT model outperforms ALL other open source models! Code: Blog: 🧵

27

221

1K

0

1

11

Georges Harik

@gharik

1 year

Just got around to reading this and it seems like a good way to get better answers, using more calls to an LLM to simulate multi agent and multi round debate in order to reach consensus.

Shuang Li

@ShuangL13799063

1 year

Check our latest paper, "Improving Factuality and Reasoning in Language Models through Multiagent Debate" .

0

1

18

0

1

8

Georges Harik

@gharik

1 year

I’ve been looking for more data to train on. This seems like it might help a lot.

Together AI

@togethercompute

1 year

Announcing RedPajama — a project to create leading, fully open-source large language models, beginning with the release of a 1.2 trillion token dataset that follows the LLaMA recipe, available today! More in 🧵 …

38

408

2K

1

10

Georges Harik

@gharik

10 months

Looking forward to this great team improving healthcare for everyone!

Tanay Tandon

@tanay_tandon

10 months

Announcing the merger of Athelas and Commure, along with fresh capital in a fundraise. I’m excited to be taking over the combined $6b company as CEO, @dpcbod as COO, and @dhruvp as CTO. Working with @htaneja in this next phase will be exhilarating

31

26

309

0

8

Georges Harik

@gharik

4 years

Every week we don’t act to halt the spread of the virus is two times as many deaths.

1

3

10

Georges Harik

@gharik

4 months

One of the reasons I'm interested in this area is to allow LLMs, post training, to use variable and possibly highly increased compute in producing answers where the answers are super important to get right.

1

0

11

Georges Harik

@gharik

1 year

The TOS of bard and open ai say you can't build ML (bard) or foundation models (open ai) using their services. Is that for distillation only or does that include writing training code? What would someone who wants to use code completion / synthesis to train models use?

1

0

8

Georges Harik

@gharik

1 year

pretty amazing execution

Character.AI

@character_ai

1 year

Character fam, we're climbing the charts!! 📈 #CharacterAI is #3 in the Top Free Entertainment Apps on the @AppStore !! Thank you to our amazing community for the continuous support!❤️

120

88

1K

0

6

Georges Harik

@gharik

4 months

New results showing Quiet Star also helps COT output - and an open source training script to try this on your own models :)

Eric Zelikman

@ericzelikman

4 months

A couple exciting updates! First, we quantitatively evaluated the improvement from combining Quiet-STaR with chain-of-thought (i.e. letting the model think before each CoT token). We found it improves zero-shot CoT accuracy on GSM8K by over 7%!

9

22

155

0

1

9

Georges Harik

@gharik

2 years

free ai models is the next open source

0

9

Georges Harik

@gharik

3 years

Happy Thanksgiving!

0

9

Georges Harik

@gharik

4 months

Right now each 1000 token answer from a frontier LLM is probably produced using no more than around .1c to .2c of allocated capital spend plus power. But for situations where I really care about an answer, this cannot be increased without a large increase in capital and spend to

1

0

8

Georges Harik

@gharik

1 year

this seems pretty awesome

Yann LeCun

@ylecun

1 year

This is huge: Llama-v2 is open source, with a license that authorizes commercial use! This is going to change the landscape of the LLM market. Llama-v2 is available on Microsoft Azure and will be available on AWS, Hugging Face and other providers Pretrained and fine-tuned

423

4K

16K

0

1

6

Georges Harik

@gharik

1 year

Talking to friends Matt Smith and Craig Silverstein, we were thinking maybe one way to align AIs better is to come up with lots of AI positive literature and videos, since their self image and personality as AIs largely developed based on these characterizations.

3

0

8

Georges Harik

@gharik

1 year

This seems pretty interesting for risk reduction for Alzheimer's

Pascal Geldsetzer

@PGeldsetzer1

1 year

Biggest thing to ever come out of my little group. Pls help spread this finding! We found clean, CAUSAL evidence that the shingles vaccine prevents a good chunk of dementia cases. So, could a virus cause Alzheimer’s->YES! Hear me out & see preprint: 🧵1/

335

4K

13K

0

5

Georges Harik

@gharik

4 years

Great progress from Deepmind on protein folding.

Demis Hassabis

@demishassabis

4 years

Thrilled to announce our first major breakthrough in applying AI to a grand challenge in science. #AlphaFold has been validated as a solution to the ‘protein folding problem’ & we hope it will have a big impact on disease understanding and drug discovery:

162

2K

8K

1

0

8

Georges Harik

@gharik

4 months

Thought generation I believe may be an additional tool (or evolution of) COT prompting and more complex techniques to elicit more correct answers, and may provide us a way of getting much better answers but at the cost of higher compute. I think ultimately these techniques will

1

0

9

Georges Harik

@gharik

2 months

Nice work on releasing SGE to more people @GoogleAI !

0

10

Georges Harik

@gharik

1 year

open ai is going to draw the web into chatgpt.

1

0

7

Georges Harik

@gharik

4 years

This was a pretty interesting read.

0

6

Georges Harik

@gharik

4 months

@MrGoldBro I'm not sure really, just seems to see and react to other vehicles, people and the road pretty well, even on surface streets.

0

7

Georges Harik

@gharik

4 years

Because we’re a connected country the only thing that will work is coordinated social distancing by everyone simultaneously. We can work towards that and to protect those who would be negatively impacted by such a move, or see half the country infected, and millions dead.

0

7

Georges Harik

@gharik

7 years

We need some solution for this. It probably involves lots of education in a scalable way, job training, job placement and mobility

1

2

6

Georges Harik

@gharik

7 months

This is a great analysis of what you need to perform associative recall with various NN architectures.

Sabri Eyuboglu

@EyubogluSabri

7 months

Curious whether sub-quadratic LMs like RWKV and Hyena will replace Transformers? We find that Transformers are still much better at associative recall (AR): a simple task known to be essential for in-context learning.

4

38

145

1

0

6

Georges Harik

@gharik

1 year

this seems pretty cool. includes an open instruction tuning dataset.

Ali Ghodsi

@alighodsi

1 year

Free Dolly! Introducing the first *commercially viable*, open source, instruction-following LLM. Dolly 2.0 is available for commercial applications without having to pay for API access or sharing data with 3rd parties.

55

448

2K

0

6

Georges Harik

@gharik

1 year

link here

0

6

Georges Harik

@gharik

2 years

using a phone is destroying my spine, but its the best way to access the internet for now. any alternatives? wearing 2lbs or really any weight on my head seems like not a great replacement.

4

1

5

Georges Harik

@gharik

1 year

@stephenbalaban I'm using Lambda to train. I'm also an investor and advisor.

1

0

5

Georges Harik

@gharik

10 months

after playing with it some on deepinfra (plug) this is quite an amazing model.

Guillaume Lample @ ICLR 2024

@GuillaumeLample

10 months

Mistral 7B is out. It outperforms Llama 2 13B on every benchmark we tried. It is also superior to LLaMA 1 34B in code, math, and reasoning, and is released under the Apache 2.0 licence.

52

481

3K

0

5

Georges Harik

@gharik

1 year

Simon is assembling a great board for an awesome company.

james hong

@jhong

1 year

Nimble is a super cool company, not surprised by this! The AI stuff is going to get even more insane stuff when we start seeing the robotics really happen

0

7

0

5

Georges Harik

@gharik

2 years

chatgpt prompt: you were just appointed speaker of the house …

1

0

4

Georges Harik

@gharik

1 year

So if cash gets worse to hold, and there's a rush to treasuries, yields go down, and the most valuable things to own become equities, medium and long term bonds and ... SVB?

1

0

5

Georges Harik

@gharik

1 year

this is a great addition to open models

Lior⚡

@AlphaSignalAI

1 year

BREAKING: StabilityAI just released their own LLM, called StableLM. "The Alpha version of the model is available in 3 billion and 7 billion parameters, with 15 billion to 65 billion parameter models to follow." The models are available on GitHub! Repo:

23

189

934

0

1

4

Georges Harik

@gharik

6 months

Awesome progress from Google/Deepmind

lmsys.org

@lmsysorg

6 months

🔥Breaking News from Arena Google's Bard has just made a stunning leap, surpassing GPT-4 to the SECOND SPOT on the leaderboard! Big congrats to @Google for the remarkable achievement! The race is heating up like never before! Super excited to see what's next for Bard + Gemini

154

626

3K

2

0

4

Georges Harik

@gharik

2 months

@kcoleman @CommunityNotes awesome work!

2

0

4

Georges Harik

@gharik

7 years

red moon rising

3

0

4

Georges Harik

@gharik

2 years

A nice Santa message for the kids for the holidays!

Francesco Rossi

@redsh

2 years

Tired of the endless lines at the mall for meeting Santa Claus? I have the coolest app to make videos for your little ones. It’s called “BeSanta”and it runs in real time on your phone, changes the voice too

0

2

13

0

4

Georges Harik

@gharik

4 months

@WilliamWeishuh2 @ericzelikman @EchoShao8899 @vpj @nickhaber @noahdgoodman hopefully by reasoning about the narrative and writing before emitting words

1

0

4

Georges Harik

@gharik

4 years

Great work from @arnaoutlab @moon_grady AUC 0.99 on prenatal detection of congenital heart disease!

Expert-level prenatal detection of complex congenital heart disease from screening ultrasound using...

Congenital heart disease (CHD) is the most common birth defect. Fetal survey ultrasound is recommended worldwide, including five views of the heart that together could detect 90% of complex CHD. In...

www.medrxiv.org

0

1

3

Georges Harik

@gharik

1 year

This seems really great for instruction tuning a model

Shayne Longpre @ICML

@ShayneRedford

1 year

✨New Paper✨What’s the best completely public competitor to #ChatGPT ? Flan-T5 beats all public models we tested: Flan-T5 3B ▶️ T0++ 3B ▶️ OPT-IML 175B ▶️ GLM-130B ▶️ Flan 2021 3B ▶️ NIv2 3B We release the @GoogleAI 🌟Flan Collection🌟data + methods for Instruction Tuning! 1/

24

250

1K

1

0

3

Georges Harik

@gharik

2 years

lambda just a few days ago got more a100 40GB cards and they are still available

0

1

4

Georges Harik

@gharik

4 years

@bling0 @blingcapital Wow that's awesome Ben!

1

0

3

Georges Harik

@gharik

5 years

It seems that the use of antibiotics and antifungals on livestock and in farming outweigh their usage on humans. Maybe not the best idea to help in the development of resistant bacteria and fungi.

1

0

2

Georges Harik

@gharik

8 months

Great set of announcements and releases by Google today!

Jeff Dean (@🏡)

@JeffDean

8 months

I’m very excited to share our work on Gemini today! Gemini is a family of multimodal models that demonstrate really strong capabilities across the image, audio, video, and text domains. Our most-capable model, Gemini Ultra, advances the state of the art in 30 of 32 benchmarks,

276

3K

13K

0

3

Georges Harik

@gharik

4 months

@vpj I'll check tomorrow, and try to get 12.3.2 if I don't already have it

0

3

Georges Harik

@gharik

1 year

@jhong @stephenbalaban I haven’t seen anything grow as fast as lambda cloud.

0

3

Georges Harik

@gharik

2 years

@simonkalouche @jhong @Jason yeah! maybe manouche zaatar first?

0

3

Georges Harik

@gharik

1 year

I think this has real promise, it's using 3.5G of memory, while you can get pretty high memory configurations on m2s, as high as 96G and I think you can run much bigger models than this on a laptop and likely will soon.

0

3

Georges Harik

@gharik

1 year

@bindureddy pretty cool and nice post!

0

2

Georges Harik

@gharik

7 years

The first negative consequence that is. There will be lots of positive consequences. Cheaper housing, goods, services, better health care.

0

2

3

Georges Harik

@gharik

1 year

@A__Diack @labmlai To get output that follows your instructions you may want to use alpaca training or other instruction following methods. We released the weights so this, and many other post processing methods are possible using finetuning or RLHF or other methods.

1

0

3

Georges Harik

@gharik

1 year

In particular the data preparation on the datasets saves a lot of time.

0

3

Georges Harik

@gharik

1 year

@lxuechen good idea, we'll try it at some point

0

3

Georges Harik

@gharik

9 months

This seems pretty interesting, especially the causal modeling component.

Dan Fu

@realDanFu

9 months

Excited about models that are sub-quadratic in sequence length and model dimension? Our Monarch Mixer paper is now on arXiv -- and super excited to present it as an oral at #NeurIPS2023 ! Let's dive in to what's new with the paper and the new goodies from this release: Monarch

4

60

292

0

3

Georges Harik

@gharik

3 years

Is there someone building a usable watch (android a plus) that continuously detects co2 levels? Seems like awareness would be useful to in the long term trigger good air quality changes in indoors spaces.

0

3

Georges Harik

@gharik

1 year

@dsivakumar I just mean its autoregressively trained to predict the next token.

0

3

Georges Harik

@gharik

1 year

@jackclarkSF another thing to make it slightly more comparable is to use a fraction of the HH prompt that fits in opts context since the instruction tuned models have kind of prompt ingested being helpful.

0

2

Georges Harik

@gharik

1 year

@olcan @Francis_YAO_ or by sparsely activating parameters

0

2

Georges Harik

@gharik

1 year

@olcan @rasbt @EMostaque @OpenAI My guess is they dont have their h100sxm delivered yet so theres no point training a larger model on the same hardware as gpt4, it wouldn’t terminate in time compared to waiting for probably a lot more h100s that are also each individually faster.

1

0

2

Georges Harik

@gharik

9 months

@ylecun @joanfihu I think one thing that would be required would be a way to coordinate funding large training runs between multiple, possible quite a few, participants

0

2

Georges Harik

@gharik

1 year

@A__Diack @labmlai Good observation, glad it worked on Colab! I'm not sure at this stage of training and without instruction following it will do quite what you ask it to do. As for accuracy even larger models have issues with that.

1

2

Georges Harik

@gharik

1 year

@ShayneRedford @GoogleAI The dataset preparation script is nice. But does a lot of the training turn into either predicting a sentence whole, or a suffix based on a prefix? If so maybe just a big file of what you used in FLAN, with json markings for prefix/suffix, would get adopted more easily.

3

0

2

Georges Harik

@gharik

5 months

@AravSrinivas or possibly heating

0

2

Georges Harik

@gharik

1 year

@jackclarkSF seems reasonable but higher temp like 1 might get you less repetitive output.

2

0

2

Georges Harik

@gharik

1 year

@perplexity_ai this is pretty cool

1

0

2

Georges Harik

@gharik

1 year

@moinnadeem @vpj Here's a link to the roper technique.

Rotary Positional Embeddings with Relative distance (RoPER)

This is an implementation of RoPER which adds relative distance information to embeddings on top of RoPE introduced in RoFormer: Enhanced Transformer with Rotary Position Embedding

nn.labml.ai

0

2

Georges Harik

@gharik

4 years

0

1

2

Georges Harik

@gharik

1 year

@arthurmensch @vpj yes the plan is to get to around 300 billion tokens or so

0

2

Georges Harik

@gharik

4 months

@ptiberry @LePoint sure go ahead :)

0

1

Georges Harik

@gharik

1 year

@laion_ai this is a good idea

0

2

Georges Harik

@gharik

6 months

@olcan but did you get the vision pro?

1

0

1

Georges Harik

@gharik

1 year

hey @elon please fix deep links on ios. when i see a tweet on safari and select open app it doesnt take me to the tweet, so that button isnt as useful as it could be.

1

0

2

Georges Harik

@gharik

7 years

My guess is this includes transportation, construction, manufacturing. Independently many retail jobs are disappearing.

0

2

Georges Harik

@gharik

1 year

@arvinds Currently our plan is to get to around 300 billion tokens for this run. We will assess and correct if need be.

0

1

Georges Harik

@gharik

2 months

@olcan let me know how you like it

1

0

1

Georges Harik

@gharik

2 years

@sama I want you to train it with an reward model, and STAR on GSM8K so it can solve math problems at a middle school level.

0

1

Georges Harik

@gharik

2 years

@jhong @gmail @paultoo I think it's got a good shot of making it ;)

0

1

Georges Harik

@gharik

1 year

@OfficialLoganK can you all build a solution for indexing and querying ones own documents or files.

1

0

1

Georges Harik

@gharik

6 months

@olcan i tried to lie down to reduce neck strain but the immersive experiences dont work lying down.

0

1

Georges Harik

@gharik

1 year

@vagabondjack Not sure its that goofy, the instructions are shorter than the output so might be easier to learn actually, and produce data for. Also the training begins to look like ul2 maybe where you’re training to produce infills.

1

0

1

Georges Harik

@gharik

2 months

@olcan congratulations!

0

1

Georges Harik

@gharik

2 months

@olcan seems cool

1

0

1

Georges Harik

@gharik

1 year

@olcan @elon lol I thought it autocompleted correctly @elonmusk

0

1

Georges Harik

@gharik

2 years

@parindam @JeffDean @quocleix I thought it was you “promoting” chain of thought prompting :)

1

0

1

Georges Harik

@gharik

1 year

@vgoklani_ai @labmlai @_akhaliq @AiEleuther Will try to do those things may take a bit.

0

1

Georges Harik

@gharik

2 years

@David_desJ yeah that works well when im at a desk and monitor but not for large parts of the day when Im not

1

0

1

Georges Harik

@gharik

1 year

@vagabondjack your open model releases are great btw thanks!

1

0

1