Jiayi Pan @pan_jiayipan Twitter profile | Pikagi

Pikagi

Jiayi Pan

@pan_jiayipan

814

Followers

1,122

Following

11

Media

156

Statuses

First year PhD student @Berkeley_AI / @BerkeleyNLP . 🧑‍🍳Post-train generalist agents.

https://t.co/jjMjnzsNSl

Joined September 2021

Don't wanna be here? Send us removal request.

Pinned Tweet

@pan_jiayipan

Jiayi Pan

3 months

New paper from @Berkeley_AI on Autonomous Evaluation and Refinement of Digital Agents! We show that VLM/LLM-based evaluators can significantly improve the performance of agents for web browsing and device control, advancing sotas by 29% to 75%. [🧵]

Tweet media one

11

58

314

Last Seen Profiles

@glazezz

@lara_korte

@BilalSargana977

@IvDotson

@J5daigada

@_TherealkingRjj

@CryptoB89

@MarianneDHobbit

@ymahmoudali

@dimigiri

@stw_pdg

@iaeres

@bozkurtaatli

@LGBTQFKATWIGS

@starstar_rip

@jasmineaminee

@z_rybak

@stk_info

@TodoEncryptoNFT

@CoachFaanes

@gumrrs

@CharisRidley666

@Whizlaykins

@pengen_stw

@bleuful_x

@YonceThees

@Tjukka

@ToneSniff

@CarloMazzone795

@linkmore2024

@Cryptowithkhan

@zSafwan

@CarmenSambrano

@_BlueZeru

@KristyFerraro

@gadgetflava

@pan_jiayipan

Jiayi Pan

4 days

I study LM agents, but strangely I find LM math reasoning paper more relevant than most agent papers🤔……

6

4

73

@pan_jiayipan

Jiayi Pan

8 days

🎉Release day! We develop RL techniques / infra to post-train VLM agents for device control. Our 2B VLM, when post-trained with an autonomous evaluator (reward model), improves its success rate on Android device-control tasks from 17% to 67%.

@aviral_kumar2

Aviral Kumar

8 days

🚨 New paper: we trained a SOTA (> GPT4, Gemini) VLM agent, DigiRL, that can do tasks on an Android phone in real time, in the wild, via autonomous offline + online RL Web: Paper: 🧵 ⬇️ / little gif of learning progress👇:

5

107

606

1

4

52

@pan_jiayipan

Jiayi Pan

2 months

@DimitrisPapail We have a paper on that!

Grounding Visual Illusions in Language: Do Vision-Language Models Perceive Illusions Like Humans?

Yichi Zhang, Jiayi Pan, Yuchen Zhou, Rui Pan, Joyce Chai. EMNLP 2023.

vl-illusion.github.io

3

1

42

@pan_jiayipan

Jiayi Pan

2 years

Just finished all my grad school applications yesterday. It was such a great opportunity to reflect on my research, career goals, and future. Excited to take a break from the pressure of "greedy optimization" and focus on the bigger picture.

0

0

34

@pan_jiayipan

Jiayi Pan

4 months

Introduce Archer, our latest efforts to develop better RL algorithms for LM agents. This multi-turn RL alg outperforms all baselines significantly and can achieve up to 100X greater sample efficiency comparing to PPO. It was a pleasure to be part of the team.

Tweet media one

@aviral_kumar2

Aviral Kumar

4 months

How can we train LLM Agents, to learn from their own experience autonomously? Introducing ArCHer, a simple (i.e., small change on top of standard RLHF) and effective way of doing so with multi-turn RL 🧵⬇️ Paper: Website:

2

41

193

0

4

31

@pan_jiayipan

Jiayi Pan

2 months

OpenDevin is more than just a reproduction of Devin—it's a vibrant community of researchers and engineers with an exciting, ambitious roadmap ahead. This can potentially provide lasting value to the community Don't miss out!

@xingyaow_

Xingyao Wang

2 months

Introducing OpenDevin CodeAct 1.0 - a new State-of-the-art open coding agent! It achieves a 21% unassisted resolve rate on SWE-Bench Lite, a 17% relative improvement above the previous SOTA by SWE-Agent. Check out our blog or the thread 🧵for more details:

Tweet media one

5

56

236

0

3

30

@pan_jiayipan

Jiayi Pan

13 days

I will at CVPR next week! If you are interested in: • Building (real-time) VLMs • Post-training (multi-modal) generalist agents We should talk! My DM is open :)

0

2

25

@pan_jiayipan

Jiayi Pan

3 months

Thanks Aran for sharing! AI feedbacks will enable autonomous evaluation and improvement of language agents at scale. We have a thread here if you wanna learn more :)

@arankomatsuzaki

Aran Komatsuzaki

@arankomatsuzaki

3 months

Autonomous Evaluation and Refinement of Digital Agents Improves WebArena's GPT4 SotA agent by 30%+ and CogAgent in iOS by 75% without any extra supervision but only a VLM-based evaluator repo: abs:

Tweet media one

3

34

142

0

4

24

@pan_jiayipan

Jiayi Pan

1 year

🎉So excited to see our work recognized at #ACL2023NLP ! Our work, bridging grounding capabilities in Vision-Language Models, serves both practical and scientific purposes. Extremely grateful to have been on this journey with my awesome mentor and advisor at @SLED_AI .

@ziqiao_ma

Martin Ziqiao Ma

1 year

🎉Thrilled to share that our paper "World-to-Words: Grounded Open Vocabulary Acquisition through Fast Mapping in Vision-Language Models" was selected for the outstanding paper award at #ACL2023NLP ! Thanks @aclmeeting :-) Let's take grounding seriously in VLMs because... 🧵[1/n]

Tweet media one

11

16

169

1

2

23

@pan_jiayipan

Jiayi Pan

7 months

Generative model that operates across versatile modalities is where future is, which makes this work super interesting.

@ZinengTang

Zineng Tang

7 months

🔥Excited to introduce CoDi-2! It follows complex multimodal-interleaved in-context instructions to generate any modalities (text, vision, audio) in zero/few-shot interactive way! @yzy_ai @nlpyang @ChenguangZhu2 @mohitban47 🧵👇

5

85

302

1

5

21

@pan_jiayipan

Jiayi Pan

2 months

Amazing speed. I also like how Reka discloses much more training details than other frontier labs in the tech report.

@RekaAILabs

Reka

2 months

Along with Core, we have published a technical report detailing the training, architecture, data, and evaluation for the Reka models.

Tweet media one

Tweet media two

2

63

368

1

1

19

@pan_jiayipan

Jiayi Pan

2 months

Excited to present our projects—Autonomous Evaluation & Refinement + Archer—at next week's CMU agents workshop! I’ll also be in Ann Arbor over the weekend catching up with friends. DM if you’re up for a chat or boba🥤!

@gneubig

Graham Neubig

2 months

We're having a big event on agents at CMU on May 2-3 (one week from now), all are welcome! It will feature: * Invited talks from @alsuhr @ysu_nlp @xinyun_chen_ @MaartenSap and @chris_j_paxton * Posters of cutting edge research * Seminars and hackathons

4

29

182

1

0

16

@pan_jiayipan

Jiayi Pan

8 months

GPT4V with assistive tool(Vimium) can be descent web agents. Happy to share a proof-of-concept project I built last night. It's under 300 lines of code

0

1

15

@pan_jiayipan

Jiayi Pan

2 months

This seems the most principled approach to parallel decoding for LMs so far + has interesting connections to the original consistency model popular in accelerating diffusion models. I'll try to share one paper I particularly like every week on Twitter starting with this one :)

@haozhangml

Hao Zhang

2 months

Check out consistency LLM (to appear at ICML'24)! We found that we can easily adapt an LLM as a parallel decoder by training it on autogen jacobi decoding trajectories using a consistency loss -- just like how we train consistency models in diffusion. The model quickly learn

5

10

132

2

1

16

@pan_jiayipan

Jiayi Pan

14 days

@ArmenAgha Maybe they just aren't that different, a pretty sketchy cot: 1. prompting is constrained / manually optimized prefix tuning 2. full-weight fine-tuning ≈ ones w/ lora 3. prefix tuning ≈ lora: We get prompting ≈ fine-tuning when "done right"

Tweet card media

Towards a Unified View of Parameter-Efficient Transfer Learning

Fine-tuning large pre-trained language models on downstream tasks has become the de-facto learning paradigm in NLP. However, conventional approaches fine-tune all the parameters of the pre-trained...

1

2

16

@pan_jiayipan

Jiayi Pan

1 year

@nishuang NLP业内有段时间非常流行用芝麻街角色取名字，“To date, this new breed of language AIs includes an ELMo, a BERT, a Grover, a Big BIRD, a Rosita, a RoBERTa, at least two ERNIEs (three if you include ERNIE 2.0), and a KERMIT.”，所以基本可以肯定百度是特意往这里靠的🤣

0

1

13

@pan_jiayipan

Jiayi Pan

7 months

Excited to share InfEdit, which delivers the 𝐛𝐞𝐬𝐭 𝐞𝐝𝐢𝐭𝐢𝐧𝐠 𝐪𝐮𝐚𝐥𝐢𝐭𝐲, 𝐬𝐩𝐞𝐞𝐝, 𝐚𝐧𝐝 𝐜𝐨𝐧𝐬𝐢𝐬𝐭𝐞𝐧𝐜𝐲 in training-free algorithms. It was also my first project on DMs, where I learned so much from my awesome collaborators. Check out the paper/demo below👇

@ziqiao_ma

Martin Ziqiao Ma

7 months

Want to edit your image with language descriptions in less than 3s? Ever questioned the need for prolonged inversion in text-guided editing? We are happy to release ♾ InfEdit (with demo), a flexible framework for fast, faithful and consistent editing. 🔗

2

37

157

0

0

12

@pan_jiayipan

Jiayi Pan

8 months

@xwang_lk Despite missing the third and forth column, my reproduction results seem relatively positive. Our recent work on visual illusion also found sth similar, where the model excels at standard test images found online but hard to generalize to novel ones

Tweet media one

0

0

8

@pan_jiayipan

Jiayi Pan

2 months

I switched from Obsidian to Notion for the same reason I switched from Emacs to VSCode: I found myself spending more time tweaking the system than gaining efficiency from it.

@danielcranney

Daniel Cranney 🇬🇧

2 months

Which app do you use for note-taking? 📝 I've just made the switch from Notion to Obsidian and I find it's much faster, less cluttered and makes it easier to focus on the content itself.

Tweet media one

Tweet media two

689

30

1K

1

0

9

@pan_jiayipan

Jiayi Pan

7 months

Research thrives on faith. I think there will be larger open progress in improving LLM's reasoning capability soon, simply because someone (OpenAI) made people aware it is achievable.

0

0

9

@pan_jiayipan

Jiayi Pan

24 days

Achievement unlocked: 🥳 Our research analyzing VLM’s perception under visual illusion is covered by Scientific American. It was one of my favorite magazines back in high school!

Tweet card media

Optical Illusions Can Fool AI Chatbots, Too

Experiments with optical illusions have revealed surprising similarities between human and AI perception

www.scientificamerican.com

1

0

9

@pan_jiayipan

Jiayi Pan

3 months

We hope our results convey to you the potential of using open-ended model-based evaluators in evaluating and improving language agents. All code is available at: Paper: Work w/ @594zyc @NickATomlin @YifeiZhou02 @svlevine @alsuhr

Tweet card media

Autonomous Evaluation and Refinement of Digital Agents

We show that domain-general automatic evaluators can significantly improve the performance of agents for web navigation and device control. We experiment with multiple evaluation models that trade...

1

1

8

@pan_jiayipan

Jiayi Pan

1 month

Are you using Figma to create figures for your papers? Be aware that Safari (or Figma?) has a bug that sometimes prevents Figma images from rendering😅 Make sure to double-check this before it’s too late

Tweet media one

1

0

9

@pan_jiayipan

Jiayi Pan

1 month

For this week, I am sharing @dwarkesh_sp podcast with John Schulman @johnschulman2 , which is particularly informative • How to enable LLMs for long-horizon tasks -- just train them to do so • Insights into OpenAI’s post-training stack • His prediction on future progress

1

1

4

@pan_jiayipan

Jiayi Pan

1 year

Honored to be part of this exhilarating journey alongside other team members. The knowledge we've garnered from this competition will fuel our excitement for the next stage of advancing embodied AI.

@jed_yang

Jianing “Jed” Yang

1 year

1/ 🎉 After a 2 years of hard work, #TeamSEAGULL , representing @UMichCSE @SLED_AI , has won First Place🥇 in the #AlexaPrize SimBot Challenge organized by @AmazonScience ! This victory is a testament to our collective dedication to advancing #EmbodiedAI . 🏆💻🚀

Tweet media one

5

9

61

0

0

6

@pan_jiayipan

Jiayi Pan

4 months

Discovered this simple clear theory explaining neural scaling law from Kalpan. Have to share the gem :) Full paper at

Tweet media one

0

0

6

@pan_jiayipan

Jiayi Pan

8 months

Do VLMs perceive visual illusions like human or faithfully represent reality? Our #EMNLP2023 paper analyzed this question systematically across 4 models. Come and check it out!

@594zyc

Yichi Zhang @CVPR

8 months

1/ Excited to share our latest research "Grounding Visual Illusions in Language: Do Vision-Language Models Perceive Illusions Like Humans?" at #EMNLP2023 🎉 Discover how VLMs fare against tricky visual illusions👀➡️

2

4

21

0

0

5

@pan_jiayipan

Jiayi Pan

2 months

@OfirPress I think this is a special case for offline/online RL, and the most general statement should be that doing RL on the agent really works. In both recent papers, the filtered BC technique (the most trivial offline RL algorithm?) almost doubles the agent's success rate: see the iOS

0

0

5

@pan_jiayipan

Jiayi Pan

12 days

Will present the paper tomorrow at MAR workshop @ CVPR (09:40 AM) Come if you are interested!

@pan_jiayipan

Jiayi Pan

3 months

New paper from @Berkeley_AI on Autonomous Evaluation and Refinement of Digital Agents! We show that VLM/LLM-based evaluators can significantly improve the performance of agents for web browsing and device control, advancing sotas by 29% to 75%. [🧵]

Tweet media one

11

58

314

0

1

5

@pan_jiayipan

Jiayi Pan

9 days

Cool work. Excited to see more progress in leveraging autonomous evaluations for inference-time refinement.

@kohjingyu

Jing Yu Koh

9 days

LLM agents have demonstrated promise in their ability to automate computer tasks, but face challenges with multi-step reasoning and planning. Towards addressing this, we propose an inference-time tree search algorithm for LLM agents to explicitly perform exploration and

11

93

416

0

0

4

@pan_jiayipan

Jiayi Pan

3 months

We begin by developing two types of evaluators: one that directly queries GPT-4V and another that employs an open-weight solution. Our best model shows 82% / 93% agreement with oracle evaluations on web browsing and android device control settings respectively.

Tweet media one

1

0

4

@pan_jiayipan

Jiayi Pan

5 months

There's a lot of redundant info in Vision, MAE shows 25% of the patches is enough for encoder training. CrossMAE goes further - no need to decode the whole image either! Open question: how can we transfer this success to generative models and make them efficient as well.

@arankomatsuzaki

Aran Komatsuzaki

@arankomatsuzaki

5 months

UC Berkeley presents CrossMAE CrossMAE matches MAE in performance with 2.5 to 3.7x less decoding compute via independent partial patch reconstruction proj: abs:

Tweet media one

3

38

258

0

0

4

@pan_jiayipan

Jiayi Pan

3 months

We also have some speculations on community's current mixed results in autonomous refinement, which our wonderful @NickATomlin details in this thread!

@NickATomlin

Nicholas Tomlin

3 months

Some additional ✨speculation✨ Our preliminary results showed that inference-time improvement w/ Reflexion was very dependent on the performance of the critic model. A bad critic often tanks model performance

1

2

5

0

1

4

@pan_jiayipan

Jiayi Pan

1 year

@pfau @tallinzen @FelixHill84 @yoavgo This work might be highly related . You can start digging from here :)

Tweet card media

BERT Rediscovers the Classical NLP Pipeline

Pre-trained text encoders have rapidly advanced the state of the art on many NLP tasks. We focus on one such model, BERT, and aim to quantify where linguistic information is captured within the...

0

0

4

@pan_jiayipan

Jiayi Pan

3 months

Lastly, we experiment with improving CogAgent on iOS, for which there is no existing benchmark environment or training data. By using the evaluator to filter sampled trajectories for behavior cloning, we significantly improve the CogAgent's success rate by a relative 75%.

Tweet media one

1

1

4

@pan_jiayipan

Jiayi Pan

4 days

@BRussellsimp Aviral’s new paper is pretty cool!

@aviral_kumar2

Aviral Kumar

4 days

🚨 New paper on RL, synthetic data, LLM math reasoning (MATH / GSM 8k) TL, DR: RL on wrong responses (yes, "proper" RL, not filtered SFT or STaR / RFT) scales utility of syn data by **8x**, ❌spurious correlations ✅stitching, credit assignment 🧵⬇️

Tweet media one

9

49

281

1

0

4

@pan_jiayipan

Jiayi Pan

1 month

@utheprodigyn @togelius lol I like the name! We have a language grounding paper called World-to-Words last year 😆

@ziqiao_ma

Martin Ziqiao Ma

1 year

🎉Thrilled to share that our paper "World-to-Words: Grounded Open Vocabulary Acquisition through Fast Mapping in Vision-Language Models" was selected for the outstanding paper award at #ACL2023NLP ! Thanks @aclmeeting :-) Let's take grounding seriously in VLMs because... 🧵[1/n]

Tweet media one

11

16

169

2

0

4

@pan_jiayipan

Jiayi Pan

3 months

We see the improvement our evaluators provide scales favorable with evaluator capability, with the best evaluator achieving 29% improvement over previous sota.

Tweet media one

1

0

4

@pan_jiayipan

Jiayi Pan

2 months

Fine-tuning lm agents with success signal can be really juicy and sft on good trajectory would be a good first step. The finetuning part in Ofir's post is quite similar to the iOS filtered-bc exp in our work, where we see 75% rel improvement. I'm curious about the results we

@OfirPress

Ofir Press

2 months

Predictions: >=2 orgs will get 35% on SWE-bench by Aug 1, 2024. A fully open source system will reach 35% by Nov 1, 2024. Probably based on SWE-agent + ACI improvements: debugger, better code retrieval, lang. server protocol. The LM will be finetuned on ~500 good trajectories

8

11

84

0

0

4

@pan_jiayipan

Jiayi Pan

1 month

This looks cool: • Predict downstream performance directly from training FLOPs • Generalize across different model families • Can be derived from widely available data - training FLOPs and benchmark results Look forward to the code release so we can try it out firsthand

@YangjunR

Yangjun Ruan

1 month

Will LM agents continue to scale? Which LM post-training methods work at scale? To answer these qns, we built Observational Scaling Laws: a generalization of scaling laws that makes accurate predictions without model training using existing public LMs.

Tweet media one

3

39

149

1

0

4

@pan_jiayipan

Jiayi Pan

7 months

@enfleisig @emnlpmeeting Congrats! Eve

0

0

1

@pan_jiayipan

Jiayi Pan

2 months

@dwarkesh_sp “All roads lead to AGI”

0

0

4

@pan_jiayipan

Jiayi Pan

2 months

@OfirPress Open-Devin also has a roadmap on this, and collaboration would be great

Create a competitive agent with open LLMs · Issue #1085 · OpenDevin/OpenDevin

What problem or use case are you trying to solve? Currently OpenDevin somewhat works with the strongest closed LLMs such as GPT-4 or Claude Opus, but we have not confirmed good results with open LL...

0

0

3

@pan_jiayipan

Jiayi Pan

2 months

With audio/text/image IO unified in GPT-4o, how far are we from sora being integrated into it?

@gdb

Greg Brockman

2 months

GPT-4o can also generate any combination of audio, text, and image outputs, which leads to interesting new capabilities we are still exploring. See e.g. the "Explorations of capabilities" section in our launch blog post (), or these generated images:

Tweet media one

Tweet media two

Tweet media three

34

130

1K

4

0

3

@pan_jiayipan

Jiayi Pan

14 days

@xhluca A shameless plug for our Agent-Eval-Refine paper. Autonomous evaluators augments any in-the-wild digital environment into effective benchmark/training environment.

@pan_jiayipan

Jiayi Pan

3 months

New paper from @Berkeley_AI on Autonomous Evaluation and Refinement of Digital Agents! We show that VLM/LLM-based evaluators can significantly improve the performance of agents for web browsing and device control, advancing sotas by 29% to 75%. [🧵]

Tweet media one

11

58

314

1

0

3

@pan_jiayipan

Jiayi Pan

3 months

Next, we show how they could be used for improving agents, either through inference-time guidance or fine-tuning. We start with WebArena, a popular web agent benchmark. We experiment integrating the sota agent with Reflexion algorithm, using our evaluators as the reward function.

Tweet media one

1

0

3

@pan_jiayipan

Jiayi Pan

4 months

In hindsight, it's not SciFi but today's tech leveraged well. Prev vid models weren't scaled up like LLMs. VideoPoet already creates good vid with a cost akin to a 8b llm. Dramatic improvement with further scaling and good recipe is certain. But seeing the outputs personally? Wow

@model_mechanic

Aditya Ramesh

@model_mechanic

4 months

Excited to share what @billpeeb @_tim_brooks and my team has been working on for the past year! Our text-to-video model Sora can generate videos of complex scenes up to a minute long. We're excited about making this step toward AI that can reason about the world like we do.

115

144

1K

0

0

2

@pan_jiayipan

Jiayi Pan

3 months

@soldni @RekaAILabs @YiTayML Also OpenLLaMA . It's trained by @younggeng and @haoliuhl with JAX at Berkeley.

Tweet card media

GitHub - openlm-research/open_llama: OpenLLaMA, a permissively licensed open source reproduction of...

OpenLLaMA, a permissively licensed open source reproduction of Meta AI’s LLaMA 7B trained on the RedPajama dataset - openlm-research/open_llama

0

1

3

@pan_jiayipan

Jiayi Pan

5 months

@Olivia61368522 This looks great! Congratulations on the release.

0

0

1

@pan_jiayipan

Jiayi Pan

3 months

@lucy3_li @ollama a real moat

0

0

2

@pan_jiayipan

Jiayi Pan

2 years

@mhtessler Simply 2-digit addition would work

1

0

2

@pan_jiayipan

Jiayi Pan

3 months

@agihippo Try the original book, it’s another level

1

0

2

@pan_jiayipan

Jiayi Pan

28 days

@dwarkesh_sp @johnschulman2 Week 4:

@_akhaliq

AK

1 month

OpenRLHF An Easy-to-use, Scalable and High-performance RLHF Framework As large language models (LLMs) continue to grow by scaling laws, reinforcement learning from human feedback (RLHF) has gained significant attention due to its outstanding performance. However,

Tweet media one

2

66

288

1

0

2

@pan_jiayipan

Jiayi Pan

2 years

@UMichCSE @ChenHilbert Congratulations to Yuang and Kan Zhu!

0

0

2

@pan_jiayipan

Jiayi Pan

2 months

@shuyanzhxyc @DukeU @dukecompsci Congrats to you and Duke! And look forward to more great works from you / your lab 🎉

1

0

2

@pan_jiayipan

Jiayi Pan

2 months

@alex_lacoste_ This looks great! I wish we had this infrastructure during our previous project on auto-eval-refinement. One quick question: Does your web arena agent receive images as input? And do you have any intuition on how much benefit this offers?

1

0

2

@pan_jiayipan

Jiayi Pan

4 months

This makes so much sense. When output y is actually a distribution of input x, regression only gave you the weighted average, which isn’t optimal at all. With the expressiveness of classification loss, we could simply model the distribution p(y|x).

@_akhaliq

AK

4 months

Stop Regressing Training Value Functions via Classification for Scalable Deep RL Value functions are a central component of deep reinforcement learning (RL). These functions, parameterized by neural networks, are trained using a mean squared error regression objective to

Tweet media one

9

31

175

0

0

2

@pan_jiayipan

Jiayi Pan

24 days

@xwang_lk @m2saxon this one?

Tweet card media

Delving Deep into Rectifiers: Surpassing Human-Level Performance...

Rectified activation units (rectifiers) are essential for state-of-the-art neural networks. In this work, we study rectifier neural networks for image classification from two aspects. First, we...

1

0

2

@pan_jiayipan

Jiayi Pan

4 months

So this is why there was a cute bot wandering around the BAIR open area for months.

@svlevine

Sergey Levine

4 months

In our new work on socially compliant navigation, we show how real-world RL-based finetuning can enable mobile robots to adapt on the fly to the behavior of humans, to obstacles, and other challenges associated with real-world navigation:

Tweet media one

2

38

151

0

0

2

@pan_jiayipan

Jiayi Pan

3 months

@frankxu2004 @berkeley_ai Good question! Evaluators should work just fine, whether on live websites or other domains, for tasks having a similar complexity In fact, each evaluator shares the same weight across all experiments, with only a change in the prompt. And WebArena isn't part of its training data

0

0

2

@pan_jiayipan

Jiayi Pan

1 year

@tabletracer @scottastevenson

0

0

2

@pan_jiayipan

Jiayi Pan

2 months

@Francis_YAO_ Nice blog! could you elaborate on the evidence why video data isn’t likely to improve reasoning?

0

0

1

@pan_jiayipan

Jiayi Pan

2 months

@kohjingyu APIs are deprecated fast though, it’s now impossible to reproduce any text-davinci experiments :(

0

0

2

@pan_jiayipan

Jiayi Pan

8 months

@AhmadMustafaAn1 @xiao_ted They do, but they are in fact from Microsoft Research Asia based in Beijing, which could have a slightly different culture than the US based teams.

0

0

1

@pan_jiayipan

Jiayi Pan

5 months

@tomosman @garrytan @ylecun If high-bandwidth visual input is really the key to human intelligence, people without vision would be at a significant disadvantage. However, this is clearly not the case.

1

0

2

@pan_jiayipan

Jiayi Pan

1 year

@ZinengTang @berkeley_ai @BerkeleyNLP @Berkeley_EECS @mohitban47 @uncnlp Congratulations! See you in Berkeley 😆

1

0

1

@pan_jiayipan

Jiayi Pan

3 months

@BoyuanChen0 @MIT @MIT_CSAIL Can we have a chain store at Berkeley🥹

1

0

1

@pan_jiayipan

Jiayi Pan

2 years

@giffmana @deliprao @MetaAI Wondering how many TPUs are available for Google Researchers?

1

0

1

@pan_jiayipan

Jiayi Pan

1 month

@thecrawles This is a very nice environment!

0

0

1

@pan_jiayipan

Jiayi Pan

1 year

@SalluMandya @ClementDelangue @OpenAI @huggingface Just browse through the source code, what's behind it is an image caption model + text-daVinci-2. This App is definitely exciting but please not over promote :)

1

0

1

@pan_jiayipan

Jiayi Pan

19 days

@dwarkesh_sp @johnschulman2 Week 5:

@andyzou_jiaming

Andy Zou

@andyzou_jiaming

21 days

No LLM is secure! A year ago, we unveiled the first of many automated jailbreak capable of cracking all major LLMs. 🚨 But there is hope?! We introduce Short Circuiting: the first alignment technique that is adversarially robust. 🧵 📄 Paper:

Tweet media one

17

106

657

1

0

1

@pan_jiayipan

Jiayi Pan

2 years

@unixpickle I remember this is one of the proposed tasks in Ego4D dataset. But instead of fixed point camera, they are using egocentric vision

0

0

1

@pan_jiayipan

Jiayi Pan

9 days

@dwarkesh_sp @johnschulman2 Week 7: There are many ways why intuitively “transcendence” would happen, but having some rigorous empirical evidence / theory support is fantastic.

@AlexGDimakis

Alex Dimakis

10 days

This paper seems very interesting: say you train an LLM to play chess using only transcripts of games of players up to 1000 elo. Is it possible that the model plays better than 1000 elo? (i.e. "transcends" the training data performance?). It seems you get something from nothing,

Tweet media one

137

309

2K

0

0

1

@pan_jiayipan

Jiayi Pan

2 years

@mrdrozdov It actually can. Sorry that I didn’t keep the screenshot but in my case i successfully evoke insert mode, write something, :wq to exit, and used cat to inspect the edit result

1

0

1

@pan_jiayipan

Jiayi Pan

1 year

@SalluMandya @ClementDelangue @OpenAI @huggingface Oh sry, the point is that text-davinci-002 is GPT 3.5 but not GPT-4. This is a very nice project and I like it!

0

0

1

@pan_jiayipan

Jiayi Pan

1 month

@dwarkesh_sp @johnschulman2 Week 3:

1

0

0

@pan_jiayipan

Jiayi Pan

1 year

@PingbangHu @iSchoolUI @Jiaqi_Ma_ Congratulations!

0

0

1

@pan_jiayipan

Jiayi Pan

2 years

@LinaAsahi M2 MBA, 8 GPU, 256 / 16

1

0

1

@pan_jiayipan

Jiayi Pan

7 months

@chshibo I think it is good have a CERN equivalent in AI.

0

0

1

@pan_jiayipan

Jiayi Pan

1 year

@davheld @perplexity_ai It seems like they are just calling openai's GPT3.5 api with some custom prompts to teach it how to cite.

@jmilldotdev

jmill

2 years

@perplexity_ai hackerman

Tweet media one

10

55

547

0

0

1

@pan_jiayipan

Jiayi Pan

5 months

@ShiboChenTech @tomosman @garrytan @ylecun I agree. I meant to refer to disadvantages specifically in intelligence. I should clarify this in my original comment.😵‍💫

0

0

0

@pan_jiayipan

Jiayi Pan

1 month

@kohjingyu Hi Jing Yu, do you think generation speed is a limitation for AR apparoaches for image/vid generation?

1

0

1

@pan_jiayipan

Jiayi Pan

1 month

@YangjunR Thanks for the clarifications! I should be more careful about the wording

0

0

1

@pan_jiayipan

Jiayi Pan

4 days

@BRussellsimp I share my weekly favorite papers here, hope you'll like it 😉 Also planning to write a blog post on LM post-training later this year.

@pan_jiayipan

Jiayi Pan

2 months

This seems the most principled approach to parallel decoding for LMs so far + has interesting connections to the original consistency model popular in accelerating diffusion models. I'll try to share one paper I particularly like every week on Twitter starting with this one :)

2

1

16

1

0

1

@pan_jiayipan

Jiayi Pan

11 months

@peterqiu0516 thank you for all the help along the way!

0

0

0

@pan_jiayipan

Jiayi Pan

3 months

@kaffyou @ADarmouni figure 1 is made with figma, others are mostly from matplotlib

0

0

1

@pan_jiayipan

Jiayi Pan

3 months

@MichaelPoli6 @y_m_asano Will neural architecture search come back again?

0

0

1

@pan_jiayipan

Jiayi Pan

1 year

@shuyanzhxyc @sreecharan93 Probably because TACL and ACL/EMNLP/NAACL/etc are all sponsored by ACL? I imagine doing so for these conferences which do not share the same root would be much harder.

0

0

1