Xing Han Lu @xhluca Twitter profile

Pinned Tweet

Xing Han Lu

2 months

Delighted to release ✨Llama-3-8B-Web✨, the most capable agent built for web navigation by following instructions and replying💬. It surpasses GPT-4V* by 18% on WebLINX, a benchmark for web navigation with dialogue. Model: Code:

19

187

1K

Last Seen Profiles

@BusinessInsider

@yyshghlt

@brk7391

@phatwhor

@jack1112541

@BarleyTae27

@06znj

@Bombo_labs

@unosavy

@CitadelSportsEQ

@B8oqWpF6bX55414

@ARPressLLC

@ElaheIzadi

@armanos5567

@Bushras3eed

@PhotosNicolas89

@IamCool_A

@danrshan

@Roselyn03515705

@vchiesibrown

@ferjinho

@Jarvis_MXN

@pup_michu

@BaheaComics1931

@Altagracia59631

@heijoyie

@hsom67

@Janetsk20073533

@NPC_SMTOWN

@Rdcci_Care

@kirasnuggets

@ni_kicart

@CoLEppingForest

Xing Han Lu

@xhluca

12 days

Announcing ⚡BM25S, a fast lexical retrieval library! 🏎️ Up to 500x faster than the most popular Python lib, matches @Elastic search (BM25 default) 🤗 First BM25 library that is directly integrated with @huggingface hub: load or save in 1 line! GitHub:

17

106

688

Xing Han Lu

@xhluca

3 years

Now that @huggingface 's transformers 4.4.0 is out, I'm happy to release dl-translate, a library for text translation between 50 languages and built on top of 🤗 transformers and @facebookai 's mBART Repo: Docs:

2

36

222

Xing Han Lu

@xhluca

23 days

Another solid work by the Android agents team at @GoogleDeepMind . 800+ apps interactions is massive. It would probably cover most of the use cases we can see on Android (so one can spend less time on thinking about generalization and focus on doing well on validation).

6

45

220

Xing Han Lu

@xhluca

1 year

CTRL () by Keskar+ is such an under-appreciated paper, considering it introduce the idea of control code/prompts for controllable generation (based on GPT-2) almost 3 years before InstructGPT.

CTRL: A Conditional Transformer Language Model for Controllable Generation

Large-scale language models show promising text generation capabilities, but users cannot easily control particular aspects of the generated text. We release CTRL, a 1.63 billion-parameter...

arxiv.org

4

29

151

Xing Han Lu

@xhluca

3 months

Is this the DSPy moment of text-to-image generation? Congratulations @oscmansan @Piovrasca et al!

AK

@_akhaliq

3 months

Improving Text-to-Image Consistency via Automatic Prompt Optimization Impressive advances in text-to-image (T2I) generative models have yielded a plethora of high performing models which are able to generate aesthetically appealing, photorealistic images. Despite the progress,

4

43

153

0

16

97

Xing Han Lu

@xhluca

3 months

I'm seriously surprised I am not seeing more hype around @valence_ai @PrescientDesign @IsomorphicLabs Those groups are clearly working on the most impactful applications of ML and are at the forefront of biotech breakthroughs.

4

7

75

Xing Han Lu

@xhluca

3 years

I think this @huggingface blog post by @PatrickPlaten is by far my favorite resource on post-hoc text generation

0

15

77

Xing Han Lu

@xhluca

3 years

The @huggingface hub is still new but it's already super convenient for downloading transformers. I was looking for a way to download libraries + models so could be used offline. In total it took 6 lines (4 in terminal + 2 in python) thanks to the hub ⬇️

0

17

77

Xing Han Lu

@xhluca

5 months

I'm going to make a thread later on why I think WebLINX () is really exciting, but I want to take a moment to say how amazing the @huggingface ecosystem has become. You can host the crucial parts of a project all at the same place. Let's take a look 👇

Siva Reddy

@sivareddyg

5 months

Introducing WebLINX 🐯, a large benchmark for AI agents navigating real websites with multi-turn dialogue. 100K interactions across 2300 demonstrations on 150 real-word websites. Includes HTML, screenshots and videos. Tests unseen sites, tasks, blind users

7

66

252

1

16

77

Xing Han Lu

@xhluca

1 year

This is probably the nicest looking poster at #ACL2023NLP (and also very interesting + insightful!)

3

6

74

Xing Han Lu

@xhluca

5 months

@srush_nlp @BlancheMinerva The 6's only weakness was: The model still has a quadratic memory requirement during training like Transformers. Except it doesn't...

2

0

68

Xing Han Lu

@xhluca

2 months

@AIatMeta Not only is Llama-3 better than GPT-4V (*in a zero-shot setting), it also surpasses all other finetuned models by a large margin, including the Flan-based MindAct-3B and GPT-3.5-Turbo (trained for same # epochs). We even observe a 15% relative improvement wrt last-gen Llama-7B.

2

5

64

Xing Han Lu

@xhluca

2 months

The most incredible about the new LLM2VEC Llama-3 is that it is top-10 on MTEB with ONLY 168MB extra usage over Llama-3, since it uses LoRA. That's 0.168GB! This, with batching, makes RAG super fast and low-memory. In comparison, GTE-Large is 1.5GB and gte-Qwen is 30GB.

Vaibhav Adlakha

@vaibhav_adlakha

2 months

LLM2Vec meets Meta-Llama-3 → new SOTA among models trained on public data 🥇. We applied our LLM2Vec approach to Meta-Llama-3-8B and it works like a charm. Each step of the LLM2Vec pipeline improves the model's performance on embedding tasks 👇. 1/N Models:

3

22

156

3

5

57

Xing Han Lu

@xhluca

1 year

There are great UIs for @metaai 's LLaMA-2, e.g.: @yvrjsharma -> @a16z -> However, they rely on web APIs like TGI and @replicatehq , so I built a @gradio UI, running @huggingface Transformers locally. Code:

4

10

52

Xing Han Lu

@xhluca

10 months

The new @huggingface Collections feature is pretty neat - it allows you to gather everything about a project in the same place - manuscript, dataset, models, etc.

2

10

47

Xing Han Lu

@xhluca

2 months

To create Llama-3-8B-Web, we finetuned @AIatMeta 's Llama-3-8B-Instruct (released last Thursday) on 24K web interactions from the WebLINX training set, including clicking, typing, submitting forms, and replies. The dataset covers 150 websites from 15 geographic locations.

2

0

42

Xing Han Lu

@xhluca

1 year

There's quite a few vector db startups ( @pinecone @weaviate_io etc.), but which startups focus on producing retrievers or embeddings-as-a-service? @OpenAI does not offer fine-tuning whereas @CohereAI embeddings do not seem to be top-10 on MTEB:

MTEB Leaderboard - a Hugging Face Space by mteb

huggingface.co

9

8

42

Xing Han Lu

@xhluca

2 months

@AIatMeta @huggingface Is that all? Of course not! We are also launching the 🖥️WebLlama project (), with the goal to make it easy for you to train, evaluate, and deploy Llama-3 agents! We want to build agents that won't replace users, but equip them with powerful assistants.

1

5

41

Xing Han Lu

@xhluca

1 year

Happy to share our #eacl2023 paper, which introduces a new dataset for Table Retrieval in Conversations: the 🍁ꜱᴛᴀᴛᴄᴀɴ ᴅɪᴀʟᴏɢᴜᴇ ᴅᴀᴛᴀꜱᴇᴛ 🍁 Paper: Homepage: A work with @sivareddyg and @harmdevries77 🧵👇

2

16

38

Xing Han Lu

@xhluca

1 year

@soumithchintala Switch Transformers-style sparse mixture, or Kaggle-style mixture? Former is more than just "little trick".

2

0

35

Xing Han Lu

@xhluca

9 months

@ylecun @ClementDelangue Scikit-Learn --> Inria (Paris) Torch --> Idiap/EPFL (Switzerland) Theano --> Lisa/UdeM (Montreal) Keras --> François Chollet FAISS, DINO, DETR, LLAMA --> FAIR Paris Tokenizers, Optimum, Accelerate --> Huggingface Wonder if there's something different about speaking French..

0

1

32

Xing Han Lu

@xhluca

19 days

Any good web/navigation agents survey out there? I know there's "A Survey on Large Language Model based Autonomous Agents" by Wang+ but there's a few important works that came out after v1 (Android in the Wild, AndroidWorld, OSWorld, VisualWebArena, WorkArena, etc.)

1

3

26

Xing Han Lu

@xhluca

3 years

Turns out running Jupyter Dash on @kaggle is a piece of cake thanks to pyngrok Check out the notebook: cc @plotlygraphs

3

2

26

Xing Han Lu

@xhluca

2 months

@AIatMeta Llama-3 Web is tightly integrated with the @huggingface ecosystem: you can load dataset in🤗Dataset and the agent from🤗Hub with pipeline, then predict actions in <10 lines. For that, I'm really thankful for the hard work by the team, especially today!

clem 🤗

@ClementDelangue

2 months

The GPT4 of datasets took down Hugging Face, sorry all 😅😅😅

25

43

907

1

2

26

Xing Han Lu

@xhluca

1 year

@amilios I think this shows the importance of decoupling modelling from evaluation, and standardizing the latter.

1

26

Xing Han Lu

@xhluca

2 months

@AIatMeta @huggingface To train a good model, great data is everything! That's why we are focusing on adding more data to our training mix to create even better agents. Our next target is @osunlp 's Mind2Web (see video), a dataset for autonomous navigation covering 137 websites

Yu Su

@ysu_nlp

1 year

What would be the most wild environment for grounding & empowering LLMs? 👉The entire Internet! 📢 Mind2Web: Towards a Generalist Agent for the Web () Led by amazing @osunlp student @XiangDeng1 #NLProc

4

27

89

1

3

24

Xing Han Lu

@xhluca

3 years

You can easily create datasets from @kaggle notebooks, but they will be limited by the output size (~20GB). If you want to create/update larger datasets (up to 100GB), you will need the Kaggle #API . I created this short tutorial to show you how:

1

5

23

Xing Han Lu

@xhluca

2 months

@AIatMeta At this point, you probably expect a cool video showing Llama-3-Web in action! Well, you'll be need to be patient😉 But it's crucial to remember that demo ≠ good systematic performance! That's why Llama-3-Web is evaluated on 4 OOD test splits covering 1000+ real-world demos.

1

20

Xing Han Lu

@xhluca

2 years

Cool feature by @SemanticScholar : Integrated PDF reader where the hyperlinks brings you directly to a paper's semantic page.

1

2

19

Xing Han Lu

@xhluca

5 months

Thank you for making @huggingface the amazing open research platform it is today @julien_c @Thom_Wolf @LysandreJik @qlhoest @mmitchell_ai @SanhEstPasMoi @_akhaliq @moi_anthony @mervenoyann @_philschmid (and many members of the team + alumni)

Xing Han Lu

@xhluca

5 months

I'm going to make a thread later on why I think WebLINX () is really exciting, but I want to take a moment to say how amazing the @huggingface ecosystem has become. You can host the crucial parts of a project all at the same place. Let's take a look 👇

1

16

77

2

1

18

Xing Han Lu

@xhluca

3 months

Two weeks ago I shared about the industry groups working on agents, but there's many academic/non-profit labs also working on web agents recently! A quick thread below of works released/updated this year (not exhaustive, so please comment if i missed anything)

Xing Han Lu

@xhluca

3 months

So many groups in industry working on web agents/action models/benchmarks now! @GoogleDeepMind -> Pix2Act & WebGUM @ServiceNowRSRCH -> WorkArena + Case Study @MithrilSecurity -> LaVague Salesforce Research -> AgentOhana

2

1

10

2

18

Xing Han Lu

@xhluca

1 year

@karpathy I think it depends on the use case. Fast prototyping, data displaying, notebook-to-ui ML demos that fit common patterns, building apps in just a few lines Enterprise-ready, REST-based, scalable apps

1

16

Xing Han Lu

@xhluca

2 years

Starting to feel @weights_biases is the @NotionHQ of model training.

0

3

17

Xing Han Lu

@xhluca

2 months

@AIatMeta @huggingface @osunlp Evaluation is very important! A goal of WebLlama will be to provide reliable evaluation for many benchmarks (incl. M2W). Dynamic benchmarks are on our mind, and a few exciting ones include @shuyanzhxyc 's WebArena, @kohjingyu 's VisualWebArena and @alexandredrouin 's WorkArena.

2

1

17

Xing Han Lu

@xhluca

2 years

Nice easter egg by @weights_biases for 🥧day (those are auto-generated names when you create a new run).

0

3

17

Xing Han Lu

@xhluca

3 years

@karpathy I'm curious how much of the improvement is in the architecture vs the improved training tools thanks to timm, considering Resnet-50 can achieve 80%+ without any architecture change:

2

0

16

Xing Han Lu

@xhluca

12 days

With Python-based implementation like BM25S and Rank-BM25, you can tokenize your text, index and retrieve in ~10 lines. However, simply implementing with Numpy may not achieve the same speed as Java-based implementation. BM25S is different: it uses scipy to store eager scores.

1

0

16

Xing Han Lu

@xhluca

2 months

@AIatMeta @huggingface @osunlp @shuyanzhxyc @kohjingyu @alexandredrouin Finally, it doesn't matter how good an agent is, if we can't use them! We want WebLlama agents to be easily deployed for end users, so we are planning integrations with deployment platforms, including @ServiceNowRSRCH 's BrowserGym (demo below), LaVague, Playwright, and more!

4

1

16

Xing Han Lu

@xhluca

2 years

I only choose my hyperparameters from an Ouija board.

Christian Szegedy

@ChrSzegedy

2 years

New jobs in the 21st century: Model restart specialist Hyperparameter psychic Prompt engineer Model janitor Tensor shape mediator Quantum state observer Model footprint accountant

20

100

723

0

3

15

Xing Han Lu

@xhluca

2 months

Lots of quality work happening at @uwaterloo CS in @WenhuChen 's lab!

Adina Yakup

@AdeenaY8

2 months

虎头帮 TIGER-LAB🐯 The name caught my attention first, then I realized they were behind all these cool works! ⚔️ GenAI-Arena ⚔️ : Benchmarking Visual Generative Models in the Wild ✨Mantis: Optimized for multi-image reasoning with text/image format

1

7

41

1

14

Xing Han Lu

@xhluca

3 months

@jackclarkSF Princeton-NLP has 22 graduate students - 13.6 H100s/student Meta has 67K employees - 5.2 H100s/employee Meta would need <38% of their workforce being in engineering to match Princeton NLP.

4

0

14

Xing Han Lu

@xhluca

12 days

More precisely, it computes all possible token scores for every document in a corpus and store them in a sparse matrix (this idea is inspired by @jxmnop 's bm25-pt). Then, given a query you can sum up the relevant tokens to get a score for each document.

jack morris

@jxmnop

4 months

implemented a fast, GPU-enabled BM25 in pytorch! BM25 is a simple search algorithm from the 70s that works as well as neural networks for most search problems; for all the advances we've made in neural text retrieval, it's still around got near SOTA on stanford LoCO benchmark

9

62

650

3

1

14

Xing Han Lu

@xhluca

2 months

Adding instructions for finetuning llama-3 on weblinx. We'll find out tomorrow how good of a web agent @AIatMeta 's LLaMA-3 is

2

14

Xing Han Lu

@xhluca

12 days

More benchmarks available in the repo: Here's a collection of indices for public BEIR datasets: BM25S stands on the shoulders of giants: Rank-bm25 (1st python implementation), Pyserini and bm25-pt (which inspired this project).

BM25S Indices - a xhluca Collection

huggingface.co

3

0

13

Xing Han Lu

@xhluca

3 months

WebLINX is not just about making a large benchmark available to researchers. We wanted it to be easy to use and avoid wasting days preprocessing complex web data, so we built a library: You can load+run models in minutes on Colab:

Xing Han Lu

@xhluca

5 months

I'm going to make a thread later on why I think WebLINX () is really exciting, but I want to take a moment to say how amazing the @huggingface ecosystem has become. You can host the crucial parts of a project all at the same place. Let's take a look 👇

1

16

77

1

3

12

Xing Han Lu

@xhluca

3 months

@danfei_xu Neurips should make this into an opportunity to promote CS/ML to local students, e.g. through talks across high schools in the host city before/during/after the conference

0

13

Xing Han Lu

@xhluca

11 months

Extremely proud to have worked on this project🎉 I hope the findings will be useful for the community to build better QA systems using retrieval-augmented LLMs like @metaai LLaMA-2. Surprised that recall & precision are so much better than F1? Check out section 4.4 and 5.1!

Vaibhav Adlakha

@vaibhav_adlakha

11 months

🚨 Traditional question-answering metrics under-report the performance of instruction-following models like ChatGPT and Llama2. In fact, they are better than finetuned models but prone to hallucinations. Ditch F1 & use holistic metrics: Recall, K-Precision, answer abstinence 1/n

2

18

119

1

13

Xing Han Lu

@xhluca

1 year

Announcing 𝐝𝐥-𝐭𝐫𝐚𝐧𝐬𝐥𝐚𝐭𝐞 v0.3 🎉 It now supports translations across 200 languages via @MetaAI 's NLLB and uses @huggingface 's AutoModelForSeq2SeqLM behind the scene. Notebook: Release: New Docs:

1

0

13

Xing Han Lu

@xhluca

17 days

I am surprised the web agents community has not been paying attention on a lot of excellent new environments that came out recently. The most recent one, AndroidWorld, includes 20 real Android apps; in comparison, prev. benchmarks tend to have 5-6 sites/apps.

Chris Rawles

@thecrawles

1 month

1/ 🤖 Personal assistants of the future will be able to operate computers just like humans — by controlling user interfaces. To help make this vision a reality, we are excited to introduce AndroidWorld, a new benchmark for building and evaluating computer control agents. 🌍📱

5

13

38

1

0

13

Xing Han Lu

@xhluca

2 months

@AIatMeta @huggingface The training and evaluation code for the current model is all available on our GitHub Repository: We even include the exact YAML configs so you can perfectly run our training pipeline and improve upon them!

1

0

13

Xing Han Lu

@xhluca

3 months

BAGEL lead by @ShikharMurty at @stanfordnlp , which is an industry collaboration with @GoogleDeepMind Really cool exploration of human-less agent supervision!

Shikhar

@ShikharMurty

4 months

Want scalable LLM agents for websites and APIs, without human labeled data? We propose BAGEL, a method where agents synthesize their own data by exploring the environment first, leading to upto 13% improvement over zero shot agents, & automated discovery of use-cases in envs!

2

34

188

1

4

13

Xing Han Lu

@xhluca

2 years

The @SemanticScholar API is simply amazing. You query all papers papers, co-authors, and citations info for an author in just one or two queries. You can even get arxiv link, tl;dr, abstract, and specter embedding; all for free. And it only takes a few minutes to understand.

1

4

13

Xing Han Lu

@xhluca

1 year

I love the trend of training smaller LLMs for longer. When combined with half precision (or even 8-bit/4-bit), software optimizations and cheaper/better hardware, those models will run on consumer/accessible hardware while achieving a good performance on many real-world tasks.

Harm de Vries

@harmdevries77

1 year

Surprised by the loss of LLaMA-7B still going down after 1 trillion tokens? In a new blogpost, I explain why you shouldn't be and argue we haven't reached the limit of the recent trend of training smaller LLMs for longer: Analysis in 🧵👇

16

126

675

0

12

Xing Han Lu

@xhluca

3 years

@abhi1thakur I'd like to but LogisticRegression.from_pretrained('google/sota') returns me an error :/

0

1

12

Xing Han Lu

@xhluca

3 months

Many vector search companies will tweet about how incredible their proprietary tech is, which big name might be using it, how you can start a free trial worth $300 @tomaarsen at @huggingface will tweet about 200+ retrieval models you can start using right away (free+open-source)

tomaarsen

@tomaarsen

3 months

Big update for the Massive Text Embedding Benchmark (MTEB) intended to simplify finding a good embedding model! Model filtering, search, memory usage, model size in parameters. The updated leaderboard: Details in 🧵:

2

22

91

1

2

12

Xing Han Lu

@xhluca

3 months

I suggest computer vision folks start using this high-resolution, CC-licensed image of the "best city in North America" It has detail, flat regions, shading, and texture. Perfect for Super-res research :)

1

11

Xing Han Lu

@xhluca

4 months

Thank you for making WebLINX () trending on @huggingface Datasets ( #1 in conversational and 11 overall)! Not sure where to start? You only need load_dataset, snapshot_download & pipeline to get started:

Xing Han Lu

@xhluca

5 months

Dataset: Our data is large (150GB), and highly heterogenous (HTML, PNG, MP4, JSON). With 🤗 Dataset, making it available was straightforward (LFS + CLI), and with the data card you can give instructions on using it with load_dataset and snapshot_download.

2

0

5

0

11

Xing Han Lu

@xhluca

12 days

Many popular BM25 libraries are built on top of Lucene in Java. Although it is fast, it is not straightforward to use them in Python due to the need for a Java runtime. For example, to use @Elastic , you need to host a web server and connect via a Python client.

1

0

10

Xing Han Lu

@xhluca

3 months

So many groups in industry working on web agents/action models/benchmarks now! @GoogleDeepMind -> Pix2Act & WebGUM @ServiceNowRSRCH -> WorkArena + Case Study @MithrilSecurity -> LaVague Salesforce Research -> AgentOhana

2

1

10

Xing Han Lu

@xhluca

9 months

@prajjwal_1 Do you know what will be the timeline for releasing it on Huggingface?

0

10

Xing Han Lu

@xhluca

4 months

@ericjang11 @GoogleColab Also @kaggle ! They were the OG platform for allowing up to 100GB datasets

1

0

10

Xing Han Lu

@xhluca

23 days

@GoogleDeepMind

On the Effects of Data Scale on Computer Control Agents

Autonomous agents that control computer interfaces to accomplish human tasks are emerging. Leveraging LLMs to power such agents has been of special interest, but unless fine-tuned on...

arxiv.org

0

1

10

Xing Han Lu

@xhluca

26 days

@m2saxon @RubberDucky_AI @WilliamWangNLP @PMinervini I think it's probably simpler to set up projects at this point - especially since API keys are tied to projects now.

1

0

10

Xing Han Lu

@xhluca

2 months

Incredible results on the WebArena benchmark! 25% marginal improvement vs GPT-4-based method released last week (a great work by @pan_jiayipan from @alsuhr 's group at @berkeley_ai btw) how you use a model matters much more than what you use. looking forward browsergym+webllama

Alexandre Lacoste

@alex_lacoste_

2 months

🧵) We unexpectedly reach 🥇 on the leaderboard of #WebArena . While 25% is still far from human performance it is a large jump compared to the next best result. The performance gain is largely attributed to #BrowserGym leaderboard:

5

19

54

0

10

Xing Han Lu

@xhluca

11 days

Since 2009, over 16,300 papers mentioned "BM25". Yet, Robertson & Zaragoza (2009)'s "The Probabilistic Relevance Framework: BM25 and Beyond" only have ~3K citations. Obviously some might have cited other variants (2nd most cited is at 900), but still seems a big discrepancy.

2

0

10

Xing Han Lu

@xhluca

4 months

@natfriedman Just one? I can count at least two VisualWebArena: SeeAct: Code on GH for both so one can easily improve upon them given sufficient engineering skills.

Yu Su

@ysu_nlp

6 months

Generalist web agents may get here sooner than we thought---introducing SeeAct, a multimodal web agent built on GPT-4V(ision). What's this all about? > Back in June 2023, when we released Mind2Web () and envisioned generalist web agent, a language agent

18

149

648

1

2

10

Xing Han Lu

@xhluca

1 month

@willkurt @pfau What's interesting is that some senior ML/NLP researchers will write (occasionally or frequently) code. On top of my head, I can think of Chris Manning & Graham Neubig publicly contributing to open-source, Kyunghyun Cho has tweeted about his JAX/Torch experiments. I wonder if...

1

4

Xing Han Lu

@xhluca

3 months

SeeAct, a multimodal web agent built on GPT-4V, lead by @boyuan__zheng @BoyuGouNLP in @ysu_nlp 's lab at @osunlp Really cool experiments on grounding!

Yu Su

@ysu_nlp

6 months

Generalist web agents may get here sooner than we thought---introducing SeeAct, a multimodal web agent built on GPT-4V(ision). What's this all about? > Back in June 2023, when we released Mind2Web () and envisioned generalist web agent, a language agent

18

149

648

1

2

9

Xing Han Lu

@xhluca

2 months

@YifanJiang17 OSU has 100+ H100s and Princeton (PLI) has 300 H100s. PLI has fewer PIs than Stanford NLP

1

0

9

Xing Han Lu

@xhluca

2 years

Reproducibility in #ML is great and I'm glad to see it gain traction. But can we talk about sharing reusable/extensible code? Reproducible: 𝚐𝚒𝚝 𝚌𝚕𝚘𝚗𝚎 𝚖𝚢-𝚛𝚎𝚙𝚘 𝚙𝚒𝚙 𝚒𝚗𝚜𝚝𝚊𝚕𝚕 . 𝚙𝚢𝚝𝚑𝚘𝚗 𝚝𝚛𝚊𝚒𝚗_𝚊𝚗𝚍_𝚎𝚟𝚊𝚕𝚞𝚊𝚝𝚎_all.𝚙𝚢 Reusable:

2

8

Xing Han Lu

@xhluca

1 year

Very insightful work on the positional encoding in the context of length generalization. With larger contexts being made available (GPT-4 at 32k and Claude at 100k), the conclusion will likely be very useful in designing the next generation of long-context models.

Amirhossein Kazemnejad

@a_kazemnejad

1 year

🚨Stop using positional encoding (PE) in Transformer decoders (e.g. GPTs). Our work shows 𝗡𝗼𝗣𝗘 (no positional encoding) outperforms all variants like absolute, relative, ALiBi, Rotary. A decoder can learn PE in its representation (see proof). Time for 𝗡𝗼𝗣𝗘 𝗟𝗟𝗠𝘀🧵[1/n]

44

247

1K

0

2

9

Xing Han Lu

@xhluca

4 months

Glad to see more work in conversational web navigation! This time, WorkArena proposes an environment-based approach, which complements the observation-based approach of WebLINX. It is also more specialized on professional tasks, which could help improve worker productivity.

Aran Komatsuzaki

@arankomatsuzaki

4 months

WorkArena: How Capable are Web Agents at Solving Common Knowledge Work Tasks? Introduces benchmarks for evaluating LMs in automating knowledge work tasks, revealing a gap in full task automation and differences between open & closed-source models

2

24

110

1

0

9

Xing Han Lu

@xhluca

3 months

@giffmana -> Has published at CV conferences (CVPR, ICCV, ECCV) -> Has worked extensively with language models Recruiter: Why is no one qualified for this position???

3

0

9

Xing Han Lu

@xhluca

4 months

@DynamicWebPaige 1: Asst Prof actually advising the project 2-4: tenure profs attending bimonthly meetings and LGTM'ing the overleaf 5: grad student doing 90% of the work 6: ugrad intern, trying hard to be helpful but keeps breaking the codebase and merging to main 7: Sr. PhD candidate who pro...

2

0

7

Xing Han Lu

@xhluca

4 months

As an individual dev, AMD/Rocm is really user unfriendly, borderline unusable: rocm-smi gets deprecated and there're no straightforward instructions on how to install it; pytorch official instructions do not work for Rocm, need to use an obscure guide deep in rocm docs: docker...

1

0

7

Xing Han Lu

@xhluca

1 year

@oscmansan @CVPR 🛑 stop right there! Here's 💯new AI 🤖 tools that came out in the past 5 minutes ⏰ that you absolutely need to learn if you don't wanna fall behind 🏃 Lets start 1/102 👇

1

0

8

Xing Han Lu

@xhluca

2 months

Should I go to #icml2024 in a lynx costume to stand out?

jack morris

@jxmnop

2 months

ok so i'm at ICLR; this is my first machine learning conference. as you might imagine, it's all very fun and exciting. but these poster sessions are absolutely INSANE this is an airplane hanger crammed with hundreds of posters, each with dozens of people talking over each other

14

7

216

3

0

8

Xing Han Lu

@xhluca

2 months

@teortaxesTex The reported score for Gpt-4v include turn-level screenshots from the dataset. So even with a vision advantage, 4V struggles compared to much smaller finetuned ones. Vision -> action works well! We finetuned based on pix2act ( @ptshaw2 et al) and it was better than 4V 0-shot.

1

0

8

Xing Han Lu

@xhluca

12 days

Beyond that, it allows using memory-mapping instead of loading everything in memory, which dramatically reduces RAM usage. This allows you to query across millions of documents in real time on a single CPU thread. Here's a side-by-side comparison (BM25S start at ~10s)

1

0

9

Xing Han Lu

@xhluca

3 months

@jxmnop

0

8

Xing Han Lu

@xhluca

5 months

Couldn't have answered better than @SemanticScholar 's AI summarizer🙃

0

8

Xing Han Lu

@xhluca

12 days

I'm sad to see that Reddit is no longer a good place to share ML/NLP project these days (apart from /r/localllama which is amazing). Issues with copy/pasting/formatting, posts getting removed again and again for many reasons, very little engagement (100 users online out of 3M).

1

0

8

Xing Han Lu

@xhluca

3 years

@abhi1thakur @kaggle I never put images in my notebooks. The time spent picking an image could've been used to stack more layers on my Roberta models

2

0

7

Xing Han Lu

@xhluca

8 months

@YiTayML What % of Google Brain was non-PhD (that did not go through the residency program)?

1

0

6

Xing Han Lu

@xhluca

3 years

I find it interesting how libraries like @huggingface has not only enabled better reproducibility, but also better model/task transferability in NLP. A few years ago, you might have been able to smoothly reproduce the results of a paper by just following the readme, but... (1/n)

1

0

7

Xing Han Lu

@xhluca

3 months

New Google Scholar Plugin on Chrome is pretty neat! I especially like the night mode (perfect for reading papers at night without burning my eyes)

1

0

7

Xing Han Lu

@xhluca

11 months

The advantage of attending a conference at McGill/Montreal is that you can pay 1/4 of big conference registration fees + you get to meet people working on the same subject as you. Really great for everyone that CoLLAs and TMLR decided to partner this year!

Gautam Kamath

@thegautamkamath

11 months

10/12 papers in the journal track are published at @TmlrOrg ! A great way to submit to TMLR but still have an opportunity to present your work at a conference.

0

3

18

1

0

7

Xing Han Lu

@xhluca

11 months

I think SILO will be extremely important for domains with asymmetric public-private data availability. It's already clear how it'll be useful for medical, enterprise and personalized use cases, but I can imagine its impact will reach much further than this.

Suchin Gururangan

@ssgrn

11 months

Feel risky to train your language model on copyrighted data? Check out our new LM called SILO✨, with co-lead @sewon__min Recipe: collect public domain & permissively licensed text data, fit parameters on it, and use the rest of the data in an inference-time-only datastore.

2

55

241

1

3

7

Xing Han Lu

@xhluca

3 months

An exception is @weaviate_io which is fully open source and @_jphwang has been doing an incredible job with tons of talks/guides/workshops, e.g.

JP Hwang

@_jphwang

8 months

I had a blast talking about AI tech and web apps at @devreach (Tbh the best part was the amazing people, but this was a close 2nd!) 😉 The talk is beginner friendly. Check it out if interested in #AI #search or #LLMs for web apps!

1

2

7

1

2

7

Xing Han Lu

@xhluca

3 months

0

7

Xing Han Lu

@xhluca

11 months

@TaliaRinger I wish there's a unique fediverse ID that can be used in any instance so if an account gets banned or if someone tries to impersonate you, you can easily move to a new account with your subscribers/subscriptions.

6

2

7

Xing Han Lu

@xhluca

3 months

Very interesting ideas explored in this paper! Also really like that the iterative improvements works so well.

Shikhar

@ShikharMurty

4 months

Want scalable LLM agents for websites and APIs, without human labeled data? We propose BAGEL, a method where agents synthesize their own data by exploring the environment first, leading to upto 13% improvement over zero shot agents, & automated discovery of use-cases in envs!

2

34

188

0

1

7

Xing Han Lu

@xhluca

4 months

@dhuynh95 Great release! Would love to see how well WebLINX models () perform inside the framework :)

WebLINX Models - a McGill-NLP Collection

huggingface.co

2

0

7

Xing Han Lu

@xhluca

1 year

Can't wait for 0.5-bit inference with a 65B param model trained with O(1/n) attention and running on a Mx iPad.

Jack Rae

@drjwrae

1 year

By this point I'm expecting Tri Dao to derive an O(1/n) attention implementation

0

8

152

0

7

Xing Han Lu

@xhluca

3 years

I'm really impressed by @pyodide 's support for py<>js proxies. You can easily create @reactjs function components with useState hooks and render them with React DOM - all with pure Python code.

2

0

6

Xing Han Lu

@xhluca

10 months

I think anyone training ML models should spend 30-45min reading through the Hydra docs (). Even if you don't need it now, you will likely need it one day.

1

0

5

Xing Han Lu

@xhluca

7 months

@benno_krojer @NeurIPSConf For #1 , maybe NeurIPS could partner with Semantic Scholar to create a "Related papers" section on the website? Something similar to the existing feature on Semantic Scholar, except it's filtered to only show NeurIPS papers linking to the schedule.

1

0

6

Xing Han Lu

@xhluca

3 months

Converting any LLM decoder into a retrieval model is pretty neat, since you won't need a separate decoder to train. Also really good to see a method solely using open data. Too many retrievers achieve a good performance but are impossible to reproduce due to private data.

Vaibhav Adlakha

@vaibhav_adlakha

3 months

We introduce LLM2Vec, a simple approach to transform any decoder-only LLM into a text encoder. We achieve SOTA performance on MTEB in the unsupervised and supervised category (among the models trained only on publicly available data). 🧵1/N Paper:

14

163

858

0

6

Xing Han Lu

@xhluca

3 months

@EugeneVinitsky Would you take a (fully-funded) student with 0 programming/optimization/ML experience today?

1

0

6