Xing Han Lu Profile Banner
Xing Han Lu Profile
Xing Han Lu

@xhluca

1,636
Followers
227
Following
174
Media
1,841
Statuses

๐Ÿ”จ Web Agents @Mila_Quebec

๐Ÿ–ฅ๏ธ+๐Ÿฆ™=๐Ÿš€
Joined December 2017
Don't wanna be here? Send us removal request.
Pinned Tweet
@xhluca
Xing Han Lu
2 months
Delighted to release โœจLlama-3-8B-Webโœจ, the most capable agent built for web navigation by following instructions and replying๐Ÿ’ฌ. It surpasses GPT-4V* by 18% on WebLINX, a benchmark for web navigation with dialogue. Model: Code:
Tweet media one
Tweet media two
19
187
1K
@xhluca
Xing Han Lu
12 days
Announcing โšกBM25S, a fast lexical retrieval library! ๐ŸŽ๏ธ Up to 500x faster than the most popular Python lib, matches @Elastic search (BM25 default) ๐Ÿค— First BM25 library that is directly integrated with @huggingface hub: load or save in 1 line! GitHub:
Tweet media one
Tweet media two
Tweet media three
17
106
688
@xhluca
Xing Han Lu
3 years
Now that @huggingface 's transformers 4.4.0 is out, I'm happy to release dl-translate, a library for text translation between 50 languages and built on top of ๐Ÿค— transformers and @facebookai 's mBART Repo: Docs:
Tweet media one
2
36
222
@xhluca
Xing Han Lu
23 days
Another solid work by the Android agents team at @GoogleDeepMind . 800+ apps interactions is massive. It would probably cover most of the use cases we can see on Android (so one can spend less time on thinking about generalization and focus on doing well on validation).
Tweet media one
6
45
220
@xhluca
Xing Han Lu
1 year
CTRL () by Keskar+ is such an under-appreciated paper, considering it introduce the idea of control code/prompts for controllable generation (based on GPT-2) almost 3 years before InstructGPT.
4
29
151
@xhluca
Xing Han Lu
3 months
Is this the DSPy moment of text-to-image generation? Congratulations @oscmansan @Piovrasca et al!
Tweet media one
@_akhaliq
AK
3 months
Improving Text-to-Image Consistency via Automatic Prompt Optimization Impressive advances in text-to-image (T2I) generative models have yielded a plethora of high performing models which are able to generate aesthetically appealing, photorealistic images. Despite the progress,
Tweet media one
4
43
153
0
16
97
@xhluca
Xing Han Lu
3 months
I'm seriously surprised I am not seeing more hype around @valence_ai @PrescientDesign @IsomorphicLabs Those groups are clearly working on the most impactful applications of ML and are at the forefront of biotech breakthroughs.
4
7
75
@xhluca
Xing Han Lu
3 years
I think this @huggingface blog post by @PatrickPlaten is by far my favorite resource on post-hoc text generation
0
15
77
@xhluca
Xing Han Lu
3 years
The @huggingface hub is still new but it's already super convenient for downloading transformers. I was looking for a way to download libraries + models so could be used offline. In total it took 6 lines (4 in terminal + 2 in python) thanks to the hub โฌ‡๏ธ
Tweet media one
0
17
77
@xhluca
Xing Han Lu
5 months
I'm going to make a thread later on why I think WebLINX () is really exciting, but I want to take a moment to say how amazing the @huggingface ecosystem has become. You can host the crucial parts of a project all at the same place. Let's take a look ๐Ÿ‘‡
@sivareddyg
Siva Reddy
5 months
Introducing WebLINX ๐Ÿฏ, a large benchmark for AI agents navigating real websites with multi-turn dialogue. 100K interactions across 2300 demonstrations on 150 real-word websites. Includes HTML, screenshots and videos. Tests unseen sites, tasks, blind users
Tweet media one
Tweet media two
Tweet media three
7
66
252
1
16
77
@xhluca
Xing Han Lu
1 year
This is probably the nicest looking poster at #ACL2023NLP (and also very interesting + insightful!)
Tweet media one
3
6
74
@xhluca
Xing Han Lu
5 months
@srush_nlp @BlancheMinerva The 6's only weakness was: The model still has a quadratic memory requirement during training like Transformers. Except it doesn't...
2
0
68
@xhluca
Xing Han Lu
2 months
@AIatMeta Not only is Llama-3 better than GPT-4V (*in a zero-shot setting), it also surpasses all other finetuned models by a large margin, including the Flan-based MindAct-3B and GPT-3.5-Turbo (trained for same # epochs). We even observe a 15% relative improvement wrt last-gen Llama-7B.
Tweet media one
2
5
64
@xhluca
Xing Han Lu
2 months
The most incredible about the new LLM2VEC Llama-3 is that it is top-10 on MTEB with ONLY 168MB extra usage over Llama-3, since it uses LoRA. That's 0.168GB! This, with batching, makes RAG super fast and low-memory. In comparison, GTE-Large is 1.5GB and gte-Qwen is 30GB.
Tweet media one
@vaibhav_adlakha
Vaibhav Adlakha
2 months
LLM2Vec meets Meta-Llama-3 โ†’ new SOTA among models trained on public data ๐Ÿฅ‡. We applied our LLM2Vec approach to Meta-Llama-3-8B and it works like a charm. Each step of the LLM2Vec pipeline improves the model's performance on embedding tasks ๐Ÿ‘‡. 1/N Models:
3
22
156
3
5
57
@xhluca
Xing Han Lu
1 year
There are great UIs for @metaai 's LLaMA-2, e.g.: @yvrjsharma -> @a16z -> However, they rely on web APIs like TGI and @replicatehq , so I built a @gradio UI, running @huggingface Transformers locally. Code:
4
10
52
@xhluca
Xing Han Lu
10 months
The new @huggingface Collections feature is pretty neat - it allows you to gather everything about a project in the same place - manuscript, dataset, models, etc.
Tweet media one
2
10
47
@xhluca
Xing Han Lu
2 months
To create Llama-3-8B-Web, we finetuned @AIatMeta 's Llama-3-8B-Instruct (released last Thursday) on 24K web interactions from the WebLINX training set, including clicking, typing, submitting forms, and replies. The dataset covers 150 websites from 15 geographic locations.
Tweet media one
Tweet media two
2
0
42
@xhluca
Xing Han Lu
1 year
There's quite a few vector db startups ( @pinecone @weaviate_io etc.), but which startups focus on producing retrievers or embeddings-as-a-service? @OpenAI does not offer fine-tuning whereas @CohereAI embeddings do not seem to be top-10 on MTEB:
9
8
42
@xhluca
Xing Han Lu
2 months
@AIatMeta @huggingface Is that all? Of course not! We are also launching the ๐Ÿ–ฅ๏ธWebLlama project (), with the goal to make it easy for you to train, evaluate, and deploy Llama-3 agents! We want to build agents that won't replace users, but equip them with powerful assistants.
Tweet media one
1
5
41
@xhluca
Xing Han Lu
1 year
Happy to share our #eacl2023 paper, which introduces a new dataset for Table Retrieval in Conversations: the ๐Ÿ๊œฑแด›แด€แด›แด„แด€ษด แด…ษชแด€สŸแดษขแดœแด‡ แด…แด€แด›แด€๊œฑแด‡แด› ๐Ÿ Paper: Homepage: A work with @sivareddyg and @harmdevries77 ๐Ÿงต๐Ÿ‘‡
2
16
38
@xhluca
Xing Han Lu
1 year
@soumithchintala Switch Transformers-style sparse mixture, or Kaggle-style mixture? Former is more than just "little trick".
2
0
35
@xhluca
Xing Han Lu
9 months
@ylecun @ClementDelangue Scikit-Learn --> Inria (Paris) Torch --> Idiap/EPFL (Switzerland) Theano --> Lisa/UdeM (Montreal) Keras --> Franรงois Chollet FAISS, DINO, DETR, LLAMA --> FAIR Paris Tokenizers, Optimum, Accelerate --> Huggingface Wonder if there's something different about speaking French..
0
1
32
@xhluca
Xing Han Lu
19 days
Any good web/navigation agents survey out there? I know there's "A Survey on Large Language Model based Autonomous Agents" by Wang+ but there's a few important works that came out after v1 (Android in the Wild, AndroidWorld, OSWorld, VisualWebArena, WorkArena, etc.)
Tweet media one
Tweet media two
1
3
26
@xhluca
Xing Han Lu
3 years
Turns out running Jupyter Dash on @kaggle is a piece of cake thanks to pyngrok Check out the notebook: cc @plotlygraphs
3
2
26
@xhluca
Xing Han Lu
2 months
@AIatMeta Llama-3 Web is tightly integrated with the @huggingface ecosystem: you can load dataset in๐Ÿค—Dataset and the agent from๐Ÿค—Hub with pipeline, then predict actions in <10 lines. For that, I'm really thankful for the hard work by the team, especially today!
Tweet media one
@ClementDelangue
clem ๐Ÿค—
2 months
The GPT4 of datasets took down Hugging Face, sorry all ๐Ÿ˜…๐Ÿ˜…๐Ÿ˜…
25
43
907
1
2
26
@xhluca
Xing Han Lu
1 year
@amilios I think this shows the importance of decoupling modelling from evaluation, and standardizing the latter.
1
1
26
@xhluca
Xing Han Lu
2 months
@AIatMeta @huggingface To train a good model, great data is everything! That's why we are focusing on adding more data to our training mix to create even better agents. Our next target is @osunlp 's Mind2Web (see video), a dataset for autonomous navigation covering 137 websites
@ysu_nlp
Yu Su
1 year
What would be the most wild environment for grounding & empowering LLMs? ๐Ÿ‘‰The entire Internet! ๐Ÿ“ข Mind2Web: Towards a Generalist Agent for the Web () Led by amazing @osunlp student @XiangDeng1 #NLProc
Tweet media one
4
27
89
1
3
24
@xhluca
Xing Han Lu
3 years
You can easily create datasets from @kaggle notebooks, but they will be limited by the output size (~20GB). If you want to create/update larger datasets (up to 100GB), you will need the Kaggle #API . I created this short tutorial to show you how:
Tweet media one
1
5
23
@xhluca
Xing Han Lu
2 months
@AIatMeta At this point, you probably expect a cool video showing Llama-3-Web in action! Well, you'll be need to be patient๐Ÿ˜‰ But it's crucial to remember that demo โ‰  good systematic performance! That's why Llama-3-Web is evaluated on 4 OOD test splits covering 1000+ real-world demos.
Tweet media one
1
1
20
@xhluca
Xing Han Lu
2 years
Cool feature by @SemanticScholar : Integrated PDF reader where the hyperlinks brings you directly to a paper's semantic page.
Tweet media one
1
2
19
@xhluca
Xing Han Lu
5 months
Thank you for making @huggingface the amazing open research platform it is today @julien_c @Thom_Wolf @LysandreJik @qlhoest @mmitchell_ai @SanhEstPasMoi @_akhaliq @moi_anthony @mervenoyann @_philschmid (and many members of the team + alumni)
@xhluca
Xing Han Lu
5 months
I'm going to make a thread later on why I think WebLINX () is really exciting, but I want to take a moment to say how amazing the @huggingface ecosystem has become. You can host the crucial parts of a project all at the same place. Let's take a look ๐Ÿ‘‡
1
16
77
2
1
18
@xhluca
Xing Han Lu
3 months
Two weeks ago I shared about the industry groups working on agents, but there's many academic/non-profit labs also working on web agents recently! A quick thread below of works released/updated this year (not exhaustive, so please comment if i missed anything)
@xhluca
Xing Han Lu
3 months
So many groups in industry working on web agents/action models/benchmarks now! @GoogleDeepMind -> Pix2Act & WebGUM @ServiceNowRSRCH -> WorkArena + Case Study @MithrilSecurity -> LaVague Salesforce Research -> AgentOhana
2
1
10
2
2
18
@xhluca
Xing Han Lu
1 year
@karpathy I think it depends on the use case. Fast prototyping, data displaying, notebook-to-ui ML demos that fit common patterns, building apps in just a few lines Enterprise-ready, REST-based, scalable apps
1
1
16
@xhluca
Xing Han Lu
2 years
Starting to feel @weights_biases is the @NotionHQ of model training.
0
3
17
@xhluca
Xing Han Lu
2 months
@AIatMeta @huggingface @osunlp Evaluation is very important! A goal of WebLlama will be to provide reliable evaluation for many benchmarks (incl. M2W). Dynamic benchmarks are on our mind, and a few exciting ones include @shuyanzhxyc 's WebArena, @kohjingyu 's VisualWebArena and @alexandredrouin 's WorkArena.
Tweet media one
Tweet media two
Tweet media three
2
1
17
@xhluca
Xing Han Lu
2 years
Nice easter egg by @weights_biases for ๐Ÿฅงday (those are auto-generated names when you create a new run).
Tweet media one
0
3
17
@xhluca
Xing Han Lu
3 years
@karpathy I'm curious how much of the improvement is in the architecture vs the improved training tools thanks to timm, considering Resnet-50 can achieve 80%+ without any architecture change:
2
0
16
@xhluca
Xing Han Lu
12 days
With Python-based implementation like BM25S and Rank-BM25, you can tokenize your text, index and retrieve in ~10 lines. However, simply implementing with Numpy may not achieve the same speed as Java-based implementation. BM25S is different: it uses scipy to store eager scores.
Tweet media one
1
0
16
@xhluca
Xing Han Lu
2 months
@AIatMeta @huggingface @osunlp @shuyanzhxyc @kohjingyu @alexandredrouin Finally, it doesn't matter how good an agent is, if we can't use them! We want WebLlama agents to be easily deployed for end users, so we are planning integrations with deployment platforms, including @ServiceNowRSRCH 's BrowserGym (demo below), LaVague, Playwright, and more!
4
1
16
@xhluca
Xing Han Lu
2 years
I only choose my hyperparameters from an Ouija board.
@ChrSzegedy
Christian Szegedy
2 years
New jobs in the 21st century: Model restart specialist Hyperparameter psychic Prompt engineer Model janitor Tensor shape mediator Quantum state observer Model footprint accountant
20
100
723
0
3
15
@xhluca
Xing Han Lu
2 months
Lots of quality work happening at @uwaterloo CS in @WenhuChen 's lab!
@AdeenaY8
Adina Yakup
2 months
่™Žๅคดๅธฎ TIGER-LAB๐Ÿฏ The name caught my attention first, then I realized they were behind all these cool works! โš”๏ธ GenAI-Arena โš”๏ธ : Benchmarking Visual Generative Models in the Wild โœจMantis: Optimized for multi-image reasoning with text/image format
1
7
41
1
1
14
@xhluca
Xing Han Lu
3 months
@jackclarkSF Princeton-NLP has 22 graduate students - 13.6 H100s/student Meta has 67K employees - 5.2 H100s/employee Meta would need <38% of their workforce being in engineering to match Princeton NLP.
4
0
14
@xhluca
Xing Han Lu
12 days
More precisely, it computes all possible token scores for every document in a corpus and store them in a sparse matrix (this idea is inspired by @jxmnop 's bm25-pt). Then, given a query you can sum up the relevant tokens to get a score for each document.
@jxmnop
jack morris
4 months
implemented a fast, GPU-enabled BM25 in pytorch! BM25 is a simple search algorithm from the 70s that works as well as neural networks for most search problems; for all the advances we've made in neural text retrieval, it's still around got near SOTA on stanford LoCO benchmark
Tweet media one
9
62
650
3
1
14
@xhluca
Xing Han Lu
2 months
Adding instructions for finetuning llama-3 on weblinx. We'll find out tomorrow how good of a web agent @AIatMeta 's LLaMA-3 is
Tweet media one
2
2
14
@xhluca
Xing Han Lu
12 days
More benchmarks available in the repo: Here's a collection of indices for public BEIR datasets: BM25S stands on the shoulders of giants: Rank-bm25 (1st python implementation), Pyserini and bm25-pt (which inspired this project).
3
0
13
@xhluca
Xing Han Lu
3 months
WebLINX is not just about making a large benchmark available to researchers. We wanted it to be easy to use and avoid wasting days preprocessing complex web data, so we built a library: You can load+run models in minutes on Colab:
Tweet media one
Tweet media two
@xhluca
Xing Han Lu
5 months
I'm going to make a thread later on why I think WebLINX () is really exciting, but I want to take a moment to say how amazing the @huggingface ecosystem has become. You can host the crucial parts of a project all at the same place. Let's take a look ๐Ÿ‘‡
1
16
77
1
3
12
@xhluca
Xing Han Lu
3 months
@danfei_xu Neurips should make this into an opportunity to promote CS/ML to local students, e.g. through talks across high schools in the host city before/during/after the conference
0
0
13
@xhluca
Xing Han Lu
11 months
Extremely proud to have worked on this project๐ŸŽ‰ I hope the findings will be useful for the community to build better QA systems using retrieval-augmented LLMs like @metaai LLaMA-2. Surprised that recall & precision are so much better than F1? Check out section 4.4 and 5.1!
@vaibhav_adlakha
Vaibhav Adlakha
11 months
๐Ÿšจ Traditional question-answering metrics under-report the performance of instruction-following models like ChatGPT and Llama2. In fact, they are better than finetuned models but prone to hallucinations. Ditch F1 & use holistic metrics: Recall, K-Precision, answer abstinence 1/n
Tweet media one
Tweet media two
Tweet media three
2
18
119
1
1
13
@xhluca
Xing Han Lu
1 year
Announcing ๐๐ฅ-๐ญ๐ซ๐š๐ง๐ฌ๐ฅ๐š๐ญ๐ž v0.3 ๐ŸŽ‰ It now supports translations across 200 languages via @MetaAI 's NLLB and uses @huggingface 's AutoModelForSeq2SeqLM behind the scene. Notebook: Release: New Docs:
Tweet media one
1
0
13
@xhluca
Xing Han Lu
17 days
I am surprised the web agents community has not been paying attention on a lot of excellent new environments that came out recently. The most recent one, AndroidWorld, includes 20 real Android apps; in comparison, prev. benchmarks tend to have 5-6 sites/apps.
@thecrawles
Chris Rawles
1 month
1/ ๐Ÿค– Personal assistants of the future will be able to operate computers just like humans โ€” by controlling user interfaces. To help make this vision a reality, we are excited to introduce AndroidWorld, a new benchmark for building and evaluating computer control agents. ๐ŸŒ๐Ÿ“ฑ
5
13
38
1
0
13
@xhluca
Xing Han Lu
2 months
@AIatMeta @huggingface The training and evaluation code for the current model is all available on our GitHub Repository: We even include the exact YAML configs so you can perfectly run our training pipeline and improve upon them!
Tweet media one
1
0
13
@xhluca
Xing Han Lu
3 months
BAGEL lead by @ShikharMurty at @stanfordnlp , which is an industry collaboration with @GoogleDeepMind Really cool exploration of human-less agent supervision!
Tweet media one
@ShikharMurty
Shikhar
4 months
Want scalable LLM agents for websites and APIs, without human labeled data? We propose BAGEL, a method where agents synthesize their own data by exploring the environment first, leading to upto 13% improvement over zero shot agents, & automated discovery of use-cases in envs!
Tweet media one
2
34
188
1
4
13
@xhluca
Xing Han Lu
2 years
The @SemanticScholar API is simply amazing. You query all papers papers, co-authors, and citations info for an author in just one or two queries. You can even get arxiv link, tl;dr, abstract, and specter embedding; all for free. And it only takes a few minutes to understand.
1
4
13
@xhluca
Xing Han Lu
1 year
I love the trend of training smaller LLMs for longer. When combined with half precision (or even 8-bit/4-bit), software optimizations and cheaper/better hardware, those models will run on consumer/accessible hardware while achieving a good performance on many real-world tasks.
@harmdevries77
Harm de Vries
1 year
Surprised by the loss of LLaMA-7B still going down after 1 trillion tokens? In a new blogpost, I explain why you shouldn't be and argue we haven't reached the limit of the recent trend of training smaller LLMs for longer: Analysis in ๐Ÿงต๐Ÿ‘‡
Tweet media one
16
126
675
0
0
12
@xhluca
Xing Han Lu
3 years
@abhi1thakur I'd like to but LogisticRegression.from_pretrained('google/sota') returns me an error :/
0
1
12
@xhluca
Xing Han Lu
3 months
Many vector search companies will tweet about how incredible their proprietary tech is, which big name might be using it, how you can start a free trial worth $300 @tomaarsen at @huggingface will tweet about 200+ retrieval models you can start using right away (free+open-source)
@tomaarsen
tomaarsen
3 months
Big update for the Massive Text Embedding Benchmark (MTEB) intended to simplify finding a good embedding model! Model filtering, search, memory usage, model size in parameters. The updated leaderboard: Details in ๐Ÿงต:
Tweet media one
2
22
91
1
2
12
@xhluca
Xing Han Lu
3 months
I suggest computer vision folks start using this high-resolution, CC-licensed image of the "best city in North America" It has detail, flat regions, shading, and texture. Perfect for Super-res research :)
Tweet media one
1
1
11
@xhluca
Xing Han Lu
4 months
Thank you for making WebLINX () trending on @huggingface Datasets ( #1 in conversational and 11 overall)! Not sure where to start? You only need load_dataset, snapshot_download & pipeline to get started:
Tweet media one
Tweet media two
@xhluca
Xing Han Lu
5 months
Dataset: Our data is large (150GB), and highly heterogenous (HTML, PNG, MP4, JSON). With ๐Ÿค— Dataset, making it available was straightforward (LFS + CLI), and with the data card you can give instructions on using it with load_dataset and snapshot_download.
2
0
5
0
0
11
@xhluca
Xing Han Lu
12 days
Many popular BM25 libraries are built on top of Lucene in Java. Although it is fast, it is not straightforward to use them in Python due to the need for a Java runtime. For example, to use @Elastic , you need to host a web server and connect via a Python client.
Tweet media one
Tweet media two
1
0
10
@xhluca
Xing Han Lu
3 months
So many groups in industry working on web agents/action models/benchmarks now! @GoogleDeepMind -> Pix2Act & WebGUM @ServiceNowRSRCH -> WorkArena + Case Study @MithrilSecurity -> LaVague Salesforce Research -> AgentOhana
2
1
10
@xhluca
Xing Han Lu
9 months
@prajjwal_1 Do you know what will be the timeline for releasing it on Huggingface?
0
0
10
@xhluca
Xing Han Lu
4 months
@ericjang11 @GoogleColab Also @kaggle ! They were the OG platform for allowing up to 100GB datasets
1
0
10
@xhluca
Xing Han Lu
26 days
@m2saxon @RubberDucky_AI @WilliamWangNLP @PMinervini I think it's probably simpler to set up projects at this point - especially since API keys are tied to projects now.
1
0
10
@xhluca
Xing Han Lu
2 months
Incredible results on the WebArena benchmark! 25% marginal improvement vs GPT-4-based method released last week (a great work by @pan_jiayipan from @alsuhr 's group at @berkeley_ai btw) how you use a model matters much more than what you use. looking forward browsergym+webllama
Tweet media one
@alex_lacoste_
Alexandre Lacoste
2 months
๐Ÿงต) We unexpectedly reach ๐Ÿฅ‡ on the leaderboard of #WebArena . While 25% is still far from human performance it is a large jump compared to the next best result. The performance gain is largely attributed to #BrowserGym leaderboard:
5
19
54
0
0
10
@xhluca
Xing Han Lu
11 days
Since 2009, over 16,300 papers mentioned "BM25". Yet, Robertson & Zaragoza (2009)'s "The Probabilistic Relevance Framework: BM25 and Beyond" only have ~3K citations. Obviously some might have cited other variants (2nd most cited is at 900), but still seems a big discrepancy.
2
0
10
@xhluca
Xing Han Lu
4 months
@natfriedman Just one? I can count at least two VisualWebArena: SeeAct: Code on GH for both so one can easily improve upon them given sufficient engineering skills.
@ysu_nlp
Yu Su
6 months
Generalist web agents may get here sooner than we thought---introducing SeeAct, a multimodal web agent built on GPT-4V(ision). What's this all about? > Back in June 2023, when we released Mind2Web () and envisioned generalist web agent, a language agent
18
149
648
1
2
10
@xhluca
Xing Han Lu
1 month
@willkurt @pfau What's interesting is that some senior ML/NLP researchers will write (occasionally or frequently) code. On top of my head, I can think of Chris Manning & Graham Neubig publicly contributing to open-source, Kyunghyun Cho has tweeted about his JAX/Torch experiments. I wonder if...
1
1
4
@xhluca
Xing Han Lu
3 months
SeeAct, a multimodal web agent built on GPT-4V, lead by @boyuan__zheng @BoyuGouNLP in @ysu_nlp 's lab at @osunlp Really cool experiments on grounding!
Tweet media one
@ysu_nlp
Yu Su
6 months
Generalist web agents may get here sooner than we thought---introducing SeeAct, a multimodal web agent built on GPT-4V(ision). What's this all about? > Back in June 2023, when we released Mind2Web () and envisioned generalist web agent, a language agent
18
149
648
1
2
9
@xhluca
Xing Han Lu
2 months
@YifanJiang17 OSU has 100+ H100s and Princeton (PLI) has 300 H100s. PLI has fewer PIs than Stanford NLP
1
0
9
@xhluca
Xing Han Lu
2 years
Reproducibility in #ML is great and I'm glad to see it gain traction. But can we talk about sharing reusable/extensible code? Reproducible: ๐š๐š’๐š ๐šŒ๐š•๐š˜๐š—๐šŽ ๐š–๐šข-๐š›๐šŽ๐š™๐š˜ ๐š™๐š’๐š™ ๐š’๐š—๐šœ๐š๐šŠ๐š•๐š• . ๐š™๐šข๐š๐š‘๐š˜๐š— ๐š๐š›๐šŠ๐š’๐š—_๐šŠ๐š—๐š_๐šŽ๐šŸ๐šŠ๐š•๐šž๐šŠ๐š๐šŽ_all.๐š™๐šข Reusable:
2
2
8
@xhluca
Xing Han Lu
1 year
Very insightful work on the positional encoding in the context of length generalization. With larger contexts being made available (GPT-4 at 32k and Claude at 100k), the conclusion will likely be very useful in designing the next generation of long-context models.
@a_kazemnejad
Amirhossein Kazemnejad
1 year
๐ŸšจStop using positional encoding (PE) in Transformer decoders (e.g. GPTs). Our work shows ๐—ก๐—ผ๐—ฃ๐—˜ (no positional encoding) outperforms all variants like absolute, relative, ALiBi, Rotary. A decoder can learn PE in its representation (see proof). Time for ๐—ก๐—ผ๐—ฃ๐—˜ ๐—Ÿ๐—Ÿ๐— ๐˜€๐Ÿงต[1/n]
Tweet media one
Tweet media two
44
247
1K
0
2
9
@xhluca
Xing Han Lu
4 months
Glad to see more work in conversational web navigation! This time, WorkArena proposes an environment-based approach, which complements the observation-based approach of WebLINX. It is also more specialized on professional tasks, which could help improve worker productivity.
Tweet media one
@arankomatsuzaki
Aran Komatsuzaki
4 months
WorkArena: How Capable are Web Agents at Solving Common Knowledge Work Tasks? Introduces benchmarks for evaluating LMs in automating knowledge work tasks, revealing a gap in full task automation and differences between open & closed-source models
Tweet media one
2
24
110
1
0
9
@xhluca
Xing Han Lu
3 months
@giffmana -> Has published at CV conferences (CVPR, ICCV, ECCV) -> Has worked extensively with language models Recruiter: Why is no one qualified for this position???
3
0
9
@xhluca
Xing Han Lu
4 months
@DynamicWebPaige 1: Asst Prof actually advising the project 2-4: tenure profs attending bimonthly meetings and LGTM'ing the overleaf 5: grad student doing 90% of the work 6: ugrad intern, trying hard to be helpful but keeps breaking the codebase and merging to main 7: Sr. PhD candidate who pro...
2
0
7
@xhluca
Xing Han Lu
4 months
As an individual dev, AMD/Rocm is really user unfriendly, borderline unusable: rocm-smi gets deprecated and there're no straightforward instructions on how to install it; pytorch official instructions do not work for Rocm, need to use an obscure guide deep in rocm docs: docker...
1
0
7
@xhluca
Xing Han Lu
1 year
@oscmansan @CVPR ๐Ÿ›‘ stop right there! Here's ๐Ÿ’ฏnew AI ๐Ÿค– tools that came out in the past 5 minutes โฐ that you absolutely need to learn if you don't wanna fall behind ๐Ÿƒ Lets start 1/102 ๐Ÿ‘‡
1
0
8
@xhluca
Xing Han Lu
2 months
Should I go to #icml2024 in a lynx costume to stand out?
@jxmnop
jack morris
2 months
ok so i'm at ICLR; this is my first machine learning conference. as you might imagine, it's all very fun and exciting. but these poster sessions are absolutely INSANE this is an airplane hanger crammed with hundreds of posters, each with dozens of people talking over each other
Tweet media one
Tweet media two
14
7
216
3
0
8
@xhluca
Xing Han Lu
2 months
@teortaxesTex The reported score for Gpt-4v include turn-level screenshots from the dataset. So even with a vision advantage, 4V struggles compared to much smaller finetuned ones. Vision -> action works well! We finetuned based on pix2act ( @ptshaw2 et al) and it was better than 4V 0-shot.
1
0
8
@xhluca
Xing Han Lu
12 days
Beyond that, it allows using memory-mapping instead of loading everything in memory, which dramatically reduces RAM usage. This allows you to query across millions of documents in real time on a single CPU thread. Here's a side-by-side comparison (BM25S start at ~10s)
1
0
9
@xhluca
Xing Han Lu
3 months
Tweet media one
0
0
8
@xhluca
Xing Han Lu
5 months
Couldn't have answered better than @SemanticScholar 's AI summarizer๐Ÿ™ƒ
Tweet media one
0
0
8
@xhluca
Xing Han Lu
12 days
I'm sad to see that Reddit is no longer a good place to share ML/NLP project these days (apart from /r/localllama which is amazing). Issues with copy/pasting/formatting, posts getting removed again and again for many reasons, very little engagement (100 users online out of 3M).
1
0
8
@xhluca
Xing Han Lu
3 years
@abhi1thakur @kaggle I never put images in my notebooks. The time spent picking an image could've been used to stack more layers on my Roberta models
2
0
7
@xhluca
Xing Han Lu
8 months
@YiTayML What % of Google Brain was non-PhD (that did not go through the residency program)?
1
0
6
@xhluca
Xing Han Lu
3 years
I find it interesting how libraries like @huggingface has not only enabled better reproducibility, but also better model/task transferability in NLP. A few years ago, you might have been able to smoothly reproduce the results of a paper by just following the readme, but... (1/n)
1
0
7
@xhluca
Xing Han Lu
3 months
New Google Scholar Plugin on Chrome is pretty neat! I especially like the night mode (perfect for reading papers at night without burning my eyes)
Tweet media one
Tweet media two
Tweet media three
1
0
7
@xhluca
Xing Han Lu
11 months
The advantage of attending a conference at McGill/Montreal is that you can pay 1/4 of big conference registration fees + you get to meet people working on the same subject as you. Really great for everyone that CoLLAs and TMLR decided to partner this year!
@thegautamkamath
Gautam Kamath
11 months
10/12 papers in the journal track are published at @TmlrOrg ! A great way to submit to TMLR but still have an opportunity to present your work at a conference.
0
3
18
1
0
7
@xhluca
Xing Han Lu
11 months
I think SILO will be extremely important for domains with asymmetric public-private data availability. It's already clear how it'll be useful for medical, enterprise and personalized use cases, but I can imagine its impact will reach much further than this.
@ssgrn
Suchin Gururangan
11 months
Feel risky to train your language model on copyrighted data? Check out our new LM called SILOโœจ, with co-lead @sewon__min Recipe: collect public domain & permissively licensed text data, fit parameters on it, and use the rest of the data in an inference-time-only datastore.
2
55
241
1
3
7
@xhluca
Xing Han Lu
3 months
An exception is @weaviate_io which is fully open source and @_jphwang has been doing an incredible job with tons of talks/guides/workshops, e.g.
@_jphwang
JP Hwang
8 months
I had a blast talking about AI tech and web apps at @devreach (Tbh the best part was the amazing people, but this was a close 2nd!) ๐Ÿ˜‰ The talk is beginner friendly. Check it out if interested in #AI #search or #LLMs for web apps!
1
2
7
1
2
7
@xhluca
Xing Han Lu
3 months
Tweet media one
0
0
7
@xhluca
Xing Han Lu
11 months
@TaliaRinger I wish there's a unique fediverse ID that can be used in any instance so if an account gets banned or if someone tries to impersonate you, you can easily move to a new account with your subscribers/subscriptions.
6
2
7
@xhluca
Xing Han Lu
3 months
Very interesting ideas explored in this paper! Also really like that the iterative improvements works so well.
Tweet media one
@ShikharMurty
Shikhar
4 months
Want scalable LLM agents for websites and APIs, without human labeled data? We propose BAGEL, a method where agents synthesize their own data by exploring the environment first, leading to upto 13% improvement over zero shot agents, & automated discovery of use-cases in envs!
Tweet media one
2
34
188
0
1
7
@xhluca
Xing Han Lu
4 months
@dhuynh95 Great release! Would love to see how well WebLINX models () perform inside the framework :)
2
0
7
@xhluca
Xing Han Lu
1 year
Can't wait for 0.5-bit inference with a 65B param model trained with O(1/n) attention and running on a Mx iPad.
@drjwrae
Jack Rae
1 year
By this point I'm expecting Tri Dao to derive an O(1/n) attention implementation
0
8
152
0
0
7
@xhluca
Xing Han Lu
3 years
I'm really impressed by @pyodide 's support for py<>js proxies. You can easily create @reactjs function components with useState hooks and render them with React DOM - all with pure Python code.
Tweet media one
2
0
6
@xhluca
Xing Han Lu
10 months
I think anyone training ML models should spend 30-45min reading through the Hydra docs (). Even if you don't need it now, you will likely need it one day.
1
0
5
@xhluca
Xing Han Lu
7 months
@benno_krojer @NeurIPSConf For #1 , maybe NeurIPS could partner with Semantic Scholar to create a "Related papers" section on the website? Something similar to the existing feature on Semantic Scholar, except it's filtered to only show NeurIPS papers linking to the schedule.
Tweet media one
1
0
6
@xhluca
Xing Han Lu
3 months
Converting any LLM decoder into a retrieval model is pretty neat, since you won't need a separate decoder to train. Also really good to see a method solely using open data. Too many retrievers achieve a good performance but are impossible to reproduce due to private data.
@vaibhav_adlakha
Vaibhav Adlakha
3 months
We introduce LLM2Vec, a simple approach to transform any decoder-only LLM into a text encoder. We achieve SOTA performance on MTEB in the unsupervised and supervised category (among the models trained only on publicly available data). ๐Ÿงต1/N Paper:
Tweet media one
14
163
858
0
0
6
@xhluca
Xing Han Lu
3 months
@EugeneVinitsky Would you take a (fully-funded) student with 0 programming/optimization/ML experience today?
1
0
6