Introduce EAGLE, a new method for fast LLM decoding based on compression:
- 3x🚀than vanilla
- 2x🚀 than Lookahead (on its benchmark)
- 1.6x🚀 than Medusa (on its benchmark)
- provably maintains text distribution
- trainable (in 1~2 days) and testable on RTX 3090s
Playground:
[Students Hiring] Looking for multiple funded Master and PhD students at UWaterloo CS, working on 1) foundation of ML; 2) AI security; 3) trustworhty ML. The deadline is Dec 15. Please help spread the word or contact hongyang.zhang
@uwaterloo
.ca
I am excited to announce that I will be joining
@UWaterloo
CS
@UWCheritonCS
as an tenure-track assistant professor and affiliated with
@VectorInst
in Fall 2021. I am very grateful to my advisors (Avrim, Nina, David, Greg, and
@zicokolter
), friends, and colleagues. New Adventure!
Just finished summer teaching of "Intro to ML" in
@UWCheritonCS
Update course with latest topics: transformer, LLM, alignment, AI safety, self-supervised learning, etc. Students are commenting good. Hope it is also helpful to the public.
Syllabus:
Though some of my papers got accepted, my favorite submission was rejected by ICML simply because "The reviews are not very insightful unfortunately", quoted from the 1st sentence in meta-review. Why should the authors pay for the low review quality? Ridiculous review system!!!
🔥EAGLE v1.1 now supports gpt-fast with 6.5x🚀LLM inference
What's new in v1.1:
- ~2x🚀over gpt-fast (55.1 tok/s -> 100.2 tok/s)
- Support Mixtral-8x7B with 1.5x🚀
- All done in <10 line code
- Support bs>1
⚒️Code:
🏅Benchmark: (on single RTX 3090 for
Super excited that our team has won First Place Award (out of 1,558 teams) in the
@CVPR
2021 Security AI Challenger (and $22,000🤑). Congratulations to the student Fangcheng whom I advise and Chao
@PKU1898
. Stay tuned for our new methodology!
A
#ICML
reviewer changes score 7->6->5 in 2 days after the score is released before we do rebuttal, quoted: "I initially gave a relatively high score, but after seeing other reviewers' comments, I think my score was a bit too high." Shouldn't the review be independent? Poisonous!
Since everyone is showing off their NeurIPS acceptance, just be curious about one question: as there are ~2,000 accepted papers, what is the impact advantage of a NeurIPS accepted paper compared with an arXiv post?
Though not at
#ICML
, it seems like I am attending it for real as everyone is tweeting about their papers. Might be more time- and money-saving to attend a conference on social media😅 Any one interested in organizing a ML conference on twitter?
"Theoretically Principled Trade-off between Robustness and Accuracy" accepted to
@icmlconf
with high scores. The paper contains the winning strategies to NeurIPS'18 Adversarial Vision Challenge over all 400 teams. Don't miss it, and check it here
EAGLE is accepted to
#ICML2024
, together with:
1. AnyTool:
2. Dipmark:
and an ACM CCS work:
3. zkLLM:
Heading to Vienna
#iclr
:
4. RAIN:
5.
See you all there!
Introduce EAGLE, a new method for fast LLM decoding based on compression:
- 3x🚀than vanilla
- 2x🚀 than Lookahead (on its benchmark)
- 1.6x🚀 than Medusa (on its benchmark)
- provably maintains text distribution
- trainable (in 1~2 days) and testable on RTX 3090s
Playground:
New work with Avrim Blum, Travis Dick, Naren Manoj.
Paper: …
Code: …
Comments welcome!
We show that random smoothing---a SOTA defense with certified L_2 robustness---might be unable to certify L_p robustness for p>2. Intuition:
Random Smoothing Might be Unable to Certify $\ell_\infty$ Robustness for High-Dimensional Images. Avrim Blum, Travis Dick, Naren Manoj, and Hongyang Zhang
EAGLE achieves SOTA results on various benchmarks. The method is also combinable with other parallelled techniques such as vLLM, Mamba, FlashAttention, quantization, and hardware optimization.
2 presentations at
#ICML
:
RetrievalGuard: Provably Robust 1-Nearest Neighbor Image Retrieval:
Building Robust Ensembles via Margin Boosting:
Cannot attend in person. Happy to intro my collaborators or chat online. Looking forward!
We've just open-sourced the code for AnyTool! AnyTool is a (self-evolving, hierarchical) GPT-4 multi-agent system which can call as many as 16K+ APIs with ~60% accuracy.🚀📷
Code:
Paper:
Many thanks to Yu Du and Fangyun Wei for
LLM Agent for Large-Scale API Calls
Cool research paper presenting AnyTool, an LLM-based agent that can utilize 16000+ APIs from Rapid API.
Proposes a simple framework consisting of
- a hierarchical API-retriever to identify relevant API candidates to a query
- a solver to
Honored to serve as the AE for the very first DMLR accepted paper (congrats to the authors). Please consider submitting your good work to DMLR (with only 3-month review window)!
'Benchmarking Robustness of Multimodal Image-Text Models under Distribution Shift'
by Jielin Qiu, Yi Zhu, Xingjian Shi, Florian Wenzel, Zhiqiang Tang, Ding Zhao, Bo Li, Mu Li
Action Editor: Hongyang Zhang
#Multimodal
#Robustness
#DistributionShift
Though we faculty might not care 1 or 2 papers got accepted/rejected, students do care. It is very frustrating to them. There are 1,200+ accepted papers. I can imagine in ICML 2052, there are 1w+ papers. Perhaps at that time, arXiv would be the best place to submit to.
Heading
#ICML2023
. First in-person conf. after COVID.
Thu:
1. A Law of Robustness beyond Isoperimetry
2. Understanding the Impact of Adv Robustness on Accuracy Disparity
Sat:
3. PoT: Securely Proving Legitimacy of Training Data and Logic for AI Regulation
See you all friends then
UWaterloo CS is hiring new faculty this year! Please consider applying and help distribute the message. Feel free to let me know if there is anything I can help with.
We’re hiring CS faculty!
Join the Cheriton School of Computer Science, the top-ranked CS program in Canada for the second year in a row according to the recently released Maclean’s 2022 university rankings. See thread for the various academic appointments being sought. 1/5
Thanks
@omarsar0
for sharing. Excited to introduce AnyTool, a self-improving, hierarchical multi-agent system for 16K+ 🛠️ uses.
AnyTool improves vanilla GPT-4's tool use ability by 2x ~ 8x pass rate.
📢More details will be upcoming about this exciting project.
LLM Agent for Large-Scale API Calls
Cool research paper presenting AnyTool, an LLM-based agent that can utilize 16000+ APIs from Rapid API.
Proposes a simple framework consisting of
- a hierarchical API-retriever to identify relevant API candidates to a query
- a solver to
RLHF? No! Releasing code: . RAIN is an inference (based on MCTS for harmless output) that allows LLMs to self-align *w/o finetuning*. The time overhead is ~4-fold of vanilla inference. It works well on HH,TurthfulQA,advBench (+15%↑). Comments are welcome!
LLMs Can Align Themselves without Finetuning?
This paper discovers "that by integrating self-evaluation and rewind mechanisms, unaligned LLMs can directly produce responses consistent with human preferences via self-boosting"
Does seem to have benefits in terms of generating
Thrilled to share our new work on the recovery problem for non-decomposable distances! – Check it out on arXiv: This is joint work with super talented colleagues Zhuangfei Hu and Xinda Li, under the advisory of
@hongyangzh
and David Woodruff. (1/n)
@matloff
That is why by design, reviewers are not allowed to see others' in their initial review. Reviewers should not be influenced by others in the initial review, at least.
Boosting is well-studied in natural training. How about its function in robustness? We show in
#ICML2022
that boosting will help in robustness too by theory and practice. Joint work with
@zdhnarsil
, Aaron Courville, Yoshua Bengio, Pradeep Ravikumar and Arun Sai Suggala.
Check our new ICML 2022 work on (adversarial) robust boosting and online learning! 🎉
Joint work w/
@hongyangzh
,
@AaronCourville
, Yoshua, Pradeep and Arun.
@thegautamkamath
Why would faculty and senior Ph.D. accept a review invitation? They have no incentive but it takes them much time without learning anything. I know more and more high-qualified reviewers decline invitation. Our community needs to offer more reasons for them to serve.
@roydanroy
More severe issue is that reviewers are also bidding papers for each others. If so, there should be no bidding process in the reviewer-level as well.
🚨What is currently the best Speculative Decoding method for accelerating LLM inference?🔍
We introduce Spec-Bench📖: A Comprehensive Benchmark and Unified Evaluation Platform for Speculative Decoding!🚀
Project page:
🧵1/n
#CallForPapers
📢:
#ICLR2022
Workshop on Socially Responsible Machine Learning (➡️). We welcome all submissions on the fairness, privacy, security, equity and ethics of machine learning. Deadline🗓️Feb 25. Submit your paper📜at CMT: 🎉
@winglian
Not necessary. Right now, we are training the head on the ShareGPT dataset (of course, you can use others), while ShareGPT is not the training data of LLaMA.
Interesting Paper - "EAGLE: Speculative Sampling Requires Rethinking Feature Uncertainty"
📌 Using gpt-fast, EAGLE attains on average 160 tokens/s with LLaMA2-Chat 13B on a single RTX 3090 GPU, compared to 24 tokens/s of Huggingface's implementations.
📌 Within the speculative
@junrushao
On EAGLE, multi-round speculation can increase the speedup over vanilla by ~0.5. We will do more exact ablation study in the upcoming paper. Thanks for the question.
@cvondrick
@djhsu
@ChengzhiM
Very interesting work! Intuitively, is it because more tasks bring more training data, so the adversarial generalization is better and the robustness is strengthened?
@RICEric22
@_leslierice
@zicokolter
Nice work! The overfitting issue might come from the empirical risk minimization, which can be alleviated by new training scheme (see Figure 4 in )
@peter_ljq
OpenReview is too formal and aims for review, and doesn't allow one to attach pictures. Twitter perhaps is more casual, just like a poster session.
@CShorten30
Thanks. You are right; if the rotation is adversarial, traditional info retrieval algorithms will return totally different top 100 neighbors. Our goal is to make the algorithm robust. In this version, we focus on perturbation attack. It is interesting to study rotation as well.
@alorebube
Of course. Everyone has equal opportunity; the selection depends only on the background (e.g., research direction, taste, previous publications, reference letters, etc.) and whether background matches with me.
@CyrusRashtchian
Tough question! My conjecture: current PAC-like ML framework cannot solve this issue, unless we have too many data. Maybe what we need is a more data-efficient framework, e.g., with knowledge, causal inference, new perception way, etc. Hopefully, tens of years later we will see.
@rustyryan
@roydanroy
Rebuttal begins only when authors input their rebuttal. But what I mentioned is before authors do anything but the score has been released.