📢 New paper alert!
Introducing STIC (Self-Training on Image Comprehension) that enhances the understanding and reasoning capabilities of LVLMs through self-generated data 🌟
📄 Read the paper:
🔗 Project page:
💻 GitHub Repo:
Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models
Significantly improves the LLM’s performance across a variety of benchmarks and even outperform models trained through DPO with extra GPT-4 preference data
Large Vision Language Models are prone to object hallucinations – how to cost-efficiently address this issue? 🚀 Introducing MARINE: a training-free, API-free framework to tackle object hallucinations.
Joint work with an amazing team
@linxizhao4
@WeitongZhang
and
@QuanquanGu
!
(1/2) Ever wondered why CLIP shines in zero-shot transfer?🌟Our new arXiv paper dives into the theoretical mechanism behind it. We explain how CLIP's contrastive objective enables transferrable representation learning in multi-modal data. 🔍📄
arXiv:
Codes and project page are released for our
#NeurIPS23
paper on spurious correlations in robust learning! 🚀
🔗 Project:
🔗 Code:
Key Insights:
📊 We discovered in theoretical analysis that spurious features overtake initial
Happy year of loong! Small milestones for me this past year🥹
- Two of our papers are accepted to
#ICLR2024
!
- We released two arxiv papers on LLMs: RaR and SPIN. Give them a check😄
- ⭐️The code for SPIN is now open-sourced! Dive into it here:
Happy to share that our work "Robust Learning with Progressive Data Expansion Against Spurious Correlation" has been accepted to
#NeurIPS2023
! 🎉
arXiv:
📢New LLM Agents Benchmark!
Introducing 🌟MIRAI🌟: A groundbreaking benchmark crafted for evaluating LLM agents in temporal forecasting of international events with tool use and complex reasoning!
📜 Arxiv:
🔗 Project page:
🧵1/N
📢Excited to be attending
#icml2023
soon and thrilled to meet and connect with everyone!
I'll be presenting at SCIS workshop @ Sat 29th 2:30-3:30pm Meeting Room 316 AB for our work "Robust Learning with Progressive Data Expansion Against Spurious Correlation."
Curious about when accelerated SGD (ASGD) outperforms SGD? Our latest work provides new insights about the comparison between ASGD and SGD in overparameterized linear regression. (1/3)
🔗
Joint work with
@Yihe__Deng
@uuujingfeng
@DongruoZ
and
@QuanquanGu
@QuanquanGu
Thanks for sharing. It’s a nice short Custom Instruction for ChatGPT, too.
With this CI it correctly created the „10 sentences that ends with Apple test“ … and of course the examples from your paper.
Don't miss our poster session today at
#NeurIPS2023
!
🤗
@Yihe__Deng
will be presenting our work on "Robust Learning with Progressive Data Expansion Against Spurious Correlation."
📍 Great Hall & Hall B1+B2 (level 1)
#707
⏰ 5:15 p.m. - 7:15 p.m. CST
🔗
Check out this great work on benign overfitting in two-layer ReLU CNNs at
#ICML2023
🙌by
@YiwenKou666
,
@_zxchen_
, Yuanzhou and
@QuanquanGu
!
📍 Exhibit Hall 1, Poster
#603
📅 Thurs, 27th July (Today!), 10:30 a.m. - 12:00 p.m. HST
Join us at ICML2023 for insights on "Benign Overfitting in Two-layer ReLU Convolutional Neural Networks" Our research reveals a sharp transition between benign and harmful overfitting in two-layer ReLU CNNs with label-flipping noise. (1/3)
(2/2) We also analyze the effect of temperature parameter in CLIP on representation learning.🌡️Our theory suggests a regularization term that provably improves CLIP's original objective.
Joint work with
@_zxchen_
, Yuanzhi Li and
@QuanquanGu
👏
@linxizhao4
@WeitongZhang
@QuanquanGu
3/ 📈 POPE: the POPE metric similarly validates the superior performance of MARINE
against existing baselines on different question formats, achieving improvements of up to +21.4% on accuracy and +12.0% on F1 score.
Refer to more experiment details and examples in our paper!
@linxizhao4
@WeitongZhang
@QuanquanGu
1/ 🎯 Our Study: MARINE vs. Baselines. We used CHAIR and POPE metrics, focusing not just on initial LVLM versions (like LLaVA) but also on advanced ones (e.g., LLaVA-v1.5). Overall, MARINE achieves superior performances across different LVLM architectures and evaluation metrics,
@linxizhao4
@WeitongZhang
@QuanquanGu
Current mitigation methods, while becoming more cost-efficient, still rely on fine-tuning models or using GPT-3.5 for post-generation corrections. Effective? Yes, but they tend to overwrite original model outputs, losing inherent diversity and instruction adherence.
See this
👏Check out this amazing work on interplay between misspecification with the sub-optimality gap in linear bandits by my labmate
@WeitongZhang
and team, to be presented at
#ICML2023
! 👇
At
#ICML2023
: How will the function approximation error affect the performance of online sequential decision making?
Find us in Poster Session 1 @ Exhibit Hall 1
#627
from 11 a.m. HST to 1:30 p.m. HST. Looking forward to discussing with you all! (1/3)
We conduct extensive experiments on seven vision-lanugage benchmarks, encompassing scientific reasoning, math reasoning, OCR, and conversation capabilities based on vision inputs, spanning image sources such as natural, chart, and text-rich images. With LLaVA-v1.6 as our primary
@linxizhao4
@WeitongZhang
@QuanquanGu
1/ 🌟 MARINE's Core: it's all about combining object grounding features with CFG to control text generation, aiming to reduce object hallucinations in LVLMs.
2/ 🤖 How it works: we integrate visual features from an object grounding encoder. Using 'direct alignment', we map
Details for the self-constructed preference dataset:
Based on a collection of unlabeled images, we focus the image description task and let the model self-construct a preference dataset via the following:
1. Preferred response is generated from a “good” prompt, which is
@winglian
@arankomatsuzaki
Thank you for the comment. Our codebase is based on the Alignment Handbook library by HuggingFace, and we're planning to release our code in the near future. Stay tuned for updates!
Outline of our method:
1️⃣ Stage 1: The model self-constructs a preference dataset using unlabeled images, generating high-quality image descriptions through a detailed “good” prompt and generating less accurate responses via corrupted images or misleading “bad” prompts.
2️⃣
Specific examples show that STIC enables the model to understand the given image first before reasoning with the input query! Find more examples and ablation studies in our paper:
[5/N]
Excited to share our method called 𝐒𝐞𝐥𝐟-𝐏𝐥𝐚𝐲 𝐟𝐈𝐧𝐞-𝐭𝐮𝐍𝐢𝐧𝐠 (SPIN)! 🌟Without acquiring additional human-annotated data, a supervised fine-tuned LLM can get stronger by SPIN. Check out how SPIN unleashes the full power of human-annotated data.
Joint work with
@Yuancheng_Xu0
@arankomatsuzaki
Thanks! We observed is that even after fine-tuning the LLM on a given SFT dataset, there is still a noticeable quality gap between its response to a training prompt and the ground truth completion. At high level, SPIN is aimed to let the LLM itself discern the gap and
@arnicas
@arankomatsuzaki
Hi and thanks for your interest. Our codebase is based on the Alignment Handbook library by HuggingFace, and we're planning to release our code in the near future. Stay tuned for updates!
Introducing Sora, our text-to-video model.
Sora can create videos of up to 60 seconds featuring highly detailed scenes, complex camera motion, and multiple characters with vibrant emotions.
Prompt: “Beautiful, snowy
@Thewimo
@QuanquanGu
@abacaj
Hi, thanks for your interest! We indeed tested on Vicuna-13b-v1.5, which is based on Llama-2, and observed improvements using RaR. ()
We build upon theoretical insights and propose a new training algorithm to efficiently improve robustness against spurious correlations in deep learning models. Looking forward to the insightful discussions and valuable networking opportunities at ICML!