🔍Hallucination or informativeness? 🤔Our latest research unveils a multi-dimensional benchmark and an LLM-based metric for measuring faithfulness and coverage in LVLMs. Explore our new method for a more reliable understanding of model outputs! 📣
🔥 Unlocking the power of Abstract Meaning Representations, AMRFact generates coherent, factually inconsistent summaries with high error-type coverage to improve the factuality evaluation on abstractive summarization!
📣 Check out our new
#NAACL2024
🇲🇽work:
🏆 Thrilled to share that in my first time as a paper reviewer, I have been honored with the Best Reviewer Award for the Ethics in NLP track. Heartfelt thanks to the Area Chairs and EMNLP PC Chairs for this recognition. It is a fantastic learning experience! 🙏
🌟 Exciting times at EMNLP! Just wrapped up a fantastic experience presenting a poster. A big shoutout to my amazing advisor
@VioletNPeng
for the incredible support and guidance. Also thanks to everyone from
@uclanlp
🩷
🔍Hallucination or informativeness? 🤔Our latest research unveils a multi-dimensional benchmark and an LLM-based metric for measuring faithfulness and coverage in LVLMs. Explore our new method for a more reliable understanding of model outputs! 📣
🚨Model-based evaluation metrics like CLIPScore can unintentionally favor gender-biased captions in image captioning tasks!
📣 Check out our new
#EMNLP2023
work:
A joint effort with
@ZiYiDou
@Tianlu_Wang
@real_asli
and my amazing advisor
@VioletNPeng
.
Welcome to CDMX 🇲🇽! 🎉 Excited to connect with everyone @ NAACL !!! 🫶🏻
⏰ I’ll be presenting AMRFact on Monday at 16:00. If you’re into summarization and factuality evaluation, don’t miss it! 🤩 See you there!
#NAACL2024
🔥 Unlocking the power of Abstract Meaning Representations, AMRFact generates coherent, factually inconsistent summaries with high error-type coverage to improve the factuality evaluation on abstractive summarization!
📣 Check out our new
#NAACL2024
🇲🇽work:
📢 Excited to share our latest work: a comprehensive survey on chart understanding! We dive into the evolution of datasets, vision-language models, challenges, and future directions in this vibrant field 📊.
📝:
💻:
1/n
1️⃣Proposed AMRFact that uses AMR-based perturbations to generate factually inconsistent summaries, which allows for more coherent generation with high error-type coverage.
Happy to share some “slow research” - it's been 15 months and it's now (finally) on arXiv! Our language development is different from LLMs. We're asking: How do you interactively babysit a language model from scratch, and would it help?🤔
🔗
@Michigan_AI
Officially Dr. Huang! 🎓 Thrilled to share that I've successfully defended my PhD thesis. Immensely grateful for my inspiring advisor
@hengjinlp
, and the guidance of my thesis committee: ChengXiang Zhai, Kathleen McKeown,
@haopeng_nlp
,
@hanzhao_ml
, and
@JotyShafiq
#PhDDone
1️⃣We propose an LLM-based two-stage evaluation framework VALOR-EVAL that generalizes previous methods by introducing semantic matching and incorporates both the faithfulness and coverage aspects into our evaluation.
3️⃣ (Cont.) Our benchmark highlights the critical balance between faithfulness and coverage of model outputs, and encourages future works to address hallucinations in LVLMs while keeping their outputs informative.
4️⃣To compare our LLM-based framework to LLM-free evaluation, we measured the average accuracy of hallucinated and covered objects detected by the metric. Results demonstrate that VALOR-EVAL significantly outperforms in both faithfulness and coverage accuracy by a large amount.
2️⃣ We introduce a comprehensive multi-dimensional benchmark, named VALOR-BENCH, dedicated to
the evaluation of LVLMs, with a particular focus on measuring hallucinations in generative tasks.
2️⃣ (Cont.) Our benchmark categorizes hallucinations into three distinct types – object, attribute, and relation – offering a detailed understanding of model inaccuracies.
1️⃣ (Cont.) Our VALOR-EVAL can handle complex hallucination types in object, attribute, and relations in open vocabulary captions generated by large vision-language models.