Weidi Xie Profile
Weidi Xie

@WeidiXie

2,388
Followers
586
Following
85
Media
448
Statuses

Computer Vision Researcher. Associate Professor at SJTU, Previously @Oxford_VGG . 中文名:谢伟迪 Personal Webpage:

Oxford, England
Joined May 2018
Don't wanna be here? Send us removal request.
@WeidiXie
Weidi Xie
5 years
Check We are excited to share code & model for Self-supervised Correspondence Flow (BMVC 2019 Oral) @bmvc2019 , State-of-the-art performance on video segmentation and pose tracking. @Oxford_VGG
2
110
423
@WeidiXie
Weidi Xie
2 years
Personal update: After spending seven wonderful years at Oxford, I've decided to take new adventure. I'm joining Shanghai Jiao Tong University from this year 🐯.
23
1
295
@WeidiXie
Weidi Xie
4 years
Tracking objects is among the first skills human infants learn, surely this must be a task without semantic understanding. We present a SOTA self-supervised tracking approach, all you need is just 10min raw videos, zero annotations required. @Oxford_VGG
Tweet media one
Tweet media two
Tweet media three
Tweet media four
1
50
195
@WeidiXie
Weidi Xie
2 months
A tiny milestone in my academic journey. I know these metrics do not carry much significance in today's academic landscape. Nevertheless, they serve as a personal gauge, allowing me to assess the papers' impact and reflect on if I've contributed something meaningful.
Tweet media one
Tweet media two
12
0
143
@WeidiXie
Weidi Xie
3 years
Code: Self-supervised Video Object Segmentation by Motion Grouping: We show that self-supervised segmentation can be done purely motions.
0
24
107
@WeidiXie
Weidi Xie
11 months
ICCV23 work on Open-vocabulary Object Segmentation with Diffusion Models - we do visual instruction tuning on pre-trained diffusion model, to simultaneously generate image and open-vocabulary masks. - it can create synthetic datasets for training discriminative model for free.
Tweet media one
Tweet media two
Tweet media three
Tweet media four
1
19
101
@WeidiXie
Weidi Xie
9 months
Can GPT-4V(vision) serve medical applications? We present recent efforts on assessing GPT-4V for multimodal medical diagnosis, by case studies, covering 17 human body systems, across 8 clinical imaging modalities, e.g., radiology, pathology. 🔥Report:
Tweet media one
Tweet media two
Tweet media three
Tweet media four
6
23
101
@WeidiXie
Weidi Xie
1 year
Just read Med-PaLM 2, the progress of LLMs in medical question answering is incredible ! but, I think multimodal medical question answering is quite far behind, here I present you, PMC-VQA: Visual Instruction Tuning for Medical Visual Question Answering:
Tweet media one
Tweet media two
Tweet media three
Tweet media four
4
17
91
@WeidiXie
Weidi Xie
2 years
Happy to share the work, "Visual-Language Models for Efficient Video Understanding" at ECCV2022. We benchmark 10 different datasets for various tasks, it turns out that, simply prompting CLIP can achieve comparable or sota results on many video tasks already. #ECCV2022
Tweet media one
1
13
93
@WeidiXie
Weidi Xie
4 years
We are releasing the code and model for #VGGSound A new large-scale audio-visual dataset, it was collected with audio-visual correspondence, accessible via: codes & model:
Tweet media one
Tweet media two
2
26
75
@WeidiXie
Weidi Xie
5 years
We investigate self-supervised learning on video correspondence flow. If done properly, the self-supervised learning can be surprisingly powerful (closing the gap to supervised learning). We demonstrate state-of-the-art results on video segmentation.
Tweet media one
Tweet media two
0
24
71
@WeidiXie
Weidi Xie
4 years
We are presenting our new paper at LUV2020 workshop today at 16:15 - 16:30pm. MAST: A Memory-Augmented Self-Supervised Tracker, by @LaiZihang , @erika_lu_ , @Oxford_VGG . A strong tracking model trained with no manual annotation. Code: #VGGatCVPR2020
1
19
69
@WeidiXie
Weidi Xie
3 years
Also best paper on CVPR RVSU Workshop. TL;DR: We propose a self-supervised learning approach for segmentation based on motions, ie, Gestalt Principle. Achieve strong performance to strong supervision on several popular benchmarks, e.g. DAVIS2016, MoCA (camouflage detection).
@chaaarig
charig yang
3 years
Check out our paper at @ICCV_2021 ! Self-supervised Video Object Segmentation by Motion Grouping (w/ @hala_lamdouar , @erika_lu_ , Andrew Zisserman & @WeidiXie ) Project page (paper+video+code):
1
23
122
0
6
60
@WeidiXie
Weidi Xie
1 year
Happy to share the paper of "Self-supervised Tumor Segmentation with Sim2Real Adaptation" published in IEEE Journal of Biomedical and Health Informatics. The model enables zero-shot tumor segmentation with Sim2Real training, requiring zero/few annotation from physicians.
Tweet media one
Tweet media two
1
7
57
@WeidiXie
Weidi Xie
1 year
Sharing the work "PMC-CLIP: Contrastive Language-Image Pre-training using Biomedical Documents", - A large-scale image-caption datasets collected from biomedical papers, - A CLIP-style model, that can be transfered to various downstream tasks with comparable or sota results.
2
8
54
@WeidiXie
Weidi Xie
1 year
I won't be able to attend the CVPR due to visa reasons, though I applied it over two months ago. Found an interesting workshop, around 4AM in China time, and it's promised to be NOT recorded. Honestly, I don't understand the point. Have we decided to go CloseAI ? #CVPR2023
4
1
54
@WeidiXie
Weidi Xie
11 months
We present you our new efforts on building medical generalist foundation models for radiology: Arxiv: Website: Hope this can promote the development of medical foundation models ! (1/5)
5
10
54
@WeidiXie
Weidi Xie
6 years
VGGFace2 is a large-scale face recognition dataset. Over 9000 identities, 3M images, are downloaded from Google Image Search and have large variations in pose, age, illumination, ethnicity and profession. Dataset: Github:
Tweet media one
1
30
51
@WeidiXie
Weidi Xie
10 months
Our recent work to initiate the open-vocab video instance segmentation (ICCV23 Oral): - we collect a large-vocabulary video instance dataset (LV-VIS), with over 1196 categories. - we propose an Transformer-based architecture, OV2Seg, proposing, segmenting objects through time.
Tweet media one
Tweet media two
Tweet media three
Tweet media four
1
14
50
@WeidiXie
Weidi Xie
3 years
Great, finally, some work tries to solve the trivial solution problem, we didn't manage to make it work and had to take another path .....
@_akhaliq
AK
3 years
Dense Unsupervised Learning for Video Segmentation abs: github:
3
100
498
0
0
48
@WeidiXie
Weidi Xie
4 months
We present you our recent work on developing open-source, multilingual language model for medicine, that the benefits a wider, linguistically diverse audience from different regions. All codes, models are available at @huggingface
Tweet media one
2
10
49
@WeidiXie
Weidi Xie
1 year
Models for perception understanding are developing rapidly, thanks for the SAM model, however, can the model infer visual attributes under open-vocabulary setting ? Here, we develop a model for open-vocabulary object detection and attribute recognition. #CVPR2023
Tweet media one
Tweet media two
Tweet media three
Tweet media four
2
11
44
@WeidiXie
Weidi Xie
5 years
Self-supervised Video Representation Learning with Dense Predictive Coding. State-of-the-art RGB-stream action classification accuracy, better than ImageNet pretrained weights! @Oxford_VGG Paper: ; Code: ( @PyTorch )
Tweet media one
Tweet media two
Tweet media three
Tweet media four
0
19
47
@WeidiXie
Weidi Xie
10 months
Our recent work on AI4Medicine: visual-language representation learning in radiology, it will be presented on ICCV2023. MedKLIP: Medical Knowledge Enhanced Language-Image Pre-Training in Radiology: Our contributions include (1/4):
Tweet media one
Tweet media two
Tweet media three
1
9
46
@WeidiXie
Weidi Xie
6 months
Excited to share our new effort on large-scale long-tailed disease diagnosis on radiology images. This is more feasible playground for academic labs, to explore sophisticated algorithms, which is impractical in developing generalist foundation model, due to computational costs.
Tweet media one
Tweet media two
2
9
45
@WeidiXie
Weidi Xie
2 years
The human visual system is amazing at many tasks, however, it is particularly weak for objects counting. In fact, one can only make a rapid, accurate and confident judgement if the number of items is below five. We can augment the ability with a CounTR:
Tweet media one
Tweet media two
Tweet media three
Tweet media four
0
7
45
@WeidiXie
Weidi Xie
3 years
I feel extremely disappointed while reading papers with incomplete literature review. To me, this should be the MOST important thing, as it clearly shows you understand what has been done and what is the remaining challenge and your contribution, instead of over-claiming.
3
1
43
@WeidiXie
Weidi Xie
1 year
Recent advances in AI, e.g. nlp, visual perception have revealed the power of supervised training on massive data, e.g. chatGPT, SAM. From a product perspective, this is great. However, from a research view, the dream remains to be training models with zero/cheap annotations.
2
5
41
@WeidiXie
Weidi Xie
4 years
Check out our #BMVC2020 paper: “Inducing Predictive Uncertainty Estimation for Face Recognition” @Oxford_VGG A simple approach for estimating the predictive confidence for face recognition systems. Q&A session Tuesday at 10:00-11:00 and 16:00-17:00 UK
Tweet media one
Tweet media two
Tweet media three
Tweet media four
2
12
39
@WeidiXie
Weidi Xie
5 months
I do think training with synthetic data will become the next thing in computer vision !! Thanks for the repost @_akhaliq 😁
@_akhaliq
AK
5 months
InstaGen Enhancing Object Detection by Training on Synthetic Dataset paper page: introduce a novel paradigm to enhance the ability of object detector, e.g., expanding categories or improving detection performance, by training on synthetic dataset
Tweet media one
2
33
114
0
0
38
@WeidiXie
Weidi Xie
11 months
Can we stop over-claiming, and be realistic ? Processing X-ray images doesn't make it generalist biomedical AI .......
@AziziShekoofeh
Shek Azizi
11 months
Meet Med-PaLM Multimodal (Med-PaLM M), the first demonstration of "Generalist Biomedical AI"!
15
130
703
2
1
36
@WeidiXie
Weidi Xie
1 year
Yet another medical-related report: - we finetuning LLaMA on 4.8 million biomedical papers from Pubmed, after several epochs, it has already enhanced capabilities in the medical domain - the proposed model, PMC-LLaMA, achieves high performance on biomedical QA benchmarks.
Tweet media one
2
7
36
@WeidiXie
Weidi Xie
3 months
I'd like to share a latest work from the group, accepted at #CVPR2024 ! Grounded Question-Answering in Long Egocentric Videos, by Shangzhe Di. It explores grounded question-answering in long, egocentric videos, enabling individuals to inquire about their past visual experiences.
Tweet media one
1
7
36
@WeidiXie
Weidi Xie
5 months
I recently saw many Mamba models for medical segmentation, generally inserted at the bottleneck of UNet, and claim to model long-term dependency at resolution of 9x9 or 7x7... This is such an odd atmosphere in AI, whenever something new comes out, people use it for EVERYTHING !!
3
1
33
@WeidiXie
Weidi Xie
2 months
🚀 Excited to introduce RadGenome-Chest CT, a comprehensive, large-scale, and fine-grained visual-language dataset for 3D CT scans. It includes: - Organ-level segmentation for 197 categories; - 665K multi-granularity grounded reports; - 1.3M grounded VQA pairs.
Tweet media one
Tweet media two
2
6
35
@WeidiXie
Weidi Xie
5 years
The Pytorch models for VGGFace2 are fully available online now: @Oxford_VGG
Tweet media one
0
13
31
@WeidiXie
Weidi Xie
10 months
We have updated the PMC-LLaMA model: Compared with the former versions, we have: (i) upscaled the model size to 13B; (ii) added 30K medical books into the knowledge injection stage; (iii) done instruction tuning on a large-scale dataset with 202M tokens.
1
7
33
@WeidiXie
Weidi Xie
11 months
New paper on open-vocabulary detection. We aim to tackle two problems in existing work: (1) lexical ambiguity, (2) visual granularity. We demonstrate great results, that use text, visual or both ways to generate classifier. Please check the threads from @pranna for details.
Tweet media one
Tweet media two
@PrannayKaul
Prannay Kaul @ CVPR
11 months
[1/8] In Hawaii(!) for #ICML2023 to present "Multi-Modal Classifiers for Open-Vocabulary Object Detection" Joint with @WeidiXie and Andrew Zisserman 🕸️🏗️ 📑 🖥️ Poster #413 , Thursday 1:30-3pm Exhibit Hall 1
Tweet media one
1
10
28
0
6
31
@WeidiXie
Weidi Xie
1 year
Recently, I'm quite interested in the deploying AI tools for medical applications, here are a series of our recent work, the list is growing continuously, please stay tuned.
2
0
30
@WeidiXie
Weidi Xie
4 months
This originally aims to generate comic books for my daughter with diffusion models, i.e., continuous image sequences with consistent characters, storylines, etc. Well, I guess now we have #SORA 🤣..... Still, it is good to get it accepted on #CVPR2024 , congrats the team !!
@HaoningWu_
HaoningWu___
4 months
Thrilled to share that our exciting project, [StoryGen] (with @liu_chang666 and @WeidiXie ), has been accepted by #CVPR2024 !!!🥳 Check out our paper, code and dataset at: Title: Intelligent Grimm - Open-ended Visual Storytelling via Latent Diffusion Models
Tweet media one
Tweet media two
Tweet media three
Tweet media four
1
4
11
0
2
29
@WeidiXie
Weidi Xie
6 years
@Oxford_VGG Our new ACCV2018 paper on Class-Agnostic Counting is now online:
Tweet media one
Tweet media two
Tweet media three
Tweet media four
1
8
27
@WeidiXie
Weidi Xie
7 months
Would like to share fun work, which has shown to be functional to entertain my daughter ! It can generate coherent image sequences based on the stories you give, or those written by GPTs ! Intelligent Grimm - Open-ended Visual Storytelling via Latent Diffusion Models.
Tweet media one
Tweet media two
Tweet media three
Tweet media four
2
0
27
@WeidiXie
Weidi Xie
2 years
@TansuYegen We actually collected a dataset of these amazing camouflage......
1
6
27
@WeidiXie
Weidi Xie
5 years
Our work on self-supervised learning for the problem of Geometric Alignment based on noisy annotation. Geometric consistency is applied, that is to say all possible perturbed label must be transformed back the unique ground-truth position. @Oxford_VGG
Tweet media one
Tweet media two
Tweet media three
0
13
26
@WeidiXie
Weidi Xie
5 years
State-of-the-art Speaker Recognition with VLAD and GhostVLAD aggregation. "Utterance-level Aggregation For Speaker Recognition In The Wild", to appear in @icassp2019 , for Oral Presentation. Project page (models & code):
Tweet media one
1
11
25
@WeidiXie
Weidi Xie
2 years
Remind me this paper, actually the first time I realise videos are not like I used to imagine ...... Analysing Gait with Spatiotemporal Surfaces,
@kitasenjudesign
Kitasenju Design
2 years
structure of slit-scan スリットスキャンの構造
55
3K
18K
1
3
23
@WeidiXie
Weidi Xie
10 months
We have updated the manuscript : - more recent models as baselines, - better model performance, - more comprehensive evaluation, including both machine and human scoring. Arxiv: Website:
Tweet media one
1
2
23
@WeidiXie
Weidi Xie
6 months
We have made the first release of SAT-Nano, with model and inference code. The SAT-Ultra will be released soon as well. Stay tuned ! Webpage: Code & Model:
Tweet media one
0
5
23
@WeidiXie
Weidi Xie
7 months
I'm not going to NeurIPS, but Fatma does, please go chat with her about the great work, that has been done by @gorkaydemir . 😎
@ftm_guney
F. Güney
7 months
I’m presenting our paper with @gorkaydemir and @WeidiXie tomorrow at Poster Session 2* at #NeurIPS2023 : with SOLV; “Self-supervised Object Centric Learning for Videos”, we can discover multiple objects in real-world video sequences without using additional modalities like depth
Tweet media one
1
15
123
0
1
17
@WeidiXie
Weidi Xie
5 years
@Oxford_VGG Code for the paper on "Class-Agnostic Counting" by Erika Lu, Weidi Xie, Andrew Zisserman :
Tweet media one
Tweet media two
Tweet media three
Tweet media four
0
7
17
@WeidiXie
Weidi Xie
1 year
🚀 The source code and pre-trained model for PMC-VQA is officially released at: We will update the model continuously. Stay tuned !
Tweet media one
0
4
16
@WeidiXie
Weidi Xie
1 year
📢 Our #PMC -VQA dataset: is now live on @huggingface datasets 🤗 , and officially benchmarked on @paperswithcode Looking forward the progress in this domain !!!
Tweet media one
Tweet media two
Tweet media three
0
2
17
@WeidiXie
Weidi Xie
5 years
I will present the recent work @Oxford_VGG with @NagraniArsha , Joon Son Chung, and Andrew Zisserman on Speaker Recognition @icassp2019 on Thursday (May 16), 09:20 - 09:40. Project page (models & code):
Tweet media one
0
6
17
@WeidiXie
Weidi Xie
1 year
PMC-CLIP accepted by #MICCAI2023 !!! Congrats Weixiong, Ziheng, @XiaomanZhang99 , @chaoyiwu4 Webpage:
@chaoyiwu4
chaoyi-wu
1 year
Our paper on the medical foundation model, PMC-CLIP, has been accepted by MICCAI2023. Congratulations to all co-authors.😆 In PMC-CLIP, we collected 1.6M medical image-text caption pairs. All meta reviewers rank it first. Thanks for their recognition.👏
Tweet media one
Tweet media two
1
1
3
0
0
16
@WeidiXie
Weidi Xie
5 months
Check this out ! Nice work on audio-visual synchronisation, also won the challenge on WACV.
@_iashin
vladimir iashin
5 months
📢 Thrilled to share that our paper on "Synchformer: Efficient Synchronization from Sparse Cues" got accepted at #ICASSP24 ! 🎉 Huge shoutout to the amazing team: @WeidiXie , Esa Rahtu, and Andrew Zisserman! Code: arXiv:
Tweet media one
Tweet media two
Tweet media three
0
1
21
0
0
15
@WeidiXie
Weidi Xie
3 years
Use motion to train segmentation model, again, use common fate principle, similar to the motion grouping paper. Simple simulation in flow field turns out generalising extremely well.
@hala_lamdouar
Hala
3 years
Check out our latest work on “Segmenting Invisible Moving Objects” at @BMVCconf (w/ @WeidiXie and Andrew Zisserman) Project page:
0
14
58
0
0
14
@WeidiXie
Weidi Xie
2 years
Existing super-resolution (SR) model often specialized for one scale, limiting their use in practise. We develop a general plugin module, can be injected to any existing SR models, to augment their ability for arbitrary-scale super-resolution. Webpage:
Tweet media one
Tweet media two
Tweet media three
Tweet media four
0
2
14
@WeidiXie
Weidi Xie
9 months
This will be present today afternoon(Friday 6th) @ICCVConference Work by @chaoyiwu4 , @XiaomanZhang99
@WeidiXie
Weidi Xie
10 months
Our recent work on AI4Medicine: visual-language representation learning in radiology, it will be presented on ICCV2023. MedKLIP: Medical Knowledge Enhanced Language-Image Pre-Training in Radiology: Our contributions include (1/4):
Tweet media one
Tweet media two
Tweet media three
1
9
46
0
0
14
@WeidiXie
Weidi Xie
6 years
Recent work on Multi-view Cardiac MR Detection, Orientation, and Segmentation has been published at Medical Image Analysis, "Ω-Net (Omega-Net): Fully automatic, multi-view cardiac MR detection, orientation, and segmentation with deep neural networks", .
Tweet media one
Tweet media two
0
4
11
@WeidiXie
Weidi Xie
1 year
thanks for the tweet.
@_akhaliq
AK
1 year
Multi-Modal Classifiers for Open-Vocabulary Object Detection paper page: The goal of this paper is open-vocabulary object detection (OVOD) x2013 building a model that can detect objects beyond the set of categories seen at training, thus enabling the
Tweet media one
5
63
279
0
0
11
@WeidiXie
Weidi Xie
9 months
Now also available on arxiv 🔥:
@WeidiXie
Weidi Xie
9 months
Can GPT-4V(vision) serve medical applications? We present recent efforts on assessing GPT-4V for multimodal medical diagnosis, by case studies, covering 17 human body systems, across 8 clinical imaging modalities, e.g., radiology, pathology. 🔥Report:
Tweet media one
Tweet media two
Tweet media three
Tweet media four
6
23
101
0
2
11
@WeidiXie
Weidi Xie
3 years
Will present in this workshop on self-supervised video representation learning.
@JonathonLuiten
Jonathon Luiten
3 years
Starting in 5 minutes!
0
0
1
0
1
10
@WeidiXie
Weidi Xie
1 year
Webpage: Datasets: Code:
Tweet media one
Tweet media two
Tweet media three
0
2
10
@WeidiXie
Weidi Xie
5 months
Following my previous tweet, - Our team decide to run thorough comparison between nnUNet and Mamba-based model for medical segmentation. - We will conduct experiment on over 60 public segmentation datasets, and provide complete comparison. (1/n)
1
0
10
@WeidiXie
Weidi Xie
5 months
I think the key problem is, this community is not tolerant to failure !! So whenever something comes out, people can always do hyper-parameter tuning or compare to low baselines, to show its effectiveness. This is misleading the entire community for huge waste of resource.
2
0
10
@WeidiXie
Weidi Xie
11 months
New paper on Nature Communications, where we investigate knowledge-enhanced multimodal representation learning with chest X-rays and radiology reports.
Tweet media one
Tweet media two
Tweet media three
Tweet media four
1
1
9
@WeidiXie
Weidi Xie
6 years
#ECCV2018 Comparator Networks.
Tweet media one
0
0
9
@WeidiXie
Weidi Xie
1 year
Please volunteer to be a reviewer for #NeurIPS2023 ! Message me if you are not invited yet.
@NeurIPSConf
NeurIPS Conference
1 year
Our past conferences wouldn't have been possible without our many reviewers. If you have at least 2 papers in top peer-reviewed confs or journals, with at least one in an ML venue (eg NeurIPS ICML ICLR), we’d be v grateful if you reviewed for #NeurIPS2023
5
66
160
0
0
9
@WeidiXie
Weidi Xie
7 months
Thank you for the invitation. It's an honour to attend this event. Will present some recent work on Developing Foundation Models for Healthcare.
@jeyamariajose
Jeya Maria Jose
7 months
Excited for the session and panel discussion with @vivnat , @pranavrajpurkar , and @WeidiXie on "Building and Evaluating Foundation Models for Healthcare" at AI+Health 2023 conference organized by Stanford University @StanfordAIMI @StanfordHAI
Tweet media one
1
1
23
0
0
9
@WeidiXie
Weidi Xie
1 year
Will cover ideas for discovering objects without manual annotations from images, videos ! Looking forward to the tutorial.
@valeoai
valeo.ai
1 year
Don't miss the "Object localization for free: Going beyond self-supervised learning" @CVPR tutorial (by @oriane_simeoni @WeidiXie @tkipf P. Pérez) for an in-depth coverage of different angles on object localization with no human supervision #cvpr2023
Tweet media one
Tweet media two
1
13
46
0
0
8
@WeidiXie
Weidi Xie
2 years
When trained at a sufficient scale, self-supervised learning has exhibited a notable ability to solve a wide range of visual-language tasks. It turns out that we can adapt pre-trained foundation models to open-vocabulary semantic segmentation, by training very few parameters.
1
0
8
@WeidiXie
Weidi Xie
9 months
Use computer vision tools, to create a large-scale audio-language dataset.
@_akhaliq
AK
9 months
A Large-scale Dataset for Audio-Language Representation Learning paper page: The AI community has made significant strides in developing powerful foundation models, driven by large-scale multimodal datasets. However, in the audio representation learning
Tweet media one
1
41
174
1
0
7
@WeidiXie
Weidi Xie
1 year
- a strong generative model for medVQA - a scalable pipeline for collecting large-scale dataset - a significantly more challenging benchmark than all existing ones, strong visual-language models fail miserably, e.g. BLIP2, Open-Flamingo. Would like to see the progress on medVQA!
Tweet media one
Tweet media two
Tweet media three
Tweet media four
1
1
7
@WeidiXie
Weidi Xie
1 year
CVPR work on automatically generating audio descriptions.
@ImagineEnpc
Imagine-ENPC
1 year
#CVPR2023 June 22, Thu AM (Highlight, Poster 234) AutoAD: Movie Description in Context @TengdaHan , @maxhbain , @NagraniArsha @gulvarol , @WeidiXie , and Andrew Zisserman ( @Oxford_VGG ) pdf: web: code:
Tweet media one
1
5
18
1
0
7
@WeidiXie
Weidi Xie
10 months
Our final model outperforms ChatGPT and LLaMA-2 on multiple Medical QA benchmarks ! Open-source materials: Code: Model: DATA: Hope this can promote the development of open-source LLM for healthcare.
Tweet media one
Tweet media two
Tweet media three
Tweet media four
1
1
6
@WeidiXie
Weidi Xie
5 years
Model and Code:
0
1
5
@WeidiXie
Weidi Xie
1 year
"Learning Open-vocabulary Semantic Segmentation Models From Natural Language Supervision." #CVPR2023 We aim to train an open-vocab segmentation model with image caption, explicitly exploiting the visual invariance between images. Project Page:
1
1
6
@WeidiXie
Weidi Xie
2 years
Collaboration with Chen Ju, @TengdaHan , @KunhaoZ , Ya Zhang, Project page: GitHub: We are gradually releasing the codes for reproducing all the benchmark results.
0
1
6
@WeidiXie
Weidi Xie
1 year
Same.......
@hardmaru
hardmaru
1 year
I’m so tired of AI hype tweets. They’re all over my Twitter feed these days.
40
141
1K
0
0
6
@WeidiXie
Weidi Xie
4 years
I've got asked for the code of an old paper, may be too late 😂, but in case someone else is interested: "Multicolumn Networks for Face Recognition", BMVC2018. The idea is to compute set representation by aggregating images based on the importances.
Tweet media one
Tweet media two
0
0
5
@WeidiXie
Weidi Xie
6 months
- We present a novel architecture that enables to process arbitrary number of input scans, from various imaging modalities, and trained by leveraging the rich domain knowledge.
Tweet media one
1
0
5
@WeidiXie
Weidi Xie
11 months
This is a work questioned by the reviewer for its value....... To be honest, I'm not sure if the idea will be adopted by the community or not, but I think it's COOL ! Webpage: Arxiv: Code & Model:
0
0
5
@WeidiXie
Weidi Xie
1 year
Looking forward to it😎
@oriane_simeoni
Oriane Siméoni
1 year
Our @CVPR tutorial about "object localization for free" is today room East 11 starting at 8:30am PDT time (with @WeidiXie , @tkipf and P. Pérez). Come and join us if you want to hear/discuss about different successful approaches to object localization with no annotation!
1
6
18
0
0
5
@WeidiXie
Weidi Xie
4 years
This is today !
@WeidiXie
Weidi Xie
4 years
Check out our #BMVC2020 paper: “Inducing Predictive Uncertainty Estimation for Face Recognition” @Oxford_VGG A simple approach for estimating the predictive confidence for face recognition systems. Q&A session Tuesday at 10:00-11:00 and 16:00-17:00 UK
Tweet media one
Tweet media two
Tweet media three
Tweet media four
2
12
39
0
0
5
@WeidiXie
Weidi Xie
11 months
Check out our recent work in diagnosing human-object interaction detectors.
@HuaizuJiang
Huaizu Jiang
11 months
Overwhelmed by the progress of human-object interaction (HOI) detection? Ever wondered why one HOI model performs better than another? Check out our recent work in diagnosing human-object interaction detectors. Paper: Code: 🛢️ 1/N
1
5
28
0
0
4
@WeidiXie
Weidi Xie
1 year
Thanks for the invitation, had really enjoyed the workshop, and learnt a lot.😁
@ftm_guney
F. Güney
1 year
videos of invited talks from our ECCV22 workshop on "What is motion for?" is available on youtube (thanks to Deqing):
0
7
27
0
0
4
@WeidiXie
Weidi Xie
9 months
joint work with a wonderful team, @XiaomanZhang99 , @chaoyiwu4 , and more.....
0
0
4
@WeidiXie
Weidi Xie
1 year
1) multimodal biomedical dataset: PMC-OA, 1.6M image-caption pairs collected from PubMedCentral, covering diverse modalities or diseases, with majority of the image-caption samples aligned at finer-grained level, i.e., subfigure and subcaption.
Tweet media one
1
0
4
@WeidiXie
Weidi Xie
4 years
@strangecosmos @ylecun may be this is related ? 😂
0
0
4
@WeidiXie
Weidi Xie
6 months
@gdb Right. I think this is exactly what we would expect a AI4Health model to have, some 'emerging abilities', to discover some hidden factors behind the disease itself, being able to make diagnosis by combining all information source, with the ability of top level clinicians.
1
0
4
@WeidiXie
Weidi Xie
7 months
cool stuff !
@pika_labs
Pika
7 months
Introducing Pika 1.0, the idea-to-video platform that brings your creativity to life. Create and edit your videos with AI. Rolling out to new users on web and discord, starting today. Sign up at
1K
5K
26K
0
0
3
@WeidiXie
Weidi Xie
9 months
We assess GPT-4V's capabilities in tasks like anatomy recognition, disease diagnosis, report generation, and disease localization.
Tweet media one
Tweet media two
Tweet media three
1
0
2
@WeidiXie
Weidi Xie
7 months
Fantastic !
@eric_zemingchen
Zeming Chen
7 months
We present MEDITRON, a set of new open-access #LLMs (70B & 7B) adapted to the medical domain, achieving new SoTA open-source performance on common medical benchmarks, outperforming #GPT -3.5 and Med-PaLM, and coming within 5% of #GPT4 Find out how we did this ⬇️
Tweet media one
25
136
610
0
0
3
@WeidiXie
Weidi Xie
6 months
- The cases cover 9 diverse imaging modalities (CT, MRI, X-ray, Ultrasound, Fluoroscopy, Nuclear medicine, Mammography, DSA, Barium Enema) and 7 human anatomy regions (head and neck, spine, chest, breast, abdomen and pelvis, upper limb, lower limb).
Tweet media one
1
0
3
@WeidiXie
Weidi Xie
6 months
- We build up an large-scale diagnostic dataset that encompasses 5568 disorders linked with 930 unique ICD-10-CM codes, containing 39,026 cases (192,675 scans).
Tweet media one
1
0
3