polish TV is using computer vision to enhance the viewer experience for sports broadcasts:
- FIFA-like radar overlays
- player recognition
- pass distance measurement
- ball speed and trajectory tracking during shots
here is the final version of my vehicle speed estimation demo
read the thread below to learn how I built it.
I will cover:
- detection
- tracking
- perspective transformation
- speed calculation
- some bonus ideas
↓
REAL-TIME object detection WITHOUT TRAINING
YOLO-World is a new SOTA open-vocabulary object detector that outperforms previous models in terms of both accuracy and speed. 35.4 AP with 52.0 FPS on V100.
↓ read more
supervision, the open-source library I created a year ago, has crossed 10,000 stars on GitHub this weekend!
thank you to everyone who helped me build this project!
it took us 2,000+ commits, 500+ PRs and 50+ contributors to do it.
repository:
almost fully functional version of my football AI project
today, I added player tracking using ByteTrack and projection of players onto the map
code coming soon:
I'm starting to get more and more serious with YOLO-World; trying to solve real-life problems.
I wanted to see if YOLO-World could recognize that the holes had been filled out.
It was pretty tricky, but I learned a little about prompting.
↓ read more
The traffic analysis project is growing! The YouTube tutorial will be out this week.
Progress: I can now identify that the car is in a specified zone.
Next: Match entrance and exit zones for every tracker ID to analyze the traffic flow.
GitHub repo:
I'm taking my football/soccer project to the next level
today, I worked on detecting players, referees, and the ball and mapping their positions from video frames to positions on the field.
↓ read more
I fine-tuned my first vision-language model
PaliGemma is an open-source VLM released by
@GoogleAI
last week. I fine-tuned it to detect bone fractures in X-ray images.
thanks to
@mervenoyann
and
@__kolesnikov__
for all the help!
↓ read more
YOLOv9
Learning What You Want to Learn Using Programmable Gradient Information
Today's deep learning methods focus on how to design the most appropriate objective functions so that the prediction results of the model can be closest to the ground truth. Meanwhile, an appropriate
manual data labeling is (almost) dead
1,500,000 images auto-annotated within 2 weeks of release.
now, we also support automatic segmentation labeling.
↓ read more about open-source models that power this feature
I need to take a break from football AI for a while.
I plan to experiment with PaliGamma, Google's new open-source VLM, over the next few days.
but don't worry, I'll be back. In the meantime, the football AI code is slowly making its way to this repo.
train YOLOv9 on your dataset tutorial
- run inference with a pre-trained COCO model
- fine-tune model on custom dataset
- evaluate the trained model
- run inference with a fine-tuned model
blogpost:
↓ read more
taking my football/soccer AI to the next level
- image embeddings
- dimension reduction
- player clustering
- awesome visualizations
code: (code migration in progress...)
↓ read more
looking for OpenAI-4V alternatives?
- LLaVA
- BakLLaVA
- CogVLM
- Fuyu-8B
- Qwen-VL
I am working on a short blog post discussing some GPT-4V alternatives. It will probably come out today.
links all resources:
Automated
@NBA
match commentary using
@OpenAI
vision and TTS (with code!)
Everyone is bragging about projects that generate automatic video commentary, but no one is showing the code. I did it while waiting for the plane.
code:
manual data labeling is almost dead
define prompts, tweak the confidence threshold, and make manual adjustments if necessary.
this feature is now available to all users, even on free accounts.
read more:
I spent most of today preparing for CVPR 2024
"Matching Anything by Segmenting Anything" particularly caught my attention.
Here are the fast open-vocabulary tracking examples (MASA + YOLO-World).
link:
↓ read more
how to calculate the TIME objects spend IN THE ZONE? - that's the topic of my next tutorial.
here's a short (and a bit creepy) demo I built a few months ago.
do you have ideas for a less creepy use case for this tech?
github repository:
supervision 0.21.0 is launching tomorrow
this update includes VertexLabelAnnotator, allowing you to annotate skeleton vertices with custom text and color
link:
analyzing store traffic to find the most frequently visited areas
super demo created by
@Hine__Po
- member of Supervision community
link to repo if you want to build something over the weekend:
The YOLO-World YouTube tutorial is out!
please, let us know what you think!
- model architecture
- processing images and video in Colab
- prompt engineering and detection refinement
- pros and cons of the model
watch here:
↓ more resources
YOLOv9 tutorial: train model on custom dataset
- running inference with pre-trained COCO weights
- fine-tuning the model on a custom dataset
- model evaluation
- model deployment
sorry it took me so long; hope you like it
it took us a while, but the supervision-0.20.0 release will finally add support for key points.
what are your thoughts on annotators? so far, we only have EdgeAnnotator and VertexAnnotator.
supervision repo:
supervision-0.15.0 is out! This time, we bring highly customizable annotators.
We added eight annotators - box, mask, ellipse, label, circle, corner, trace, and blur. But the best part is... you can freely mix them!
GitHub repository:
YOLO is the craziest model family. Each version is created by a different organization.
"Compared with YOLOv9-C, YOLOv10-B has 46% less latency and 25% fewer parameters for the same performance."
I'll try to test it today.
↓ links
improving object counting logic
today I solved an interesting bug that has existed in my library for a loooooong time
repository:
↓ WARNING: lots of math in the thread below
Easily one of the most exciting projects built with Supervision!
Our community member Vriza Wahyu Saputra built this fantastic ball juggling counting demo using the moving LineZone available in our API.
support for pose estimation and key point detection soon in the supervision
you can expect connectors for the most popular models and the first annotators in the next supervision release
can't wait to build demos like this with supervision
parking occupancy analysis
calculation of percentage occupancy in individual parking zones
all this was done with supervision:
btw,
@UenoLeo
is cooking a blog post covering this project, so stay tuned!
↓ read more
I love watching other people build cool demos with the supervision library; traffic analysis examples built by Anant Jaiswal
- object tracking
- zone counting
- heat-map analysis
link:
smart self-service checkout powered by YOLOv9
the value of the basket is updated live based on its changing content; what else should I add?
demo build with supervision:
What papers should I read to expand my knowledge of Transformers?
Please send links in the comments and write why this paper is worth reading. Thanks for your help!
new YouTube tutorial: compute dwell time using computer vision in live streams
(seems easy, yet tricky)
- static file vs stream processing
- preventing growing latency and frame buffer overflow
- efficient stream processing
full tutorial:
↓ read more
speed estimation tutorial is finally out!
- object detection
-multi-object tracking
- filtering detections with polygon zone
- perspective transformation and speed estimation
link:
below are some interesting visualizations I created for this video
↓
Qwen-VL-Plus is SACARY good! (better than GPT-4V)
here it is casually solving Recaptcha!
- You don't have to give any additional instructions other than 'Solve it.'
- It can even mark the exact position of the objects it is looking for.
↓ it can do so much more
Sports Analytics with GPT-4 Vision
I wondered whether GPT-4V had the capability to automatically separate players into teams based on the color of their uniforms.
It took me a ridiculously long time to create this image, but in the meantime, I learned a lot about GPT-4V.
- Object detection over HTTP?
- Easy!
We just open-sourced our inference server under Apache 2.0
Left terminal:
@roboflow
inference
Right terminal: video client
I'm experimenting with PaliGemma tonight
a single open-source model allowing you to:
- detect car (detection)
- answer questions about its color and brand (VQA)
- read license plate number (OCR)
all that on a single consumer-grade GPU
is there any other model that can do it?
It took me ONE HOUR to craft this demo using supervision-0.18.0
- Three new annotators: PercentageBar, RoundedBox, and OrientedBox
- Enhanced LineZone feature for improved counting
- OBB (oriented bounding boxes) integration
↓ read more
repo:
YOLO-World + EfficientSAM + StableDiffusion for language-guided inpainting
I was inspired yesterday by the work of
@MrDravcan
(see attached), and I decided to try to replicate it.
SPOILER ALERT: it didn't quite work out for me.
↓ read more
awesome example of using Supervision for the detection, annotation, and counting of coffee seedlings
kudos to community member Eric Kimwatan
supervision repo:
↓ youtube tutorial and colab
time-in-zone (dwell time) tutorial is coming
this is the third time I'm trying to make this video; hopefully, the last one
I finally have a good use case - waiting time for service.
here is the first iteration. what do you think?
link:
always triple-check the correctness of your datasets and data augmentations.
today, I found two separate errors that ruined my model training.
but finally, we are on the right track
↓ here's where I messed up
detecting small objects is hard
I spent some time today writing a short how-to guide on using supervision (in combination with the most popular CV libraries) to detect small objects.
btw is that a good idea for a video tutorial?
link:
↓ read more
supervision-0.18.0 is almost here!
we had planned to release it tomorrow, but we're still putting the finishing touches on the OBB (oriented bounding box) support
repository:
Manually annotate ONE image and let GPT-4V annotate ALL of them.
1. generate boxes for all images with GroundingDINO
2. provide categories for the reference image
3. prompt GPT-4V to map generated boxes to reference categories
I'm experimenting with a new annotator that zooms in on small detections
do you think it is something useful? or am I just wasting my time here?
more cool annotators:
processing documents with Claude 3
- Good OCR capabilities
- Process up to 20 images with a single API call
- API seems slow and a bit unstable; expect a lot of variance in call execution time
- ~2x cheaper than GPT4-V (please check my math)
↓ read more
time analysis with computer vision
- blurring faces
- detection and tracking
- smoothing detections
- filtering detections by zone
- calculating time
let me know if you want me to explain anything else. ;)
code:
↓ read more
finally had a little bit of time to work on my upcoming vehicle speed estimation tutorial
any improvement ideas?
the demo was built with the supervision
code will soon land on GitHub:
Is that demo too creepy?
Ignore that one lady sitting in the zona since the beginning is undetected. I am still trying to figure out why...
But zone timers work!
GitHub repository:
🔴 stream: YOLO-World Q&A + coding
in less than 15 minutes, I start my first YT stream; I'll be talking about YOLO-World and answering your questions that you left under my last YT video
stop by to say hello
link:
↓ some of the topics we will cover
Two months ago, I created a
@github
repository where I gathered links to the best free AI courses. 🔥
I started with five links, and now there are almost 20. 🚀 The entire repository already has 1200+ ⭐
⮑ 🔗 GitHub repository:
↓🧵some of the courses
YOLOv10 is fast and light but is NOT the best choice for detecting small objects in the distance.
- YOLOv8 - top-right
- YOLOv9 - bottom-left
- YOLOv10 - bottom-right
YOLOv10 performs worse.
supervision, the open-source library I created a year ago, has crossed 10,000 stars on GitHub this weekend!
thank you to everyone who helped me build this project!
it took us 2,000+ commits, 500+ PRs and 50+ contributors to do it.
repository:
using GPT-4V to split players into teams
blending detections with the same tracker ID allows you to significantly reduce the number of GPT-4V API calls when you process video
1 call / 25 frames
kudos to
@ikuma_uchida18
for coming up with this strategy
read more, it's cool ↓
Sports Analytics with GPT-4 Vision
I wondered whether GPT-4V had the capability to automatically separate players into teams based on the color of their uniforms.
It took me a ridiculously long time to create this image, but in the meantime, I learned a lot about GPT-4V.
The second day of work on my SAM + MetaCLIP + ProPainter HF Space
- Automated object masking [done]
- Automated inpainting using ProPainter [in progress]
I know my football AI isn't quite there yet, but this motivates me to add some advanced features after I get back from CVPR 2024.
I'll keep you guys posted!
link:
I just added the polygon annotator to the supervision package
you can now use masks or polygons to visualize the result of the instance segmentation model
polygon annotator will be available in supervision-0.17.0
code:
processing this one-second video exhausted my entire daily quota of 500 GPT-4V requests
but if you were wondering,
@OpenAI
GPT-4V can automatically divide players into teams based on the color of their uniforms
Sports Analytics with GPT-4 Vision
I wondered whether GPT-4V had the capability to automatically separate players into teams based on the color of their uniforms.
It took me a ridiculously long time to create this image, but in the meantime, I learned a lot about GPT-4V.
detecting small and distant objects is a major weakness of YOLOv10.
here is the comparison of YOLOv8l at 640x640 and YOLOv10l at 640x640:
- green: detected by YOLOv8 and YOLOv10
- red: detected only by YOLOv8
- blue: detected only by YOLOv10
YOLOv10 is fast and light but is NOT the best choice for detecting small objects in the distance.
- YOLOv8 - top-right
- YOLOv9 - bottom-left
- YOLOv10 - bottom-right
YOLOv10 performs worse.
estimating traffic density based on the live feed from NYC street cameras.
you can find out in real-time which streets are congested.
shoutout to
@UenoLeo
for creating this cool project!
CLIP by
@OpenAI
was revolutionary, but its data curation pipeline was never detailed nor open-sourced.
@Meta
has now released MetaCLIP, a fully open-source replication.
Models are on the hub:
YOLO (unofficial and incomplete) history
who made what?
while I wait for my first YOLOv9 model custom dataset fine-tuning to finish, I decided to share with you an incomplete YOLO history
with links to papers and code
YOLO (2016) Joseph Redmon et al.
- paper:
obviously torch vision is a lot more important than supervision :)
still it’s awesome feeling to see my tiny library overtaking this freaking giant on GitHub
link:
supervision-0.15.0 will be out tomorrow! This time we bring highly customizable annotators. Just plug in your model and we'll take care of the rest.
GitHub repository:
zone analysis is awesome;
you can use it to calculate an object's precise position in space, determine its movement path, or measure its distance traveled.
air traffic monitoring demo by
@carlos_melo_py
supervision repo:
↓ youtube tutorial and code
whenever I show zone analysis in my tutorials, people ask me how I designed the polygons
I decided to spend a few hours and create for you a small tool you can fire up locally to draw zones
code: