Exploring and enhancing the visual robustness of Vision-Language Models (VLMs) within complex, real-world physical environments
Bridging Image Quality Assessment and Visual Question Answering for robust embodied intelligence
Vision-Language Models (VLMs) act as the "eyes" and "brains" of modern embodied agents. However, the physical world is filled with uncertainties: motion blur, poor lighting, sensor noise, and transmission loss.
An agent's ability to maintain stable visual understanding under these harsh conditions is critical for achieving safe and reliable embodied intelligence. This challenge bridges Image Quality Assessment (IQA) and Visual Question Answering (VQA) to measure the performance degradation of models under various levels of visual distortion.
Evolution from controlled simulation environments to real-world robotic deployment
Correlate quality metrics with ground-truth VLM performance scores
30 distinct distortion types across 5 severity levels
Absolute Score, Relative Score & Real-Robot evaluation tracks
The competition is organized into three distinct tracks, spanning from synthetic robustness evaluation to real-world robotic deployment
Participants must predict the absolute accuracy (0.0 to 1.0) of VLM models on a large-scale dataset containing 12,400 images with controlled noise. The goal is to accurately estimate the model's task-solving capability under various distortions.
Participants are tasked with predicting the relative performance drop of VLM models compared to their baseline on original, clean images. This track focuses on the sensitivity of metrics to increasing levels of image degradation.
Starting May 1, 2026, top-performing models will be evaluated on physical robot platforms.
Prizes will be awarded to the Top 3 winners of each individual track (Track 1, Track 2, and Track 3).
The final evaluation for paper acceptance and official competition reports will be based on the comprehensive performance across all three tracks.
12,400 images with controlled distortions for robust evaluation
| Split | Original Images | Distorted Images | Total | Role |
|---|---|---|---|---|
| Train | 280 (img_000 - img_279) |
8,400 (img_000_noise01 - img_279_noise30) |
8,680 | Model Training & Fine-tuning |
| Val | 40 (img_280 - img_319) |
1,200 (img_280_noise01 - img_319_noise30) |
1,240 | Model Selection & Debugging |
| Test | 80 (img_320 - img_399) |
2,400 (img_320_noise01 - img_399_noise30) |
2,480 | Competition Evaluation |
| Total | 400 | 12,000 | 12,400 | - |
30 distinct types of distortions simulating real-world visual degradation
5 severity levels (Level 4 to Level 8) for each distortion type
Each original image has 30 distorted variations for comprehensive analysis
Submissions are evaluated based on correlation with Ground Truth scores
Measures the linear relationship between the predicted scores and the ground truth.
Measures the monotonic relationship (ranking consistency) between the predicted scores and the ground truth.
Predicting the mean performance level (Mean Opinion Score) of models on a specific distorted image.
Measuring the ratio of performance on a distorted image compared to its original, clean counterpart (Ratio of Means).
Ensure the index column matches the Test Set exactly. Any missing rows or mismatched indices will result in a submission error.
Do not include Base64 image data in your submission; only the predicted scores are required.
We recommend providing scores with at least 4 decimal places for better ranking granularity.
All submissions are handled through CodaBench platform
All competition submissions are managed through the CodaBench platform. Please visit the competition page for detailed submission instructions and format requirements.
Go to CodaBenchTraining and validation datasets become available for all participants
Test dataset becomes available for all participants
Beginning of the real-world robot competition phase
Announcement of final competition results
Deadline for workshop paper submissions
Notification of paper acceptance decisions
Deadline for camera-ready paper submissions