Machine-oriented Visual
Media Quality Assessment

Exploring and enhancing the visual robustness of Vision-Language Models (VLMs) within complex, real-world physical environments

12,400 Images
12 VLMs
30 Distortion Types

Welcome to M-VQA Challenge

Bridging Image Quality Assessment and Visual Question Answering for robust embodied intelligence

Background

With the development of Embodied AI, machines have replaced humans as the main consumers of visual media, yet existing Image Quality Assessment (IQA) metrics remain focused on human and overlooked machine task utility. To address this gap, we introduce the Machine-oriented Image Quality Assessment (MoIQA) Challenge, emphasizing simulation comprehension and Real-world exectution. This challenge aims to advance reliable image understanding for machine preference and support robust Embodied AI applications.

Synthetic to Real

Evolution from controlled simulation environments to real-world robotic deployment

Performance Metrics

Correlate quality metrics with ground-truth VLM performance scores

Robustness Testing

30 distinct distortion types across 5 severity levels

Three Tracks

Absolute Score, Relative Score & Real-Robot evaluation tracks

Competition Structure & Core Tasks

The competition is organized into three distinct tracks, spanning from synthetic robustness evaluation to real-world robotic deployment

01

Track 1: Absolute Performance Prediction (In-Silico)

Participants must predict the absolute accuracy (0.0 to 1.0) of VLM models on a large-scale dataset containing 12,400 images with controlled noise. The goal is to accurately estimate the model's task-solving capability under various distortions.

02

Track 2: Relative Degradation Assessment (In-Silico)

Participants are tasked with predicting the relative performance drop of VLM models compared to their baseline on original, clean images. This track focuses on the sensitivity of metrics to increasing levels of image degradation.

03

Track 3: Real-World Robot Deployment (Real-Robot)

Starting May 30, 2026, top-performing models will undergo rigorous evaluation on physical robot platforms. Specifically, we will conduct systematic assessments on data collected from 100 robotic arms operating within a standardized experimental setup.

  • Eligibility: Only the top 10 teams from the combined rankings of Track 1 and Track 2 are eligible to participate.
  • Evaluation Protocol: Direct testing on real-world sensor data collected from laboratory robot platforms, with no prior training phase. Performance will be measured by assessing the consistency between predicted quality scores and binary success/failure labels (0/1) derived from task execution outcomes.

Phase Details

Preliminary Round

Development & Challenge

  • Status: Open
  • Objective: Evaluation for Track 1 and Track 2 based on the released Training and Validation datasets. Participants must submit their predictions to the CodaBench platform for ranking.
Final Round

Sim-to-Real

  • Status: Invitation Only
  • Objective: Evaluation for Track 3. Qualified teams will deploy their algorithms on private, real-world robotic data to test generalization in physical environments.

Awards & Publications

Awards

Prizes will be awarded to the Top 3 winners of each individual track (Track 1, Track 2, and Track 3).

Publications

The final evaluation for paper acceptance and official competition reports will be based on the comprehensive performance across all three tracks.

Dataset Statistics

12,400 images with controlled distortions for robust evaluation

12,400
Total Images
400
Original Images
30
Distortion Types
5
Severity Levels
Split Original Images Distorted Images Total Role
Train 280
(img_000 - img_279)
8,400
(img_000_noise01 - img_279_noise30)
8,680 Model Training & Fine-tuning
Val 40
(img_280 - img_319)
1,200
(img_280_noise01 - img_319_noise30)
1,240 Model Selection & Debugging
Test 80
(img_320 - img_399)
2,400
(img_320_noise01 - img_399_noise30)
2,480 Competition Evaluation
Total 400 12,000 12,400 -

Distortion Types

30 distinct types of distortions simulating real-world visual degradation

Severity Levels

5 severity levels (Level 4 to Level 8) for each distortion type

Image Pairs

Each original image has 30 distorted variations for comprehensive analysis

Additional Training Data

Simulation Track Additional Training Data

https://github.com/lcysyzxdxc/MPD

Real-Robot Track Additional Training Data

https://github.com/aiben-ch/EmbodiedIQA

Citation Format

@inproceedings{simdata,
  author = {Li, Chunyi and Tian, Yuan and Ling, Xiaoyue and Zhang, Zicheng and Duan, Haodong and Wu, Haoning and Jia, Ziheng and Liu, Xiaohong and Min, Xiongkuo and Lu, Guo and Lin, Weisi and Zhai, Guangtao},
  title = {Image Quality Assessment: From Human to Machine Preference},
  booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  month = {June},
  year = {2025},
  pages = {7570-7581}
}

@inproceedings{realdata,
  title={Image Quality Assessment for Embodied {AI}},
  author={Chunyi Li and Jiahao Xiao and Jianbo Zhang and Farong Wen and Zicheng Zhang and Yuan Tian and Xiangyang Zhu and Xiaohong Liu and Zhengxue Cheng and Weisi Lin and Guangtao Zhai},
  booktitle={The Fourteenth International Conference on Learning Representations},
  year={2026},
  url={https://openreview.net/forum?id=azj53PLJRL}
}

Evaluation Metrics

Submissions are evaluated based on correlation with Ground Truth scores

PLCC

Pearson Linear Correlation Coefficient

Measures the linear relationship between the predicted scores and the ground truth.

SRCC

Spearman Rank-Order Correlation Coefficient

Measures the monotonic relationship (ranking consistency) between the predicted scores and the ground truth.

Competition Tracks

01

Absolute Score

Robustness Level

Predicting the mean performance level (Mean Opinion Score) of models on a specific distorted image.

Evaluated using mean of PLCC and SRCC
02

Relative Score

Performance Degradation

Measuring the ratio of performance on a distorted image compared to its original, clean counterpart (Ratio of Means).

Evaluated using mean of PLCC and SRCC

Important Notes

Consistency

Ensure the index column matches the Test Set exactly. Any missing rows or mismatched indices will result in a submission error.

Encoding

Do not include Base64 image data in your submission; only the predicted scores are required.

Precision

We recommend providing scores with at least 4 decimal places for better ranking granularity.

Submit Your Results

All submissions are handled through CodaBench platform

Submit via CodaBench

All competition submissions are managed through the CodaBench platform. Please visit the competition page for detailed submission instructions and format requirements.

Go to CodaBench

Submission Steps

  1. Create an account on CodaBench if you haven't already
  2. Join the M-VQA Challenge competition
  3. Prepare your prediction file following the required format
  4. Upload your submission and wait for evaluation
  5. Check your ranking on the leaderboard

Competition Timeline

February 22, 2026

Training & Validation Datasets Release

May 15, 2026

Test Dataset Release

May 30, 2026

Real-Robot Competition Starts

June 10, 2026

Final Competition Results Announced

June 25, 2026

Deadline for Paper Submission

July 16, 2026

Paper Acceptance Notification

August 6, 2026

Deadline for Camera-Ready Papers

The timeline will be adjusted in accordance with the progress of the ACMMM conference.

Organizers & Advisors

Challenge Organizing Committee

Jianbo Zhang

Jianbo Zhang

Rui Qing

Rui Qing

Yufei Han

Yufei Han

Shiyu Liu

Shiyu Liu

Yuanzhong Zhang

Yuanzhong Zhang

Advisory Committee

Chunyi Li

Chunyi Li

Zicheng Zhang

Zicheng Zhang

Chris Wei Zhou

Chris Wei Zhou

Guangtao Zhai

Guangtao Zhai

Weisi Lin

Weisi Lin

Roger Zimmermann

Roger Zimmermann

Contact Us

For any inquiries, please reach out to:

lcysyzxdxc@sjtu.edu.cn