OpenDataArena

community

https://opendataarena.github.io

OpenDataArena

Activity Feed

AI & ML interests

Data-centric AI, LLM, MLLM

Recent Activity

LHL3341 new activity about 1 hour ago

OpenDataArena/MMFineReason-Full-2.3M-Qwen3-VL-235B-Thinking:awesome work. can you please add a license

LHL3341 new activity about 1 hour ago

OpenDataArena/MMFineReason-Full-2.3M-Qwen3-VL-235B-Thinking:Could you share the prompt to generate the response from Qwen

LHL3341 updated a dataset about 3 hours ago

OpenDataArena/MMFineReason-SFT-123K-Qwen3-VL-235B-Thinking

View all activity

Papers

OpenDataArena: A Fair and Open Arena for Benchmarking Post-Training Dataset Value

View all Papers

Organization Card

Community About org cards

🌐 About OpenDataArena

OpenDataArena (ODA) is an open research initiative devoted to evaluating, benchmarking, and creating high-value datasets for the post-training era of large language models (LLMs).
We believe data quality defines model capability — and that open, reproducible evaluation is key to accelerating progress in AI.

🚀 Our Mission

To make data evaluation scientific, transparent, and community-driven, while continuously producing high-value, openly available datasets that enhance model alignment and reasoning ability.

🔑 Key Features

🏆 Dataset Leaderboard — Leaderboard ranks the most valuable datasets across multiple domains, based on diverse benchmarks.
📊 Comprehensive Scoring System — Scoring tool measures dataset quality, diversity, and learning values using reproducible pipelines.
🧰 Open-Source Toolkit — OpenDataArena-Tool enables dataset evaluation, scoring with a standardized, community-driven workflow.
🌱 High-Value Data Generation — beyond evaluation, ODA continuously produces and shares new, top-quality datasets for fine-tuning and alignment research.

If you find our work helpful, please consider ⭐ starring and subscribing to support open, data-driven AI research. Learn more at opendataarena.github.io.

(OpenDataArena is part of OpenDataLab).