What is RoadSocial

RoadSocial is a large-scale, diverse VideoQA dataset tailored for generic road event understanding from social media narratives. It differentiates itself from existing datasets by capturing the global complexity of road events with varied geographies, camera viewpoints (CCTV, handheld, drones) and rich social discourse. RoadSocial highlights:

  • 14M frames, 414K social comments
  • 13.2K videos (7.9K minutes)
  • 674 unique video tags (total 100K+)
  • 260K high-quality socially-informed QA pairs
  • Scalable QA generation pipeline using social video narratives
  • 12 challenging video QA tasks for generic road event understanding
  • New tasks to test robustness of Video LLMs to hallucination
  • Improves generic road event understanding capability of Video LLMs
  • Critical insights onto zero-shot capabilities of 18 Video LLMs

Dataset Examples

Dataset Statistics

Video Presentation

References: InternVL2 [4]; MM-AU [5]; VITA [6]; BDD-X [8]; LLaVA-OV [10]; ARIA [11]; Dolphin [13]; DRAMA [15]; LingoQA [16]; GPT-4o [18]; Rank2Tell [24]; LongVU [25]; DriveLM [26]; ROAD [27]; Gemini-1.5-Pro [28]; Tarsier [30]; Qwen2-VL [31]; SUTD-TrafficQA [32]; BDD-OIA [33]; Mini-CPM-V [35]; IXC-2.5 [36]; LLaVA-Video [37]

Citation

@misc{parikh2025roadsocialdiversevideoqadataset,
                      title={RoadSocial: A Diverse VideoQA Dataset and Benchmark for Road Event Understanding from Social Video Narratives}, 
                      author={Chirag Parikh and Deepti Rawat and Rakshitha R. T. and Tathagata Ghosh and Ravi Kiran Sarvadevabhatla},
                      year={2025},
                      eprint={2503.21459},
                      archivePrefix={arXiv},
                      primaryClass={cs.CV},
                      url={https://arxiv.org/abs/2503.21459}, 
                    }