M3SGG Documentation

Welcome to the documentation for M3SGG (Modular, multi-modal Scene Graph Generation), a modular framework for video scene graph generation and analysis.

Overview

M3SGG builds on established SGG research and extends it with modular components, dataset support, and training/evaluation tooling. It supports multiple approaches and provides utilities for training, evaluation, and analysis of video scene graphs.

Key Features

  • Multiple SGG Models: STTran, DSG-DETR, STKET, Tempura, SceneLLM, OED, VLM

  • Dataset Support: Action Genome, EASG, and Visual Genome datasets

  • Language Integration: Summarization and language modeling capabilities

  • GUI Application: Interactive demo application for visualization and testing

  • Comprehensive Evaluation: Multiple evaluation modes (PredCLS, SGCLS, SGDET)

Quick Start

To get started quickly, see the Installation guide and then check out the Usage Guide examples.

Additional Information:

Indices and Tables