M3SGG Documentation

Welcome to the documentation for M3SGG (Modular, multi-modal Scene Graph Generation), a modular framework for video scene graph generation and analysis.

Overview

M3SGG builds on established SGG research and extends it with modular components, dataset support, and training/evaluation tooling. It supports multiple approaches and provides utilities for training, evaluation, and analysis of video scene graphs.

Key Features

Multiple SGG Models: STTran, DSG-DETR, STKET, Tempura, SceneLLM, OED, VLM
Dataset Support: Action Genome, EASG, and Visual Genome datasets
Language Integration: Summarization and language modeling capabilities
GUI Application: Interactive demo application for visualization and testing
Comprehensive Evaluation: Multiple evaluation modes (PredCLS, SGCLS, SGDET)

Quick Start

To get started quickly, see the Installation guide and then check out the Usage Guide examples.

User Guide:

API Reference:

API Reference

Additional Information:

M3SGG Documentation

Overview

Key Features

Quick Start

Indices and Tables