M3SGG Documentation
Welcome to the documentation for M3SGG (Modular, multi-modal Scene Graph Generation), a modular framework for video scene graph generation and analysis.
Overview
M3SGG builds on established SGG research and extends it with modular components, dataset support, and training/evaluation tooling. It supports multiple approaches and provides utilities for training, evaluation, and analysis of video scene graphs.
Key Features
Multiple SGG Models: STTran, DSG-DETR, STKET, Tempura, SceneLLM, OED, VLM
Dataset Support: Action Genome, EASG, and Visual Genome datasets
Language Integration: Summarization and language modeling capabilities
GUI Application: Interactive demo application for visualization and testing
Comprehensive Evaluation: Multiple evaluation modes (PredCLS, SGCLS, SGDET)
Quick Start
To get started quickly, see the Installation guide and then check out the Usage Guide examples.
User Guide:
API Reference:
Additional Information: