M3SGG Documentation
===================

Welcome to the documentation for M3SGG (Modular, multi-modal Scene Graph Generation), a modular framework for video scene graph generation and analysis.

Overview
--------

M3SGG builds on established SGG research and extends it with modular components, dataset support, and training/evaluation tooling. It supports multiple approaches and provides utilities for training, evaluation, and analysis of video scene graphs.

Key Features
------------

* **Multiple SGG Models**: STTran, DSG-DETR, STKET, Tempura, SceneLLM, OED, VLM
* **Dataset Support**: Action Genome, EASG, and Visual Genome datasets
* **Language Integration**: Summarization and language modeling capabilities
* **GUI Application**: Interactive demo application for visualization and testing
* **Comprehensive Evaluation**: Multiple evaluation modes (PredCLS, SGCLS, SGDET)

Quick Start
-----------

To get started quickly, see the :doc:`installation` guide and then check out the :doc:`usage` examples.

.. toctree::
   :maxdepth: 2
   :caption: User Guide:
   
   installation
   usage
   datasets
   models
   training
   evaluation

.. toctree::
   :maxdepth: 2
   :caption: API Reference:
   
   api

.. toctree::
   :maxdepth: 1
   :caption: Additional Information:
   
   contributing
   changelog
   license

Indices and Tables
==================

* :ref:`genindex`
* :ref:`modindex`
* :ref:`search`