Usage Guide

This comprehensive guide covers all usage patterns for M3SGG, including the new configuration system, different calling methods, applications, and examples.

Quick Start

M3SGG provides multiple ways to interact with the framework:

  1. CLI Commands - Direct command-line interface

  2. Configuration Files - YAML-based configuration system

  3. Python API - Programmatic interface

  4. Applications - GUI and web interfaces

  5. Jupyter Notebooks - Interactive examples

Configuration System

M3SGG features a modern, flexible configuration system with multiple approaches:

YAML Configuration Files

Use structured YAML files for reproducible experiments:

# configs/sttran_predcls.yaml
mode: predcls
model_type: sttran
dataset: action_genome
data_path: data/action_genome
datasize: large

# Training parameters
lr: 1e-4
nepoch: 100
batch_size: 1
optimizer: adamw

# Model architecture
enc_layer: 1
dec_layer: 3

# System settings
device: cuda:0
seed: 42
num_workers: 4

Structured Configuration Classes

Use type-safe configuration classes for programmatic access:

from m3sgg.core.config.structured.sttran import STTranConfig

config = STTranConfig(
    mode="predcls",
    lr=1e-4,
    nepoch=100,
    enc_layer=1,
    dec_layer=3
)

Unified Configuration Interface

The unified interface supports both legacy and modern systems:

from m3sgg.core.config.unified import UnifiedConfig

# Modern configuration
config = UnifiedConfig(
    config_path="configs/sttran.yaml",
    model_type="sttran",
    use_modern=True
)

# Legacy configuration
config = UnifiedConfig(
    cli_args=["-mode", "predcls", "-model", "sttran"],
    use_modern=False
)

Command Line Interface

M3SGG CLI

The modern CLI provides a clean interface:

# Install CLI
pip install -e .

# Basic usage
m3sgg train --config configs/sttran.yaml
m3sgg eval --model-path output/model.pth
m3sgg app  # Launch Streamlit app

CLI Options

m3sgg train [OPTIONS]

Options:
  --config, -c PATH     Path to configuration file
  --model, -m TEXT      Model type (sttran/tempura/scenellm/stket/oed/vlm)
  --dataset, -d TEXT    Dataset (action_genome/EASG)
  --mode TEXT           Training mode (predcls/sgcls/sgdet)
  --epochs, -e INT      Number of training epochs
  --lr FLOAT            Learning rate
  --batch-size, -b INT  Batch size
  --device TEXT         Device (cuda:0/cpu)
  --output, -o PATH     Output directory
  --checkpoint PATH     Path to checkpoint file
  --verbose, -v         Enable verbose logging

Legacy Training Scripts

For backward compatibility, legacy scripts are still supported:

# Training
python scripts/training/training.py -mode predcls -model sttran -data_path data/action_genome

# Evaluation
python scripts/evaluation/test.py -m predcls -model_path output/model.pth

# EASG training
python scripts/training/easg/train_with_EASG.py -mode easgcls -model sttran

Applications

Streamlit Web Application

Interactive web interface for video scene graph generation:

# Launch Streamlit app
python scripts/apps/streamlit.py

# Or use CLI
m3sgg app

Features: * Video Upload: Upload and process custom videos * Model Selection: Choose from available trained models * Real-time Processing: Generate scene graphs on-the-fly * Interactive Visualization: Explore results with interactive plots * Export Options: Save results in multiple formats * Chat Interface: Natural language interaction with results

PyQt Desktop Application

Desktop GUI for advanced users:

python scripts/apps/pyqt.py

Features: * Native Performance: Full desktop application experience * Advanced Controls: Fine-grained parameter adjustment * Batch Processing: Process multiple videos efficiently * Custom Visualizations: Advanced plotting and analysis tools * Model Management: Easy model switching and comparison

Jupyter Notebook Examples

Interactive examples in the examples/ directory:

  1. Basic Video Scene Graph Generation (01_basic_video_scene_graph_generation.ipynb) - Complete pipeline from video to scene graph - Error handling and troubleshooting - Configurable parameters and results analysis

  2. Scene Graph to Text Summarization (02_scene_graph_to_text_summarization.ipynb) - Convert scene graphs to natural language - Multiple summarization models (T5, Pegasus) - Advanced prompting strategies

  3. End-to-End Video to Summary Pipeline (03_end_to_end_video_to_summary.ipynb) - Integrated VideoToSummaryPipeline class - Modular design with error handling - Combined visualization and export

  4. Advanced VLM Scene Graph Generation (04_advanced_vlm_scene_graph_generation.ipynb) - Vision-Language Model integration - Few-shot learning and reasoning - Chain-of-thought prompting

  5. Model Comparison and Evaluation (05_model_comparison_and_evaluation.ipynb) - Comprehensive evaluation framework - Model comparison and ranking - Performance analysis and visualization

Running Examples

# Start Jupyter Lab
jupyter lab

# Or Jupyter Notebook
jupyter notebook

# Navigate to examples/ directory and open notebooks

Python API

Programmatic Interface

Use M3SGG as a Python library:

from m3sgg.core.config.unified import UnifiedConfig
from m3sgg.core.training.trainer import Trainer
from m3sgg.datasets.action_genome import ActionGenomeDataset

# Load configuration
config = UnifiedConfig(config_path="configs/sttran.yaml")

# Create dataset
dataset = ActionGenomeDataset(
    data_path=config.data_path,
    split="train",
    mode=config.mode
)

# Initialize trainer
trainer = Trainer(config)

# Train model
trainer.train(dataset)

Model Factory

Create models programmatically:

from m3sgg.core.training.model_factory import create_model

# Create STTran model
model = create_model("sttran", config)

# Create Tempura model
model = create_model("tempura", config)

Dataset Factory

Load datasets dynamically:

from m3sgg.datasets.factory import create_dataset

# Create Action Genome dataset
dataset = create_dataset("action_genome", config)

# Create EASG dataset
dataset = create_dataset("easg", config)

Training Modes

PredCLS (Predicate Classification)

Predict relationships given ground truth objects:

# CLI
m3sgg train --mode predcls --model sttran

# Legacy
python scripts/training/training.py -mode predcls -model sttran

SGCLS (Scene Graph Classification)

Predict both objects and relationships given bounding boxes:

# CLI
m3sgg train --mode sgcls --model sttran

# Legacy
python scripts/training/training.py -mode sgcls -model sttran

SGDET (Scene Graph Detection)

End-to-end object detection and relationship prediction:

# CLI
m3sgg train --mode sgdet --model sttran

# Legacy
python scripts/training/training.py -mode sgdet -model sttran

Model-Specific Usage

STTran

Spatial-Temporal Transformer baseline:

# configs/sttran.yaml
model_type: sttran
enc_layer: 1
dec_layer: 3
lr: 1e-4

Tempura

Uncertainty-aware temporal modeling:

# configs/tempura.yaml
model_type: tempura
obj_head: gmm
rel_head: gmm
K: 3
obj_mem_compute: true
rel_mem_compute: true

SceneLLM

Large language model integration:

# configs/scenellm.yaml
model_type: scenellm
scenellm_training_stage: stage1
llm_model: gemma3-270M
fusion_layers: 3

STKET

Knowledge-enhanced transformer:

# configs/stket.yaml
model_type: stket
N_layer: 1
enc_layer_num: 1
dec_layer_num: 1
use_spatial_prior: true
use_temporal_prior: true

OED

Object-Entity Disentanglement:

# configs/oed.yaml
model_type: oed
oed_variant: multi
num_queries: 100

VLM

Vision-Language Model:

# configs/vlm.yaml
model_type: vlm
vlm_model: blip2
reasoning_type: chain_of_thought

Configuration Presets

Use predefined configuration presets:

# Quick test configuration
m3sgg train --config configs/presets/quick_test.yaml

# Production configuration
m3sgg train --config configs/presets/production.yaml

Available Presets

  • Quick Test: Fast training for testing

  • Production: Optimized for best performance

  • Debug: Verbose logging and error checking

  • Research: Full feature set for experimentation

Advanced Usage

Custom Training Loops

from m3sgg.core.training.trainer import Trainer

class CustomTrainer(Trainer):
    def train_epoch(self, dataloader):
        # Custom training logic
        pass

trainer = CustomTrainer(config)
trainer.train()

Model Evaluation

from m3sgg.core.training.evaluation import Evaluator

evaluator = Evaluator(config)
results = evaluator.evaluate(model, dataloader)
print(f"Recall@20: {results['recall@20']:.2f}")

Batch Processing

from m3sgg.utils.batch_processor import BatchProcessor

processor = BatchProcessor(config)
results = processor.process_videos(video_paths)

Performance Optimization

GPU Memory Management

# configs/optimized.yaml
batch_size: 1
gradient_accumulation_steps: 4
mixed_precision: true
gradient_checkpointing: true

Data Loading Optimization

# configs/optimized.yaml
num_workers: 8
pin_memory: true
persistent_workers: true
prefetch_factor: 2

Troubleshooting

Common Issues

Configuration Errors
  • Verify YAML syntax and indentation

  • Check required parameters are present

  • Validate parameter types and ranges

Import Errors
  • Ensure m3sgg package is installed: pip install -e .

  • Check Python path includes src directory

  • Verify all dependencies are installed

CUDA Issues
  • Check GPU availability: torch.cuda.is_available()

  • Verify CUDA version compatibility

  • Use CPU fallback: –device cpu

Memory Issues
  • Reduce batch size

  • Use gradient accumulation

  • Enable gradient checkpointing

  • Use mixed precision training

Data Loading Issues
  • Verify dataset paths and structure

  • Check file permissions

  • Ensure sufficient disk space

  • Validate data format and annotations

Getting Help

  • Documentation: Check the comprehensive API documentation

  • Examples: Run through the Jupyter notebook examples

  • Issues: Report bugs and ask questions on GitHub

  • Community: Join discussions and get help from the community

Next Steps

  • Training Guide - Detailed training procedures and best practices

  • Evaluation Guide - Comprehensive evaluation metrics and analysis

  • Models - Deep dive into model architectures and implementations

  • API Reference - Complete API reference documentation