Installation
This guide explains how to install and set up M3SGG on your system.
Requirements
System Requirements
Operating System: Linux, macOS, or Windows
Python: 3.10.0 or higher
CUDA: Compatible GPU with CUDA support (recommended for training)
Memory: At least 16GB RAM (32GB recommended for large datasets)
Storage: At least 50GB free space for datasets and models
Installation Methods
Using pip (editable):
pip install -e .
Optional extras:
# summarization extras
pip install -e .[summarization]
PyTorch is not pinned in dependencies. Install the variant that fits your system:
# CPU
pip install torch torchvision
# CUDA (example for cu121)
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu121
GUI Dependencies (Optional)
For the demo application:
PyQt5>=5.15.0
opencv-python>=4.5.0
matplotlib>=3.5.0
Installation Steps
1. Clone the repository
git clone <repository-url>
cd m3sgg
2. Create virtual environment (recommended)
python -m venv .venv
# Linux/macOS
source .venv/bin/activate
# Windows PowerShell
.\.venv\Scripts\Activate.ps1
3. Install dependencies
pip install -e .
4. Install DGL (if using SceneLLM)
Visit https://www.dgl.ai/pages/start.html and choose the appropriate DGL version with CUDA support for your system.
pip install dgl-cu117 # Example for CUDA 11.7
5. Setup FasterRCNN backbone
Follow compilation instructions from https://github.com/jwyang/faster-rcnn.pytorch.
Download the pretrained FasterRCNN model:
# Download from: https://drive.google.com/file/d/1-u930Pk0JYz3ivS6V_HNTM1D5AxmN5Bs/view?usp=sharing
# Place at: fasterRCNN/models/faster_rcnn_ag.pth
Dataset Setup
Action Genome dataset
Download the Action Genome dataset from https://www.actiongenome.org/#download
Process using the ActionGenome Toolkit: https://github.com/JingweiJ/ActionGenome
Organize the dataset structure:
data/
└── action_genome/
├── annotations/ # GT annotations
├── frames/ # Sampled frames
└── videos/ # Original videos
Download the additional filter file:
# Download object_bbox_and_relationship_filtersmall.pkl
# From: https://drive.google.com/file/d/19BkAwjCw5ByyGyZjFo174Oc3Ud56fkaT/view?usp=sharing
# Place in: src/m3sgg/utils/
EASG dataset
TODO: Add EASG dataset setup instructions
Visual Genome dataset
TODO: Add Visual Genome dataset setup instructions
Verification
Test Your Installation
Run a simple test to verify the installation:
python -c "import torch; print(\"PyTorch version:\", torch.__version__)"
python -c "import m3sgg; print(\"M3SGG imported successfully\")"
Test GPU Support (if available):
python -c "import torch; print('CUDA available:', torch.cuda.is_available())"
Run Tests
Execute the test suite:
python -m pytest tests/
Troubleshooting
Common Issues
- ImportError: No module named ‘dgl’
Install DGL with appropriate CUDA version from https://www.dgl.ai/
- CUDA out of memory
Reduce batch size in configuration or use CPU-only mode
- FasterRCNN compilation errors
Ensure proper C++ compiler is installed and CUDA paths are set correctly
- Missing dataset files
Verify dataset download and placement according to the directory structure
Hardware Requirements
For optimal performance:
Training: NVIDIA RTX 3090 or equivalent (24GB VRAM recommended)
Inference: NVIDIA GTX 1080 or equivalent (8GB VRAM minimum)
CPU-only: Possible but significantly slower
Next Steps
After successful installation, proceed to:
Usage Guide - Learn basic usage patterns
Datasets - Understand dataset formats and preparation
Training Guide - Start training your first model