Datasets

This section provides detailed information about the datasets supported by M3SGG.

Supported Datasets

Action Genome

The Action Genome dataset is the primary dataset for video scene graph generation.

Overview

  • Type: Video Scene Graph Dataset

  • Domain: Human activities and object interactions

  • Size: ~10,000 videos with dense annotations

  • Format: MP4 videos with JSON annotations

Dataset Structure

action_genome/
├── annotations/     # Ground truth scene graph annotations
│   ├── train/
│   ├── val/
│   └── test/
├── frames/         # Extracted video frames
│   ├── video_001/
│   ├── video_002/
│   └── ...
└── videos/         # Original video files
    ├── video_001.mp4
    ├── video_002.mp4
    └── ...

Annotation Format

Each annotation file contains:

{
  "video_id": "video_001",
  "frame_annotations": [
    {
      "frame_id": 1,
      "objects": [
        {
          "object_id": 1,
          "bbox": [100, 50, 200, 150],
          "class": "person",
          "attributes": ["adult", "standing"]
        }
      ],
      "relationships": [
        {
          "subject_id": 1,
          "object_id": 2,
          "predicate": "holding"
        }
      ]
    }
  ]
}

Download and Setup

  1. Visit https://www.actiongenome.org/#download

  2. Download the complete dataset

  3. Process using the ActionGenome Toolkit

  4. Place in data/action_genome/ directory

EASG Dataset

The EASG (Enhanced Action Scene Graph) dataset provides additional annotations and features.

Overview

  • Type: Enhanced Video Scene Graph Dataset

  • Domain: Extended human activities with fine-grained annotations

  • Features: Additional semantic features and temporal annotations

Dataset Structure

EASG/
├── EASG/
│   ├── annotations/
│   └── features/
├── frames/
├── features_verb.pt
├── verb_features.pt
└── model_final.pth

Setup Instructions

TODO: Add detailed EASG setup instructions

Visual Genome Dataset

Visual Genome provides static image scene graphs that can be used for pretraining.

Overview

  • Type: Static Image Scene Graph Dataset

  • Domain: General object relationships in images

  • Size: ~100,000 images with scene graph annotations

Setup Instructions

TODO: Add Visual Genome integration details

Dataset Processing

Data Preprocessing

The framework includes several preprocessing utilities:

Frame Extraction

from m3sgg.datasets.action_genome import ActionGenomeDataset

# Initialize dataset
dataset = ActionGenomeDataset(
    data_path="data/action_genome",
    split="train",
    mode="predcls"
)

Annotation Processing

# Load and process annotations
annotations = dataset.load_annotations()
processed_data = dataset.preprocess_annotations(annotations)

Feature Extraction

# Extract visual features
features = dataset.extract_features(video_path)

Data Loading

Basic Usage

from torch.utils.data import DataLoader
from m3sgg.datasets.action_genome import ActionGenomeDataset

# Create dataset
dataset = ActionGenomeDataset(
    data_path="data/action_genome",
    split="train",
    mode="predcls"
)

# Create data loader
dataloader = DataLoader(
    dataset,
    batch_size=1,
    shuffle=True,
    num_workers=4
)

# Iterate through data
for batch in dataloader:
    frames, annotations, metadata = batch
    # Process batch...

Advanced Configuration

# Custom dataset configuration
dataset = ActionGenomeDataset(
    data_path="data/action_genome",
    split="train",
    mode="predcls",
    filter_duplicate_relations=True,
    filter_multiple_preds=False,
    frame_sample_rate=1
)

Dataset Statistics

Action Genome Statistics

Action Genome Dataset Statistics

Split

Videos

Frames

Relationships

Train

7,842

476,583

1,752,524

Validation

1,960

119,145

438,131

Test

1,960

119,170

438,384

Object Classes

The dataset includes 35 object categories:

person, chair, table, cup, plate, food, bag, bed, book, laptop,
phone, tv, remote, mouse, keyboard, bottle, wine_glass, fork,
knife, spoon, bowl, banana, apple, sandwich, orange, broccoli,
carrot, hot_dog, pizza, donut, cake, refrigerator, oven,
microwave, toaster

Relationship Predicates

The dataset includes 25 relationship types:

looking_at, not_looking_at, unsure, above, beneath, in_front_of,
behind, on_the_side_of, in, carrying, covered_by, drinking_from,
eating, have_it_on_the_back, holding, leaning_on, lying_on,
not_contacting, other_relationship, sitting_on, standing_on,
touching, twisting, wearing, wiping

Quality Assurance

Data Validation

The framework includes validation utilities:

from utils.validation import validate_dataset

# Validate dataset integrity
validation_report = validate_dataset("data/action_genome")
print(validation_report)

Common Validation Checks

  • File existence and accessibility

  • Annotation format consistency

  • Bounding box validity

  • Frame-annotation alignment

  • Missing or corrupted files

Performance Considerations

Loading Optimization

  • Caching: Enable feature caching for faster loading

  • Parallel Loading: Use multiple workers for data loading

  • Memory Management: Monitor memory usage with large datasets

# Optimized data loading
dataloader = DataLoader(
    dataset,
    batch_size=4,
    shuffle=True,
    num_workers=8,
    pin_memory=True,
    persistent_workers=True
)

Storage Requirements

Storage Requirements

Dataset

Raw Size

Processed Size

Action Genome

~500GB

~200GB

EASG

~100GB

~50GB

Visual Genome

~15GB

~10GB

Custom Datasets

Adding New Datasets

To add support for a new dataset:

  1. Create a new dataloader class inheriting from base dataset

  2. Implement required methods: __init__, __len__, __getitem__

  3. Add dataset-specific preprocessing functions

  4. Update configuration files

from dataloader.base import BaseDataset

class CustomDataset(BaseDataset):
    def __init__(self, data_path, split, mode):
        super().__init__(data_path, split, mode)
        # Custom initialization

    def __getitem__(self, idx):
        # Load and return data sample
        pass

    def __len__(self):
        # Return dataset size
        pass

Dataset Conversion

Utilities for converting between dataset formats:

# Convert from custom format to Action Genome format
python scripts/datasets/convert_dataset.py --input custom_data --output action_genome_format

Next Steps

  • Training Guide - Learn how to train models on these datasets

  • api/dataloader - Detailed API documentation for data loading

  • Evaluation Guide - Understand evaluation metrics and procedures