Language Module
The language module provides summarization and evaluation tools for video scene graph generation.
Summarization
- m3sgg.language.summarization.summarize.linearize_triples(triples, mode='flat')[source]
Convert scene graph triples into natural language sentences.
Transforms subject-predicate-object triples into human-readable sentences using predefined relationship patterns for visual attention, spatial relationships, and physical interactions.
- m3sgg.language.summarization.summarize.summarize_sentences(sentences, model_name='google-t5/t5-base', model_type='t5')[source]
- m3sgg.language.summarization.summarize.summarize_with_pegasus_separate(sentences, model_name='google/pegasus-xsum')[source]
- m3sgg.language.summarization.summarize.summarize_with_pegasus_custom(sentences, model_name='google/pegasus-xsum', **kwargs)[source]
- class m3sgg.language.summarization.wrappers.BaseSummarizationWrapper(model_name: str, device: str | None = None)[source]
Bases:
ABC
Abstract base class for summarization model wrappers.
Provides a unified interface for different summarization models including T5 and Pegasus variants. Handles model loading, input preparation, and text summarization with configurable parameters.
- Parameters:
ABC (class) – Abstract Base Class
- __init__(model_name: str, device: str | None = None)[source]
Initialize the summarization wrapper.
Sets up the model name and device, then loads the tokenizer and model.
- class m3sgg.language.summarization.wrappers.T5SummarizationWrapper(model_name: str, device: str | None = None)[source]
Bases:
BaseSummarizationWrapper
Wrapper for T5-based summarization models.
- class m3sgg.language.summarization.wrappers.PegasusSummarizationWrapper(model_name: str, device: str | None = None)[source]
Bases:
BaseSummarizationWrapper
Wrapper for Pegasus-based summarization models.
- class m3sgg.language.summarization.wrappers.PegasusSeparateLoader(model_name: str = 'google/pegasus-xsum', device: str | None = None)[source]
Bases:
object
Extension class that loads Pegasus tokenizer and model separately. Useful for custom loading strategies or when you need more control.
- __init__(model_name: str = 'google/pegasus-xsum', device: str | None = None)[source]
Initialize with separate tokenizer and model loading.
- load_tokenizer(**kwargs) Placeholder [source]
Load the Pegasus tokenizer separately.
- Parameters:
**kwargs – Additional arguments for tokenizer loading
- Returns:
Loaded tokenizer
- Return type:
PegasusTokenizer
- class m3sgg.language.summarization.wrappers.PegasusCustomConfig(model_name: str = 'google/pegasus-xsum', device: str | None = None)[source]
Bases:
object
Extension class for Pegasus with custom configuration options. Allows for more granular control over model behavior.
- __init__(model_name: str = 'google/pegasus-xsum', device: str | None = None)[source]
Initialize with custom configuration options.
- load_with_config(config_kwargs: Dict[str, Any] | None = None, model_kwargs: Dict[str, Any] | None = None) None [source]
Load model with custom configuration.
- set_generation_config(**kwargs) Dict[str, Any] [source]
Set custom generation configuration.
- Parameters:
**kwargs – Generation parameters
- Returns:
Updated generation config
- Return type:
Dict[str, Any]
Evaluation
Benchmark execution and result management for language module evaluation.
This module provides the main benchmark class for evaluating summarization models on video caption generation tasks using scene graph data.
- class m3sgg.language.evaluation.benchmark.SummarizationBenchmark(checkpoint_path: str, device: str = 'cuda:0', cache_dir: str = 'data/msr_vtt', video_root: str = 'data/msr_vtt/videos', sg_cache_dir: str = 'data/summarization/cache', frames_per_clip: int = 8, linearizer: str = 'flat', variant: str = 'sg', linearizers: List[str] | None = None, variants: List[str] | None = None)[source]
Bases:
object
Main benchmark class for summarization evaluation.
Provides functionality to run comprehensive benchmarks on summarization models using scene graph generation and text summarization pipelines.
- Parameters:
- __init__(checkpoint_path: str, device: str = 'cuda:0', cache_dir: str = 'data/msr_vtt', video_root: str = 'data/msr_vtt/videos', sg_cache_dir: str = 'data/summarization/cache', frames_per_clip: int = 8, linearizer: str = 'flat', variant: str = 'sg', linearizers: List[str] | None = None, variants: List[str] | None = None)[source]
Initialize summarization benchmark.
- load_models(config_path: str | None = None)[source]
Load all required models for evaluation.
- Parameters:
config_path (Optional[str]) – Path to config file, if None uses default
- scene_graph_to_text(scene_graph: Dict[str, Any]) str [source]
Convert scene graph to text description.
- generate_summary(text: str, model_name: str = 't5_base') str [source]
Generate summary using specified model.
- run_scenario1_benchmark(subset_size: int = 100, models: List[str] | None = None) Dict[str, Any] [source]
Run Scenario 1: Video Caption Generation benchmark.
Dataset loading and preprocessing utilities for language module evaluation.
This module provides functionality to download, load, and preprocess datasets for summarization evaluation, with a focus on MSR-VTT dataset.
- class m3sgg.language.evaluation.dataset_loader.MSRVTTLoader(cache_dir: str = 'data/msr_vtt', subset_size: int = 500)[source]
Bases:
object
Loader for MSR-VTT dataset with subset creation capabilities.
Provides functionality to download MSR-VTT dataset from Hugging Face and create train/test subsets for evaluation.
- Parameters:
- __init__(cache_dir: str = 'data/msr_vtt', subset_size: int = 500)[source]
Initialize MSR-VTT loader.
- download_dataset() Dict [source]
Download MSR-VTT dataset from Hugging Face.
- Returns:
Dictionary containing train and test splits
- Return type:
Dict
- create_subset(dataset: Dict | None = None, train_size: int = 400, test_size: int = 100, random_seed: int = 42) Dict [source]
Create a subset of MSR-VTT dataset for evaluation.
- m3sgg.language.evaluation.dataset_loader.create_subset(train_size: int = 400, test_size: int = 100, cache_dir: str = 'data/msr_vtt', random_seed: int = 42) Dict [source]
Convenience function to create MSR-VTT subset.
Simple dataset loading utilities for language module evaluation.
This module provides a simplified approach to dataset loading that works around the local datasets directory conflict by using mock data for testing.
- class m3sgg.language.evaluation.dataset_loader_simple.SimpleDatasetLoader(cache_dir: str = 'data/mock_dataset', subset_size: int = 500)[source]
Bases:
object
Simple dataset loader that creates mock data for testing.
This loader creates synthetic video caption data for testing the evaluation framework without requiring external dataset downloads.
- Parameters:
cache_dir (int, optional) – Directory to cache data
subset_size – Size of subset to create (train + test)
- __init__(cache_dir: str = 'data/mock_dataset', subset_size: int = 500)[source]
Initialize simple dataset loader.
- create_mock_dataset(train_size: int = 400, test_size: int = 100, random_seed: int = 42) Dict [source]
Create a mock dataset for testing.
- load_mock_dataset() Dict | None [source]
Load previously saved mock dataset.
- Returns:
Mock dataset if available, None otherwise
- Return type:
Optional[Dict]
- m3sgg.language.evaluation.dataset_loader_simple.create_mock_subset(train_size: int = 400, test_size: int = 100, cache_dir: str = 'data/mock_dataset', random_seed: int = 42) Dict [source]
Convenience function to create mock dataset subset.
- m3sgg.language.evaluation.dataset_loader_simple.main()[source]
Example usage of SimpleDatasetLoader.
Evaluation metrics for summarization quality assessment.
This module provides comprehensive metrics for evaluating summarization models including ROUGE, BLEU, METEOR, and semantic similarity metrics.
- class m3sgg.language.evaluation.metrics.SummarizationMetrics(rouge_types: List[str] | None = None, use_stemmer: bool = True, sentence_model: str = 'all-MiniLM-L6-v2')[source]
Bases:
object
Comprehensive metrics for summarization evaluation.
Provides ROUGE, BLEU, METEOR, and semantic similarity metrics for evaluating summarization quality.
- Parameters:
- __init__(rouge_types: List[str] | None = None, use_stemmer: bool = True, sentence_model: str = 'all-MiniLM-L6-v2')[source]
Initialize summarization metrics.
- compute_rouge(predictions: List[str], references: List[str]) Dict[str, float] [source]
Compute ROUGE scores.
- compute_bleu(predictions: List[str], references: List[str]) Dict[str, float] [source]
Compute BLEU scores.
- compute_semantic_similarity(predictions: List[str], references: List[str]) float [source]
Compute semantic similarity using sentence transformers.