Language Module

The language module provides summarization and evaluation tools for video scene graph generation.

Summarization

m3sgg.language.summarization.summarize.linearize_triples(triples, mode='flat')[source]

Convert scene graph triples into natural language sentences.

Transforms subject-predicate-object triples into human-readable sentences using predefined relationship patterns for visual attention, spatial relationships, and physical interactions.

Parameters:
  • triples (list) – List of (subject, predicate, object) tuples

  • mode (str) – Linearization mode (flat, majority, time)

Returns:

List of natural language sentences

Return type:

list

m3sgg.language.summarization.summarize.summarize_sentences(sentences, model_name='google-t5/t5-base', model_type='t5')[source]
m3sgg.language.summarization.summarize.summarize_with_pegasus_separate(sentences, model_name='google/pegasus-xsum')[source]
m3sgg.language.summarization.summarize.summarize_with_pegasus_custom(sentences, model_name='google/pegasus-xsum', **kwargs)[source]
m3sgg.language.summarization.summarize.main()[source]
class m3sgg.language.summarization.wrappers.BaseSummarizationWrapper(model_name: str, device: str | None = None)[source]

Bases: ABC

Abstract base class for summarization model wrappers.

Provides a unified interface for different summarization models including T5 and Pegasus variants. Handles model loading, input preparation, and text summarization with configurable parameters.

Parameters:

ABC (class) – Abstract Base Class

__init__(model_name: str, device: str | None = None)[source]

Initialize the summarization wrapper.

Sets up the model name and device, then loads the tokenizer and model.

Parameters:
  • model_name (str) – Name of the pretrained model

  • device (str, optional) – Device to load model on (‘cpu’, ‘cuda’, etc.), defaults to None

Returns:

None

Return type:

None

summarize(text: str, **kwargs) str[source]

Summarize the given text.

Parameters:
  • text (str) – Text to summarize

  • **kwargs – Additional generation parameters

Returns:

Generated summary

Return type:

str

summarize_batch(texts: List[str], **kwargs) List[str][source]

Summarize a batch of texts.

Parameters:
  • texts (List[str]) – List of texts to summarize

  • **kwargs – Additional generation parameters

Returns:

List of generated summaries

Return type:

List[str]

class m3sgg.language.summarization.wrappers.T5SummarizationWrapper(model_name: str, device: str | None = None)[source]

Bases: BaseSummarizationWrapper

Wrapper for T5-based summarization models.

class m3sgg.language.summarization.wrappers.PegasusSummarizationWrapper(model_name: str, device: str | None = None)[source]

Bases: BaseSummarizationWrapper

Wrapper for Pegasus-based summarization models.

class m3sgg.language.summarization.wrappers.PegasusSeparateLoader(model_name: str = 'google/pegasus-xsum', device: str | None = None)[source]

Bases: object

Extension class that loads Pegasus tokenizer and model separately. Useful for custom loading strategies or when you need more control.

__init__(model_name: str = 'google/pegasus-xsum', device: str | None = None)[source]

Initialize with separate tokenizer and model loading.

Parameters:
  • model_name (str) – Name of the Pegasus model

  • device (Optional[str]) – Device to load model on

load_tokenizer(**kwargs) Placeholder[source]

Load the Pegasus tokenizer separately.

Parameters:

**kwargs – Additional arguments for tokenizer loading

Returns:

Loaded tokenizer

Return type:

PegasusTokenizer

load_model(**kwargs) PegasusForConditionalGeneration[source]

Load the Pegasus model separately.

Parameters:

**kwargs – Additional arguments for model loading

Returns:

Loaded model

Return type:

PegasusForConditionalGeneration

is_loaded() bool[source]

Check if both tokenizer and model are loaded.

summarize(text: str, **kwargs) str[source]

Summarize text using the separately loaded components.

Parameters:
  • text (str) – Text to summarize

  • **kwargs – Generation parameters

Returns:

Generated summary

Return type:

str

class m3sgg.language.summarization.wrappers.PegasusCustomConfig(model_name: str = 'google/pegasus-xsum', device: str | None = None)[source]

Bases: object

Extension class for Pegasus with custom configuration options. Allows for more granular control over model behavior.

__init__(model_name: str = 'google/pegasus-xsum', device: str | None = None)[source]

Initialize with custom configuration options.

Parameters:
  • model_name (str) – Name of the Pegasus model

  • device (Optional[str]) – Device to load model on

load_with_config(config_kwargs: Dict[str, Any] | None = None, model_kwargs: Dict[str, Any] | None = None) None[source]

Load model with custom configuration.

Parameters:
  • config_kwargs (Dict[str, Any]) – Configuration parameters

  • model_kwargs (Dict[str, Any]) – Model loading parameters

set_generation_config(**kwargs) Dict[str, Any][source]

Set custom generation configuration.

Parameters:

**kwargs – Generation parameters

Returns:

Updated generation config

Return type:

Dict[str, Any]

summarize(text: str, **kwargs) str[source]

Summarize text with custom configuration.

Parameters:
  • text (str) – Text to summarize

  • **kwargs – Generation parameters

Returns:

Generated summary

Return type:

str

is_loaded() bool[source]

Check if model is loaded.

Evaluation

Benchmark execution and result management for language module evaluation.

This module provides the main benchmark class for evaluating summarization models on video caption generation tasks using scene graph data.

class m3sgg.language.evaluation.benchmark.SummarizationBenchmark(checkpoint_path: str, device: str = 'cuda:0', cache_dir: str = 'data/msr_vtt', video_root: str = 'data/msr_vtt/videos', sg_cache_dir: str = 'data/summarization/cache', frames_per_clip: int = 8, linearizer: str = 'flat', variant: str = 'sg', linearizers: List[str] | None = None, variants: List[str] | None = None)[source]

Bases: object

Main benchmark class for summarization evaluation.

Provides functionality to run comprehensive benchmarks on summarization models using scene graph generation and text summarization pipelines.

Parameters:
  • checkpoint_path (str) – Path to STTran checkpoint

  • device (str, optional) – Device to run inference on

  • cache_dir (str, optional) – Directory to cache datasets

__init__(checkpoint_path: str, device: str = 'cuda:0', cache_dir: str = 'data/msr_vtt', video_root: str = 'data/msr_vtt/videos', sg_cache_dir: str = 'data/summarization/cache', frames_per_clip: int = 8, linearizer: str = 'flat', variant: str = 'sg', linearizers: List[str] | None = None, variants: List[str] | None = None)[source]

Initialize summarization benchmark.

Parameters:
  • checkpoint_path (str) – Path to STTran checkpoint

  • device (str) – Device to run inference on

  • cache_dir (str) – Directory to cache datasets

load_models(config_path: str | None = None)[source]

Load all required models for evaluation.

Parameters:

config_path (Optional[str]) – Path to config file, if None uses default

generate_scene_graph(video_path: str) Dict[str, Any][source]

Generate scene graph for a video.

Parameters:

video_path (str) – Path to video file

Returns:

Scene graph data

Return type:

Dict[str, Any]

scene_graph_to_text(scene_graph: Dict[str, Any]) str[source]

Convert scene graph to text description.

Parameters:

scene_graph (Dict[str, Any]) – Scene graph data

Returns:

Text description

Return type:

str

generate_summary(text: str, model_name: str = 't5_base') str[source]

Generate summary using specified model.

Parameters:
  • text (str) – Input text to summarize

  • model_name (str) – Name of summarization model

Returns:

Generated summary

Return type:

str

run_scenario1_benchmark(subset_size: int = 100, models: List[str] | None = None) Dict[str, Any][source]

Run Scenario 1: Video Caption Generation benchmark.

Parameters:
  • subset_size (int) – Number of test samples to use

  • models (List[str], optional) – List of model names to evaluate

Returns:

Benchmark results

Return type:

Dict[str, Any]

save_results(results: Dict[str, Any], output_path: str)[source]

Save benchmark results to file.

Parameters:
  • results (Dict[str, Any]) – Benchmark results

  • output_path (str) – Path to save results

print_results(results: Dict[str, Any])[source]

Print formatted benchmark results.

Parameters:

results (Dict[str, Any]) – Benchmark results

m3sgg.language.evaluation.benchmark.main()[source]

Example usage of SummarizationBenchmark.

Dataset loading and preprocessing utilities for language module evaluation.

This module provides functionality to download, load, and preprocess datasets for summarization evaluation, with a focus on MSR-VTT dataset.

class m3sgg.language.evaluation.dataset_loader.MSRVTTLoader(cache_dir: str = 'data/msr_vtt', subset_size: int = 500)[source]

Bases: object

Loader for MSR-VTT dataset with subset creation capabilities.

Provides functionality to download MSR-VTT dataset from Hugging Face and create train/test subsets for evaluation.

Parameters:
  • cache_dir (str, optional) – Directory to cache downloaded datasets

  • subset_size (int, optional) – Size of subset to create (train + test)

__init__(cache_dir: str = 'data/msr_vtt', subset_size: int = 500)[source]

Initialize MSR-VTT loader.

Parameters:
  • cache_dir (str) – Directory to cache downloaded datasets

  • subset_size (int) – Size of subset to create (train + test)

download_dataset() Dict[source]

Download MSR-VTT dataset from Hugging Face.

Returns:

Dictionary containing train and test splits

Return type:

Dict

create_subset(dataset: Dict | None = None, train_size: int = 400, test_size: int = 100, random_seed: int = 42) Dict[source]

Create a subset of MSR-VTT dataset for evaluation.

Parameters:
  • dataset (Dict, optional) – Pre-loaded dataset, if None will download

  • train_size (int) – Number of training samples

  • test_size (int) – Number of test samples

  • random_seed (int) – Random seed for reproducibility

Returns:

Dictionary containing train and test subsets

Return type:

Dict

load_subset_metadata() Dict | None[source]

Load previously saved subset metadata.

Returns:

Subset metadata if available, None otherwise

Return type:

Optional[Dict]

get_sample_info(subset: Dict, split: str = 'test', sample_idx: int = 0) Dict[source]

Get information about a specific sample.

Parameters:
  • subset (Dict) – Dataset subset

  • split (str) – Split name (‘train’ or ‘test’)

  • sample_idx (int) – Sample index

Returns:

Sample information

Return type:

Dict

m3sgg.language.evaluation.dataset_loader.create_subset(train_size: int = 400, test_size: int = 100, cache_dir: str = 'data/msr_vtt', random_seed: int = 42) Dict[source]

Convenience function to create MSR-VTT subset.

Parameters:
  • train_size (int) – Number of training samples

  • test_size (int) – Number of test samples

  • cache_dir (str) – Directory to cache dataset

  • random_seed (int) – Random seed for reproducibility

Returns:

Dataset subset

Return type:

Dict

m3sgg.language.evaluation.dataset_loader.main()[source]

Example usage of MSRVTTLoader.

Simple dataset loading utilities for language module evaluation.

This module provides a simplified approach to dataset loading that works around the local datasets directory conflict by using mock data for testing.

class m3sgg.language.evaluation.dataset_loader_simple.SimpleDatasetLoader(cache_dir: str = 'data/mock_dataset', subset_size: int = 500)[source]

Bases: object

Simple dataset loader that creates mock data for testing.

This loader creates synthetic video caption data for testing the evaluation framework without requiring external dataset downloads.

Parameters:
  • cache_dir (int, optional) – Directory to cache data

  • subset_size – Size of subset to create (train + test)

__init__(cache_dir: str = 'data/mock_dataset', subset_size: int = 500)[source]

Initialize simple dataset loader.

Parameters:
  • cache_dir (str) – Directory to cache data

  • subset_size (int) – Size of subset to create (train + test)

create_mock_dataset(train_size: int = 400, test_size: int = 100, random_seed: int = 42) Dict[source]

Create a mock dataset for testing.

Parameters:
  • train_size (int) – Number of training samples

  • test_size (int) – Number of test samples

  • random_seed (int) – Random seed for reproducibility

Returns:

Dictionary containing train and test subsets

Return type:

Dict

load_mock_dataset() Dict | None[source]

Load previously saved mock dataset.

Returns:

Mock dataset if available, None otherwise

Return type:

Optional[Dict]

get_sample_info(dataset: Dict, split: str = 'test', sample_idx: int = 0) Dict[source]

Get information about a specific sample.

Parameters:
  • dataset (Dict) – Dataset

  • split (str) – Split name (‘train’ or ‘test’)

  • sample_idx (int) – Sample index

Returns:

Sample information

Return type:

Dict

get_captions(dataset: Dict, split: str = 'test') List[str][source]

Get all captions from a dataset split.

Parameters:
  • dataset (Dict) – Dataset

  • split (str) – Split name (‘train’ or ‘test’)

Returns:

List of captions

Return type:

List[str]

m3sgg.language.evaluation.dataset_loader_simple.create_mock_subset(train_size: int = 400, test_size: int = 100, cache_dir: str = 'data/mock_dataset', random_seed: int = 42) Dict[source]

Convenience function to create mock dataset subset.

Parameters:
  • train_size (int) – Number of training samples

  • test_size (int) – Number of test samples

  • cache_dir (str) – Directory to cache dataset

  • random_seed (int) – Random seed for reproducibility

Returns:

Mock dataset

Return type:

Dict

m3sgg.language.evaluation.dataset_loader_simple.main()[source]

Example usage of SimpleDatasetLoader.

Evaluation metrics for summarization quality assessment.

This module provides comprehensive metrics for evaluating summarization models including ROUGE, BLEU, METEOR, and semantic similarity metrics.

class m3sgg.language.evaluation.metrics.SummarizationMetrics(rouge_types: List[str] | None = None, use_stemmer: bool = True, sentence_model: str = 'all-MiniLM-L6-v2')[source]

Bases: object

Comprehensive metrics for summarization evaluation.

Provides ROUGE, BLEU, METEOR, and semantic similarity metrics for evaluating summarization quality.

Parameters:
  • rouge_types (List[str], optional) – List of ROUGE types to compute

  • use_stemmer (bool, optional) – Whether to use stemming for ROUGE

  • sentence_model (str, optional) – Sentence transformer model for semantic similarity

__init__(rouge_types: List[str] | None = None, use_stemmer: bool = True, sentence_model: str = 'all-MiniLM-L6-v2')[source]

Initialize summarization metrics.

Parameters:
  • rouge_types (List[str], optional) – List of ROUGE types to compute

  • use_stemmer (bool, optional) – Whether to use stemming for ROUGE

  • sentence_model (str, optional) – Sentence transformer model for semantic similarity

compute_rouge(predictions: List[str], references: List[str]) Dict[str, float][source]

Compute ROUGE scores.

Parameters:
  • predictions (List[str]) – List of predicted summaries

  • references (List[str]) – List of reference summaries

Returns:

Dictionary of ROUGE scores

Return type:

Dict[str, float]

compute_bleu(predictions: List[str], references: List[str]) Dict[str, float][source]

Compute BLEU scores.

Parameters:
  • predictions (List[str]) – List of predicted summaries

  • references (List[str]) – List of reference summaries

Returns:

Dictionary of BLEU scores

Return type:

Dict[str, float]

compute_meteor(predictions: List[str], references: List[str]) float[source]

Compute METEOR score.

Parameters:
  • predictions (List[str]) – List of predicted summaries

  • references (List[str]) – List of reference summaries

Returns:

METEOR score

Return type:

float

compute_semantic_similarity(predictions: List[str], references: List[str]) float[source]

Compute semantic similarity using sentence transformers.

Parameters:
  • predictions (List[str]) – List of predicted summaries

  • references (List[str]) – List of reference summaries

Returns:

Average semantic similarity score

Return type:

float

compute_all_metrics(predictions: List[str], references: List[str]) Dict[str, float][source]

Compute all available metrics.

Parameters:
  • predictions (List[str]) – List of predicted summaries

  • references (List[str]) – List of reference summaries

Returns:

Dictionary of all computed metrics

Return type:

Dict[str, float]

format_results(metrics: Dict[str, float], precision: int = 4) str[source]

Format metrics results for display.

Parameters:
  • metrics (Dict[str, float]) – Dictionary of metrics

  • precision (int) – Number of decimal places

Returns:

Formatted results string

Return type:

str

m3sgg.language.evaluation.metrics.main()[source]

Example usage of SummarizationMetrics.

Language Modeling