Language Module

The language module provides summarization and evaluation tools for video scene graph generation.

Summarization

m3sgg.language.summarization.summarize.linearize_triples(triples, mode='flat')[source]

Convert scene graph triples into natural language sentences.

Transforms subject-predicate-object triples into human-readable sentences using predefined relationship patterns for visual attention, spatial relationships, and physical interactions.

Parameters:

triples (list) – List of (subject, predicate, object) tuples
mode (str) – Linearization mode (flat, majority, time)

Returns:

List of natural language sentences

Return type:

list

m3sgg.language.summarization.summarize.summarize_sentences(sentences, model_name='google-t5/t5-base', model_type='t5')[source]

m3sgg.language.summarization.summarize.summarize_with_pegasus_separate(sentences, model_name='google/pegasus-xsum')[source]

m3sgg.language.summarization.summarize.summarize_with_pegasus_custom(sentences, model_name='google/pegasus-xsum', **kwargs)[source]

m3sgg.language.summarization.summarize.main()[source]

class m3sgg.language.summarization.wrappers.BaseSummarizationWrapper(model_name: str, device: str | None = None)[source]

Bases: ABC

Abstract base class for summarization model wrappers.

Provides a unified interface for different summarization models including T5 and Pegasus variants. Handles model loading, input preparation, and text summarization with configurable parameters.

Parameters:: ABC (class) – Abstract Base Class

__init__(model_name: str, device: str | None = None)[source]

Initialize the summarization wrapper.

Sets up the model name and device, then loads the tokenizer and model.

Parameters:

model_name (str) – Name of the pretrained model
device (str, optional) – Device to load model on (‘cpu’, ‘cuda’, etc.), defaults to None

Returns:

None

Return type:

None

summarize(text: str, **kwargs) → str[source]

Summarize the given text.

Parameters:

text (str) – Text to summarize
**kwargs – Additional generation parameters

Returns:

Generated summary

Return type:

str

summarize_batch(texts: List[str], **kwargs) → List[str][source]

Summarize a batch of texts.

Parameters:

texts (List[str]) – List of texts to summarize
**kwargs – Additional generation parameters

Returns:

List of generated summaries

Return type:

List[str]

class m3sgg.language.summarization.wrappers.T5SummarizationWrapper(model_name: str, device: str | None = None)[source]

Bases: BaseSummarizationWrapper

Wrapper for T5-based summarization models.

class m3sgg.language.summarization.wrappers.PegasusSummarizationWrapper(model_name: str, device: str | None = None)[source]

Bases: BaseSummarizationWrapper

Wrapper for Pegasus-based summarization models.

class m3sgg.language.summarization.wrappers.PegasusSeparateLoader(model_name: str = 'google/pegasus-xsum', device: str | None = None)[source]

Bases: object

Extension class that loads Pegasus tokenizer and model separately. Useful for custom loading strategies or when you need more control.

__init__(model_name: str = 'google/pegasus-xsum', device: str | None = None)[source]

Initialize with separate tokenizer and model loading.

Parameters:

model_name (str) – Name of the Pegasus model
device (Optional[str]) – Device to load model on

load_tokenizer(**kwargs) → Placeholder[source]

Load the Pegasus tokenizer separately.

Parameters:: **kwargs – Additional arguments for tokenizer loading
Returns:: Loaded tokenizer
Return type:: PegasusTokenizer

load_model(**kwargs) → PegasusForConditionalGeneration[source]

Load the Pegasus model separately.

Parameters:: **kwargs – Additional arguments for model loading
Returns:: Loaded model
Return type:: PegasusForConditionalGeneration

is_loaded() → bool[source]: Check if both tokenizer and model are loaded.

summarize(text: str, **kwargs) → str[source]

Summarize text using the separately loaded components.

Parameters:

text (str) – Text to summarize
**kwargs – Generation parameters

Returns:

Generated summary

Return type:

str

class m3sgg.language.summarization.wrappers.PegasusCustomConfig(model_name: str = 'google/pegasus-xsum', device: str | None = None)[source]

Bases: object

Extension class for Pegasus with custom configuration options. Allows for more granular control over model behavior.

__init__(model_name: str = 'google/pegasus-xsum', device: str | None = None)[source]

Initialize with custom configuration options.

Parameters:

model_name (str) – Name of the Pegasus model
device (Optional[str]) – Device to load model on

load_with_config(config_kwargs: Dict[str, Any] | None = None, model_kwargs: Dict[str, Any] | None = None) → None[source]

Load model with custom configuration.

Parameters:

config_kwargs (Dict[str, Any]) – Configuration parameters
model_kwargs (Dict[str, Any]) – Model loading parameters

set_generation_config(**kwargs) → Dict[str, Any][source]

Set custom generation configuration.

Parameters:: **kwargs – Generation parameters
Returns:: Updated generation config
Return type:: Dict[str, Any]

summarize(text: str, **kwargs) → str[source]

Summarize text with custom configuration.

Parameters:

text (str) – Text to summarize
**kwargs – Generation parameters

Returns:

Generated summary

Return type:

str

is_loaded() → bool[source]: Check if model is loaded.

Evaluation

Benchmark execution and result management for language module evaluation.

This module provides the main benchmark class for evaluating summarization models on video caption generation tasks using scene graph data.

class m3sgg.language.evaluation.benchmark.SummarizationBenchmark(checkpoint_path: str, device: str = 'cuda:0', cache_dir: str = 'data/msr_vtt', video_root: str = 'data/msr_vtt/videos', sg_cache_dir: str = 'data/summarization/cache', frames_per_clip: int = 8, linearizer: str = 'flat', variant: str = 'sg', linearizers: List[str] | None = None, variants: List[str] | None = None)[source]

Bases: object

Main benchmark class for summarization evaluation.

Provides functionality to run comprehensive benchmarks on summarization models using scene graph generation and text summarization pipelines.

Parameters:

checkpoint_path (str) – Path to STTran checkpoint
device (str, optional) – Device to run inference on
cache_dir (str, optional) – Directory to cache datasets

__init__(checkpoint_path: str, device: str = 'cuda:0', cache_dir: str = 'data/msr_vtt', video_root: str = 'data/msr_vtt/videos', sg_cache_dir: str = 'data/summarization/cache', frames_per_clip: int = 8, linearizer: str = 'flat', variant: str = 'sg', linearizers: List[str] | None = None, variants: List[str] | None = None)[source]

Initialize summarization benchmark.

Parameters:

checkpoint_path (str) – Path to STTran checkpoint
device (str) – Device to run inference on
cache_dir (str) – Directory to cache datasets

load_models(config_path: str | None = None)[source]

Load all required models for evaluation.

Parameters:: config_path (Optional[str]) – Path to config file, if None uses default

generate_scene_graph(video_path: str) → Dict[str, Any][source]

Generate scene graph for a video.

Parameters:: video_path (str) – Path to video file
Returns:: Scene graph data
Return type:: Dict[str, Any]

scene_graph_to_text(scene_graph: Dict[str, Any]) → str[source]

Convert scene graph to text description.

Parameters:: scene_graph (Dict[str, Any]) – Scene graph data
Returns:: Text description
Return type:: str

generate_summary(text: str, model_name: str = 't5_base') → str[source]

Generate summary using specified model.

Parameters:

text (str) – Input text to summarize
model_name (str) – Name of summarization model

Returns:

Generated summary

Return type:

str

run_scenario1_benchmark(subset_size: int = 100, models: List[str] | None = None) → Dict[str, Any][source]

Run Scenario 1: Video Caption Generation benchmark.

Parameters:

subset_size (int) – Number of test samples to use
models (List[str], optional) – List of model names to evaluate

Returns:

Benchmark results

Return type:

Dict[str, Any]

save_results(results: Dict[str, Any], output_path: str)[source]

Save benchmark results to file.

Parameters:

results (Dict[str, Any]) – Benchmark results
output_path (str) – Path to save results

print_results(results: Dict[str, Any])[source]

Print formatted benchmark results.

Parameters:: results (Dict[str, Any]) – Benchmark results

m3sgg.language.evaluation.benchmark.main()[source]: Example usage of SummarizationBenchmark.

Dataset loading and preprocessing utilities for language module evaluation.

This module provides functionality to download, load, and preprocess datasets for summarization evaluation, with a focus on MSR-VTT dataset.

class m3sgg.language.evaluation.dataset_loader.MSRVTTLoader(cache_dir: str = 'data/msr_vtt', subset_size: int = 500)[source]

Bases: object

Loader for MSR-VTT dataset with subset creation capabilities.

Provides functionality to download MSR-VTT dataset from Hugging Face and create train/test subsets for evaluation.

Parameters:

cache_dir (str, optional) – Directory to cache downloaded datasets
subset_size (int, optional) – Size of subset to create (train + test)

__init__(cache_dir: str = 'data/msr_vtt', subset_size: int = 500)[source]

Initialize MSR-VTT loader.

Parameters:

cache_dir (str) – Directory to cache downloaded datasets
subset_size (int) – Size of subset to create (train + test)

download_dataset() → Dict[source]

Download MSR-VTT dataset from Hugging Face.

Returns:: Dictionary containing train and test splits
Return type:: Dict

create_subset(dataset: Dict | None = None, train_size: int = 400, test_size: int = 100, random_seed: int = 42) → Dict[source]

Create a subset of MSR-VTT dataset for evaluation.

Parameters:

dataset (Dict, optional) – Pre-loaded dataset, if None will download
train_size (int) – Number of training samples
test_size (int) – Number of test samples
random_seed (int) – Random seed for reproducibility

Returns:

Dictionary containing train and test subsets

Return type:

Dict

load_subset_metadata() → Dict | None[source]

Load previously saved subset metadata.

Returns:: Subset metadata if available, None otherwise
Return type:: Optional[Dict]

get_sample_info(subset: Dict, split: str = 'test', sample_idx: int = 0) → Dict[source]

Get information about a specific sample.

Parameters:

subset (Dict) – Dataset subset
split (str) – Split name (‘train’ or ‘test’)
sample_idx (int) – Sample index

Returns:

Sample information

Return type:

Dict

m3sgg.language.evaluation.dataset_loader.create_subset(train_size: int = 400, test_size: int = 100, cache_dir: str = 'data/msr_vtt', random_seed: int = 42) → Dict[source]

Convenience function to create MSR-VTT subset.

Parameters:

train_size (int) – Number of training samples
test_size (int) – Number of test samples
cache_dir (str) – Directory to cache dataset
random_seed (int) – Random seed for reproducibility

Returns:

Dataset subset

Return type:

Dict

m3sgg.language.evaluation.dataset_loader.main()[source]: Example usage of MSRVTTLoader.

Simple dataset loading utilities for language module evaluation.

This module provides a simplified approach to dataset loading that works around the local datasets directory conflict by using mock data for testing.

class m3sgg.language.evaluation.dataset_loader_simple.SimpleDatasetLoader(cache_dir: str = 'data/mock_dataset', subset_size: int = 500)[source]

Bases: object

Simple dataset loader that creates mock data for testing.

This loader creates synthetic video caption data for testing the evaluation framework without requiring external dataset downloads.

Parameters:

cache_dir (int, optional) – Directory to cache data
subset_size – Size of subset to create (train + test)

__init__(cache_dir: str = 'data/mock_dataset', subset_size: int = 500)[source]

Initialize simple dataset loader.

Parameters:

cache_dir (str) – Directory to cache data
subset_size (int) – Size of subset to create (train + test)

create_mock_dataset(train_size: int = 400, test_size: int = 100, random_seed: int = 42) → Dict[source]

Create a mock dataset for testing.

Parameters:

train_size (int) – Number of training samples
test_size (int) – Number of test samples
random_seed (int) – Random seed for reproducibility

Returns:

Dictionary containing train and test subsets

Return type:

Dict

load_mock_dataset() → Dict | None[source]

Load previously saved mock dataset.

Returns:: Mock dataset if available, None otherwise
Return type:: Optional[Dict]

get_sample_info(dataset: Dict, split: str = 'test', sample_idx: int = 0) → Dict[source]

Get information about a specific sample.

Parameters:

dataset (Dict) – Dataset
split (str) – Split name (‘train’ or ‘test’)
sample_idx (int) – Sample index

Returns:

Sample information

Return type:

Dict

get_captions(dataset: Dict, split: str = 'test') → List[str][source]

Get all captions from a dataset split.

Parameters:

dataset (Dict) – Dataset
split (str) – Split name (‘train’ or ‘test’)

Returns:

List of captions

Return type:

List[str]

m3sgg.language.evaluation.dataset_loader_simple.create_mock_subset(train_size: int = 400, test_size: int = 100, cache_dir: str = 'data/mock_dataset', random_seed: int = 42) → Dict[source]

Convenience function to create mock dataset subset.

Parameters:

train_size (int) – Number of training samples
test_size (int) – Number of test samples
cache_dir (str) – Directory to cache dataset
random_seed (int) – Random seed for reproducibility

Returns:

Mock dataset

Return type:

Dict

m3sgg.language.evaluation.dataset_loader_simple.main()[source]: Example usage of SimpleDatasetLoader.

Evaluation metrics for summarization quality assessment.

This module provides comprehensive metrics for evaluating summarization models including ROUGE, BLEU, METEOR, and semantic similarity metrics.

class m3sgg.language.evaluation.metrics.SummarizationMetrics(rouge_types: List[str] | None = None, use_stemmer: bool = True, sentence_model: str = 'all-MiniLM-L6-v2')[source]

Bases: object

Comprehensive metrics for summarization evaluation.

Provides ROUGE, BLEU, METEOR, and semantic similarity metrics for evaluating summarization quality.

Parameters:

rouge_types (List[str], optional) – List of ROUGE types to compute
use_stemmer (bool, optional) – Whether to use stemming for ROUGE
sentence_model (str, optional) – Sentence transformer model for semantic similarity

__init__(rouge_types: List[str] | None = None, use_stemmer: bool = True, sentence_model: str = 'all-MiniLM-L6-v2')[source]

Initialize summarization metrics.

Parameters:

rouge_types (List[str], optional) – List of ROUGE types to compute
use_stemmer (bool, optional) – Whether to use stemming for ROUGE
sentence_model (str, optional) – Sentence transformer model for semantic similarity

compute_rouge(predictions: List[str], references: List[str]) → Dict[str, float][source]

Compute ROUGE scores.

Parameters:

predictions (List[str]) – List of predicted summaries
references (List[str]) – List of reference summaries

Returns:

Dictionary of ROUGE scores

Return type:

Dict[str, float]

compute_bleu(predictions: List[str], references: List[str]) → Dict[str, float][source]

Compute BLEU scores.

Parameters:

predictions (List[str]) – List of predicted summaries
references (List[str]) – List of reference summaries

Returns:

Dictionary of BLEU scores

Return type:

Dict[str, float]

compute_meteor(predictions: List[str], references: List[str]) → float[source]

Compute METEOR score.

Parameters:

predictions (List[str]) – List of predicted summaries
references (List[str]) – List of reference summaries

Returns:

METEOR score

Return type:

float

compute_semantic_similarity(predictions: List[str], references: List[str]) → float[source]

Compute semantic similarity using sentence transformers.

Parameters:

predictions (List[str]) – List of predicted summaries
references (List[str]) – List of reference summaries

Returns:

Average semantic similarity score

Return type:

float

compute_all_metrics(predictions: List[str], references: List[str]) → Dict[str, float][source]

Compute all available metrics.

Parameters:

predictions (List[str]) – List of predicted summaries
references (List[str]) – List of reference summaries

Returns:

Dictionary of all computed metrics

Return type:

Dict[str, float]

format_results(metrics: Dict[str, float], precision: int = 4) → str[source]

Format metrics results for display.

Parameters:

metrics (Dict[str, float]) – Dictionary of metrics
precision (int) – Number of decimal places

Returns:

Formatted results string

Return type:

str

m3sgg.language.evaluation.metrics.main()[source]: Example usage of SummarizationMetrics.

Language Module

Summarization

Evaluation

Language Modeling