Utils Module

The utils module provides helper functions and utilities for the M3SGG framework.

Core Utilities

m3sgg.utils.funcs.assign_relations(prediction, gt_annotations, assign_IOU_threshold)[source]

Assign relations between predicted detections and ground truth annotations.

Matches predicted bounding boxes with ground truth annotations based on IoU threshold and prepares relation data for scene graph generation training.

Parameters:

prediction (dict) – Results from FasterRCNN containing predicted boxes, labels, scores, and features
gt_annotations (list) – Ground-truth annotations with person info and objects
assign_IOU_threshold (float) – IoU threshold for assignment (typically 0.5 for SGDET)

Returns:

Tuple containing detector found indices, ground truth relations, and supply relations

Return type:

tuple

m3sgg.utils.funcs.im_list_to_blob(ims)[source]

Convert a list of images into a network input.

Assumes images are already prepared (means subtracted, BGR order, …).

m3sgg.utils.funcs.enumerate_by_image(im_inds)[source]

m3sgg.utils.funcs.transpose_packed_sequence_inds(lengths)[source]: Goes from a TxB packed sequence to a BxT or vice versa. Assumes that nothing is a variable :param ps: PackedSequence :return:

m3sgg.utils.funcs.pad_sequence(frame_idx)[source]

Miscellaneous functions that might be useful for pytorch

m3sgg.utils.pytorch_misc.optimistic_restore(network, state_dict)[source]

Optimistically restore network weights from state dictionary.

Attempts to load weights from state_dict into network, handling size mismatches gracefully by skipping incompatible parameters.

Parameters:

network (torch.nn.Module) – Neural network to restore weights to
state_dict (dict) – State dictionary containing weights

Returns:

Whether any mismatches were found

Return type:

bool

m3sgg.utils.pytorch_misc.pairwise(iterable)[source]: s -> (s0,s1), (s1,s2), (s2, s3), …

m3sgg.utils.pytorch_misc.get_ranking(predictions, labels, num_guesses=5)[source]: Given a matrix of predictions and labels for the correct ones, get the number of guesses required to get the prediction right per example. :param predictions: [batch_size, range_size] predictions :param labels: [batch_size] array of labels :param num_guesses: Number of guesses to return :return:

m3sgg.utils.pytorch_misc.cache(f)[source]: Caches a computation

class m3sgg.utils.pytorch_misc.Flattener[source]

Bases: Module

__init__()[source]: Flattens last 3 dimensions to make it only batch size, -ĺeftright

forward(x)[source]

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

m3sgg.utils.pytorch_misc.to_variable(f)[source]: Decorator that pushes all the outputs to a variable :param f: :return:

m3sgg.utils.pytorch_misc.arange(base_tensor, n=None)[source]

m3sgg.utils.pytorch_misc.to_onehot(vec, num_classes, fill=1000)[source]

Creates a [size, num_classes] torch FloatTensor where one_hot[i, vec[i]] = fill

Parameters:

vec – 1d torch tensor
num_classes – int
fill – value that we want + and - things to be.

Returns:

m3sgg.utils.pytorch_misc.save_net(fname, net)[source]

m3sgg.utils.pytorch_misc.load_net(fname, net)[source]

m3sgg.utils.pytorch_misc.batch_index_iterator(len_l, batch_size, skip_end=True)[source]

Provides indices that iterate over a list in batches.

Creates a generator that yields (start, end) tuples for batch processing.

Parameters:

len_l (int) – Size of the list to iterate over
batch_size (int) – Size of each batch
skip_end (bool, optional) – Whether to skip the last incomplete batch, defaults to True

Returns:

Generator yielding (start, end) tuples for each batch

Return type:

Generator[tuple, None, None]

m3sgg.utils.pytorch_misc.batch_map(f, a, batch_size)[source]

Maps a function over an array in chunks of specified batch size.

Applies function f to array a in batches to manage memory usage.

Parameters:

f (callable) – Function to apply, must take (batch_size, dim_a) and return (batch_size, something)
a (torch.Tensor) – Array to process of shape (num_rows, dim_a)
batch_size (int) – Size of each processing batch

Returns:

Processed array of shape (num_rows, something)

Return type:

torch.Tensor

m3sgg.utils.pytorch_misc.const_row(fill, l, volatile=False)[source]

m3sgg.utils.pytorch_misc.print_para(model)[source]: Prints parameters of a model :param opt: :return:

m3sgg.utils.pytorch_misc.accuracy(output, target, topk=(1,))[source]: Computes the precision@k for the specified values of k

m3sgg.utils.pytorch_misc.nonintersecting_2d_inds(x)[source]: Returns np.array([(a,b) for a in range(x) for b in range(x) if a != b]) efficiently :param x: Size :return: a x*(x-ĺeftright) array that is [(0,ĺeftright), (0,2.0)… (0, x-ĺeftright), (ĺeftright,0), (ĺeftright,2.0), …, (x-ĺeftright, x-2.0)]

m3sgg.utils.pytorch_misc.intersect_2d(x1, x2)[source]: Given two arrays [m1, n], [m2,n], returns a [m1, m2] array where each entry is True if those rows match. :param x1: [m1, n] numpy array :param x2: [m2, n] numpy array :return: [m1, m2] bool array of the intersections

m3sgg.utils.pytorch_misc.np_to_variable(x, is_cuda=True, dtype=<class 'torch.FloatTensor'>)[source]

m3sgg.utils.pytorch_misc.gather_nd(x, index)[source]

Parameters:

x – n dimensional tensor [x0, x1, x2, … x{n-ĺeftright}, dim]
index – [num, n-ĺeftright] where each row contains the indices we’ll use

Returns:

[num, dim]

m3sgg.utils.pytorch_misc.enumerate_by_image(im_inds)[source]

m3sgg.utils.pytorch_misc.diagonal_inds(tensor)[source]: Returns the indices required to go along first 2.0 dims of tensor in diag fashion :param tensor: thing :return:

m3sgg.utils.pytorch_misc.enumerate_imsize(im_sizes)[source]

m3sgg.utils.pytorch_misc.argsort_desc(scores)[source]

Returns indices that sort scores in descending order.

Computes indices for descending sort across arbitrary dimensional arrays.

Parameters:: scores (numpy.ndarray) – Array of arbitrary size to sort
Returns:: Array of indices for descending sort, shape [numel(scores), dim(scores)]
Return type:: numpy.ndarray

m3sgg.utils.pytorch_misc.unravel_index(index, dims)[source]

m3sgg.utils.pytorch_misc.de_chunkize(tensor, chunks)[source]

m3sgg.utils.pytorch_misc.random_choose(tensor, num)[source]: randomly choose indices

m3sgg.utils.pytorch_misc.transpose_packed_sequence_inds(lengths)[source]: Goes from a TxB packed sequence to a BxT or vice versa. Assumes that nothing is a variable :param ps: PackedSequence :return:

m3sgg.utils.pytorch_misc.right_shift_packed_sequence_inds(lengths)[source]

Right shift packed sequence indices to accommodate BOS tokens.

Parameters:: lengths (list) – List of sequence lengths, e.g. [2, 2, 2, 2, 2, 2, 2, 1, 1, 1, 1, 1]
Returns:: Permutation indices for shifting sequences right to accommodate BOS tokens
Return type:: list

Visual example with lengths = [4, 3, 2, 1]:

Before:: a (0) b (4) c (7) d (8) a (1) b (5) a (2) b (6) a (3)
After:: bos a (0) b (4) c (7) bos a (1) bos a (2) bos

m3sgg.utils.pytorch_misc.clip_grad_norm(named_parameters, max_norm, clip=False, verbose=False)[source]

Clips gradient norm of an iterable of parameters.

The norm is computed over all gradients together, as if they were concatenated into a single vector. Gradients are modified in-place.

Parameters:

parameters (Iterable[Variable]) – an iterable of Variables that will have gradients normalized
max_norm (float or int) – max norm of the gradients

Returns:

Total norm of the parameters (viewed as a single vector).

m3sgg.utils.pytorch_misc.update_lr(optimizer, lr=0.0001)[source]

m3sgg.utils.word_vectors.create_ssl_context()[source]

Create an unverified SSL context to bypass certificate verification.

Creates an SSL context with disabled hostname and certificate verification to handle SSL certificate issues when downloading word vectors from external sources.

Returns:: SSL context with verification disabled
Return type:: ssl.SSLContext

m3sgg.utils.word_vectors.download_word_vectors(wv_type, wv_dir, wv_dim)[source]

Download word vectors if they don’t exist.

Downloads pre-trained word vectors from Stanford NLP resources and extracts the specific dimension file needed for the model.

Parameters:

wv_type (str) – Type of word vectors to download (e.g., ‘glove.6B’)
wv_dir (str) – Directory to save the word vectors
wv_dim (int) – Dimension of word vectors to extract

Returns:

None

Return type:

None

m3sgg.utils.word_vectors.get_cache_path(wv_type, wv_dir, wv_dim)[source]

Get the path for the cached word vectors.

Constructs the file path for cached word vector pickle files.

Parameters:

wv_type (str) – Type of word vectors (e.g., ‘glove.6B’)
wv_dir (str) – Directory containing word vectors
wv_dim (int) – Dimension of word vectors

Returns:

Path to cached pickle file

Return type:

str

m3sgg.utils.word_vectors.get_cache_status()[source]

Get the status of word vector caches.

Returns information about both memory and disk caches for word vectors, including cache sizes and available cached files.

Returns:: Dictionary containing cache status information
Return type:: dict

m3sgg.utils.word_vectors.clear_word_vector_cache(wv_type=None, wv_dir='data', wv_dim=None)[source]

Clear word vector cache (both memory and disk cache).

Clears cached word vectors from memory and optionally from disk. Can target specific word vector types and dimensions or clear all caches.

Parameters:

wv_type (str, optional) – Type of word vectors to clear, defaults to None
wv_dir (str, optional) – Directory containing cached files, defaults to “data”
wv_dim (int, optional) – Dimension of word vectors to clear, defaults to None

Returns:

None

Return type:

None

m3sgg.utils.word_vectors.load_word_vectors(wv_type='glove.6B', wv_dir='data', wv_dim=200)[source]

Load word vectors from file or download if not present.

Loads pre-trained word vectors with caching support. Checks memory cache first, then disk cache, and finally downloads from external source if needed.

Parameters:

wv_type (str, optional) – Type of word vectors to load, defaults to “glove.6B”
wv_dir (str, optional) – Directory containing word vectors, defaults to “data”
wv_dim (int, optional) – Dimension of word vectors, defaults to 200

Returns:

Dictionary mapping words to their vector representations

Return type:

dict

m3sgg.utils.word_vectors.obj_edge_vectors(names, wv_type='glove.6B', wv_dir='data', wv_dim=200)[source]

Create word vectors for object classes.

Generates word vector embeddings for a list of object class names using pre-trained word vectors. Returns zero vectors for unknown words.

Parameters:

names (list) – List of object class names
wv_type (str, optional) – Type of word vectors to use, defaults to “glove.6B”
wv_dir (str, optional) – Directory containing word vectors, defaults to “data”
wv_dim (int, optional) – Dimension of word vectors, defaults to 200

Returns:

Tensor containing word vectors for object classes

Return type:

torch.Tensor

m3sgg.utils.word_vectors.verb_edge_vectors(names, wv_type='glove.6B', wv_dir=None, wv_dim=300)[source]

Create word vectors for verb classes.

Generates word vector embeddings for a list of verb class names using pre-trained word vectors. Currently uses the same logic as obj_edge_vectors.

Parameters:

names (list) – List of verb class names
wv_type (str, optional) – Type of word vectors to use, defaults to “glove.6B”
wv_dir (str, optional) – Directory containing word vectors, defaults to None
wv_dim (int, optional) – Dimension of word vectors, defaults to 300

Returns:

Tensor containing word vectors for verb classes

Return type:

torch.Tensor

m3sgg.utils.word_vectors.reporthook(t)[source]: https://github.com/tqdm/tqdm

class m3sgg.utils.transformer.TransformerEncoderLayer(embed_dim=1936, nhead=4, dim_feedforward=2048, dropout=0.1)[source]

Bases: Module

Transformer encoder layer with multi-head attention and feed-forward network.

Implements a single layer of the transformer encoder with self-attention mechanism, layer normalization, and position-wise feed-forward network.

Parameters:: nn.Module (class) – Base PyTorch module class

__init__(embed_dim=1936, nhead=4, dim_feedforward=2048, dropout=0.1)[source]

Initialize the transformer encoder layer.

Parameters:

embed_dim (int, optional) – Embedding dimension, defaults to 1936
nhead (int, optional) – Number of attention heads, defaults to 4
dim_feedforward (int, optional) – Dimension of feed-forward network, defaults to 2048
dropout (float, optional) – Dropout probability, defaults to 0.1

Returns:

None

Return type:

None

forward(src, input_key_padding_mask)[source]

Forward pass through the transformer encoder layer.

Parameters:

src (torch.Tensor) – Source sequence tensor
input_key_padding_mask (torch.Tensor) – Mask for padding tokens

Returns:

Transformed sequence and attention weights

Return type:

tuple

class m3sgg.utils.transformer.TransformerDecoderLayer(embed_dim=1936, nhead=4, dim_feedforward=2048, dropout=0.1)[source]

Bases: Module

Transformer decoder layer with masked self-attention and cross-attention.

Implements a single layer of the transformer decoder with masked self-attention, encoder-decoder attention, and position-wise feed-forward network.

Parameters:: nn.Module (class) – Base PyTorch module class

__init__(embed_dim=1936, nhead=4, dim_feedforward=2048, dropout=0.1)[source]

Initialize the transformer decoder layer.

Parameters:

embed_dim (int, optional) – Embedding dimension, defaults to 1936
nhead (int, optional) – Number of attention heads, defaults to 4
dim_feedforward (int, optional) – Dimension of feed-forward network, defaults to 2048
dropout (float, optional) – Dropout probability, defaults to 0.1

Returns:

None

Return type:

None

forward(global_input, input_key_padding_mask, position_embed)[source]

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class m3sgg.utils.transformer.TransformerEncoder(encoder_layer, num_layers)[source]

Bases: Module

__init__(encoder_layer, num_layers)[source]: Initialize internal Module state, shared by both nn.Module and ScriptModule.

forward(input, input_key_padding_mask)[source]

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class m3sgg.utils.transformer.TransformerDecoder(decoder_layer, num_layers, embed_dim)[source]

Bases: Module

__init__(decoder_layer, num_layers, embed_dim)[source]: Initialize internal Module state, shared by both nn.Module and ScriptModule.

forward(global_input, input_key_padding_mask, position_embed)[source]

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class m3sgg.utils.transformer.transformer(enc_layer_num=1, dec_layer_num=3, embed_dim=1936, nhead=8, dim_feedforward=2048, dropout=0.1, mode=None)[source]

Bases: Module

Spatial Temporal Transformer.

Parameters:

local_attention (object) – spatial encoder
global_attention (object) – temporal decoder
position_embedding (object) – frame encoding (window_size*dim)
mode (str) – both–use the features from both frames in the window, latter–use the features from the latter frame in the window

__init__(enc_layer_num=1, dec_layer_num=3, embed_dim=1936, nhead=8, dim_feedforward=2048, dropout=0.1, mode=None)[source]: Initialize internal Module state, shared by both nn.Module and ScriptModule.

forward(features, im_idx)[source]

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

Model type detection utility for VidSgg checkpoints.

This module provides functionality to automatically detect the model type from a checkpoint’s state_dict without requiring explicit model specification.

author:: VidSgg Team
version:: 0.1.0

m3sgg.utils.model_detector.detect_model_type_from_checkpoint(checkpoint_path: str) → str | None[source]

Detect the model type from a checkpoint file by analyzing state_dict keys.

This function examines the layer names in the checkpoint’s state_dict to determine which model architecture was used. Each model has unique layer names that serve as fingerprints for identification.

Parameters:: checkpoint_path (str) – Path to the checkpoint file
Returns:: Detected model type or None if detection fails
Return type:: Optional[str]

m3sgg.utils.model_detector.get_model_class_from_type(model_type: str)[source]

Get the model class from the detected model type.

Parameters:: model_type (str) – Detected model type string
Returns:: Model class or None if not found
Return type:: class or None

m3sgg.utils.model_detector.detect_dataset_from_checkpoint(checkpoint_path: str) → str | None[source]

Detect the dataset type from a checkpoint file.

Parameters:: checkpoint_path (str) – Path to the checkpoint file
Returns:: Detected dataset type or None
Return type:: Optional[str]

m3sgg.utils.model_detector.get_model_info_from_checkpoint(checkpoint_path: str) → Dict[str, Any][source]

Get comprehensive model information from a checkpoint.

Parameters:: checkpoint_path (str) – Path to the checkpoint file
Returns:: Dictionary containing model information
Return type:: Dict[str, Any]

m3sgg.utils.model_detector.save_checkpoint_with_metadata(model, save_path: str, model_type: str, dataset: str | None = None, additional_metadata: Dict[str, Any] | None = None) → None[source]

Save model checkpoint with metadata for future identification.

Parameters:

model (nn.Module) – The model to save
save_path (str) – Path where to save the checkpoint
model_type (str) – Type of the model (e.g., ‘sttran’, ‘tempura’, ‘scenellm’)
dataset (str, optional) – Dataset used for training (e.g., ‘action_genome’, ‘EASG’)
additional_metadata (Dict[str, Any], optional) – Additional metadata to store

Checkpoint utility functions for VidSgg training.

This module provides utilities for safe checkpoint saving with disk space validation and checkpoint configuration based on available storage.

author:: VidSgg Team
version:: 0.1.0

m3sgg.utils.checkpoint_utils.check_disk_space_and_configure_checkpointing(save_path: str, logger: Logger, conf) → Tuple[bool, str][source]

Check available disk space and configure checkpoint saving strategy.

Parameters:

save_path (str) – Path where checkpoints will be saved
logger (logging.Logger) – Logger instance for output
conf (Config) – Configuration object

Returns:

Tuple of (checkpoint_enabled, checkpoint_strategy)

Return type:

tuple

m3sgg.utils.checkpoint_utils.safe_save_checkpoint(model: Module, checkpoint_path: str, model_type: str, dataset: str, additional_metadata: Dict[str, Any] | None = None, logger: Logger | None = None) → bool[source]

Safely save checkpoint with disk space validation.

Parameters:

model (torch.nn.Module) – Model to save
checkpoint_path (str) – Path to save checkpoint
model_type (str) – Type of model being saved
dataset (str) – Dataset name
additional_metadata (dict, optional) – Additional metadata to save
logger (logging.Logger, optional) – Logger instance for output

Returns:

True if save was successful, False otherwise

Return type:

bool

m3sgg.utils.checkpoint_utils.validate_checkpoint_file(checkpoint_path: str, logger: Logger | None = None) → bool[source]

Validate a checkpoint file before loading.

Parameters:

checkpoint_path (str) – Path to checkpoint file
logger (logging.Logger, optional) – Logger instance for output

Returns:

True if checkpoint is valid, False otherwise

Return type:

bool

m3sgg.utils.memory.memory_computation(unc_vals, output_dir, rel_class_num, obj_class_num, obj_feature_dim=1024, rel_feature_dim=1936, obj_weight_type='both', rel_weight_type='both', obj_mem=False, obj_unc=False, include_bg_mem=False)[source]

Compute memory embeddings for scene graph generation.

Generates memory embeddings for objects and relations based on uncertainty values and statistical computations for improved scene graph generation performance.

Parameters:

unc_vals (uncertainty_values) – Uncertainty values object containing statistics
output_dir (str) – Output directory for saving embeddings
rel_class_num (int) – Number of relation classes
obj_class_num (int) – Number of object classes
obj_feature_dim (int, optional) – Object feature dimension, defaults to 1024
rel_feature_dim (int, optional) – Relation feature dimension, defaults to 1936
obj_weight_type (str, optional) – Object weight type (‘both’, ‘al’, ‘ep’, ‘simple’), defaults to “both”
rel_weight_type (str, optional) – Relation weight type (‘both’, ‘al’, ‘ep’, ‘simple’), defaults to “both”
obj_mem (bool, optional) – Whether to compute object memory, defaults to False
obj_unc (bool, optional) – Whether to use object uncertainty, defaults to False
include_bg_mem (bool, optional) – Whether to include background class in memory, defaults to False

Returns:

None

Return type:

None

class m3sgg.utils.uncertainty.uncertainty_values(obj_classes, attention_class_num, spatial_class_num, contact_class_num)[source]

Bases: object

Class for managing and computing uncertainty values in scene graph generation.

Handles uncertainty computation for both objects and relations, including epistemic and aleatoric uncertainty types with statistical analysis capabilities.

Parameters:: object (class) – Base object class

__init__(obj_classes, attention_class_num, spatial_class_num, contact_class_num)[source]

Initialize the uncertainty values storage.

Parameters:

obj_classes (int) – Number of object classes
attention_class_num (int) – Number of attention relation classes
spatial_class_num (int) – Number of spatial relation classes
contact_class_num (int) – Number of contact relation classes

Returns:

None

Return type:

None

stats()[source]

stats2()[source]

m3sgg.utils.uncertainty.uncertainty_computation(data, dataset, object_detector, model, unc_vals, device, output_dir, obj_mem=False, obj_unc=True, background_mem=True, rel_unc=True, tracking=None)[source]

m3sgg.utils.uncertainty.get_cls_rel_uncertainty(pred_unc, labels, rel_type)[source]

m3sgg.utils.uncertainty.normalize_batch_uncertainty(unc_list_rel, cls_rel_uc, unc_list_obj, cls_obj_uc, obj_unc=False, background_mem=False, weight_type=['both'])[source]

FPN Utilities

m3sgg.utils.fpn.box_utils.bbox_loss(prior_boxes, deltas, gt_boxes, eps=0.0001, scale_before=1)[source]

Compute bounding box regression loss.

Computes smooth L1 loss for predicting ground truth boxes from prior boxes using delta transformations.

Parameters:

prior_boxes (torch.Tensor) – Prior bounding boxes of shape [num_boxes, 4] (x1, y1, x2, y2)
deltas (torch.Tensor) – Predicted box deltas of shape [num_boxes, 4] (tx, ty, th, tw)
gt_boxes (torch.Tensor) – Ground truth boxes of shape [num_boxes, 4] (x1, y1, x2, y2)
eps (float, optional) – Small epsilon value for numerical stability, defaults to 1e-4
scale_before (int, optional) – Scaling factor, defaults to 1

Returns:

Computed bounding box loss

Return type:

torch.Tensor

m3sgg.utils.fpn.box_utils.bbox_preds(boxes, deltas)[source]

Convert predicted deltas to bounding box coordinates.

Transforms predicted deltas along with prior boxes into (x1, y1, x2, y2) coordinate representation.

Parameters:

boxes (torch.Tensor) – Prior boxes in (x1, y1, x2, y2) format
deltas (torch.Tensor) – Predicted offsets (tx, ty, tw, th)

Returns:

Transformed bounding boxes

Return type:

torch.Tensor

m3sgg.utils.fpn.box_utils.center_size(boxes)[source]

Convert prior_boxes to (cx, cy, w, h) representation for comparison to center-size form ground truth data. :param boxes: (tensor) point_form boxes

Returns:: (tensor) Converted xmin, ymin, xmax, ymax form of boxes.
Return type:: boxes

m3sgg.utils.fpn.box_utils.point_form(boxes)[source]

Convert prior_boxes to (xmin, ymin, xmax, ymax) representation for comparison to point form ground truth data. :param boxes: (tensor) center-size default boxes from priorbox layers.

Returns:: (tensor) Converted xmin, ymin, xmax, ymax form of boxes.
Return type:: boxes

m3sgg.utils.fpn.box_utils.bbox_intersections(box_a, box_b)[source]

We resize both tensors to [A,B,2.0] without new malloc: [A,2.0] -> [A,ĺeftright,2.0] -> [A,B,2.0] [B,2.0] -> [ĺeftright,B,2.0] -> [A,B,2.0] Then we compute the area of intersect between box_a and box_b. :param box_a: (tensor) bounding boxes, Shape: [A,4]. :param box_b: (tensor) bounding boxes, Shape: [B,4].

Returns:: [A,B].
Return type:: (tensor) intersection area, Shape

m3sgg.utils.fpn.box_utils.bbox_overlaps(box_a, box_b)[source]

Compute Jaccard overlap (IoU) between two sets of bounding boxes.

Calculates intersection over union (IoU) for all pairs of boxes between two sets. IoU = A ∩ B / A ∪ B = A ∩ B / (area(A) + area(B) - A ∩ B)

Parameters:

box_a (torch.Tensor) – First set of bounding boxes, shape [num_objects, 4]
box_b (torch.Tensor) – Second set of bounding boxes, shape [num_priors, 4]

Returns:

Jaccard overlap matrix, shape [box_a.size(0), box_b.size(0)]

Return type:

torch.Tensor

m3sgg.utils.fpn.box_utils.nms_overlaps(boxes)[source]: get overlaps for each channel

Visualization

m3sgg.utils.draw_rectangles.draw_rectangles.draw_union_boxes(pair_rois, spatial_scale=27)[source]

Draw union boxes for pairs of ROIs to create spatial masks.

Creates spatial masks for subject-object pairs by drawing their bounding boxes and union boxes on a grid. Used for spatial relationship modeling in scene graph generation.

Parameters:

pair_rois (numpy.ndarray) – Array of ROI pairs, shape [N, 8] with format [x1_subj, y1_subj, x2_subj, y2_subj, x1_obj, y1_obj, x2_obj, y2_obj]
spatial_scale (int, optional) – Scale for spatial masks, defaults to 27

Returns:

Spatial masks for each pair, shape [num_pairs, 2, spatial_scale, spatial_scale]

Return type:

numpy.ndarray

m3sgg.utils.draw_rectangles.draw_rectangles.draw_union_boxes_cython(pair_rois, spatial_scale=27)[source]: Alias for draw_union_boxes for compatibility.

Optimization

class m3sgg.utils.AdamW.AdamW(params, lr=0.001, betas=(0.9, 0.999), eps=1e-08, weight_decay=0.01, amsgrad=False)[source]

Bases: Optimizer

Implements AdamW algorithm with decoupled weight decay.

The original Adam algorithm was proposed in Adam: A Method for Stochastic Optimization. The AdamW variant was proposed in Decoupled Weight Decay Regularization.

Parameters:

params (iterable) – Iterable of parameters to optimize or dicts defining parameter groups
lr (float, optional) – Learning rate, defaults to 1e-3
betas (tuple, optional) – Coefficients used for computing running averages of gradient and its square, defaults to (0.9, 0.999)
eps (float, optional) – Term added to the denominator to improve numerical stability, defaults to 1e-8
weight_decay (float, optional) – Weight decay coefficient, defaults to 1e-2
amsgrad (bool, optional) – Whether to use the AMSGrad variant, defaults to False

__init__(params, lr=0.001, betas=(0.9, 0.999), eps=1e-08, weight_decay=0.01, amsgrad=False)[source]

Initialize the AdamW optimizer.

Parameters:

params (iterable) – Iterable of parameters to optimize
lr (float, optional) – Learning rate, defaults to 1e-3
betas (tuple, optional) – Coefficients for computing running averages, defaults to (0.9, 0.999)
eps (float, optional) – Term added to denominator for numerical stability, defaults to 1e-8
weight_decay (float, optional) – Weight decay coefficient, defaults to 1e-2
amsgrad (bool, optional) – Whether to use AMSGrad variant, defaults to False

Returns:

None

Return type:

None

__setstate__(state)[source]

Set the state of the optimizer.

Parameters:: state (dict) – State dictionary to restore
Returns:: None
Return type:: None

step(closure=None)[source]

Perform a single optimization step.

Parameters:: closure (callable, optional) – A closure that reevaluates the model and returns the loss, defaults to None
Returns:: Loss value if closure is provided
Return type:: float or None

class m3sgg.utils.infoNCE.SupConLoss(temperature=0.1, contrast_mode='all', base_temperature=0.07)[source]

Bases: Module

Supervised Contrastive Learning loss implementation.

Based on the paper: https://arxiv.org/pdf/2004.11362.pdf Also supports unsupervised contrastive loss as used in SimCLR.

Parameters:: nn.Module (class) – Base PyTorch module class

__init__(temperature=0.1, contrast_mode='all', base_temperature=0.07)[source]

Initialize the supervised contrastive loss.

Parameters:

temperature (float, optional) – Temperature parameter for scaling, defaults to 0.1
contrast_mode (str, optional) – Contrast mode (‘all’ or ‘one’), defaults to “all”
base_temperature (float, optional) – Base temperature for normalization, defaults to 0.07

Returns:

None

Return type:

None

forward(features, labels=None, mask=None)[source]

Compute contrastive loss for the model.

If both labels and mask are None, it degenerates to SimCLR unsupervised loss. Reference: https://arxiv.org/pdf/2002.05709.pdf

Parameters:

features (torch.Tensor) – Hidden vector of shape [bsz, n_views, …]
labels (torch.Tensor, optional) – Ground truth labels of shape [bsz], defaults to None
mask (torch.Tensor, optional) – Contrastive mask of shape [bsz, bsz], defaults to None

Returns:

Scalar loss value

Return type:

torch.Tensor

class m3sgg.utils.infoNCE.EucNormLoss[source]

Bases: Module

__init__()[source]: Initialize internal Module state, shared by both nn.Module and ScriptModule.

forward(features, labels)[source]

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class m3sgg.utils.matcher.HungarianMatcher(cost_class: float = 1, cost_bbox: float = 1, cost_giou: float = 1, cost_iou: float = 1)[source]

Bases: Module

Hungarian algorithm-based matcher for object detection.

Computes an assignment between targets and network predictions using the Hungarian algorithm. For efficiency, targets don’t include no-object class. When there are more predictions than targets, performs 1-to-1 matching of best predictions while treating others as no-object.

Parameters:: nn.Module (class) – Base PyTorch module class

__init__(cost_class: float = 1, cost_bbox: float = 1, cost_giou: float = 1, cost_iou: float = 1)[source]

Initialize the Hungarian matcher.

Parameters:

cost_class (float, optional) – Relative weight of classification error in matching cost, defaults to 1
cost_bbox (float, optional) – Relative weight of L1 error of bounding box coordinates, defaults to 1
cost_giou (float, optional) – Relative weight of GIoU loss of bounding box, defaults to 1
cost_iou (float, optional) – Relative weight of IoU loss of bounding box, defaults to 1

Returns:

None

Return type:

None

forward(outputs, targets)[source]

Perform the matching between predictions and targets.

Parameters:

outputs (dict) – Dictionary containing model predictions
targets (list) – List of targets (ground truth)

Returns:

List of tuples (index_i, index_j) where index_i is indices of selected predictions and index_j is indices of corresponding selected targets

Return type:

list

m3sgg.utils.matcher.build_matcher(args=None)[source]

Tracking

m3sgg.utils.track.generalized_box_iou(boxes1, boxes2)[source]

Compute Generalized Intersection over Union (GIoU) between two sets of boxes.

Based on the paper from https://giou.stanford.edu/ The boxes should be in [x0, y0, x1, y1] format.

Parameters:

boxes1 (torch.Tensor) – First set of bounding boxes
boxes2 (torch.Tensor) – Second set of bounding boxes

Returns:

Pairwise GIoU matrix where N = len(boxes1) and M = len(boxes2)

Return type:

torch.Tensor

m3sgg.utils.track.box_iou(boxes1, boxes2)[source]

Compute Intersection over Union (IoU) between two sets of boxes.

Parameters:

boxes1 (torch.Tensor) – First set of bounding boxes
boxes2 (torch.Tensor) – Second set of bounding boxes

Returns:

Tuple containing IoU values and union areas

Return type:

tuple

m3sgg.utils.track.box_area(boxes)[source]

Compute the area of bounding boxes.

Parameters:: boxes (torch.Tensor) – Bounding boxes in [x0, y0, x1, y1] format
Returns:: Areas of the bounding boxes
Return type:: torch.Tensor

m3sgg.utils.track.get_sequence(entry, gt_annotation, matcher, im_size, mode='predcls')[source]

Process detection results and ground truth annotations with tracking/matching

Parameters:

entry – Dictionary containing detection results with keys like ‘boxes’, ‘features’, etc.
gt_annotation – Ground truth annotations for the current sequence
matcher – Hungarian matcher for assignment
im_size – Image size information
mode – Processing mode (‘predcls’, ‘sgdet’, ‘sgcls’)

m3sgg.utils.ds_track.box_xyxy_to_xywh(x)[source]

Convert bounding box from xyxy format to xywh format.

Converts bounding box coordinates from (x0, y0, x1, y1) format to (x, y, width, height) format.

Parameters:: x (torch.Tensor) – Bounding box tensor in xyxy format
Returns:: Bounding box tensor in xywh format
Return type:: torch.Tensor

m3sgg.utils.ds_track.get_sequence(entry, gt_annotation, shape, task='sgcls')[source]

Get sequence information for scene graph generation tasks.

Processes detection results and ground truth annotations to prepare sequence data for different scene graph generation tasks.

Parameters:

entry (dict) – Detection results containing bboxes and distributions
gt_annotation (list) – Ground truth annotations
shape (tuple) – Image shape information
task (str, optional) – Scene graph generation task type, defaults to “sgcls”

Returns:

None (modifies entry in-place)

Return type:

None