Utils Module
The utils module provides helper functions and utilities for the M3SGG framework.
Core Utilities
- m3sgg.utils.funcs.assign_relations(prediction, gt_annotations, assign_IOU_threshold)[source]
Assign relations between predicted detections and ground truth annotations.
Matches predicted bounding boxes with ground truth annotations based on IoU threshold and prepares relation data for scene graph generation training.
- Parameters:
- Returns:
Tuple containing detector found indices, ground truth relations, and supply relations
- Return type:
- m3sgg.utils.funcs.im_list_to_blob(ims)[source]
Convert a list of images into a network input.
Assumes images are already prepared (means subtracted, BGR order, …).
- m3sgg.utils.funcs.transpose_packed_sequence_inds(lengths)[source]
Goes from a TxB packed sequence to a BxT or vice versa. Assumes that nothing is a variable :param ps: PackedSequence :return:
Miscellaneous functions that might be useful for pytorch
- m3sgg.utils.pytorch_misc.optimistic_restore(network, state_dict)[source]
Optimistically restore network weights from state dictionary.
Attempts to load weights from state_dict into network, handling size mismatches gracefully by skipping incompatible parameters.
- Parameters:
network (torch.nn.Module) – Neural network to restore weights to
state_dict (dict) – State dictionary containing weights
- Returns:
Whether any mismatches were found
- Return type:
- m3sgg.utils.pytorch_misc.get_ranking(predictions, labels, num_guesses=5)[source]
Given a matrix of predictions and labels for the correct ones, get the number of guesses required to get the prediction right per example. :param predictions: [batch_size, range_size] predictions :param labels: [batch_size] array of labels :param num_guesses: Number of guesses to return :return:
- class m3sgg.utils.pytorch_misc.Flattener[source]
Bases:
Module
- forward(x)[source]
Define the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- m3sgg.utils.pytorch_misc.to_variable(f)[source]
Decorator that pushes all the outputs to a variable :param f: :return:
- m3sgg.utils.pytorch_misc.to_onehot(vec, num_classes, fill=1000)[source]
Creates a [size, num_classes] torch FloatTensor where one_hot[i, vec[i]] = fill
- Parameters:
vec – 1d torch tensor
num_classes – int
fill – value that we want + and - things to be.
- Returns:
- m3sgg.utils.pytorch_misc.batch_index_iterator(len_l, batch_size, skip_end=True)[source]
Provides indices that iterate over a list in batches.
Creates a generator that yields (start, end) tuples for batch processing.
- m3sgg.utils.pytorch_misc.batch_map(f, a, batch_size)[source]
Maps a function over an array in chunks of specified batch size.
Applies function f to array a in batches to manage memory usage.
- Parameters:
f (callable) – Function to apply, must take (batch_size, dim_a) and return (batch_size, something)
a (torch.Tensor) – Array to process of shape (num_rows, dim_a)
batch_size (int) – Size of each processing batch
- Returns:
Processed array of shape (num_rows, something)
- Return type:
- m3sgg.utils.pytorch_misc.print_para(model)[source]
Prints parameters of a model :param opt: :return:
- m3sgg.utils.pytorch_misc.accuracy(output, target, topk=(1,))[source]
Computes the precision@k for the specified values of k
- m3sgg.utils.pytorch_misc.nonintersecting_2d_inds(x)[source]
Returns np.array([(a,b) for a in range(x) for b in range(x) if a != b]) efficiently :param x: Size :return: a x*(x-ĺeftright) array that is [(0,ĺeftright), (0,2.0)… (0, x-ĺeftright), (ĺeftright,0), (ĺeftright,2.0), …, (x-ĺeftright, x-2.0)]
- m3sgg.utils.pytorch_misc.intersect_2d(x1, x2)[source]
Given two arrays [m1, n], [m2,n], returns a [m1, m2] array where each entry is True if those rows match. :param x1: [m1, n] numpy array :param x2: [m2, n] numpy array :return: [m1, m2] bool array of the intersections
- m3sgg.utils.pytorch_misc.np_to_variable(x, is_cuda=True, dtype=<class 'torch.FloatTensor'>)[source]
- m3sgg.utils.pytorch_misc.gather_nd(x, index)[source]
- Parameters:
x – n dimensional tensor [x0, x1, x2, … x{n-ĺeftright}, dim]
index – [num, n-ĺeftright] where each row contains the indices we’ll use
- Returns:
[num, dim]
- m3sgg.utils.pytorch_misc.diagonal_inds(tensor)[source]
Returns the indices required to go along first 2.0 dims of tensor in diag fashion :param tensor: thing :return:
- m3sgg.utils.pytorch_misc.argsort_desc(scores)[source]
Returns indices that sort scores in descending order.
Computes indices for descending sort across arbitrary dimensional arrays.
- Parameters:
scores (numpy.ndarray) – Array of arbitrary size to sort
- Returns:
Array of indices for descending sort, shape [numel(scores), dim(scores)]
- Return type:
- m3sgg.utils.pytorch_misc.transpose_packed_sequence_inds(lengths)[source]
Goes from a TxB packed sequence to a BxT or vice versa. Assumes that nothing is a variable :param ps: PackedSequence :return:
- m3sgg.utils.pytorch_misc.right_shift_packed_sequence_inds(lengths)[source]
Right shift packed sequence indices to accommodate BOS tokens.
- Parameters:
lengths (list) – List of sequence lengths, e.g. [2, 2, 2, 2, 2, 2, 2, 1, 1, 1, 1, 1]
- Returns:
Permutation indices for shifting sequences right to accommodate BOS tokens
- Return type:
Visual example with lengths = [4, 3, 2, 1]:
- Before:
a (0) b (4) c (7) d (8) a (1) b (5) a (2) b (6) a (3)
- After:
bos a (0) b (4) c (7) bos a (1) bos a (2) bos
- m3sgg.utils.pytorch_misc.clip_grad_norm(named_parameters, max_norm, clip=False, verbose=False)[source]
Clips gradient norm of an iterable of parameters.
The norm is computed over all gradients together, as if they were concatenated into a single vector. Gradients are modified in-place.
- m3sgg.utils.word_vectors.create_ssl_context()[source]
Create an unverified SSL context to bypass certificate verification.
Creates an SSL context with disabled hostname and certificate verification to handle SSL certificate issues when downloading word vectors from external sources.
- Returns:
SSL context with verification disabled
- Return type:
- m3sgg.utils.word_vectors.download_word_vectors(wv_type, wv_dir, wv_dim)[source]
Download word vectors if they don’t exist.
Downloads pre-trained word vectors from Stanford NLP resources and extracts the specific dimension file needed for the model.
- m3sgg.utils.word_vectors.get_cache_path(wv_type, wv_dir, wv_dim)[source]
Get the path for the cached word vectors.
Constructs the file path for cached word vector pickle files.
- m3sgg.utils.word_vectors.get_cache_status()[source]
Get the status of word vector caches.
Returns information about both memory and disk caches for word vectors, including cache sizes and available cached files.
- Returns:
Dictionary containing cache status information
- Return type:
- m3sgg.utils.word_vectors.clear_word_vector_cache(wv_type=None, wv_dir='data', wv_dim=None)[source]
Clear word vector cache (both memory and disk cache).
Clears cached word vectors from memory and optionally from disk. Can target specific word vector types and dimensions or clear all caches.
- m3sgg.utils.word_vectors.load_word_vectors(wv_type='glove.6B', wv_dir='data', wv_dim=200)[source]
Load word vectors from file or download if not present.
Loads pre-trained word vectors with caching support. Checks memory cache first, then disk cache, and finally downloads from external source if needed.
- Parameters:
- Returns:
Dictionary mapping words to their vector representations
- Return type:
- m3sgg.utils.word_vectors.obj_edge_vectors(names, wv_type='glove.6B', wv_dir='data', wv_dim=200)[source]
Create word vectors for object classes.
Generates word vector embeddings for a list of object class names using pre-trained word vectors. Returns zero vectors for unknown words.
- Parameters:
- Returns:
Tensor containing word vectors for object classes
- Return type:
- m3sgg.utils.word_vectors.verb_edge_vectors(names, wv_type='glove.6B', wv_dir=None, wv_dim=300)[source]
Create word vectors for verb classes.
Generates word vector embeddings for a list of verb class names using pre-trained word vectors. Currently uses the same logic as obj_edge_vectors.
- Parameters:
- Returns:
Tensor containing word vectors for verb classes
- Return type:
- class m3sgg.utils.transformer.TransformerEncoderLayer(embed_dim=1936, nhead=4, dim_feedforward=2048, dropout=0.1)[source]
Bases:
Module
Transformer encoder layer with multi-head attention and feed-forward network.
Implements a single layer of the transformer encoder with self-attention mechanism, layer normalization, and position-wise feed-forward network.
- Parameters:
nn.Module (class) – Base PyTorch module class
- __init__(embed_dim=1936, nhead=4, dim_feedforward=2048, dropout=0.1)[source]
Initialize the transformer encoder layer.
- Parameters:
- Returns:
None
- Return type:
None
- forward(src, input_key_padding_mask)[source]
Forward pass through the transformer encoder layer.
- Parameters:
src (torch.Tensor) – Source sequence tensor
input_key_padding_mask (torch.Tensor) – Mask for padding tokens
- Returns:
Transformed sequence and attention weights
- Return type:
- class m3sgg.utils.transformer.TransformerDecoderLayer(embed_dim=1936, nhead=4, dim_feedforward=2048, dropout=0.1)[source]
Bases:
Module
Transformer decoder layer with masked self-attention and cross-attention.
Implements a single layer of the transformer decoder with masked self-attention, encoder-decoder attention, and position-wise feed-forward network.
- Parameters:
nn.Module (class) – Base PyTorch module class
- __init__(embed_dim=1936, nhead=4, dim_feedforward=2048, dropout=0.1)[source]
Initialize the transformer decoder layer.
- Parameters:
- Returns:
None
- Return type:
None
- forward(global_input, input_key_padding_mask, position_embed)[source]
Define the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- class m3sgg.utils.transformer.TransformerEncoder(encoder_layer, num_layers)[source]
Bases:
Module
- __init__(encoder_layer, num_layers)[source]
Initialize internal Module state, shared by both nn.Module and ScriptModule.
- forward(input, input_key_padding_mask)[source]
Define the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- class m3sgg.utils.transformer.TransformerDecoder(decoder_layer, num_layers, embed_dim)[source]
Bases:
Module
- __init__(decoder_layer, num_layers, embed_dim)[source]
Initialize internal Module state, shared by both nn.Module and ScriptModule.
- forward(global_input, input_key_padding_mask, position_embed)[source]
Define the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- class m3sgg.utils.transformer.transformer(enc_layer_num=1, dec_layer_num=3, embed_dim=1936, nhead=8, dim_feedforward=2048, dropout=0.1, mode=None)[source]
Bases:
Module
Spatial Temporal Transformer.
- Parameters:
- __init__(enc_layer_num=1, dec_layer_num=3, embed_dim=1936, nhead=8, dim_feedforward=2048, dropout=0.1, mode=None)[source]
Initialize internal Module state, shared by both nn.Module and ScriptModule.
- forward(features, im_idx)[source]
Define the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
Model type detection utility for VidSgg checkpoints.
This module provides functionality to automatically detect the model type from a checkpoint’s state_dict without requiring explicit model specification.
- author:
VidSgg Team
- version:
0.1.0
- m3sgg.utils.model_detector.detect_model_type_from_checkpoint(checkpoint_path: str) str | None [source]
Detect the model type from a checkpoint file by analyzing state_dict keys.
This function examines the layer names in the checkpoint’s state_dict to determine which model architecture was used. Each model has unique layer names that serve as fingerprints for identification.
- m3sgg.utils.model_detector.get_model_class_from_type(model_type: str)[source]
Get the model class from the detected model type.
- Parameters:
model_type (str) – Detected model type string
- Returns:
Model class or None if not found
- Return type:
class or None
- m3sgg.utils.model_detector.detect_dataset_from_checkpoint(checkpoint_path: str) str | None [source]
Detect the dataset type from a checkpoint file.
- m3sgg.utils.model_detector.get_model_info_from_checkpoint(checkpoint_path: str) Dict[str, Any] [source]
Get comprehensive model information from a checkpoint.
- m3sgg.utils.model_detector.save_checkpoint_with_metadata(model, save_path: str, model_type: str, dataset: str | None = None, additional_metadata: Dict[str, Any] | None = None) None [source]
Save model checkpoint with metadata for future identification.
- Parameters:
model (nn.Module) – The model to save
save_path (str) – Path where to save the checkpoint
model_type (str) – Type of the model (e.g., ‘sttran’, ‘tempura’, ‘scenellm’)
dataset (str, optional) – Dataset used for training (e.g., ‘action_genome’, ‘EASG’)
additional_metadata (Dict[str, Any], optional) – Additional metadata to store
Checkpoint utility functions for VidSgg training.
This module provides utilities for safe checkpoint saving with disk space validation and checkpoint configuration based on available storage.
- author:
VidSgg Team
- version:
0.1.0
- m3sgg.utils.checkpoint_utils.check_disk_space_and_configure_checkpointing(save_path: str, logger: Logger, conf) Tuple[bool, str] [source]
Check available disk space and configure checkpoint saving strategy.
- Parameters:
save_path (str) – Path where checkpoints will be saved
logger (logging.Logger) – Logger instance for output
conf (Config) – Configuration object
- Returns:
Tuple of (checkpoint_enabled, checkpoint_strategy)
- Return type:
- m3sgg.utils.checkpoint_utils.safe_save_checkpoint(model: Module, checkpoint_path: str, model_type: str, dataset: str, additional_metadata: Dict[str, Any] | None = None, logger: Logger | None = None) bool [source]
Safely save checkpoint with disk space validation.
- Parameters:
model (torch.nn.Module) – Model to save
checkpoint_path (str) – Path to save checkpoint
model_type (str) – Type of model being saved
dataset (str) – Dataset name
additional_metadata (dict, optional) – Additional metadata to save
logger (logging.Logger, optional) – Logger instance for output
- Returns:
True if save was successful, False otherwise
- Return type:
- m3sgg.utils.checkpoint_utils.validate_checkpoint_file(checkpoint_path: str, logger: Logger | None = None) bool [source]
Validate a checkpoint file before loading.
- Parameters:
checkpoint_path (str) – Path to checkpoint file
logger (logging.Logger, optional) – Logger instance for output
- Returns:
True if checkpoint is valid, False otherwise
- Return type:
- m3sgg.utils.memory.memory_computation(unc_vals, output_dir, rel_class_num, obj_class_num, obj_feature_dim=1024, rel_feature_dim=1936, obj_weight_type='both', rel_weight_type='both', obj_mem=False, obj_unc=False, include_bg_mem=False)[source]
Compute memory embeddings for scene graph generation.
Generates memory embeddings for objects and relations based on uncertainty values and statistical computations for improved scene graph generation performance.
- Parameters:
unc_vals (uncertainty_values) – Uncertainty values object containing statistics
output_dir (str) – Output directory for saving embeddings
rel_class_num (int) – Number of relation classes
obj_class_num (int) – Number of object classes
obj_feature_dim (int, optional) – Object feature dimension, defaults to 1024
rel_feature_dim (int, optional) – Relation feature dimension, defaults to 1936
obj_weight_type (str, optional) – Object weight type (‘both’, ‘al’, ‘ep’, ‘simple’), defaults to “both”
rel_weight_type (str, optional) – Relation weight type (‘both’, ‘al’, ‘ep’, ‘simple’), defaults to “both”
obj_mem (bool, optional) – Whether to compute object memory, defaults to False
obj_unc (bool, optional) – Whether to use object uncertainty, defaults to False
include_bg_mem (bool, optional) – Whether to include background class in memory, defaults to False
- Returns:
None
- Return type:
None
- class m3sgg.utils.uncertainty.uncertainty_values(obj_classes, attention_class_num, spatial_class_num, contact_class_num)[source]
Bases:
object
Class for managing and computing uncertainty values in scene graph generation.
Handles uncertainty computation for both objects and relations, including epistemic and aleatoric uncertainty types with statistical analysis capabilities.
- Parameters:
object (class) – Base object class
FPN Utilities
- m3sgg.utils.fpn.box_utils.bbox_loss(prior_boxes, deltas, gt_boxes, eps=0.0001, scale_before=1)[source]
Compute bounding box regression loss.
Computes smooth L1 loss for predicting ground truth boxes from prior boxes using delta transformations.
- Parameters:
prior_boxes (torch.Tensor) – Prior bounding boxes of shape [num_boxes, 4] (x1, y1, x2, y2)
deltas (torch.Tensor) – Predicted box deltas of shape [num_boxes, 4] (tx, ty, th, tw)
gt_boxes (torch.Tensor) – Ground truth boxes of shape [num_boxes, 4] (x1, y1, x2, y2)
eps (float, optional) – Small epsilon value for numerical stability, defaults to 1e-4
scale_before (int, optional) – Scaling factor, defaults to 1
- Returns:
Computed bounding box loss
- Return type:
- m3sgg.utils.fpn.box_utils.bbox_preds(boxes, deltas)[source]
Convert predicted deltas to bounding box coordinates.
Transforms predicted deltas along with prior boxes into (x1, y1, x2, y2) coordinate representation.
- Parameters:
boxes (torch.Tensor) – Prior boxes in (x1, y1, x2, y2) format
deltas (torch.Tensor) – Predicted offsets (tx, ty, tw, th)
- Returns:
Transformed bounding boxes
- Return type:
- m3sgg.utils.fpn.box_utils.center_size(boxes)[source]
Convert prior_boxes to (cx, cy, w, h) representation for comparison to center-size form ground truth data. :param boxes: (tensor) point_form boxes
- Returns:
(tensor) Converted xmin, ymin, xmax, ymax form of boxes.
- Return type:
boxes
- m3sgg.utils.fpn.box_utils.point_form(boxes)[source]
Convert prior_boxes to (xmin, ymin, xmax, ymax) representation for comparison to point form ground truth data. :param boxes: (tensor) center-size default boxes from priorbox layers.
- Returns:
(tensor) Converted xmin, ymin, xmax, ymax form of boxes.
- Return type:
boxes
- m3sgg.utils.fpn.box_utils.bbox_intersections(box_a, box_b)[source]
We resize both tensors to [A,B,2.0] without new malloc: [A,2.0] -> [A,ĺeftright,2.0] -> [A,B,2.0] [B,2.0] -> [ĺeftright,B,2.0] -> [A,B,2.0] Then we compute the area of intersect between box_a and box_b. :param box_a: (tensor) bounding boxes, Shape: [A,4]. :param box_b: (tensor) bounding boxes, Shape: [B,4].
- Returns:
[A,B].
- Return type:
(tensor) intersection area, Shape
- m3sgg.utils.fpn.box_utils.bbox_overlaps(box_a, box_b)[source]
Compute Jaccard overlap (IoU) between two sets of bounding boxes.
Calculates intersection over union (IoU) for all pairs of boxes between two sets. IoU = A ∩ B / A ∪ B = A ∩ B / (area(A) + area(B) - A ∩ B)
- Parameters:
box_a (torch.Tensor) – First set of bounding boxes, shape [num_objects, 4]
box_b (torch.Tensor) – Second set of bounding boxes, shape [num_priors, 4]
- Returns:
Jaccard overlap matrix, shape [box_a.size(0), box_b.size(0)]
- Return type:
Visualization
- m3sgg.utils.draw_rectangles.draw_rectangles.draw_union_boxes(pair_rois, spatial_scale=27)[source]
Draw union boxes for pairs of ROIs to create spatial masks.
Creates spatial masks for subject-object pairs by drawing their bounding boxes and union boxes on a grid. Used for spatial relationship modeling in scene graph generation.
- Parameters:
pair_rois (numpy.ndarray) – Array of ROI pairs, shape [N, 8] with format [x1_subj, y1_subj, x2_subj, y2_subj, x1_obj, y1_obj, x2_obj, y2_obj]
spatial_scale (int, optional) – Scale for spatial masks, defaults to 27
- Returns:
Spatial masks for each pair, shape [num_pairs, 2, spatial_scale, spatial_scale]
- Return type:
Optimization
- class m3sgg.utils.AdamW.AdamW(params, lr=0.001, betas=(0.9, 0.999), eps=1e-08, weight_decay=0.01, amsgrad=False)[source]
Bases:
Optimizer
Implements AdamW algorithm with decoupled weight decay.
The original Adam algorithm was proposed in Adam: A Method for Stochastic Optimization. The AdamW variant was proposed in Decoupled Weight Decay Regularization.
- Parameters:
params (iterable) – Iterable of parameters to optimize or dicts defining parameter groups
lr (float, optional) – Learning rate, defaults to 1e-3
betas (tuple, optional) – Coefficients used for computing running averages of gradient and its square, defaults to (0.9, 0.999)
eps (float, optional) – Term added to the denominator to improve numerical stability, defaults to 1e-8
weight_decay (float, optional) – Weight decay coefficient, defaults to 1e-2
amsgrad (bool, optional) – Whether to use the AMSGrad variant, defaults to False
- __init__(params, lr=0.001, betas=(0.9, 0.999), eps=1e-08, weight_decay=0.01, amsgrad=False)[source]
Initialize the AdamW optimizer.
- Parameters:
params (iterable) – Iterable of parameters to optimize
lr (float, optional) – Learning rate, defaults to 1e-3
betas (tuple, optional) – Coefficients for computing running averages, defaults to (0.9, 0.999)
eps (float, optional) – Term added to denominator for numerical stability, defaults to 1e-8
weight_decay (float, optional) – Weight decay coefficient, defaults to 1e-2
amsgrad (bool, optional) – Whether to use AMSGrad variant, defaults to False
- Returns:
None
- Return type:
None
- class m3sgg.utils.infoNCE.SupConLoss(temperature=0.1, contrast_mode='all', base_temperature=0.07)[source]
Bases:
Module
Supervised Contrastive Learning loss implementation.
Based on the paper: https://arxiv.org/pdf/2004.11362.pdf Also supports unsupervised contrastive loss as used in SimCLR.
- Parameters:
nn.Module (class) – Base PyTorch module class
- __init__(temperature=0.1, contrast_mode='all', base_temperature=0.07)[source]
Initialize the supervised contrastive loss.
- forward(features, labels=None, mask=None)[source]
Compute contrastive loss for the model.
If both labels and mask are None, it degenerates to SimCLR unsupervised loss. Reference: https://arxiv.org/pdf/2002.05709.pdf
- Parameters:
features (torch.Tensor) – Hidden vector of shape [bsz, n_views, …]
labels (torch.Tensor, optional) – Ground truth labels of shape [bsz], defaults to None
mask (torch.Tensor, optional) – Contrastive mask of shape [bsz, bsz], defaults to None
- Returns:
Scalar loss value
- Return type:
- class m3sgg.utils.infoNCE.EucNormLoss[source]
Bases:
Module
- forward(features, labels)[source]
Define the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- class m3sgg.utils.matcher.HungarianMatcher(cost_class: float = 1, cost_bbox: float = 1, cost_giou: float = 1, cost_iou: float = 1)[source]
Bases:
Module
Hungarian algorithm-based matcher for object detection.
Computes an assignment between targets and network predictions using the Hungarian algorithm. For efficiency, targets don’t include no-object class. When there are more predictions than targets, performs 1-to-1 matching of best predictions while treating others as no-object.
- Parameters:
nn.Module (class) – Base PyTorch module class
- __init__(cost_class: float = 1, cost_bbox: float = 1, cost_giou: float = 1, cost_iou: float = 1)[source]
Initialize the Hungarian matcher.
- Parameters:
cost_class (float, optional) – Relative weight of classification error in matching cost, defaults to 1
cost_bbox (float, optional) – Relative weight of L1 error of bounding box coordinates, defaults to 1
cost_giou (float, optional) – Relative weight of GIoU loss of bounding box, defaults to 1
cost_iou (float, optional) – Relative weight of IoU loss of bounding box, defaults to 1
- Returns:
None
- Return type:
None
Tracking
- m3sgg.utils.track.generalized_box_iou(boxes1, boxes2)[source]
Compute Generalized Intersection over Union (GIoU) between two sets of boxes.
Based on the paper from https://giou.stanford.edu/ The boxes should be in [x0, y0, x1, y1] format.
- Parameters:
boxes1 (torch.Tensor) – First set of bounding boxes
boxes2 (torch.Tensor) – Second set of bounding boxes
- Returns:
Pairwise GIoU matrix where N = len(boxes1) and M = len(boxes2)
- Return type:
- m3sgg.utils.track.box_iou(boxes1, boxes2)[source]
Compute Intersection over Union (IoU) between two sets of boxes.
- Parameters:
boxes1 (torch.Tensor) – First set of bounding boxes
boxes2 (torch.Tensor) – Second set of bounding boxes
- Returns:
Tuple containing IoU values and union areas
- Return type:
- m3sgg.utils.track.box_area(boxes)[source]
Compute the area of bounding boxes.
- Parameters:
boxes (torch.Tensor) – Bounding boxes in [x0, y0, x1, y1] format
- Returns:
Areas of the bounding boxes
- Return type:
- m3sgg.utils.track.get_sequence(entry, gt_annotation, matcher, im_size, mode='predcls')[source]
Process detection results and ground truth annotations with tracking/matching
- Parameters:
entry – Dictionary containing detection results with keys like ‘boxes’, ‘features’, etc.
gt_annotation – Ground truth annotations for the current sequence
matcher – Hungarian matcher for assignment
im_size – Image size information
mode – Processing mode (‘predcls’, ‘sgdet’, ‘sgcls’)
- m3sgg.utils.ds_track.box_xyxy_to_xywh(x)[source]
Convert bounding box from xyxy format to xywh format.
Converts bounding box coordinates from (x0, y0, x1, y1) format to (x, y, width, height) format.
- Parameters:
x (torch.Tensor) – Bounding box tensor in xyxy format
- Returns:
Bounding box tensor in xywh format
- Return type:
- m3sgg.utils.ds_track.get_sequence(entry, gt_annotation, shape, task='sgcls')[source]
Get sequence information for scene graph generation tasks.
Processes detection results and ground truth annotations to prepare sequence data for different scene graph generation tasks.
- Parameters:
- Returns:
None (modifies entry in-place)
- Return type:
None