soundmentations.Mask¶

class soundmentations.Mask(mask_ratio: float = 0.2, p: float = 1.0)[source]¶

Bases: BaseMask

Mask a random contiguous segment of audio data with zeros.

This transform randomly selects a contiguous time segment of the audio and replaces it with silence (zeros), simulating audio dropouts, temporal masking effects, or packet loss in streaming audio.

Parameters:

mask_ratio (float, optional) – The ratio of audio length to mask (0.0 to 1.0), by default 0.2. For example, 0.2 means 20% of the audio duration will be masked.
p (float, optional) – Probability of applying the transform, by default 1.0. Must be between 0.0 and 1.0.

Examples

>>> import numpy as np
>>> from soundmentations.transforms.time.mask import TimeMask
>>>
>>> # Create audio signal (1 second at 44.1kHz)
>>> sample_rate = 44100
>>> duration = 1.0
>>> t = np.linspace(0, duration, int(sample_rate * duration))
>>> audio = np.sin(2 * np.pi * 440 * t)  # 440Hz sine wave
>>>
>>> # Create TimeMask that masks 10% of the audio
>>> time_mask = TimeMask(mask_ratio=0.1, p=1.0)
>>> masked_audio = time_mask(audio, sample_rate=44100)
>>>
>>> # Verify that some portion is masked
>>> assert len(masked_audio) == len(audio)
>>> assert np.sum(masked_audio == 0) > 0  # Some samples are zero
>>>
>>> # Example with probability
>>> probabilistic_mask = TimeMask(mask_ratio=0.2, p=0.5)
>>> maybe_masked = probabilistic_mask(audio, sample_rate=44100)

Notes

The masking process: 1. Calculates the number of samples to mask based on mask_ratio 2. Randomly selects a starting position for the mask 3. Replaces the selected segment with zeros 4. Concatenates the unmasked portions with the masked segment

This transform is useful for: - Simulating audio dropouts or glitches - Creating training data robust to missing temporal information - Augmenting datasets for speech recognition tasks - Testing model robustness to temporal discontinuities

The mask location is uniformly random across the audio sample, ensuring no bias toward beginning or end of the audio.