soundmentations.Mask¶
- class soundmentations.Mask(mask_ratio: float = 0.2, p: float = 1.0)[source]¶
Bases:
BaseMaskMask a random contiguous segment of audio data with zeros.
This transform randomly selects a contiguous time segment of the audio and replaces it with silence (zeros), simulating audio dropouts, temporal masking effects, or packet loss in streaming audio.
- Parameters:
Examples
>>> import numpy as np >>> from soundmentations.transforms.time.mask import TimeMask >>> >>> # Create audio signal (1 second at 44.1kHz) >>> sample_rate = 44100 >>> duration = 1.0 >>> t = np.linspace(0, duration, int(sample_rate * duration)) >>> audio = np.sin(2 * np.pi * 440 * t) # 440Hz sine wave >>> >>> # Create TimeMask that masks 10% of the audio >>> time_mask = TimeMask(mask_ratio=0.1, p=1.0) >>> masked_audio = time_mask(audio, sample_rate=44100) >>> >>> # Verify that some portion is masked >>> assert len(masked_audio) == len(audio) >>> assert np.sum(masked_audio == 0) > 0 # Some samples are zero >>> >>> # Example with probability >>> probabilistic_mask = TimeMask(mask_ratio=0.2, p=0.5) >>> maybe_masked = probabilistic_mask(audio, sample_rate=44100)
Notes
The masking process: 1. Calculates the number of samples to mask based on mask_ratio 2. Randomly selects a starting position for the mask 3. Replaces the selected segment with zeros 4. Concatenates the unmasked portions with the masked segment
This transform is useful for: - Simulating audio dropouts or glitches - Creating training data robust to missing temporal information - Augmenting datasets for speech recognition tasks - Testing model robustness to temporal discontinuities
The mask location is uniformly random across the audio sample, ensuring no bias toward beginning or end of the audio.