soundmentations package

Subpackages

Module contents

class soundmentations.Compose(transforms)[source]

Bases: BaseCompose

Compose multiple audio transforms into a sequential pipeline.

This class allows you to chain multiple transforms together into a single callable object. Transforms are applied in the order they appear in the list, with each transform receiving the output of the previous one.

Parameters:

transforms (list) – List of transform objects to apply sequentially. Each transform must implement __call__(samples, sample_rate).

Examples

Create a basic augmentation pipeline:

>>> import soundmentations as S
>>>
>>> # Define individual transforms
>>> pipeline = S.Compose([
...     S.RandomTrim(duration=(1.0, 3.0), p=0.8),
...     S.Pad(pad_length=44100, p=0.6),
...     S.Gain(gain=6.0, p=0.5)
... ])
>>>
>>> # Apply to audio
>>> augmented = pipeline(audio_samples, sample_rate=44100)

Complex preprocessing pipeline:

>>> # ML training data preparation
>>> ml_pipeline = S.Compose([
...     S.CenterTrim(duration=2.0),              # Extract 2s from center
...     S.PadToLength(pad_length=88200),         # Normalize to exactly 2s
...     S.Gain(gain=3.0, p=0.7),                # Boost volume 70% of time
...     S.FadeIn(duration=0.1, p=0.5),          # Smooth start 50% of time
...     S.FadeOut(duration=0.1, p=0.5)          # Smooth end 50% of time
... ])
>>>
>>> # Process batch of audio files
>>> for audio in audio_batch:
...     processed = ml_pipeline(audio, sample_rate=16000)

Audio enhancement pipeline:

>>> # Clean up audio recordings
>>> enhance_pipeline = S.Compose([
...     S.StartTrim(start_time=0.5),            # Remove first 0.5s
...     S.EndTrim(end_time=10.0),               # Keep max 10s
...     S.Gain(gain=6.0),                       # Boost volume
...     S.FadeIn(duration=0.2),                 # Smooth fade-in
...     S.FadeOut(duration=0.2)                 # Smooth fade-out
... ])
>>>
>>> enhanced = enhance_pipeline(noisy_audio, sample_rate=44100)

Notes

  • Transforms are applied in order: first transform in list is applied first

  • Each transform receives the output of the previous transform

  • Probability parameters (p) in individual transforms are respected

  • The pipeline preserves mono audio format throughout

  • All transforms must accept (samples, sample_rate) parameters

See also

Individual, Trim, Pad, RandomTrim, FadeIn, FadeOut

class soundmentations.OneOf(transforms, p=1.0)[source]

Bases: BaseCompose

Apply one transform from a list of transforms, chosen at random.

This class allows you to randomly select and apply one transform from a provided list. Each transform has an equal chance of being selected unless specified otherwise.

Parameters:
  • transforms (list) – List of transform objects to choose from. Each transform must implement __call__(samples, sample_rate).

  • p (float, optional) – Probability of applying one of the transforms, by default 1.0. If not applied, the input audio is returned unchanged.

Examples

Create a random augmentation pipeline:

>>> import soundmentations as S
>>>
>>> # Define individual transforms
>>> pipeline = S.OneOf([
...     S.Gain(gain=6.0),
...     S.FadeIn(duration=0.5),
...     S.FadeOut(duration=0.5)
... ], p=0.9)
>>>
>>> # Apply to audio
>>> augmented = pipeline(audio_samples, sample_rate=44100)

Randomized preprocessing pipeline:

>>> # ML training data preparation with randomness
>>> ml_pipeline = S.OneOf([
...     S.RandomTrim(duration=(1.0, 3.0)),
...     S.Pad(pad_length=44100),
...     S.Gain(gain=3.0)
... ], p=0.8)
>>>
>>> # Process batch of audio files
>>> for audio in audio_batch:
...     processed = ml_pipeline(audio, sample_rate=16000)

Audio enhancement with random effects:

>>> # Clean up audio recordings with variability
>>> enhance_pipeline = S.OneOf([
...     S.StartTrim(start_time=0.5),
...     S.EndTrim(end_time=10.0),
...     S.FadeIn(duration=0.2),
...     S.FadeOut(duration=0.2)
... ], p=0.7)
>>>
>>> enhanced = enhance_pipeline(noisy_audio, sample_rate=44100)

Notes

  • Only one transform is applied per call, chosen at random

  • Probability parameter (p) controls likelihood of applying any transform - If not applied, input audio is returned unchanged

class soundmentations.Trim(start_time: float = 0.0, end_time: float | None = None, p: float = 1.0)[source]

Bases: BaseTrim

Trim audio to keep only the portion between start_time and end_time.

This is the most basic trimming operation that allows specifying exact start and end times for the audio segment to keep.

Parameters:
  • start_time (float, optional) – Start time in seconds to begin keeping audio, by default 0.0. Must be non-negative.

  • end_time (float, optional) – End time in seconds to stop keeping audio, by default None. If None, keeps audio until the end. Must be greater than start_time.

  • p (float, optional) – Probability of applying the transform, by default 1.0.

Examples

Trim audio to specific time range:

>>> import numpy as np
>>> from soundmentations.transforms.time import Trim
>>>
>>> # Create 5 seconds of audio at 44.1kHz
>>> audio = np.random.randn(220500)
>>>
>>> # Keep audio from 1.5 to 3.0 seconds
>>> trim_transform = Trim(start_time=1.5, end_time=3.0)
>>> trimmed = trim_transform(audio, sample_rate=44100)
>>> print(len(trimmed) / 44100)  # 1.5 seconds

Use in a pipeline:

>>> import soundmentations as S
>>>
>>> # Extract middle portion and apply gain
>>> pipeline = S.Compose([
...     S.Trim(start_time=1.0, end_time=4.0, p=1.0),
...     S.Gain(gain=6.0, p=0.5)
... ])
>>>
>>> result = pipeline(audio, sample_rate=44100)
class soundmentations.RandomTrim(duration: float | Tuple[float, float], p: float = 1.0)[source]

Bases: BaseTrim

Randomly trim audio by selecting a random segment of specified duration.

This transform randomly selects a continuous segment from the audio, useful for data augmentation where you want random crops of fixed or variable duration.

Parameters:
  • duration (float or tuple of float) – If float, exact duration to keep in seconds. If tuple (min_duration, max_duration), random duration in range.

  • p (float, optional) – Probability of applying the transform, by default 1.0.

Examples

Fixed duration random trimming:

>>> import numpy as np
>>> from soundmentations.transforms.time import RandomTrim
>>>
>>> # Always keep 2 seconds randomly
>>> trim_transform = RandomTrim(duration=2.0)
>>> trimmed = trim_transform(audio, sample_rate=44100)
>>> print(len(trimmed) / 44100)  # 2.0 seconds

Variable duration random trimming:

>>> # Keep 1-3 seconds randomly
>>> variable_trim = RandomTrim(duration=(1.0, 3.0))
>>> result = variable_trim(audio, sample_rate=44100)

Use for data augmentation:

>>> import soundmentations as S
>>>
>>> # Random crop and normalize for training
>>> augment = S.Compose([
...     S.RandomTrim(duration=(0.5, 2.5), p=0.8),
...     S.PadToLength(pad_length=88200, p=1.0),  # 2 seconds
...     S.Gain(gain=(-6, 6), p=0.5)
... ])
>>>
>>> augmented = augment(training_audio, sample_rate=44100)
class soundmentations.StartTrim(start_time: float = 0.0, p: float = 1.0)[source]

Bases: BaseTrim

Trim audio to keep only the portion starting from start_time to the end.

This removes the beginning of the audio up to start_time, keeping everything after that point.

Parameters:
  • start_time (float, optional) – Start time in seconds to begin keeping audio, by default 0.0. Must be non-negative.

  • p (float, optional) – Probability of applying the transform, by default 1.0.

Examples

Remove silence from beginning:

>>> import numpy as np
>>> from soundmentations.transforms.time import StartTrim
>>>
>>> # Remove first 2 seconds
>>> trim_transform = StartTrim(start_time=2.0)
>>> trimmed = trim_transform(audio, sample_rate=44100)

Use in preprocessing pipeline:

>>> import soundmentations as S
>>>
>>> # Remove intro and normalize
>>> preprocess = S.Compose([
...     S.StartTrim(start_time=1.5, p=1.0),
...     S.PadToLength(pad_length=132300, p=1.0)  # 3 seconds
... ])
>>>
>>> processed = preprocess(raw_audio, sample_rate=44100)
class soundmentations.EndTrim(end_time: float, p: float = 1.0)[source]

Bases: BaseTrim

Trim audio to keep only the portion from the start to end_time.

This removes the end of the audio after end_time, keeping everything before that point.

Parameters:
  • end_time (float) – End time in seconds to stop keeping audio. Must be positive.

  • p (float, optional) – Probability of applying the transform, by default 1.0.

Examples

Keep only first part of audio:

>>> import numpy as np
>>> from soundmentations.transforms.time import EndTrim
>>>
>>> # Keep first 5 seconds only
>>> trim_transform = EndTrim(end_time=5.0)
>>> trimmed = trim_transform(audio, sample_rate=44100)

Use for consistent audio lengths:

>>> import soundmentations as S
>>>
>>> # Ensure maximum 10 seconds
>>> limit_length = S.Compose([
...     S.EndTrim(end_time=10.0, p=1.0),
...     S.Gain(gain=3.0, p=0.3)
... ])
>>>
>>> limited = limit_length(long_audio, sample_rate=44100)
class soundmentations.CenterTrim(duration: float, p: float = 1.0)[source]

Bases: BaseTrim

Trim audio to keep only the center portion of specified duration.

This extracts a segment from the middle of the audio, useful for focusing on the main content while removing silence at the beginning and end.

Parameters:
  • duration (float) – Duration of the center portion to keep in seconds. Must be positive.

  • p (float, optional) – Probability of applying the transform, by default 1.0.

Examples

Extract center content:

>>> import numpy as np
>>> from soundmentations.transforms.time import CenterTrim
>>>
>>> # Keep 3 seconds from center
>>> trim_transform = CenterTrim(duration=3.0)
>>> trimmed = trim_transform(audio, sample_rate=44100)
>>> print(len(trimmed) / 44100)  # 3.0 seconds

Use for focusing on main content:

>>> import soundmentations as S
>>>
>>> # Extract center and enhance
>>> focus_pipeline = S.Compose([
...     S.CenterTrim(duration=4.0, p=1.0),
...     S.Gain(gain=6.0, p=0.6),
...     S.PadToLength(pad_length=176400, p=1.0)  # 4 seconds
... ])
>>>
>>> focused = focus_pipeline(noisy_audio, sample_rate=44100)
class soundmentations.Pad(pad_length: int, p: float = 1.0)[source]

Bases: BasePad

Pad audio to minimum length by adding zeros at the end.

If the input audio is shorter than pad_length, zeros are appended to reach the minimum length. If already longer or equal, returns unchanged.

Parameters:
  • pad_length (int) – Minimum length for the audio in samples.

  • p (float, optional) – Probability of applying the transform, by default 1.0.

Examples

Apply end padding to ensure minimum length:

>>> import numpy as np
>>> from soundmentations.transforms.time import Pad
>>>
>>> # Create short audio sample
>>> audio = np.array([0.1, 0.2, 0.3])
>>>
>>> # Pad to minimum 1000 samples
>>> pad_transform = Pad(pad_length=1000)
>>> padded = pad_transform(audio)
>>> print(len(padded))  # 1000

Use in a pipeline:

>>> import soundmentations as S
>>>
>>> # Ensure all audio is at least 2 seconds (44.1kHz)
>>> augment = S.Compose([
...     S.Pad(pad_length=88200, p=1.0),
...     S.Gain(gain=3.0, p=0.5)
... ])
>>>
>>> result = augment(audio)
class soundmentations.CenterPad(pad_length: int, p: float = 1.0)[source]

Bases: BasePad

Pad audio to minimum length by adding zeros symmetrically on both sides.

If the input audio is shorter than pad_length, zeros are added equally to both sides. For odd padding amounts, the extra zero goes to the right.

Parameters:
  • pad_length (int) – Minimum length for the audio in samples.

  • p (float, optional) – Probability of applying the transform, by default 1.0.

Examples

Apply symmetric padding:

>>> import numpy as np
>>> from soundmentations.transforms.time import CenterPad
>>>
>>> audio = np.array([1, 2, 3])
>>> pad_transform = CenterPad(pad_length=7)
>>> result = pad_transform(audio)
>>> print(result)  # [0 0 1 2 3 0 0]

Use for centering audio in fixed-length windows:

>>> # Center audio in 5-second windows (44.1kHz)
>>> center_pad = CenterPad(pad_length=220500)
>>> centered_audio = center_pad(audio_sample)
class soundmentations.StartPad(pad_length: int, p: float = 1.0)[source]

Bases: BasePad

Pad audio to minimum length by adding zeros at the beginning.

If the input audio is shorter than pad_length, zeros are prepended to reach the minimum length. If already longer or equal, returns unchanged.

Parameters:
  • pad_length (int) – Minimum length for the audio in samples.

  • p (float, optional) – Probability of applying the transform, by default 1.0.

Examples

Apply start padding:

>>> import numpy as np
>>> from soundmentations.transforms.time import StartPad
>>>
>>> audio = np.array([1, 2, 3])
>>> pad_transform = StartPad(pad_length=6)
>>> result = pad_transform(audio)
>>> print(result)  # [0 0 0 1 2 3]

Use for aligning audio to end of fixed windows:

>>> # Align audio to end of 3-second windows
>>> start_pad = StartPad(pad_length=132300)  # 3 seconds at 44.1kHz
>>> aligned_audio = start_pad(audio_sample)
class soundmentations.PadToLength(pad_length: int, p: float = 1.0)[source]

Bases: BasePad

Pad or trim audio to exact target length using end operations.

  • If shorter: adds zeros at the end to reach exact length

  • If longer: trims from the end to reach exact length

  • If equal: returns unchanged

Parameters:
  • pad_length (int) – Exact target length for the audio in samples.

  • p (float, optional) – Probability of applying the transform, by default 1.0.

Examples

Normalize all audio to exact length:

>>> import numpy as np
>>> from soundmentations.transforms.time import PadToLength
>>>
>>> # Short audio
>>> short_audio = np.array([1, 2, 3])
>>> # Long audio
>>> long_audio = np.arange(10)
>>>
>>> pad_transform = PadToLength(pad_length=5)
>>>
>>> result1 = pad_transform(short_audio)
>>> print(result1)  # [1 2 3 0 0]
>>>
>>> result2 = pad_transform(long_audio)
>>> print(result2)  # [0 1 2 3 4]

Use for fixed-length model inputs:

>>> # Ensure all audio is exactly 2 seconds for ML model
>>> normalize_length = PadToLength(pad_length=88200)  # 2s at 44.1kHz
>>> model_input = normalize_length(variable_length_audio)
class soundmentations.CenterPadToLength(pad_length: int, p: float = 1.0)[source]

Bases: BasePad

Pad or trim audio to exact target length using center operations.

  • If shorter: adds zeros symmetrically on both sides

  • If longer: trims symmetrically from both sides (keeps center)

  • If equal: returns unchanged

Parameters:
  • pad_length (int) – Exact target length for the audio in samples.

  • p (float, optional) – Probability of applying the transform, by default 1.0.

Examples

Center-normalize audio to exact length:

>>> import numpy as np
>>> from soundmentations.transforms.time import CenterPadToLength
>>>
>>> # Short audio - will be center-padded
>>> short_audio = np.array([1, 2, 3])
>>> # Long audio - will be center-trimmed
>>> long_audio = np.arange(9)
>>>
>>> pad_transform = CenterPadToLength(pad_length=7)
>>>
>>> result1 = pad_transform(short_audio)
>>> print(result1)  # [0 0 1 2 3 0 0]
>>>
>>> result2 = pad_transform(long_audio)
>>> print(result2)  # [1 2 3 4 5 6 7]

Use for preserving important audio content in center:

>>> # Keep center 3 seconds for speech processing
>>> center_normalize = CenterPadToLength(pad_length=132300)
>>> processed_audio = center_normalize(speech_audio)
class soundmentations.PadToMultiple(pad_length: int, p: float = 1.0)[source]

Bases: BasePad

Pad audio to make its length a multiple of the specified value.

This is useful for STFT operations where frame sizes must be multiples of certain values. Only adds padding at the end, never trims.

Parameters:
  • pad_length (int) – The multiple value. Audio length will be padded to next multiple of this value. Common values: 1024, 512, 256 for STFT operations.

  • p (float, optional) – Probability of applying the transform, by default 1.0.

Examples

Pad for STFT-friendly lengths:

>>> import numpy as np
>>> from soundmentations.transforms.time import PadToMultiple
>>>
>>> # Audio with length 2050 samples
>>> audio = np.random.randn(2050)
>>>
>>> # Pad to multiple of 1024 (STFT frame size)
>>> pad_transform = PadToMultiple(pad_length=1024)
>>> result = pad_transform(audio)
>>> print(len(result))  # 3072 (3 * 1024)

Use in spectral processing pipeline:

>>> import soundmentations as S
>>>
>>> # Prepare audio for spectral analysis
>>> spectral_prep = S.Compose([
...     S.PadToMultiple(pad_length=512, p=1.0),  # STFT-friendly
...     S.Gain(gain=(-3, 3), p=0.5)
... ])
>>>
>>> stft_ready_audio = spectral_prep(raw_audio)
class soundmentations.Mask(mask_ratio: float = 0.2, p: float = 1.0)[source]

Bases: BaseMask

Mask a random contiguous segment of audio data with zeros.

This transform randomly selects a contiguous time segment of the audio and replaces it with silence (zeros), simulating audio dropouts, temporal masking effects, or packet loss in streaming audio.

Parameters:
  • mask_ratio (float, optional) – The ratio of audio length to mask (0.0 to 1.0), by default 0.2. For example, 0.2 means 20% of the audio duration will be masked.

  • p (float, optional) – Probability of applying the transform, by default 1.0. Must be between 0.0 and 1.0.

Examples

>>> import numpy as np
>>> from soundmentations.transforms.time.mask import TimeMask
>>>
>>> # Create audio signal (1 second at 44.1kHz)
>>> sample_rate = 44100
>>> duration = 1.0
>>> t = np.linspace(0, duration, int(sample_rate * duration))
>>> audio = np.sin(2 * np.pi * 440 * t)  # 440Hz sine wave
>>>
>>> # Create TimeMask that masks 10% of the audio
>>> time_mask = TimeMask(mask_ratio=0.1, p=1.0)
>>> masked_audio = time_mask(audio, sample_rate=44100)
>>>
>>> # Verify that some portion is masked
>>> assert len(masked_audio) == len(audio)
>>> assert np.sum(masked_audio == 0) > 0  # Some samples are zero
>>>
>>> # Example with probability
>>> probabilistic_mask = TimeMask(mask_ratio=0.2, p=0.5)
>>> maybe_masked = probabilistic_mask(audio, sample_rate=44100)

Notes

The masking process: 1. Calculates the number of samples to mask based on mask_ratio 2. Randomly selects a starting position for the mask 3. Replaces the selected segment with zeros 4. Concatenates the unmasked portions with the masked segment

This transform is useful for: - Simulating audio dropouts or glitches - Creating training data robust to missing temporal information - Augmenting datasets for speech recognition tasks - Testing model robustness to temporal discontinuities

The mask location is uniformly random across the audio sample, ensuring no bias toward beginning or end of the audio.

class soundmentations.Gain(gain: float = 1.0, clip: bool = True, p: float = 1.0)[source]

Bases: BaseGain

Apply a fixed gain (in dB) to audio samples.

This transform multiplies the audio samples by a gain factor derived from the specified gain in decibels. Optionally clips the output to prevent values from exceeding the [-1, 1] range.

Parameters:
  • gain (float, optional) – Gain in decibels, by default 1.0. Positive values increase volume, negative values decrease volume.

  • clip (bool, optional) – Whether to clip the output to [-1, 1] range, by default True. Prevents audio distortion from excessive gain.

  • p (float, optional) – Probability of applying the gain transform, by default 1.0.

Examples

Apply a fixed gain to audio samples:

>>> import numpy as np
>>> from soundmentations.transforms.amplitude import Gain
>>>
>>> # Create audio samples
>>> samples = np.array([0.1, 0.2, -0.1, 0.3])
>>>
>>> # Apply +6dB gain
>>> gain_transform = Gain(gain=6.0)
>>> amplified = gain_transform(samples)
>>>
>>> # Apply -12dB gain with 50% probability
>>> quiet_transform = Gain(gain=-12.0, p=0.5)
>>> result = quiet_transform(samples)

Use in a pipeline with other transforms:

>>> import soundmentations as S
>>>
>>> # Create augmentation pipeline
>>> augment = S.Compose([
...     S.RandomTrim(duration=(1.0, 3.0), p=0.8),
...     S.Gain(gain=6.0, clip=True, p=0.7),
...     S.PadToLength(pad_length=44100, p=0.5)
... ])
>>>
>>> # Apply pipeline to audio
>>> audio_samples = np.random.randn(22050)  # 0.5 seconds at 44.1kHz
>>> augmented = augment(samples=audio_samples, sample_rate=44100)

Different gain scenarios:

>>> # Boost quiet audio
>>> boost = Gain(gain=12.0, clip=True)
>>>
>>> # Attenuate loud audio
>>> attenuate = Gain(gain=-6.0, clip=False)
>>>
>>> # Random volume variation
>>> random_volume = Gain(gain=np.random.uniform(-10, 10), p=0.6)
class soundmentations.RandomGain(min_gain: float, max_gain: float, clip: bool = True, p: float = 1.0)[source]

Bases: Gain

Apply a random gain to audio samples within a specified range.

This transform randomly selects a gain value from a uniform distribution between min_gain and max_gain, applying it to the audio samples.

Parameters:
  • min_gain (float) – Minimum gain in decibels.

  • max_gain (float) – Maximum gain in decibels.

  • clip (bool, optional) – Whether to clip the output to [-1, 1] range, by default True.

  • p (float, optional) – Probability of applying the random gain transform, by default 1.0.

Examples

>>> import numpy as np
>>> from soundmentations.transforms.amplitude import RandomGain
>>>
>>> # Create audio samples
>>> samples = np.array([0.1, 0.2, -0.1, 0.3])
>>>
>>> # Apply random gain between -6dB and +6dB
>>> random_gain_transform = RandomGain(min_gain=-6.0, max_gain=6.0)
>>> result = random_gain_transform(samples)
class soundmentations.PerSampleRandomGain(min_gain: float, max_gain: float, clip: bool = True, p: float = 1.0)[source]

Bases: Gain

Apply a different random gain to each audio sample in a batch.

This transform applies a unique random gain value, drawn from a uniform distribution between min_gain and max_gain, to each sample in the input batch. This is useful for batch processing where you want different gain variations for each audio sample in the batch, creating diverse augmentations.

Parameters:
  • min_gain (float) – Minimum gain in decibels for the random range.

  • max_gain (float) – Maximum gain in decibels for the random range.

  • clip (bool, optional) – Whether to clip the output to [-1, 1] range, by default True.

  • p (float, optional) – Probability of applying the per-sample random gain transform, by default 1.0.

Examples

Basic batch processing:

>>> import numpy as np
>>> from soundmentations.transforms.amplitude import PerSampleRandomGain
>>>
>>> # Create batch of audio samples (2 samples, each 1000 samples long)
>>> batch_samples = np.random.randn(2, 1000) * 0.1
>>>
>>> # Apply different random gain to each sample in batch
>>> per_sample_transform = PerSampleRandomGain(min_gain=-6.0, max_gain=6.0)
>>> result = per_sample_transform(batch_samples)
>>>
>>> # Each row now has a different gain applied
>>> print(f"Sample 1 max: {np.max(np.abs(result[0])):.3f}")
>>> print(f"Sample 2 max: {np.max(np.abs(result[1])):.3f}")

Machine learning data augmentation:

>>> # Training data preparation with varied gains
>>> ml_augment = PerSampleRandomGain(
...     min_gain=-12.0,
...     max_gain=12.0,
...     clip=True,
...     p=0.8
... )
>>>
>>> # Process batch for training
>>> training_batch = np.random.randn(32, 16000)  # 32 samples, 16k each
>>> augmented_batch = ml_augment(training_batch)

Different use cases:

>>> # Subtle variations for speech data
>>> speech_augment = PerSampleRandomGain(min_gain=-3.0, max_gain=3.0)
>>>
>>> # Dramatic variations for sound effects
>>> sfx_augment = PerSampleRandomGain(min_gain=-20.0, max_gain=10.0)
>>>
>>> # Conservative augmentation with low probability
>>> conservative_augment = PerSampleRandomGain(
...     min_gain=-1.5, max_gain=1.5, p=0.3
... )

Notes

  • Requires 2D input arrays where first dimension is batch size

  • Each sample in the batch gets an independent random gain

  • Useful for creating diverse training data in machine learning

  • The transform maintains the batch structure and sample lengths

  • Random gains are independently sampled for each batch item

Raises:

ValueError – If input is not a 2D array or if min_gain > max_gain

See also

RandomGain

Apply a single random gain to entire audio

Gain

Apply a fixed gain to audio samples

RandomGainEnvelope

Apply a smoothly varying gain envelope

class soundmentations.RandomGainEnvelope(min_gain: float, max_gain: float, n_control_points: int = 10, clip: bool = True, p: float = 1.0)[source]

Bases: Gain

Apply a smoothly varying random gain envelope to audio samples.

This transform creates a smooth gain envelope by generating random gain values at control points and interpolating between them. This results in gradual gain changes over time, useful for creating natural volume variations.

Parameters:
  • min_gain (float) – Minimum gain in decibels for the envelope.

  • max_gain (float) – Maximum gain in decibels for the envelope.

  • n_control_points (int, optional) – Number of control points for the gain envelope, by default 10. More points create more detailed envelope variations.

  • clip (bool, optional) – Whether to clip the output to [-1, 1] range, by default True.

  • p (float, optional) – Probability of applying the random gain envelope transform, by default 1.0.

Examples

Apply a smooth random gain envelope:

>>> import numpy as np
>>> from soundmentations.transforms.amplitude import RandomGainEnvelope
>>>
>>> # Create audio samples (1 second at 8kHz)
>>> samples = np.random.randn(8000) * 0.1
>>>
>>> # Apply smooth gain envelope with 5 control points
>>> envelope_transform = RandomGainEnvelope(
...     min_gain=-12.0,
...     max_gain=6.0,
...     n_control_points=5
... )
>>> result = envelope_transform(samples)

Use in audio processing pipeline:

>>> import soundmentations as S
>>>
>>> # Create dynamic volume processing
>>> dynamic_pipeline = S.Compose([
...     S.RandomGainEnvelope(min_gain=-9.0, max_gain=3.0, n_control_points=8, p=0.7),
...     S.Gain(gain=6.0, p=0.5)  # Additional boost
... ])
>>>
>>> # Process audio with natural volume variations
>>> processed = dynamic_pipeline(samples, sample_rate=44100)

Different envelope scenarios:

>>> # Subtle volume variations for music
>>> subtle_envelope = RandomGainEnvelope(
...     min_gain=-3.0, max_gain=3.0, n_control_points=15
... )
>>>
>>> # Dramatic variations for sound effects
>>> dramatic_envelope = RandomGainEnvelope(
...     min_gain=-20.0, max_gain=10.0, n_control_points=5
... )
>>>
>>> # High-resolution envelope for detailed control
>>> detailed_envelope = RandomGainEnvelope(
...     min_gain=-6.0, max_gain=6.0, n_control_points=50
... )

Notes

  • The envelope is created by linearly interpolating between random gain values

  • More control points create more complex envelope shapes

  • The envelope affects the entire audio sample duration

  • Gain values are converted from dB to linear scale before application

  • The transform preserves audio sample length and format

See also

RandomGain

Apply a single random gain to entire audio

Gain

Apply a fixed gain to audio samples

class soundmentations.Limiter(threshold: float = 0.9, p: float = 1.0)[source]

Bases: BaseLimiter

Apply hard limiting to audio samples to prevent clipping.

This transform clips audio samples that exceed the specified threshold, preventing digital clipping and maintaining signal integrity within the specified dynamic range.

Parameters:
  • threshold (float, optional) – The threshold level for limiting, by default 0.9. Values above this threshold will be clipped. Must be between 0.0 and 1.0.

  • p (float, optional) – Probability of applying the transform, by default 1.0. Must be between 0.0 and 1.0.

Examples

Apply hard limiting to prevent clipping:

>>> import numpy as np
>>> from soundmentations.transforms.amplitude import Limiter
>>>
>>> # Create audio with some peaks above 0.9
>>> audio = np.array([0.5, 1.2, -1.5, 0.8, 0.95])
>>>
>>> # Apply limiting at 0.9 threshold
>>> limiter = Limiter(threshold=0.9)
>>> limited = limiter(audio, sample_rate=44100)
>>> print(limited)  # [0.5, 0.9, -0.9, 0.8, 0.9]

Use in audio processing pipeline:

>>> import soundmentations as S
>>>
>>> # Safe audio processing with limiting
>>> safe_pipeline = S.Compose([
...     S.Gain(gain=12.0, p=1.0),           # Boost signal
...     S.Limiter(threshold=0.95, p=1.0),   # Prevent clipping
...     S.FadeOut(duration=0.1, p=0.5)      # Smooth ending
... ])
>>>
>>> processed = safe_pipeline(audio, sample_rate=44100)

Protect against digital distortion:

>>> # Conservative limiting for pristine quality
>>> conservative_limiter = Limiter(threshold=0.8, p=1.0)
>>> clean_audio = conservative_limiter(loud_audio, sample_rate=44100)
class soundmentations.FadeIn(duration: float = 0.1, p: float = 1.0)[source]

Bases: BaseFade

Fade-in effect for audio samples.

This transform applies a fade-in effect to the beginning of the audio samples.

class soundmentations.FadeOut(duration: float = 0.1, p: float = 1.0)[source]

Bases: BaseFade

Apply a fade-out effect to the end of audio samples.

This transform gradually decreases the amplitude from full amplitude to silence (0) over the specified duration, creating a smooth fade-out effect.

Parameters:
  • duration (float, optional) – Duration of the fade-out effect in seconds, by default 0.1. Must be positive and less than the audio duration.

  • p (float, optional) – Probability of applying the transform, by default 1.0.

class soundmentations.Compressor(threshold: float, ratio: float, attack_time: float, release_time: float, p: float = 1.0)[source]

Bases: BaseCompressor

Apply dynamic range compression to the audio sample.

This compressor uses an envelope follower with configurable attack and release times to track the signal level, then applies gain reduction based on a threshold and compression ratio.

Parameters:
  • threshold (float) – The threshold above which compression is applied, in dB. Typical values range from -40 to -6 dB.

  • ratio (float) – The compression ratio to apply. Must be >= 1.0. - 1.0 = no compression - 2.0 = 2:1 compression - 10.0 = 10:1 compression (heavy compression)

  • attack_time (float) – Attack time in milliseconds. How quickly the compressor responds to signals above the threshold. Typical values: 0.1 to 100 ms.

  • release_time (float) – Release time in milliseconds. How quickly the compressor stops compressing after the signal falls below threshold. Typical values: 10 to 1000 ms.

  • p (float, optional) – Probability of applying the transform, by default 1.0.

Examples

>>> import numpy as np
>>> from soundmentations.transforms.amplitude.compressor import Compressor
>>>
>>> # Create a compressor with 4:1 ratio and -12dB threshold
>>> compressor = Compressor(threshold=-12.0, ratio=4.0,
...                        attack_time=5.0, release_time=50.0)
>>>
>>> # Apply to a sine wave
>>> sample_rate = 44100
>>> duration = 1.0
>>> t = np.linspace(0, duration, int(sample_rate * duration))
>>> audio = np.sin(2 * np.pi * 440 * t) * 0.8  # 440Hz sine wave
>>> compressed = compressor(audio, sample_rate)

Notes

The compressor implementation uses: - Linear threshold conversion from dB - Exponential envelope follower with separate attack/release coefficients - Logarithmic gain calculation for smooth compression curves

The envelope follower uses first-order low-pass filtering to smooth the absolute value of the input signal, with different time constants for attack (signal increasing) and release (signal decreasing).

class soundmentations.PitchShift(semitones: float, p: float = 1.0)[source]

Bases: BasePitchShift

Shift the pitch of audio by a specified number of semitones.

Parameters:
  • semitones (float) – Number of semitones to shift (positive or negative). - 12 semitones = 1 octave - Positive: pitch up, Negative: pitch down

  • p (float, optional) – Probability of applying the transform, by default 1.0.

class soundmentations.RandomPitchShift(min_semitones: float = -2.0, max_semitones: float = 2.0, p: float = 1.0)[source]

Bases: BasePitchShift

Randomly shift the pitch within a specified semitone range.

This class wraps PitchShift to provide random pitch variations for data augmentation purposes.

Parameters:
  • min_semitones (float, optional) – Minimum semitone shift, by default -2.0.

  • max_semitones (float, optional) – Maximum semitone shift, by default 2.0.

  • p (float, optional) – Probability of applying the transform, by default 1.0.

Examples

>>> # Random pitch variation for training data
>>> random_pitch = RandomPitchShift(min_semitones=-1.0, max_semitones=1.0, p=0.8)
>>> augmented = random_pitch(audio, sample_rate=44100)
soundmentations.load_audio(file_path: str, sample_rate: int | None = None) Tuple[ndarray, int][source]

Load an audio file and return the audio data as a mono numpy array.

Parameters: - file_path (str): Path to the audio file. - sample_rate (int, optional): Desired sample rate. If None, uses the original sample rate.

Returns: - Tuple[np.ndarray, int]: Mono audio data as numpy array and sample rate.

Raises: - FileNotFoundError: If the audio file doesn’t exist. - ValueError: If the audio file format is unsupported. - RuntimeError: If resampling fails.