soundmentations package¶
Subpackages¶
Module contents¶
- class soundmentations.Compose(transforms)[source]¶
Bases:
BaseComposeCompose multiple audio transforms into a sequential pipeline.
This class allows you to chain multiple transforms together into a single callable object. Transforms are applied in the order they appear in the list, with each transform receiving the output of the previous one.
- Parameters:
transforms (list) – List of transform objects to apply sequentially. Each transform must implement __call__(samples, sample_rate).
Examples
Create a basic augmentation pipeline:
>>> import soundmentations as S >>> >>> # Define individual transforms >>> pipeline = S.Compose([ ... S.RandomTrim(duration=(1.0, 3.0), p=0.8), ... S.Pad(pad_length=44100, p=0.6), ... S.Gain(gain=6.0, p=0.5) ... ]) >>> >>> # Apply to audio >>> augmented = pipeline(audio_samples, sample_rate=44100)
Complex preprocessing pipeline:
>>> # ML training data preparation >>> ml_pipeline = S.Compose([ ... S.CenterTrim(duration=2.0), # Extract 2s from center ... S.PadToLength(pad_length=88200), # Normalize to exactly 2s ... S.Gain(gain=3.0, p=0.7), # Boost volume 70% of time ... S.FadeIn(duration=0.1, p=0.5), # Smooth start 50% of time ... S.FadeOut(duration=0.1, p=0.5) # Smooth end 50% of time ... ]) >>> >>> # Process batch of audio files >>> for audio in audio_batch: ... processed = ml_pipeline(audio, sample_rate=16000)
Audio enhancement pipeline:
>>> # Clean up audio recordings >>> enhance_pipeline = S.Compose([ ... S.StartTrim(start_time=0.5), # Remove first 0.5s ... S.EndTrim(end_time=10.0), # Keep max 10s ... S.Gain(gain=6.0), # Boost volume ... S.FadeIn(duration=0.2), # Smooth fade-in ... S.FadeOut(duration=0.2) # Smooth fade-out ... ]) >>> >>> enhanced = enhance_pipeline(noisy_audio, sample_rate=44100)
Notes
Transforms are applied in order: first transform in list is applied first
Each transform receives the output of the previous transform
Probability parameters (p) in individual transforms are respected
The pipeline preserves mono audio format throughout
All transforms must accept (samples, sample_rate) parameters
See also
Individual,Trim,Pad,RandomTrim,FadeIn,FadeOut
- class soundmentations.OneOf(transforms, p=1.0)[source]¶
Bases:
BaseComposeApply one transform from a list of transforms, chosen at random.
This class allows you to randomly select and apply one transform from a provided list. Each transform has an equal chance of being selected unless specified otherwise.
- Parameters:
Examples
Create a random augmentation pipeline:
>>> import soundmentations as S >>> >>> # Define individual transforms >>> pipeline = S.OneOf([ ... S.Gain(gain=6.0), ... S.FadeIn(duration=0.5), ... S.FadeOut(duration=0.5) ... ], p=0.9) >>> >>> # Apply to audio >>> augmented = pipeline(audio_samples, sample_rate=44100)
Randomized preprocessing pipeline:
>>> # ML training data preparation with randomness >>> ml_pipeline = S.OneOf([ ... S.RandomTrim(duration=(1.0, 3.0)), ... S.Pad(pad_length=44100), ... S.Gain(gain=3.0) ... ], p=0.8) >>> >>> # Process batch of audio files >>> for audio in audio_batch: ... processed = ml_pipeline(audio, sample_rate=16000)
Audio enhancement with random effects:
>>> # Clean up audio recordings with variability >>> enhance_pipeline = S.OneOf([ ... S.StartTrim(start_time=0.5), ... S.EndTrim(end_time=10.0), ... S.FadeIn(duration=0.2), ... S.FadeOut(duration=0.2) ... ], p=0.7) >>> >>> enhanced = enhance_pipeline(noisy_audio, sample_rate=44100)
Notes
Only one transform is applied per call, chosen at random
Probability parameter (p) controls likelihood of applying any transform - If not applied, input audio is returned unchanged
- class soundmentations.Trim(start_time: float = 0.0, end_time: float | None = None, p: float = 1.0)[source]¶
Bases:
BaseTrimTrim audio to keep only the portion between start_time and end_time.
This is the most basic trimming operation that allows specifying exact start and end times for the audio segment to keep.
- Parameters:
start_time (float, optional) – Start time in seconds to begin keeping audio, by default 0.0. Must be non-negative.
end_time (float, optional) – End time in seconds to stop keeping audio, by default None. If None, keeps audio until the end. Must be greater than start_time.
p (float, optional) – Probability of applying the transform, by default 1.0.
Examples
Trim audio to specific time range:
>>> import numpy as np >>> from soundmentations.transforms.time import Trim >>> >>> # Create 5 seconds of audio at 44.1kHz >>> audio = np.random.randn(220500) >>> >>> # Keep audio from 1.5 to 3.0 seconds >>> trim_transform = Trim(start_time=1.5, end_time=3.0) >>> trimmed = trim_transform(audio, sample_rate=44100) >>> print(len(trimmed) / 44100) # 1.5 seconds
Use in a pipeline:
>>> import soundmentations as S >>> >>> # Extract middle portion and apply gain >>> pipeline = S.Compose([ ... S.Trim(start_time=1.0, end_time=4.0, p=1.0), ... S.Gain(gain=6.0, p=0.5) ... ]) >>> >>> result = pipeline(audio, sample_rate=44100)
- class soundmentations.RandomTrim(duration: float | Tuple[float, float], p: float = 1.0)[source]¶
Bases:
BaseTrimRandomly trim audio by selecting a random segment of specified duration.
This transform randomly selects a continuous segment from the audio, useful for data augmentation where you want random crops of fixed or variable duration.
- Parameters:
Examples
Fixed duration random trimming:
>>> import numpy as np >>> from soundmentations.transforms.time import RandomTrim >>> >>> # Always keep 2 seconds randomly >>> trim_transform = RandomTrim(duration=2.0) >>> trimmed = trim_transform(audio, sample_rate=44100) >>> print(len(trimmed) / 44100) # 2.0 seconds
Variable duration random trimming:
>>> # Keep 1-3 seconds randomly >>> variable_trim = RandomTrim(duration=(1.0, 3.0)) >>> result = variable_trim(audio, sample_rate=44100)
Use for data augmentation:
>>> import soundmentations as S >>> >>> # Random crop and normalize for training >>> augment = S.Compose([ ... S.RandomTrim(duration=(0.5, 2.5), p=0.8), ... S.PadToLength(pad_length=88200, p=1.0), # 2 seconds ... S.Gain(gain=(-6, 6), p=0.5) ... ]) >>> >>> augmented = augment(training_audio, sample_rate=44100)
- class soundmentations.StartTrim(start_time: float = 0.0, p: float = 1.0)[source]¶
Bases:
BaseTrimTrim audio to keep only the portion starting from start_time to the end.
This removes the beginning of the audio up to start_time, keeping everything after that point.
- Parameters:
Examples
Remove silence from beginning:
>>> import numpy as np >>> from soundmentations.transforms.time import StartTrim >>> >>> # Remove first 2 seconds >>> trim_transform = StartTrim(start_time=2.0) >>> trimmed = trim_transform(audio, sample_rate=44100)
Use in preprocessing pipeline:
>>> import soundmentations as S >>> >>> # Remove intro and normalize >>> preprocess = S.Compose([ ... S.StartTrim(start_time=1.5, p=1.0), ... S.PadToLength(pad_length=132300, p=1.0) # 3 seconds ... ]) >>> >>> processed = preprocess(raw_audio, sample_rate=44100)
- class soundmentations.EndTrim(end_time: float, p: float = 1.0)[source]¶
Bases:
BaseTrimTrim audio to keep only the portion from the start to end_time.
This removes the end of the audio after end_time, keeping everything before that point.
- Parameters:
Examples
Keep only first part of audio:
>>> import numpy as np >>> from soundmentations.transforms.time import EndTrim >>> >>> # Keep first 5 seconds only >>> trim_transform = EndTrim(end_time=5.0) >>> trimmed = trim_transform(audio, sample_rate=44100)
Use for consistent audio lengths:
>>> import soundmentations as S >>> >>> # Ensure maximum 10 seconds >>> limit_length = S.Compose([ ... S.EndTrim(end_time=10.0, p=1.0), ... S.Gain(gain=3.0, p=0.3) ... ]) >>> >>> limited = limit_length(long_audio, sample_rate=44100)
- class soundmentations.CenterTrim(duration: float, p: float = 1.0)[source]¶
Bases:
BaseTrimTrim audio to keep only the center portion of specified duration.
This extracts a segment from the middle of the audio, useful for focusing on the main content while removing silence at the beginning and end.
- Parameters:
Examples
Extract center content:
>>> import numpy as np >>> from soundmentations.transforms.time import CenterTrim >>> >>> # Keep 3 seconds from center >>> trim_transform = CenterTrim(duration=3.0) >>> trimmed = trim_transform(audio, sample_rate=44100) >>> print(len(trimmed) / 44100) # 3.0 seconds
Use for focusing on main content:
>>> import soundmentations as S >>> >>> # Extract center and enhance >>> focus_pipeline = S.Compose([ ... S.CenterTrim(duration=4.0, p=1.0), ... S.Gain(gain=6.0, p=0.6), ... S.PadToLength(pad_length=176400, p=1.0) # 4 seconds ... ]) >>> >>> focused = focus_pipeline(noisy_audio, sample_rate=44100)
- class soundmentations.Pad(pad_length: int, p: float = 1.0)[source]¶
Bases:
BasePadPad audio to minimum length by adding zeros at the end.
If the input audio is shorter than pad_length, zeros are appended to reach the minimum length. If already longer or equal, returns unchanged.
- Parameters:
Examples
Apply end padding to ensure minimum length:
>>> import numpy as np >>> from soundmentations.transforms.time import Pad >>> >>> # Create short audio sample >>> audio = np.array([0.1, 0.2, 0.3]) >>> >>> # Pad to minimum 1000 samples >>> pad_transform = Pad(pad_length=1000) >>> padded = pad_transform(audio) >>> print(len(padded)) # 1000
Use in a pipeline:
>>> import soundmentations as S >>> >>> # Ensure all audio is at least 2 seconds (44.1kHz) >>> augment = S.Compose([ ... S.Pad(pad_length=88200, p=1.0), ... S.Gain(gain=3.0, p=0.5) ... ]) >>> >>> result = augment(audio)
- class soundmentations.CenterPad(pad_length: int, p: float = 1.0)[source]¶
Bases:
BasePadPad audio to minimum length by adding zeros symmetrically on both sides.
If the input audio is shorter than pad_length, zeros are added equally to both sides. For odd padding amounts, the extra zero goes to the right.
- Parameters:
Examples
Apply symmetric padding:
>>> import numpy as np >>> from soundmentations.transforms.time import CenterPad >>> >>> audio = np.array([1, 2, 3]) >>> pad_transform = CenterPad(pad_length=7) >>> result = pad_transform(audio) >>> print(result) # [0 0 1 2 3 0 0]
Use for centering audio in fixed-length windows:
>>> # Center audio in 5-second windows (44.1kHz) >>> center_pad = CenterPad(pad_length=220500) >>> centered_audio = center_pad(audio_sample)
- class soundmentations.StartPad(pad_length: int, p: float = 1.0)[source]¶
Bases:
BasePadPad audio to minimum length by adding zeros at the beginning.
If the input audio is shorter than pad_length, zeros are prepended to reach the minimum length. If already longer or equal, returns unchanged.
- Parameters:
Examples
Apply start padding:
>>> import numpy as np >>> from soundmentations.transforms.time import StartPad >>> >>> audio = np.array([1, 2, 3]) >>> pad_transform = StartPad(pad_length=6) >>> result = pad_transform(audio) >>> print(result) # [0 0 0 1 2 3]
Use for aligning audio to end of fixed windows:
>>> # Align audio to end of 3-second windows >>> start_pad = StartPad(pad_length=132300) # 3 seconds at 44.1kHz >>> aligned_audio = start_pad(audio_sample)
- class soundmentations.PadToLength(pad_length: int, p: float = 1.0)[source]¶
Bases:
BasePadPad or trim audio to exact target length using end operations.
If shorter: adds zeros at the end to reach exact length
If longer: trims from the end to reach exact length
If equal: returns unchanged
- Parameters:
Examples
Normalize all audio to exact length:
>>> import numpy as np >>> from soundmentations.transforms.time import PadToLength >>> >>> # Short audio >>> short_audio = np.array([1, 2, 3]) >>> # Long audio >>> long_audio = np.arange(10) >>> >>> pad_transform = PadToLength(pad_length=5) >>> >>> result1 = pad_transform(short_audio) >>> print(result1) # [1 2 3 0 0] >>> >>> result2 = pad_transform(long_audio) >>> print(result2) # [0 1 2 3 4]
Use for fixed-length model inputs:
>>> # Ensure all audio is exactly 2 seconds for ML model >>> normalize_length = PadToLength(pad_length=88200) # 2s at 44.1kHz >>> model_input = normalize_length(variable_length_audio)
- class soundmentations.CenterPadToLength(pad_length: int, p: float = 1.0)[source]¶
Bases:
BasePadPad or trim audio to exact target length using center operations.
If shorter: adds zeros symmetrically on both sides
If longer: trims symmetrically from both sides (keeps center)
If equal: returns unchanged
- Parameters:
Examples
Center-normalize audio to exact length:
>>> import numpy as np >>> from soundmentations.transforms.time import CenterPadToLength >>> >>> # Short audio - will be center-padded >>> short_audio = np.array([1, 2, 3]) >>> # Long audio - will be center-trimmed >>> long_audio = np.arange(9) >>> >>> pad_transform = CenterPadToLength(pad_length=7) >>> >>> result1 = pad_transform(short_audio) >>> print(result1) # [0 0 1 2 3 0 0] >>> >>> result2 = pad_transform(long_audio) >>> print(result2) # [1 2 3 4 5 6 7]
Use for preserving important audio content in center:
>>> # Keep center 3 seconds for speech processing >>> center_normalize = CenterPadToLength(pad_length=132300) >>> processed_audio = center_normalize(speech_audio)
- class soundmentations.PadToMultiple(pad_length: int, p: float = 1.0)[source]¶
Bases:
BasePadPad audio to make its length a multiple of the specified value.
This is useful for STFT operations where frame sizes must be multiples of certain values. Only adds padding at the end, never trims.
- Parameters:
Examples
Pad for STFT-friendly lengths:
>>> import numpy as np >>> from soundmentations.transforms.time import PadToMultiple >>> >>> # Audio with length 2050 samples >>> audio = np.random.randn(2050) >>> >>> # Pad to multiple of 1024 (STFT frame size) >>> pad_transform = PadToMultiple(pad_length=1024) >>> result = pad_transform(audio) >>> print(len(result)) # 3072 (3 * 1024)
Use in spectral processing pipeline:
>>> import soundmentations as S >>> >>> # Prepare audio for spectral analysis >>> spectral_prep = S.Compose([ ... S.PadToMultiple(pad_length=512, p=1.0), # STFT-friendly ... S.Gain(gain=(-3, 3), p=0.5) ... ]) >>> >>> stft_ready_audio = spectral_prep(raw_audio)
- class soundmentations.Mask(mask_ratio: float = 0.2, p: float = 1.0)[source]¶
Bases:
BaseMaskMask a random contiguous segment of audio data with zeros.
This transform randomly selects a contiguous time segment of the audio and replaces it with silence (zeros), simulating audio dropouts, temporal masking effects, or packet loss in streaming audio.
- Parameters:
Examples
>>> import numpy as np >>> from soundmentations.transforms.time.mask import TimeMask >>> >>> # Create audio signal (1 second at 44.1kHz) >>> sample_rate = 44100 >>> duration = 1.0 >>> t = np.linspace(0, duration, int(sample_rate * duration)) >>> audio = np.sin(2 * np.pi * 440 * t) # 440Hz sine wave >>> >>> # Create TimeMask that masks 10% of the audio >>> time_mask = TimeMask(mask_ratio=0.1, p=1.0) >>> masked_audio = time_mask(audio, sample_rate=44100) >>> >>> # Verify that some portion is masked >>> assert len(masked_audio) == len(audio) >>> assert np.sum(masked_audio == 0) > 0 # Some samples are zero >>> >>> # Example with probability >>> probabilistic_mask = TimeMask(mask_ratio=0.2, p=0.5) >>> maybe_masked = probabilistic_mask(audio, sample_rate=44100)
Notes
The masking process: 1. Calculates the number of samples to mask based on mask_ratio 2. Randomly selects a starting position for the mask 3. Replaces the selected segment with zeros 4. Concatenates the unmasked portions with the masked segment
This transform is useful for: - Simulating audio dropouts or glitches - Creating training data robust to missing temporal information - Augmenting datasets for speech recognition tasks - Testing model robustness to temporal discontinuities
The mask location is uniformly random across the audio sample, ensuring no bias toward beginning or end of the audio.
- class soundmentations.Gain(gain: float = 1.0, clip: bool = True, p: float = 1.0)[source]¶
Bases:
BaseGainApply a fixed gain (in dB) to audio samples.
This transform multiplies the audio samples by a gain factor derived from the specified gain in decibels. Optionally clips the output to prevent values from exceeding the [-1, 1] range.
- Parameters:
gain (float, optional) – Gain in decibels, by default 1.0. Positive values increase volume, negative values decrease volume.
clip (bool, optional) – Whether to clip the output to [-1, 1] range, by default True. Prevents audio distortion from excessive gain.
p (float, optional) – Probability of applying the gain transform, by default 1.0.
Examples
Apply a fixed gain to audio samples:
>>> import numpy as np >>> from soundmentations.transforms.amplitude import Gain >>> >>> # Create audio samples >>> samples = np.array([0.1, 0.2, -0.1, 0.3]) >>> >>> # Apply +6dB gain >>> gain_transform = Gain(gain=6.0) >>> amplified = gain_transform(samples) >>> >>> # Apply -12dB gain with 50% probability >>> quiet_transform = Gain(gain=-12.0, p=0.5) >>> result = quiet_transform(samples)
Use in a pipeline with other transforms:
>>> import soundmentations as S >>> >>> # Create augmentation pipeline >>> augment = S.Compose([ ... S.RandomTrim(duration=(1.0, 3.0), p=0.8), ... S.Gain(gain=6.0, clip=True, p=0.7), ... S.PadToLength(pad_length=44100, p=0.5) ... ]) >>> >>> # Apply pipeline to audio >>> audio_samples = np.random.randn(22050) # 0.5 seconds at 44.1kHz >>> augmented = augment(samples=audio_samples, sample_rate=44100)
Different gain scenarios:
>>> # Boost quiet audio >>> boost = Gain(gain=12.0, clip=True) >>> >>> # Attenuate loud audio >>> attenuate = Gain(gain=-6.0, clip=False) >>> >>> # Random volume variation >>> random_volume = Gain(gain=np.random.uniform(-10, 10), p=0.6)
- class soundmentations.RandomGain(min_gain: float, max_gain: float, clip: bool = True, p: float = 1.0)[source]¶
Bases:
GainApply a random gain to audio samples within a specified range.
This transform randomly selects a gain value from a uniform distribution between min_gain and max_gain, applying it to the audio samples.
- Parameters:
Examples
>>> import numpy as np >>> from soundmentations.transforms.amplitude import RandomGain >>> >>> # Create audio samples >>> samples = np.array([0.1, 0.2, -0.1, 0.3]) >>> >>> # Apply random gain between -6dB and +6dB >>> random_gain_transform = RandomGain(min_gain=-6.0, max_gain=6.0) >>> result = random_gain_transform(samples)
- class soundmentations.PerSampleRandomGain(min_gain: float, max_gain: float, clip: bool = True, p: float = 1.0)[source]¶
Bases:
GainApply a different random gain to each audio sample in a batch.
This transform applies a unique random gain value, drawn from a uniform distribution between min_gain and max_gain, to each sample in the input batch. This is useful for batch processing where you want different gain variations for each audio sample in the batch, creating diverse augmentations.
- Parameters:
min_gain (float) – Minimum gain in decibels for the random range.
max_gain (float) – Maximum gain in decibels for the random range.
clip (bool, optional) – Whether to clip the output to [-1, 1] range, by default True.
p (float, optional) – Probability of applying the per-sample random gain transform, by default 1.0.
Examples
Basic batch processing:
>>> import numpy as np >>> from soundmentations.transforms.amplitude import PerSampleRandomGain >>> >>> # Create batch of audio samples (2 samples, each 1000 samples long) >>> batch_samples = np.random.randn(2, 1000) * 0.1 >>> >>> # Apply different random gain to each sample in batch >>> per_sample_transform = PerSampleRandomGain(min_gain=-6.0, max_gain=6.0) >>> result = per_sample_transform(batch_samples) >>> >>> # Each row now has a different gain applied >>> print(f"Sample 1 max: {np.max(np.abs(result[0])):.3f}") >>> print(f"Sample 2 max: {np.max(np.abs(result[1])):.3f}")
Machine learning data augmentation:
>>> # Training data preparation with varied gains >>> ml_augment = PerSampleRandomGain( ... min_gain=-12.0, ... max_gain=12.0, ... clip=True, ... p=0.8 ... ) >>> >>> # Process batch for training >>> training_batch = np.random.randn(32, 16000) # 32 samples, 16k each >>> augmented_batch = ml_augment(training_batch)
Different use cases:
>>> # Subtle variations for speech data >>> speech_augment = PerSampleRandomGain(min_gain=-3.0, max_gain=3.0) >>> >>> # Dramatic variations for sound effects >>> sfx_augment = PerSampleRandomGain(min_gain=-20.0, max_gain=10.0) >>> >>> # Conservative augmentation with low probability >>> conservative_augment = PerSampleRandomGain( ... min_gain=-1.5, max_gain=1.5, p=0.3 ... )
Notes
Requires 2D input arrays where first dimension is batch size
Each sample in the batch gets an independent random gain
Useful for creating diverse training data in machine learning
The transform maintains the batch structure and sample lengths
Random gains are independently sampled for each batch item
- Raises:
ValueError – If input is not a 2D array or if min_gain > max_gain
See also
RandomGainApply a single random gain to entire audio
GainApply a fixed gain to audio samples
RandomGainEnvelopeApply a smoothly varying gain envelope
- class soundmentations.RandomGainEnvelope(min_gain: float, max_gain: float, n_control_points: int = 10, clip: bool = True, p: float = 1.0)[source]¶
Bases:
GainApply a smoothly varying random gain envelope to audio samples.
This transform creates a smooth gain envelope by generating random gain values at control points and interpolating between them. This results in gradual gain changes over time, useful for creating natural volume variations.
- Parameters:
min_gain (float) – Minimum gain in decibels for the envelope.
max_gain (float) – Maximum gain in decibels for the envelope.
n_control_points (int, optional) – Number of control points for the gain envelope, by default 10. More points create more detailed envelope variations.
clip (bool, optional) – Whether to clip the output to [-1, 1] range, by default True.
p (float, optional) – Probability of applying the random gain envelope transform, by default 1.0.
Examples
Apply a smooth random gain envelope:
>>> import numpy as np >>> from soundmentations.transforms.amplitude import RandomGainEnvelope >>> >>> # Create audio samples (1 second at 8kHz) >>> samples = np.random.randn(8000) * 0.1 >>> >>> # Apply smooth gain envelope with 5 control points >>> envelope_transform = RandomGainEnvelope( ... min_gain=-12.0, ... max_gain=6.0, ... n_control_points=5 ... ) >>> result = envelope_transform(samples)
Use in audio processing pipeline:
>>> import soundmentations as S >>> >>> # Create dynamic volume processing >>> dynamic_pipeline = S.Compose([ ... S.RandomGainEnvelope(min_gain=-9.0, max_gain=3.0, n_control_points=8, p=0.7), ... S.Gain(gain=6.0, p=0.5) # Additional boost ... ]) >>> >>> # Process audio with natural volume variations >>> processed = dynamic_pipeline(samples, sample_rate=44100)
Different envelope scenarios:
>>> # Subtle volume variations for music >>> subtle_envelope = RandomGainEnvelope( ... min_gain=-3.0, max_gain=3.0, n_control_points=15 ... ) >>> >>> # Dramatic variations for sound effects >>> dramatic_envelope = RandomGainEnvelope( ... min_gain=-20.0, max_gain=10.0, n_control_points=5 ... ) >>> >>> # High-resolution envelope for detailed control >>> detailed_envelope = RandomGainEnvelope( ... min_gain=-6.0, max_gain=6.0, n_control_points=50 ... )
Notes
The envelope is created by linearly interpolating between random gain values
More control points create more complex envelope shapes
The envelope affects the entire audio sample duration
Gain values are converted from dB to linear scale before application
The transform preserves audio sample length and format
See also
RandomGainApply a single random gain to entire audio
GainApply a fixed gain to audio samples
- class soundmentations.Limiter(threshold: float = 0.9, p: float = 1.0)[source]¶
Bases:
BaseLimiterApply hard limiting to audio samples to prevent clipping.
This transform clips audio samples that exceed the specified threshold, preventing digital clipping and maintaining signal integrity within the specified dynamic range.
- Parameters:
Examples
Apply hard limiting to prevent clipping:
>>> import numpy as np >>> from soundmentations.transforms.amplitude import Limiter >>> >>> # Create audio with some peaks above 0.9 >>> audio = np.array([0.5, 1.2, -1.5, 0.8, 0.95]) >>> >>> # Apply limiting at 0.9 threshold >>> limiter = Limiter(threshold=0.9) >>> limited = limiter(audio, sample_rate=44100) >>> print(limited) # [0.5, 0.9, -0.9, 0.8, 0.9]
Use in audio processing pipeline:
>>> import soundmentations as S >>> >>> # Safe audio processing with limiting >>> safe_pipeline = S.Compose([ ... S.Gain(gain=12.0, p=1.0), # Boost signal ... S.Limiter(threshold=0.95, p=1.0), # Prevent clipping ... S.FadeOut(duration=0.1, p=0.5) # Smooth ending ... ]) >>> >>> processed = safe_pipeline(audio, sample_rate=44100)
Protect against digital distortion:
>>> # Conservative limiting for pristine quality >>> conservative_limiter = Limiter(threshold=0.8, p=1.0) >>> clean_audio = conservative_limiter(loud_audio, sample_rate=44100)
- class soundmentations.FadeIn(duration: float = 0.1, p: float = 1.0)[source]¶
Bases:
BaseFadeFade-in effect for audio samples.
This transform applies a fade-in effect to the beginning of the audio samples.
- class soundmentations.FadeOut(duration: float = 0.1, p: float = 1.0)[source]¶
Bases:
BaseFadeApply a fade-out effect to the end of audio samples.
This transform gradually decreases the amplitude from full amplitude to silence (0) over the specified duration, creating a smooth fade-out effect.
- class soundmentations.Compressor(threshold: float, ratio: float, attack_time: float, release_time: float, p: float = 1.0)[source]¶
Bases:
BaseCompressorApply dynamic range compression to the audio sample.
This compressor uses an envelope follower with configurable attack and release times to track the signal level, then applies gain reduction based on a threshold and compression ratio.
- Parameters:
threshold (float) – The threshold above which compression is applied, in dB. Typical values range from -40 to -6 dB.
ratio (float) – The compression ratio to apply. Must be >= 1.0. - 1.0 = no compression - 2.0 = 2:1 compression - 10.0 = 10:1 compression (heavy compression)
attack_time (float) – Attack time in milliseconds. How quickly the compressor responds to signals above the threshold. Typical values: 0.1 to 100 ms.
release_time (float) – Release time in milliseconds. How quickly the compressor stops compressing after the signal falls below threshold. Typical values: 10 to 1000 ms.
p (float, optional) – Probability of applying the transform, by default 1.0.
Examples
>>> import numpy as np >>> from soundmentations.transforms.amplitude.compressor import Compressor >>> >>> # Create a compressor with 4:1 ratio and -12dB threshold >>> compressor = Compressor(threshold=-12.0, ratio=4.0, ... attack_time=5.0, release_time=50.0) >>> >>> # Apply to a sine wave >>> sample_rate = 44100 >>> duration = 1.0 >>> t = np.linspace(0, duration, int(sample_rate * duration)) >>> audio = np.sin(2 * np.pi * 440 * t) * 0.8 # 440Hz sine wave >>> compressed = compressor(audio, sample_rate)
Notes
The compressor implementation uses: - Linear threshold conversion from dB - Exponential envelope follower with separate attack/release coefficients - Logarithmic gain calculation for smooth compression curves
The envelope follower uses first-order low-pass filtering to smooth the absolute value of the input signal, with different time constants for attack (signal increasing) and release (signal decreasing).
- class soundmentations.PitchShift(semitones: float, p: float = 1.0)[source]¶
Bases:
BasePitchShiftShift the pitch of audio by a specified number of semitones.
- class soundmentations.RandomPitchShift(min_semitones: float = -2.0, max_semitones: float = 2.0, p: float = 1.0)[source]¶
Bases:
BasePitchShiftRandomly shift the pitch within a specified semitone range.
This class wraps PitchShift to provide random pitch variations for data augmentation purposes.
- Parameters:
Examples
>>> # Random pitch variation for training data >>> random_pitch = RandomPitchShift(min_semitones=-1.0, max_semitones=1.0, p=0.8) >>> augmented = random_pitch(audio, sample_rate=44100)
- soundmentations.load_audio(file_path: str, sample_rate: int | None = None) Tuple[ndarray, int][source]¶
Load an audio file and return the audio data as a mono numpy array.
Parameters: - file_path (str): Path to the audio file. - sample_rate (int, optional): Desired sample rate. If None, uses the original sample rate.
Returns: - Tuple[np.ndarray, int]: Mono audio data as numpy array and sample rate.
Raises: - FileNotFoundError: If the audio file doesn’t exist. - ValueError: If the audio file format is unsupported. - RuntimeError: If resampling fails.