soundmentations package¶
Subpackages¶
Module contents¶
- class soundmentations.BaseCompose(transforms)[source]¶
Bases:
object
Base class for composing multiple transforms into a sequential pipeline.
This class provides the fundamental functionality for chaining transforms together, where each transform is applied sequentially to the audio data.
- Parameters:
transforms (list) – List of transform objects to apply sequentially. Each transform must have a __call__ method that accepts (samples, sample_rate) parameters.
Notes
This is an internal base class. Use the Compose class instead.
- class soundmentations.Compose(transforms)[source]¶
Bases:
BaseCompose
Compose multiple audio transforms into a sequential pipeline.
This class allows you to chain multiple transforms together into a single callable object. Transforms are applied in the order they appear in the list, with each transform receiving the output of the previous one.
- Parameters:
transforms (list) – List of transform objects to apply sequentially. Each transform must implement __call__(samples, sample_rate).
Examples
Create a basic augmentation pipeline:
>>> import soundmentations as S >>> >>> # Define individual transforms >>> pipeline = S.Compose([ ... S.RandomTrim(duration=(1.0, 3.0), p=0.8), ... S.Pad(pad_length=44100, p=0.6), ... S.Gain(gain=6.0, p=0.5) ... ]) >>> >>> # Apply to audio >>> augmented = pipeline(audio_samples, sample_rate=44100)
Complex preprocessing pipeline:
>>> # ML training data preparation >>> ml_pipeline = S.Compose([ ... S.CenterTrim(duration=2.0), # Extract 2s from center ... S.PadToLength(pad_length=88200), # Normalize to exactly 2s ... S.Gain(gain=3.0, p=0.7), # Boost volume 70% of time ... S.FadeIn(duration=0.1, p=0.5), # Smooth start 50% of time ... S.FadeOut(duration=0.1, p=0.5) # Smooth end 50% of time ... ]) >>> >>> # Process batch of audio files >>> for audio in audio_batch: ... processed = ml_pipeline(audio, sample_rate=16000)
Audio enhancement pipeline:
>>> # Clean up audio recordings >>> enhance_pipeline = S.Compose([ ... S.StartTrim(start_time=0.5), # Remove first 0.5s ... S.EndTrim(end_time=10.0), # Keep max 10s ... S.Gain(gain=6.0), # Boost volume ... S.FadeIn(duration=0.2), # Smooth fade-in ... S.FadeOut(duration=0.2) # Smooth fade-out ... ]) >>> >>> enhanced = enhance_pipeline(noisy_audio, sample_rate=44100)
Notes
Transforms are applied in order: first transform in list is applied first
Each transform receives the output of the previous transform
Probability parameters (p) in individual transforms are respected
The pipeline preserves mono audio format throughout
All transforms must accept (samples, sample_rate) parameters
See also
Individual
,Trim
,Pad
,RandomTrim
,FadeIn
,FadeOut
- class soundmentations.Trim(start_time: float = 0.0, end_time: float | None = None, p: float = 1.0)[source]¶
Bases:
BaseTrim
Trim audio to keep only the portion between start_time and end_time.
This is the most basic trimming operation that allows specifying exact start and end times for the audio segment to keep.
- Parameters:
start_time (float, optional) – Start time in seconds to begin keeping audio, by default 0.0. Must be non-negative.
end_time (float, optional) – End time in seconds to stop keeping audio, by default None. If None, keeps audio until the end. Must be greater than start_time.
p (float, optional) – Probability of applying the transform, by default 1.0.
Examples
Trim audio to specific time range:
>>> import numpy as np >>> from soundmentations.transforms.time import Trim >>> >>> # Create 5 seconds of audio at 44.1kHz >>> audio = np.random.randn(220500) >>> >>> # Keep audio from 1.5 to 3.0 seconds >>> trim_transform = Trim(start_time=1.5, end_time=3.0) >>> trimmed = trim_transform(audio, sample_rate=44100) >>> print(len(trimmed) / 44100) # 1.5 seconds
Use in a pipeline:
>>> import soundmentations as S >>> >>> # Extract middle portion and apply gain >>> pipeline = S.Compose([ ... S.Trim(start_time=1.0, end_time=4.0, p=1.0), ... S.Gain(gain=6.0, p=0.5) ... ]) >>> >>> result = pipeline(audio, sample_rate=44100)
- class soundmentations.RandomTrim(duration: float | Tuple[float, float], p: float = 1.0)[source]¶
Bases:
BaseTrim
Randomly trim audio by selecting a random segment of specified duration.
This transform randomly selects a continuous segment from the audio, useful for data augmentation where you want random crops of fixed or variable duration.
- Parameters:
Examples
Fixed duration random trimming:
>>> import numpy as np >>> from soundmentations.transforms.time import RandomTrim >>> >>> # Always keep 2 seconds randomly >>> trim_transform = RandomTrim(duration=2.0) >>> trimmed = trim_transform(audio, sample_rate=44100) >>> print(len(trimmed) / 44100) # 2.0 seconds
Variable duration random trimming:
>>> # Keep 1-3 seconds randomly >>> variable_trim = RandomTrim(duration=(1.0, 3.0)) >>> result = variable_trim(audio, sample_rate=44100)
Use for data augmentation:
>>> import soundmentations as S >>> >>> # Random crop and normalize for training >>> augment = S.Compose([ ... S.RandomTrim(duration=(0.5, 2.5), p=0.8), ... S.PadToLength(pad_length=88200, p=1.0), # 2 seconds ... S.Gain(gain=(-6, 6), p=0.5) ... ]) >>> >>> augmented = augment(training_audio, sample_rate=44100)
- class soundmentations.StartTrim(start_time: float = 0.0, p: float = 1.0)[source]¶
Bases:
BaseTrim
Trim audio to keep only the portion starting from start_time to the end.
This removes the beginning of the audio up to start_time, keeping everything after that point.
- Parameters:
Examples
Remove silence from beginning:
>>> import numpy as np >>> from soundmentations.transforms.time import StartTrim >>> >>> # Remove first 2 seconds >>> trim_transform = StartTrim(start_time=2.0) >>> trimmed = trim_transform(audio, sample_rate=44100)
Use in preprocessing pipeline:
>>> import soundmentations as S >>> >>> # Remove intro and normalize >>> preprocess = S.Compose([ ... S.StartTrim(start_time=1.5, p=1.0), ... S.PadToLength(pad_length=132300, p=1.0) # 3 seconds ... ]) >>> >>> processed = preprocess(raw_audio, sample_rate=44100)
- class soundmentations.EndTrim(end_time: float, p: float = 1.0)[source]¶
Bases:
BaseTrim
Trim audio to keep only the portion from the start to end_time.
This removes the end of the audio after end_time, keeping everything before that point.
- Parameters:
Examples
Keep only first part of audio:
>>> import numpy as np >>> from soundmentations.transforms.time import EndTrim >>> >>> # Keep first 5 seconds only >>> trim_transform = EndTrim(end_time=5.0) >>> trimmed = trim_transform(audio, sample_rate=44100)
Use for consistent audio lengths:
>>> import soundmentations as S >>> >>> # Ensure maximum 10 seconds >>> limit_length = S.Compose([ ... S.EndTrim(end_time=10.0, p=1.0), ... S.Gain(gain=3.0, p=0.3) ... ]) >>> >>> limited = limit_length(long_audio, sample_rate=44100)
- class soundmentations.CenterTrim(duration: float, p: float = 1.0)[source]¶
Bases:
BaseTrim
Trim audio to keep only the center portion of specified duration.
This extracts a segment from the middle of the audio, useful for focusing on the main content while removing silence at the beginning and end.
- Parameters:
Examples
Extract center content:
>>> import numpy as np >>> from soundmentations.transforms.time import CenterTrim >>> >>> # Keep 3 seconds from center >>> trim_transform = CenterTrim(duration=3.0) >>> trimmed = trim_transform(audio, sample_rate=44100) >>> print(len(trimmed) / 44100) # 3.0 seconds
Use for focusing on main content:
>>> import soundmentations as S >>> >>> # Extract center and enhance >>> focus_pipeline = S.Compose([ ... S.CenterTrim(duration=4.0, p=1.0), ... S.Gain(gain=6.0, p=0.6), ... S.PadToLength(pad_length=176400, p=1.0) # 4 seconds ... ]) >>> >>> focused = focus_pipeline(noisy_audio, sample_rate=44100)
- class soundmentations.Pad(pad_length: int, p: float = 1.0)[source]¶
Bases:
BasePad
Pad audio to minimum length by adding zeros at the end.
If the input audio is shorter than pad_length, zeros are appended to reach the minimum length. If already longer or equal, returns unchanged.
- Parameters:
Examples
Apply end padding to ensure minimum length:
>>> import numpy as np >>> from soundmentations.transforms.time import Pad >>> >>> # Create short audio sample >>> audio = np.array([0.1, 0.2, 0.3]) >>> >>> # Pad to minimum 1000 samples >>> pad_transform = Pad(pad_length=1000) >>> padded = pad_transform(audio) >>> print(len(padded)) # 1000
Use in a pipeline:
>>> import soundmentations as S >>> >>> # Ensure all audio is at least 2 seconds (44.1kHz) >>> augment = S.Compose([ ... S.Pad(pad_length=88200, p=1.0), ... S.Gain(gain=3.0, p=0.5) ... ]) >>> >>> result = augment(audio)
- class soundmentations.CenterPad(pad_length: int, p: float = 1.0)[source]¶
Bases:
BasePad
Pad audio to minimum length by adding zeros symmetrically on both sides.
If the input audio is shorter than pad_length, zeros are added equally to both sides. For odd padding amounts, the extra zero goes to the right.
- Parameters:
Examples
Apply symmetric padding:
>>> import numpy as np >>> from soundmentations.transforms.time import CenterPad >>> >>> audio = np.array([1, 2, 3]) >>> pad_transform = CenterPad(pad_length=7) >>> result = pad_transform(audio) >>> print(result) # [0 0 1 2 3 0 0]
Use for centering audio in fixed-length windows:
>>> # Center audio in 5-second windows (44.1kHz) >>> center_pad = CenterPad(pad_length=220500) >>> centered_audio = center_pad(audio_sample)
- class soundmentations.StartPad(pad_length: int, p: float = 1.0)[source]¶
Bases:
BasePad
Pad audio to minimum length by adding zeros at the beginning.
If the input audio is shorter than pad_length, zeros are prepended to reach the minimum length. If already longer or equal, returns unchanged.
- Parameters:
Examples
Apply start padding:
>>> import numpy as np >>> from soundmentations.transforms.time import StartPad >>> >>> audio = np.array([1, 2, 3]) >>> pad_transform = StartPad(pad_length=6) >>> result = pad_transform(audio) >>> print(result) # [0 0 0 1 2 3]
Use for aligning audio to end of fixed windows:
>>> # Align audio to end of 3-second windows >>> start_pad = StartPad(pad_length=132300) # 3 seconds at 44.1kHz >>> aligned_audio = start_pad(audio_sample)
- class soundmentations.PadToLength(pad_length: int, p: float = 1.0)[source]¶
Bases:
BasePad
Pad or trim audio to exact target length using end operations.
If shorter: adds zeros at the end to reach exact length
If longer: trims from the end to reach exact length
If equal: returns unchanged
- Parameters:
Examples
Normalize all audio to exact length:
>>> import numpy as np >>> from soundmentations.transforms.time import PadToLength >>> >>> # Short audio >>> short_audio = np.array([1, 2, 3]) >>> # Long audio >>> long_audio = np.arange(10) >>> >>> pad_transform = PadToLength(pad_length=5) >>> >>> result1 = pad_transform(short_audio) >>> print(result1) # [1 2 3 0 0] >>> >>> result2 = pad_transform(long_audio) >>> print(result2) # [0 1 2 3 4]
Use for fixed-length model inputs:
>>> # Ensure all audio is exactly 2 seconds for ML model >>> normalize_length = PadToLength(pad_length=88200) # 2s at 44.1kHz >>> model_input = normalize_length(variable_length_audio)
- class soundmentations.CenterPadToLength(pad_length: int, p: float = 1.0)[source]¶
Bases:
BasePad
Pad or trim audio to exact target length using center operations.
If shorter: adds zeros symmetrically on both sides
If longer: trims symmetrically from both sides (keeps center)
If equal: returns unchanged
- Parameters:
Examples
Center-normalize audio to exact length:
>>> import numpy as np >>> from soundmentations.transforms.time import CenterPadToLength >>> >>> # Short audio - will be center-padded >>> short_audio = np.array([1, 2, 3]) >>> # Long audio - will be center-trimmed >>> long_audio = np.arange(9) >>> >>> pad_transform = CenterPadToLength(pad_length=7) >>> >>> result1 = pad_transform(short_audio) >>> print(result1) # [0 0 1 2 3 0 0] >>> >>> result2 = pad_transform(long_audio) >>> print(result2) # [1 2 3 4 5 6 7]
Use for preserving important audio content in center:
>>> # Keep center 3 seconds for speech processing >>> center_normalize = CenterPadToLength(pad_length=132300) >>> processed_audio = center_normalize(speech_audio)
- class soundmentations.PadToMultiple(pad_length: int, p: float = 1.0)[source]¶
Bases:
BasePad
Pad audio to make its length a multiple of the specified value.
This is useful for STFT operations where frame sizes must be multiples of certain values. Only adds padding at the end, never trims.
- Parameters:
Examples
Pad for STFT-friendly lengths:
>>> import numpy as np >>> from soundmentations.transforms.time import PadToMultiple >>> >>> # Audio with length 2050 samples >>> audio = np.random.randn(2050) >>> >>> # Pad to multiple of 1024 (STFT frame size) >>> pad_transform = PadToMultiple(pad_length=1024) >>> result = pad_transform(audio) >>> print(len(result)) # 3072 (3 * 1024)
Use in spectral processing pipeline:
>>> import soundmentations as S >>> >>> # Prepare audio for spectral analysis >>> spectral_prep = S.Compose([ ... S.PadToMultiple(pad_length=512, p=1.0), # STFT-friendly ... S.Gain(gain=(-3, 3), p=0.5) ... ]) >>> >>> stft_ready_audio = spectral_prep(raw_audio)
- class soundmentations.Gain(gain: float = 1.0, clip: bool = True, p: float = 1.0)[source]¶
Bases:
BaseGain
Apply a fixed gain (in dB) to audio samples.
This transform multiplies the audio samples by a gain factor derived from the specified gain in decibels. Optionally clips the output to prevent values from exceeding the [-1, 1] range.
- Parameters:
gain (float, optional) – Gain in decibels, by default 1.0. Positive values increase volume, negative values decrease volume.
clip (bool, optional) – Whether to clip the output to [-1, 1] range, by default True. Prevents audio distortion from excessive gain.
p (float, optional) – Probability of applying the gain transform, by default 1.0.
Examples
Apply a fixed gain to audio samples:
>>> import numpy as np >>> from soundmentations.transforms.amplitude import Gain >>> >>> # Create audio samples >>> samples = np.array([0.1, 0.2, -0.1, 0.3]) >>> >>> # Apply +6dB gain >>> gain_transform = Gain(gain=6.0) >>> amplified = gain_transform(samples) >>> >>> # Apply -12dB gain with 50% probability >>> quiet_transform = Gain(gain=-12.0, p=0.5) >>> result = quiet_transform(samples)
Use in a pipeline with other transforms:
>>> import soundmentations as S >>> >>> # Create augmentation pipeline >>> augment = S.Compose([ ... S.RandomTrim(duration=(1.0, 3.0), p=0.8), ... S.Gain(gain=6.0, clip=True, p=0.7), ... S.PadToLength(pad_length=44100, p=0.5) ... ]) >>> >>> # Apply pipeline to audio >>> audio_samples = np.random.randn(22050) # 0.5 seconds at 44.1kHz >>> augmented = augment(samples=audio_samples, sample_rate=44100)
Different gain scenarios:
>>> # Boost quiet audio >>> boost = Gain(gain=12.0, clip=True) >>> >>> # Attenuate loud audio >>> attenuate = Gain(gain=-6.0, clip=False) >>> >>> # Random volume variation >>> random_volume = Gain(gain=np.random.uniform(-10, 10), p=0.6)
- class soundmentations.Limiter(threshold: float = 0.9, p: float = 1.0)[source]¶
Bases:
BaseLimiter
Apply hard limiting to audio samples to prevent clipping.
This transform clips audio samples that exceed the specified threshold, preventing digital clipping and maintaining signal integrity within the specified dynamic range.
- Parameters:
Examples
Apply hard limiting to prevent clipping:
>>> import numpy as np >>> from soundmentations.transforms.amplitude import Limiter >>> >>> # Create audio with some peaks above 0.9 >>> audio = np.array([0.5, 1.2, -1.5, 0.8, 0.95]) >>> >>> # Apply limiting at 0.9 threshold >>> limiter = Limiter(threshold=0.9) >>> limited = limiter(audio, sample_rate=44100) >>> print(limited) # [0.5, 0.9, -0.9, 0.8, 0.9]
Use in audio processing pipeline:
>>> import soundmentations as S >>> >>> # Safe audio processing with limiting >>> safe_pipeline = S.Compose([ ... S.Gain(gain=12.0, p=1.0), # Boost signal ... S.Limiter(threshold=0.95, p=1.0), # Prevent clipping ... S.FadeOut(duration=0.1, p=0.5) # Smooth ending ... ]) >>> >>> processed = safe_pipeline(audio, sample_rate=44100)
Protect against digital distortion:
>>> # Conservative limiting for pristine quality >>> conservative_limiter = Limiter(threshold=0.8, p=1.0) >>> clean_audio = conservative_limiter(loud_audio, sample_rate=44100)
- class soundmentations.FadeIn(duration: float = 0.1, p: float = 1.0)[source]¶
Bases:
BaseFade
Fade-in effect for audio samples.
This transform applies a fade-in effect to the beginning of the audio samples.
- class soundmentations.FadeOut(duration: float = 0.1, p: float = 1.0)[source]¶
Bases:
BaseFade
Apply a fade-out effect to the end of audio samples.
This transform gradually decreases the amplitude from full amplitude to silence (0) over the specified duration, creating a smooth fade-out effect.
- class soundmentations.PitchShift(semitones: float, p: float = 1.0)[source]¶
Bases:
BasePitchShift
Shift the pitch of audio by a specified number of semitones.
- class soundmentations.RandomPitchShift(min_semitones: float = -2.0, max_semitones: float = 2.0, p: float = 1.0)[source]¶
Bases:
BasePitchShift
Randomly shift the pitch within a specified semitone range.
This class wraps PitchShift to provide random pitch variations for data augmentation purposes.
- Parameters:
Examples
>>> # Random pitch variation for training data >>> random_pitch = RandomPitchShift(min_semitones=-1.0, max_semitones=1.0, p=0.8) >>> augmented = random_pitch(audio, sample_rate=44100)
- soundmentations.load_audio(file_path: str, sample_rate: int | None = None) Tuple[ndarray, int] [source]¶
Load an audio file and return the audio data as a mono numpy array.
Parameters: - file_path (str): Path to the audio file. - sample_rate (int, optional): Desired sample rate. If None, uses the original sample rate.
Returns: - Tuple[np.ndarray, int]: Mono audio data as numpy array and sample rate.
Raises: - FileNotFoundError: If the audio file doesn’t exist. - ValueError: If the audio file format is unsupported. - RuntimeError: If resampling fails.