Task 2: in car

Task 2: in car#

This tutorial walks through the process of running the CAD1 Task 2 baseline using the shell interface.

1. Cloning the Clarity Repository#

We first need to install the Clarity package.

# Clean directory
from pathlib import Path
import shutil

dirpath = Path('clarity')
if dirpath.exists() and dirpath.is_dir():
    shutil.rmtree(dirpath)

%%capture
!git clone --quiet https://github.com/claritychallenge/clarity.git
%cd clarity
!git checkout v0.3.3
%pip install -e .
%pip install seedir
%cd recipes/cad1/task2/baseline

2. Dataset#

We will be using music audio and listener metadata.

%%capture

import gdown

!gdown 1OWn0w39t_aeOOh514wRUgTw5uSwsoB17
!mv cadenza_data_demo.tar.xz recipes/cad1/task2/baseline
!tar -xvf cadenza_task2_data_demo.tar.xz
!rm cadenza_task2_data_demo.tar.xz

import seedir as sd
sd.seedir('cad1/task2/', style='lines', depthlimit=4, exclude_folders='ipy*', regex=True)

task2/
├─metadata/
│ ├─listeners.valid.json
│ ├─eBrird_BRIR.json
│ ├─scenes_listeners.json
│ ├─music.valid.json
│ └─scenes.json
└─audio/
  ├─eBrird/
  │ ├─Anechoic/
  │ │ ├─Anechoic.txt
  │ │ └─audio/
  │ └─Car/
  │   └─audio/
  └─music/
    └─validation/
      ├─Instrumental/
      ├─Pop/
      ├─Rock/
      ├─Orchestral/
      ├─Classical/
      └─International/

3. Baseline#

Note

The baseline recipe is divided into 2 stages: Enhnacement and Evaluation

3.1 Enhancement#

The enhancement script contains several key functions

3.1.1 Enhance#

The main function that orchestrates the enhancement process.

It takes the config.yaml as input parameter

@hydra.main(config_path="", config_name="config")
def enhance(config: DictConfig) -> None:
    """
    Run the music enhancement.
    The baseline system is a dummy processor that returns the input signal.

    Args:
        config (dict): Dictionary of configuration options for enhancing music.
    """

3.1.2 Enhance Song#

A function that process a single song-listener pair

It adjusts the level of the song according to the average hearing loss. This is to prevent clipping.

def enhance_song(
    waveform: np.ndarray,
    listener_audiograms: dict,
    config: DictConfig,
) -> tuple[np.ndarray, np.ndarray]:
    """
    Enhance a single song for a listener.

    Baseline enhancement returns the signal with a loudness
    of -14 LUFS if the average hearing loss is below 50 dB HL,
    and -11 LUFS otherwise.

    Args:
        waveform (np.ndarray): The waveform of the song.
        listener_audiograms (dict): The audiograms of the listener.
        config (dict): Dictionary of configuration options for enhancing music.

    Returns:
        out_left (np.ndarray): The enhanced left channel.
        out_right (np.ndarray): The enhanced right channel.

    """

3.2 Evaluation#

The evaluation script assumes that the exp output directory exists. This is the output directory from enhancement.py script.

Caution

The evaluation script cannot be modified or altered in any way.

3.2.1 Run Calculate Audio Quality#

Main function that orchestrates the evaluation

@hydra.main(config_path="", config_name="config")
def run_calculate_audio_quality(config: DictConfig) -> None:
    """Evaluate the enhanced signals using the HAAQI metric."""

3.2.2 Evaluate Scene#

Evaluates a single scene

def evaluate_scene(
    ref_signal: np.ndarray,
    enh_signal: np.ndarray,
    sample_rate: int,
    scene_id: str,
    current_scene: dict,
    listener_audiogram: dict,
    car_scene_acoustic: CarSceneAcoustics,
    hrtf: dict,
    config: DictConfig,
) -> tuple[float, float]:
    """Evaluate a single scene and return HAAQI scores for left and right ears

    Args:
        ref_signal (np.ndarray): A numpy array of shape (2, n_samples)
            containing the reference signal.
        enh_signal (np.ndarray): A numpy array of shape (2, n_samples)
            containing the enhanced signal.
        sample_rate (int): The sampling frequency of the reference and enhanced signals.
        scene_id (str): A string identifier for the scene being evaluated.
        current_scene (dict): A dictionary containing information about the scene being
            evaluated, including the song ID, the listener ID, the car noise type, and
            the split.
        listener_audiogram (dict): A dictionary containing the listener's audiogram
            data, including the center frequencies (cfs) and the hearing levels for both
            ears (audiogram_levels_l and audiogram_levels_r).
        car_scene_acoustic (CarSceneAcoustics): An instance of the CarSceneAcoustics
            class, which is used to generate car noise and add binaural room impulse
            responses (BRIRs) to the enhanced signal.
        hrtf (dict): A dictionary containing the head-related transfer functions (HRTFs)
            for the listener being evaluated. This includes the left and right HRTFs for
            the car and the anechoic room.
        config (DictConfig): A dictionary-like object containing various configuration
            parameters for the evaluation. This includes the path to the enhanced signal
            folder,the path to the music directory, and a flag indicating whether to set
            a random seed.

    Returns:
        Tuple[float, float]: A tuple containing HAAQI scores for left and right ears.

    """

3.3 Car Scene Acoustics#

A class containing all the logic for the car acoustics

Noise Generation
Applies the head-related transfer function (HRTF) taking the head direction into account.
Applies the hearing aid (HA) amplification

class CarSceneAcoustics:
    """
    A class for the car acoustic environment.
    """

    def apply_car_acoustics_to_signal(
        self,
        enh_signal: np.ndarray,
        scene: dict,
        listener: dict,
        hrtf: dict,
        audio_manager: AudioManager,
        config: DictConfig,
    ) -> np.ndarray:
        """Applies the car acoustics to the enhanced signal."""
        
    def apply_hearing_aid(
        self, signal: np.ndarray, audiogram: np.ndarray, center_frequencies: np.ndarray
    ) -> np.ndarray:
        """
        Applies the hearing aid:
        It consists in NALR prescription and Compressor
    """
        
    def get_car_noise(
        self,
        car_noise_params: dict,
    ) -> np.ndarray:
        """Generates car noise."""
        
    def add_hrtf_to_stereo_signal(
        self, signal: np.ndarray, hrir: dict, hrtf_type: str
    ) -> np.ndarray:
        """Add a head rotation transfer function using binaural room impulse
            response (BRIR)"""
        
    def scale_signal_to_snr(
        self,
        signal: np.ndarray,
        reference_signal: np.ndarray = None,
        snr: float | None = 0,
    ) -> np.ndarray:
        """
        Scales the target signal to the desired SNR.
        We transpose channel because pylodnorm operates
        on arrays with shape [n_samples, n_channels].
        """

3.4 Audio Manager#

A simple utility class to manage the signals to be saved.

This class stores and save intermediate signals in order to help to understand the effect of each step in the evaluation.

class AudioManager:
    """A utility class for managing audio files."""
    
    def add_audios_to_save(self, file_name: str, waveform: np.ndarray) -> None:
        """Add a waveform to the list of audios to save."""
    
    def save_audios(self) -> None:
        """Save the audios to the given path."""
    
    def clip_audio(
        self, signal: np.ndarray, min_val: float = -1, max_val: float = 1
    ) -> tuple[int, np.ndarray]:
        """Clip a WAV file to the given range."""
     
    def get_lufs_level(self, signal: np.ndarray) -> float:
        """Get the LUFS level of the signal."""

    def scale_to_lufs(self, signal: np.ndarray, target_lufs: float) -> np.ndarray:
        """Scale the signal to the given LUFS level."""  

4. Inspecting Existing Configuration#

All of the included shell scripts take configurable variables from the yaml files in the same directory as the shell script.Typically these are named config.yaml, however, other names may be used if more than one shell script is in a directory.

We can inspect the contents of the config file:

path:
  root: ../../cadenza_task2_data_demo/cad1/task2
  audio_dir: ${path.root}/audio
  metadata_dir: ${path.root}/metadata
  music_dir: ${path.audio_dir}/music
  hrtf_dir: ${path.audio_dir}/eBrird
  listeners_train_file: ${path.metadata_dir}/listeners.train.json
  listeners_valid_file: ${path.metadata_dir}/listeners.valid.json
  scenes_file: ${path.metadata_dir}/scenes.json
  scenes_listeners_file: ${path.metadata_dir}/scenes_listeners.json
  hrtf_file: ${path.metadata_dir}/eBrird_BRIR.json
  exp_folder: ./exp # folder to store enhanced signals and final results

sample_rate: 44100          # sample rate of the input signal
enhanced_sample_rate: 32000 # sample rate for the enhanced output signal

nalr:
  nfir: 220
  fs: ${sample_rate}

compressor:
  threshold: 0.7
  attenuation: 0.1
  attack: 5
  release: 20
  rms_buffer_size: 0.064

soft_clip: False

enhance:
  average_level: -14   # Average level according Spotify's levels
  min_level: -19

evaluate:
  set_random_seed: True
  small_test: False
  save_intermediate_wavs: False
  split: valid # train, valid
  batch_size: 1   # Number of batches
  batch: 0      # Batch number to evaluate

# hydra config
hydra:
  run:
    dir: ${path.exp_folder}

5. Run Demo#

Typically, all the work is done within python with configurable variables supplied by a yaml file which is parsed by Hydra inside the python code.

The execution of this code is performed in the CLI and new configuration variable values are supplied as arguments to override defaults.

We are now ready to run the prepared python script. However, the standard configuration is designed to work with the full clarity dataset. We can redirect the script to the correct folders by overriding the appropriate configuration parameters.

!python enhance.py path.root=../cad1/task2

  0%|                                                    | 0/20 [00:00<?, ?it/s]

  5%|██▏                                         | 1/20 [00:01<00:19,  1.01s/it]

 10%|████▍                                       | 2/20 [00:01<00:09,  1.99it/s]

 15%|██████▌                                     | 3/20 [00:01<00:05,  2.96it/s]

 20%|████████▊                                   | 4/20 [00:01<00:04,  3.84it/s]

 25%|███████████                                 | 5/20 [00:01<00:03,  4.53it/s]

 30%|█████████████▏                              | 6/20 [00:01<00:02,  5.17it/s]

 35%|███████████████▍                            | 7/20 [00:01<00:02,  5.59it/s]

 40%|█████████████████▌                          | 8/20 [00:02<00:02,  5.86it/s]

 45%|███████████████████▊                        | 9/20 [00:02<00:01,  6.16it/s]

 50%|█████████████████████▌                     | 10/20 [00:02<00:01,  6.38it/s]

 55%|███████████████████████▋                   | 11/20 [00:02<00:01,  6.54it/s]

 60%|█████████████████████████▊                 | 12/20 [00:02<00:01,  6.70it/s]

 65%|███████████████████████████▉               | 13/20 [00:02<00:01,  6.80it/s]

 70%|██████████████████████████████             | 14/20 [00:02<00:00,  6.84it/s]

 75%|████████████████████████████████▎          | 15/20 [00:03<00:00,  6.73it/s]

 80%|██████████████████████████████████▍        | 16/20 [00:03<00:00,  6.79it/s]

 85%|████████████████████████████████████▌      | 17/20 [00:03<00:00,  6.75it/s]

 90%|██████████████████████████████████████▋    | 18/20 [00:03<00:00,  6.35it/s]

 95%|████████████████████████████████████████▊  | 19/20 [00:03<00:00,  6.40it/s]

100%|███████████████████████████████████████████| 20/20 [00:03<00:00,  6.56it/s]
100%|███████████████████████████████████████████| 20/20 [00:03<00:00,  5.24it/s]

Define some helper functions

%%capture
!pip install more_itertools

from os import listdir
from os.path import isfile, join
from pathlib import Path
from scipy.io import wavfile
from more_itertools import windowed

from clarity.clarity.utils.flac_encoder import read_flac_signal

import IPython.display as ipd
import pandas as pd

def audio_player_list(signals, rates, width=270, height=40, columns=None, column_align='center'):
    """Generate a list of HTML audio players tags for a given list of audio signals.

    Notebook: B/B_PythonAudio.ipynb

    Args:
        signals (list): List of audio signals
        rates (list): List of sample rates
        width (int): Width of player (either number or list) (Default value = 270)
        height (int): Height of player (either number or list) (Default value = 40)
        columns (list): Column headings (Default value = None)
        column_align (str): Left, center, right (Default value = 'center')
    """
    pd.set_option('display.max_colwidth', None)

    if isinstance(width, int):
        width = [width] * len(signals)
    if isinstance(height, int):
        height = [height] * len(signals)

    audio_list = []
    for cur_x, cur_Fs, cur_width, cur_height in zip(signals, rates, width, height):
        audio_html = ipd.Audio(data=cur_x.T, rate=cur_Fs)._repr_html_()
        audio_html = audio_html.replace('\n', '').strip()
        audio_html = audio_html.replace('<audio ', f'<audio style="width: {cur_width}px; height: {cur_height}px" ')
        audio_list.append([audio_html])

    df = pd.DataFrame(audio_list, index=columns).T
    table_html = df.to_html(escape=False, index=False, header=bool(columns))
    table_html = table_html.replace('<th>', f'<th style="text-align: {column_align}">')
    ipd.display(ipd.HTML(table_html))

def load_and_display(audio_path, number_audios=None, start=20, end=30):
    number_audios = -1 if number_audios is None else number_audios
    audio_files = [f for f in audio_path.glob('**/*') if f.suffix in ['.flac', '.wav']][:number_audios]

    signals = []
    rates = []
    columns = []
    
    # load signals
    for file_to_play in audio_files:
        if file_to_play.suffix == '.flac':
            signal, sample_rate = read_flac_signal(file_to_play)
        else:
            sample_rate, signal = wavfile.read(file_to_play)
            
        signals.append(signal[int(start*sample_rate):int(end*sample_rate)])
        rates.append(sample_rate)
        
        columns.append("/".join(list(file_to_play.parts[-3:-1]) +  [file_to_play.stem]))
    
    # display signals
    for s, r, c in zip(
          windowed(signals, 2, step=2),
          windowed(rates, 2, step=2),
          windowed(columns, 2, step=2)): 
        audio_player_list(s, r, columns=c)

Now we have the enhanced output. Below, we can load and play the audio to listen to examples of the results.

load_and_display(Path("exp/enhanced_signals"), 2, start=20, end=30)

valid/L5022/T133454_L5022_S2023_L5022_fma_133454	valid/L5009/T142947_L5009_S2023_L5009_fma_064515

Now that we have enhanced audios we can use the evaluate recipe to generate HAAQI scores for the signals.

!python evaluate.py path.root=../cad1/task2  evaluate.small_test=True

[2023-05-22 14:28:07,157][__main__][INFO] - Evaluating from enhanced_signals directory

  0%|                                                     | 0/2 [00:00<?, ?it/s]

[2023-05-22 14:29:11,265][__main__][INFO] - The combined score for scene T012526_L5076_S2023: 0.1489

 50%|██████████████████████▌                      | 1/2 [01:04<01:04, 64.10s/it]

[2023-05-22 14:29:20,979][recipes.cad1.task2.baseline.car_scene_acoustics][WARNING] - Scene T131979_L5007_S2023S500015: 10 samples clipped in evaluation signal.

[2023-05-22 14:30:14,952][__main__][INFO] - The combined score for scene T131979_L5007_S2023: 0.1092

100%|█████████████████████████████████████████████| 2/2 [02:07<00:00, 63.86s/it]
100%|█████████████████████████████████████████████| 2/2 [02:07<00:00, 63.89s/it]

load_and_display(Path("exp/evaluation_signals"), 2, start=20, end=30)

L5007/fma_131979/ref_signal_for_eval	L5007/fma_131979/ha_processed_signal

We hope that this tutorial has been useful and has explained the process for using the recipe scripts using the Hydra configuration system. This approach can be applied to all of the recipes that are included in the repository.