Segmentation¶

This module contains functions for syllable segmentation from .wav files.

avn.segmentation module¶

Created on Wed May 5 08:29:00 2021

@author: Therese

class avn.segmentation.MFCC¶

Bases: avn.segmentation.Segmenter

Child class of avn.segmentation.Segmenter(), which segments syllables based on threshold crossing of the first mel frequency cepstral coefficient (MFCC).

get_seg_criteria(song, hop_length=512, n_fft=2048, bandpass=True, lower_cutoff=200, upper_cutoff=9000, rescale=True)¶

Calculates the first MFCC component at every frame of a song file for later use in threshold crossing-based segmentation.

Parameters

song (avn.dataloading.SongFile class instance) – Contains audio data for a single song file
hop_length (int > 0, optional) – The number of samples between successive frames used in the short term fourier transform to generate MFCC values. The default is 512.
n_fft (int > 0, optional) – The length of the FFT window used to calculate the MFCC values. The default is 2048.
bandpass (bool, optional) – If True, the song will be bandpass filtered before calculating the MFCC. If False, the MFCC will be calculated on the song as-is. The default is True.
lower_cutoff (float >=0, optional) – The lower frequency limit in Hz used to bandpass filter the input wave before calculating the MFCC. The default is 200.
upper_cutoff (float > lower_cutoff, optional) – The upper frequency limit in Hz used to bandpass filter the input wave before calculating the MFCC. The default is 9000.
rescale (bool, optional) – If True, the MFCC will be min-max rescaled so that all values fall between 0 and 1. This is meant to ensure consistency across recordings. If False, the raw MFCC values will be returned. The default is True.

Returns

mfcc – 1 dimensional numpy array containing the MFCC values for the input wave.

Return type

numpy ndarray, 1D

class avn.segmentation.MFCCDerivative¶

Bases: avn.segmentation.Segmenter

Child class of avn.segmentation.Segmenter(), which segments syllables based on threshold crossing of an estimate of the first derivative of the first mel frequency cepstral coefficient (MFCC).

get_seg_criteria(song, hop_length=512, n_fft=2048, bandpass=True, lower_cutoff=200, upper_cutoff=9000, rescale=True, deriv_width=3)¶

Calculates the first derivative of the MFCC at every frame of a song file for later use in threshold crossing-based segmentation.

Parameters

song (avn.dataloading.SongFile class instance) – Contains audio data for a single song file
hop_length (int > 0, optional) – The number of samples between successive frames used in the short term fourier transform to generate MFCC values. The default is 512.
n_fft (int > 0, optional) – The length of the FFT window used to calculate the MFCC values. The default is 2048.
bandpass (bool, optional) – If True, the song will be bandpass filtered before calculating the MFCC. If False, the MFCC will be calculated on the song as-is. The default is True.
lower_cutoff (float >=0, optional) – The lower frequency limit in Hz used to bandpass filter the input wave before calculating the MFCC. The default is 200.
upper_cutoff (float > lower_cutoff, optional) – The upper frequency limit in Hz used to bandpass filter the input wave before calculating the MFCC. The default is 9000.
rescale (bool, optional) – If True, the MFCC will be min-max rescaled so that all values fall between 0 and 1 before the first derivative is taken. This is meant to ensure consistency across recordings. If False, the first derivative is calculated on the raw MFCC. The default is True.
deriv_width (int >=3, odd, optional.) – Number of frames over which to compute the change in MFCC to estimate the first derivative.

Returns

mfcc_derivative – 1 dimensional numpy array containing an estimate of the first derivative of the MFCC of the input wave.

Return type

numpy ndarray, 1D

class avn.segmentation.Metrics¶

Bases: object

Contains functions for calculating segmentation accuracy metrics by comparing automatically generates syllable segmentations to ‘ground truth’ segmentations.

calc_F1(max_gap=0.05, feature='onsets')¶

Calculates the F1 score, precision and recall of syllable onsets or offsets in seg_data.seg_table relative to seg_data.true_seg_table.

Parameters

seg_data (avn.segmentation.SegData instance) – Instance of a SegData object which must have valid .seg_table and .true_seg_table attributes.
max_gap (float, optional) – The maximum allowable gap in seconds between a syllable onset or offset in seg_data.seg_table and in seg_data.true_seg_table that will be considered a match. The default is 0.05.
feature (['onsets', 'offsets']) – Specifies whether you want to calculate the F1 score of syllable onsets or offsets. The default is ‘onsets’

Returns

seg_data – This will be the same seg_data object as is passed as an argument, with an added .seg_metrics attribute which contains a DataFrame with columns F1, precision and recall and a single row with the value of each metric.

Return type

avn.segmentation.SegData instance

get_time_delta_df(max_gap=0.05, feature='onsets')¶

Creates a dataframes with the segmenter generated timestamps that align to ground truth timestamps for either syllable onsets or offsets within max_gap seconds.

Parameters

seg_data (avn.segmentation.SegData instance) – Instance of a SegData object which must have valid .seg_table and .true_seg_table attributes.
max_gap (float, optional) – The maximum allowable gap in seconds between a syllable onset or offset in seg_data.seg_table and in seg_data.true_seg_table that will be considered a match. The default is 0.05.
feature (['onsets', 'offsets']) – Specifies whether you want to get the matched timestamps of syllable onsets or offsets. The default is ‘onsets’
Returns –
all_matched_times (pandas DataFrame) – DataFrame with columns ‘True_feat_times’ and ‘Seg_matched_times’, containing the timestamps in seconds of a ground truth syllable onset or offset, and the matched automatically segmented syllable onset or offset, respectively. This can be used to look at the distribution of time differences between true and generated segmentations.

class avn.segmentation.Plot¶

Bases: object

Contains functions for plotting automatically generated syllable segmentations, segmentation criteria and/or ground truth syllable segmentations over spectrograms.

plot_seg_criteria(segmenter, label, file_idx=0, figsize=(20, 5))¶

Plots a given segmentation criteria (ie MFCC, RMSE, RMSE Derivative) over the spectrogram of a given song file.

Parameters

seg_data (avn.segmentation.SegData object) – SegData object with valid .seg_table and .song_folder_path attributes.
segmenter (avn.segmentation.Segmenter daughter class object) – The type of the segmenter (ie MFCC, RMSE, RMSEDerivative) determines which segmentation criteria will be plotted
label (str) – Label of segmentation criteria to be displayed in plot legend.
file_idx (int >=0, <total number of files segmented in seg_data, optional) – The index of the single file within seg_data to be plotted. The default is 0.
figsize (tuple of floats, optional) – Specifies the dimensions of the output plot. The default is (20, 5).

Returns

Return type

None.

plot_segmentations(seg_label, plot_ground_truth=False, true_label='Ground Truth', file_idx=0, figsize=(20, 5), seg_attribute='onsets', plot_title='')¶

Plots the spectrogram of a given wave file with automatically generated syllable onsets or offsets plotted over top. Ground truth segmentations can be plotted in addition to automatically generate segmentations when available.

Parameters

seg_data (avn.segmentation.SegData object) – SegData object containing automatic syllable segmentations in it’s .seg_table attribute, and optionally also ground truth segmentations in its .true_seg_table attribute.
seg_label (str) – Label for automatic syllable segmentations to be displayed in legend.
plot_ground_truth (bool, optional) – If True, both the automatically generated and ground truth syllable segmentations will be plotted. If False, only automatically generated syllable segmentations will be plotted. The default is False.
true_label (str, optional) – Label for ground truth syllable segmentations to be displayed in legend. Only used if plot_ground_truth == True. The default is ‘Ground Truth’.
file_idx (int >=0, <total number of files segmented in seg_data, optional) – The index of the single file within seg_data to be plotted. The default is 0.
figsize (tuple of floats, optional) – Specifies the dimensions of the output plot. The default is (20, 5).
seg_attribute ({'onsets', 'offsets'}, optional) – Specifies whether syllable onset times or offset times should be displayed. The default is ‘onsets’.
plot_title (str, optional) – Title of the generated plot. The default is “”.

Returns

Return type

None.

class avn.segmentation.RMSE¶

Bases: avn.segmentation.Segmenter

Child class of avn.segmentation.Segmenter(), which segments syllables based on root mean square energy (RMSE) threshold crossing.

get_seg_criteria(song, hop_length=512, n_fft=2048, bandpass=True, lower_cutoff=200, upper_cutoff=9000, rescale=True)¶

Calculates the RMSE at every frame of a song file for later use in threshold crossing-based segmentation.

Parameters

song (avn.dataloading.SongFile class instance) – Contains audio data for a single song file
hop_length (int > 0, optional) – The number of samples between successive frames used in the short term fourier transform to generate RMSE values. The default is 512.
n_fft (int > 0, optional) – The length of the FFT window used to calculate the RMSE values. The default is 2048.
bandpass (bool, optional) – If True, the song will be bandpass filtered before calculating the RMSE. If False, the RMSE will be calculated on the song as-is. The default is True.
lower_cutoff (float >=0, optional) – The lower frequency limit in Hz used to bandpass filter the input wave before calculating the RMSE. The default is 200.
upper_cutoff (float > lower_cutoff, optional) – The upper frequency limit in Hz used to bandpass filter the input wave before calculating the RMSE. The default is 9000.
rescale (bool, optional) – If True, the RMSE will be min-max rescaled so that all values fall between 0 and 1. This is meant to ensure consistency across recordings. If False, the raw RMSE values will be returned. The default is True.

Returns

rmse – 1 dimensional numpy array containing the RMSE values for the input wave.

Return type

numpy ndarray, 1D

class avn.segmentation.RMSEDerivative¶

Bases: avn.segmentation.Segmenter

Child class of avn.segmentation.Segmenter(), which segments syllables based on threshold crossing of the first derivative of the root mean square energy (RMSE).

get_seg_criteria(song, hop_length=512, n_fft=2048, bandpass=True, lower_cutoff=200, upper_cutoff=9000, rescale=True, deriv_width=3)¶

Calculates the first derivative of the RMSE at every frame of a song file for later use in threshold crossing-based segmentation.

Parameters

song (avn.dataloading.SongFile class instance) – Contains audio data for a single song file
hop_length (int > 0, optional) – The number of samples between successive frames used in the short term fourier transform to generate RMSE values. The default is 512.
n_fft (int > 0, optional) – The length of the FFT window used to calculate the RMSE values. The default is 2048.
bandpass (bool, optional) – If True, the song will be bandpass filtered before calculating the RMSE. If False, the RMSE will be calculated on the song as-is. The default is True.
lower_cutoff (float >=0, optional) – The lower frequency limit in Hz used to bandpass filter the input wave before calculating the RMSE. The default is 200.
upper_cutoff (float > lower_cutoff, optional) – The upper frequency limit in Hz used to bandpass filter the input wave before calculating the RMSE. The default is 9000.
rescale (bool, optional) – If True, the RMSE will be min-max rescaled so that all values fall between 0 and 1 before the first derivative is taken. This is meant to ensure consistency across recordings. If False, the first derivative is calculated on the raw RMSE. The default is True.
deriv_width (int >=3, odd, optional.) – Number of frames over which to compute the change in RMSE to estimate the first derivative.

Returns

rmse_derivative – 1 dimensional numpy array containing an estimate of the first derivative of the RMSE of the input wave.

Return type

numpy ndarray, 1D

class avn.segmentation.SegData(Bird_ID, seg_table)¶

Bases: object

Syllable segmentation data for many files from a single bird.

Bird_ID¶

String containing a unique identifier for subject bird.

Type: str

seg_table¶

Dataframe with columns onsets, offsets and files, which contains the onset and offset times in seconds of every syllable, and the syllable’s corresponding file. This is generated by an avn.segmentation.Segmenter class object with the function make_segmentation_table.

Type: pandas DataFrame

true_seg_table¶

Dataframe with columns files, labels, onsets and offsets, which contains ground truth segmentation information for calculation of automatic segmentation metrics. labels contains only {‘n’, ‘s’}, to indicate whether a row reflects a true song syllable (‘s’) or cage noise (‘n’). This can be imported from evsonganaly with the avn.dataloading.add_ev_song_truth_table() function.

Type: pandas DataFrame

seg_metrics¶

Dataframe with columns F1, precision and recall conatining a single row with each metric calculated by comparing segmentations in SegData.seg_table to SegData.true_seg_table. This is generated with the function segmentation.Metrics.calc_F1().

Type: pandas DataFrame

save_as_csv(out_folder_path)¶: Saves the contents of SegData.seg_table and SegData.seg_metrics as csv files in the out_folder_path directory.

save_as_csv(out_folder_path)¶

Saves SegData.seg_table and SegData.seg_metrics as csv files in the out_folder_path directory.

Parameters: out_folder_path (str) – Path to local directory in which to save csv files.
Returns
Return type: None.

Notes

The SegData.seg_table file will be called “[Bird_ID]_seg_table.csv”, and the SegData.seg_metrics file will be called “[Bird_ID]_seg_metrics.csv”.

If either the .seg_table or .seg_metrics attributes do not exist, the corresponding file will not be created.

class avn.segmentation.Segmenter¶

Bases: object

Parent class for automated syllable segmentation

None¶

make_segmentation_table(Bird_ID, song_folder_path, upper_threshold, lower_threshold)¶: Generates a SegData object with syllable segmentation information for all .wav files in song_folder_path

get_seg_criteria(song)¶: For threshold based segmentation, this calculates the song feature on which to apply threshold segmentation (eg. RMSE, MFCC, MFCC Derivative)

rescale(seg_criteria)¶: Applies 0 to 1 min-max rescaling to a vector.

get_threshold(seg_criteria, thresh):: Generates a flat threshold vetor for comparison to seg_criteria for threshold based segmentation.

get_syll_onsets_offsets(seg_criteria, upper_thresh, lower_thresh, total_file_duration)¶: Returns onsets and offset timestamps of all syllables in a file based on seg_criteria threshold crossings.

Notes

An instance of this parent class cannot be used to generate syllable segmentations, as the get_seg_criteria() function is not implemented. Instead, please use one of the child classes which each use a different segmentation criteria (e.g RMSE, RMSEDerivative, MFCC, MFCCDerivative)

get_seg_criteria(song)¶

make_segmentation_table(Bird_ID, song_folder_path, upper_threshold, lower_threshold, max_syll_duration=0.33, hop_length=512, n_fft=2048)¶

Parameters

Bird_ID (str) – String containing a unique identifier for subject bird.
song_folder_path (str) – Path to a local directory containing all .wav files to be segmented.
upper_threshold (float > lower_threshold) – Value of the upper segmentation criteria threshold for detecting syllable onsets.
lower_threshold (float < upper_threshold) – Value of the lower segmentation criteria threshold used for detecting syllable offsets when onset to onset segmentation results in a syllable longer than max_syll_duration. DESCRIPTION.
max_syll_duration (float > 0, optional) – Maximum allowable syllable duration in seconds. The default is 0.33. If the gap between consecutive syllabel onsets is longer than this value, the offset will be determined by lower threshold crossing. If the lower threshold crossing still results in a syllable longer than this value, the syllable offset will be set to the onset timestamp + max_syll_duration.
hop_length (int > 0, optional) – The number of samples between successive frames used in the short term fourier transform to generate segmentation criteria values. The default is 512.
n_fft (int > 0, optional) – The length of the FFT window used to calculate the segmentation criteria values. The default is 2048.

Returns

segmentation_data – SegData object with attributes .Bird_ID and .seg_table, where .seg_table is a pandas Dataframe with columns onsets, offsets and files, which contains the onset and offset times in seconds of every syllable, and the syllable’s corresponding file.

Return type

avn.segmentation.SegData object

rescale(seg_criteria)¶

Applies 0 to 1 min-max rescaling to a vector.

Parameters: seg_criteria (numpy ndarray, 1D) – 1 dimensional numpy array to be rescaled
Returns: seg_criteria – 1 dimensional numpy array rescaled between 0 and 1.
Return type: numpy ndarray, 1D

class avn.segmentation.Utils¶

Bases: object

Contains syllable segmentation utilities

calc_F1_many_birds(Bird_IDs, folder_path, upper_threshold, lower_threshold, truth_table_suffix='_syll_table.csv', max_gap=0.05, feature='onsets')¶

Calculate the segmentation metrics for all birds in Bird_IDs with a given method and threshold.

Parameters

segmenter (avn.segmentation.Segmenter daughter class object.) – Determines the segmentation method.
Bird_IDs (List of strings) – List of unique bird identifiers. These should correspond to the names of subfolders within the folder_path directory.
folder_path (str) – Path to a local directory containing subdirectories named with the Bird IDs in Bird_IDs, which in turn contain the .wav files to be segmented.
upper_threshold (float > lower_threshold) – Value of the upper segmentation criteria threshold for detecting syllable onsets.
lower_threshold (float < upper_threshold) – Value of the lower segmentation criteria threshold used for detecting syllable offsets.
truth_table_suffix (str, optional) – This function requires that the truth table data be located in a .csv file within folder_pathBird_IDand begin with the Bird_ID followed by some descriptor. This is used to specify that final part of the file name. The default is “_syll_table.csv”.
max_gap (float, optional) – The maximum allowable gap in seconds between a syllable onset or offset in seg_data.seg_table and in seg_data.true_seg_table that will be considered a match. The default is 0.05.
feature (['onsets', 'offsets']) – Specifies whether you want to calculate the F1 score of syllable onsets or offsets. The default is ‘onsets’

Returns

segmentation_scores (pandas DataFrame) – DataFrame with columns F1, precision, recall, upper_threshold, lower_threshold, and Bird_ID, which contains the segmentation metrics for each bird.
segmentations_df (pandas DataFrame) – DataFrame with columns onsets, `offsets, files, and Bird_ID which contains the onset and offset timestamps of every segmented syllable in each file for each bird.

get_time_deltas_many_birds(Bird_IDs, folder_path, upper_threshold, lower_threshold, max_gap=0.05, feature='onsets', truth_table_suffix='_syll_table.csv')¶

Creates a dataframes with the segmenter generated timestamps that align to ground truth timestamps for either syllable onsets or offsets within max_gap seconds for all birds in Bird_IDs.

Parameters

segmenter (avn.segmentation.Segmenter daughter class object.) – Determines the segmentation method.
Bird_IDs (List of strings) – List of unique bird identifiers. These should correspond to the names of subfolders within the folder_path directory.
folder_path (str) – Path to a local directory containing subdirectories named with the Bird IDs in Bird_IDs, which in turn contain the .wav files to be segmented.
upper_threshold (float > lower_threshold) – Value of the upper segmentation criteria threshold for detecting syllable onsets.
lower_threshold (float < upper_threshold) – Value of the lower segmentation criteria threshold used for detecting syllable offsets.
max_gap (float, optional) – The maximum allowable gap in seconds between a syllable onset or offset in seg_data.seg_table and in seg_data.true_seg_table that will be considered a match. The default is 0.05.
feature (['onsets', 'offsets']) – Specifies whether you want to get the matched timestamps of syllable onsets or offsets. The default is ‘onsets’
truth_table_suffix (str, optional) – This function requires that the truth table data be located in a .csv file within folder_pathBird_IDand begin with the Bird_ID followed by some descriptor. This is used to specify that final part of the file name. The default is “_syll_table.csv”.

Returns

all_time_deltas_df – DataFrame with columns ‘True_feat_times’ and ‘Seg_matched_times’, containing the timestamps in seconds of a ground truth syllable onset or offset, and the matched automatically segmented syllable onset or offset, respectively. This can be used to look at the distribution of time differences between true and generated segmentations.

Return type

Pandas DataFrame

make_segmentation_table_many_birds(Bird_IDs, folder_path, upper_threshold, lower_threshold, save_to_csv=False, out_file_dir=None)¶

Generates syllable segmentations for many files across many birds.

Parameters

segmenter (avn.segmentation.Segmenter child class type) – This determines the segmentation criteria used for threshold segmentation.
Bird_IDs (List of strings) – List of unique bird identifiers. These should correspond to the names of subfolders within the folder_path directory.
folder_path (str) – Path to a local directory containing subdirectories named with the Bird IDs in Bird_IDs, which in turn contain the .wav files to be segmented.
upper_threshold (float > lower_threshold) – Value of the upper segmentation criteria threshold for detecting syllable onsets.
lower_threshold (float < upper_threshold) – Value of the lower segmentation criteria threshold used for detecting syllable offsets.
save_to_csv (bool, optional) – If True, segmentation table and metrics table .csv files will be saved for each bird in the out_file_dir directory. These will have the Bird_ID in the file name. The default is False.
out_file_dir (str, optional) – Path to a local directory in which to save segmentation table and metrics tables for each bird. This will only be used if save_to_csv == True. The default is None.

Returns

segmentations_df – DataFrame with columns onsets, offsets, files and Bird_ID which contains syllable onset and offset timestamps in seconds for every automatically segmented syllable in every file for every bird.

Return type

pandas DataFrame

plot_segmentations_many_birds(Bird_IDs, folder_path, seg_label, upper_threshold, lower_threshold, plot_ground_truth=False, files_per_bird=3, random_seed=2021, true_label='Ground Truth', figsize=(20, 5), seg_attribute='onsets', truth_table_suffix='_syll_table.csv')¶

Plots files_per_bird number of random example song spectrograms with automatically generated segmentations (and optionally ground truth segmentations) overlaid for each bird in Bird_IDs.

Parameters

segmenter (avn.segmentation.Segmenter child class type) – This determines the segmentation criteria used for threshold segmentation.
Bird_IDs (List of strings) – List of unique bird identifiers. These should correspond to the names of subfolders within the folder_path directory.
folder_path (str) – Path to a local directory containing subdirectories named with the Bird IDs in Bird_IDs, which in turn contain the .wav files to be segmented.
seg_label (str) – Label for automatic syllable segmentations to be displayed in legend.
upper_threshold (float > lower_threshold) – Value of the upper segmentation criteria threshold for detecting syllable onsets.
lower_threshold (float < upper_threshold) – Value of the lower segmentation criteria threshold used for detecting syllable offsets.
plot_ground_truth (bool, optional) – If True, both the automatically generated and ground truth syllable segmentations will be plotted. If False, only automatically generated syllable segmentations will be plotted. The default is False.
files_per_bird (int>=1, optional) – The number of files to plot from each bird. The default is 3.
random_seed (optional) – Any object that can be converted to an integer. This ensures that the same set of randomly selected files will be plotted every time this function is run with the same random_seed value. The default is 2021.
true_label (str, optional) – Label for ground truth syllable segmentations to be displayed in legend. Only used if plot_ground_truth == True. The default is ‘Ground Truth’.
figsize (tuple of floats, optional) – Specifies the dimensions of the output plot. The default is (20, 5).
seg_attribute ({'onsets', 'offsets'}, optional) – Specifies whether syllable onset times or offset times should be displayed. The default is ‘onsets’.
truth_table_suffix (str, optional) – This function requires that the truth table data be located in a .csv file within folder_pathBird_IDand begin with the Bird_ID followed by some descriptor. This is used to specify that final part of the file name. The default is “_syll_table.csv”.

Returns

Return type

None.

threshold_optimization(Bird_ID, song_folder_path, truth_table_path, threshold_range, threshold_step, lower_threshold)¶

Tests a range of upper threshold values for threshold-based segmentation to find the threshold which results in the best F1 score for one bird.

Parameters

segmenter (avn.segmentation.Segmenter daughter class object) – Determines the segmentation criteria that will be used for threshold segmentation (ie RMSE, MFCC, RMSEDerivative)
Bird_ID (str) – String containing a unique identifier for subject bird.
song_folder_path (str) – Path to the folder containing all .wav files to be segmented for the subject bird.
truth_table_path (str) – Path to the .csv file generated with evsonganaly which contains the ground truth syllable segmentations.
threshold_range (tuple of floats) – Specifies the range thresholds to test.
threshold_step (float) – The size of the step between consequtive thresholds to be tested.
lower_threshold (float) – Lower segmentation thrshold for determining syllable offsets (fixed).

Returns

optimal_threshold (float) – Value of the threshold which results in the best F1 score.
peak_F1 (float) – F1 score of segmentation using optimal threshold.
segmentation_scores (pandas DataFrame) – DataFrame with columns F1, precision, recall, upper_threshold, and lower_threshold which contain the metrics of segmentation at every segmentation threshold tested. This can be useful for plotting the relationship between the threshold value and metrics.

Notes

Selecting a very wide threshold_range and small threshold_step can make this quite slow to run. It is recommended to test a wide threshold_range with large threshold_step and plot F1 across threshold values using segmentation_scores initially, then once you have a better idea of the ballpark of the peak threshold try using a smaller threshold_range with a finer threshold_step.

threshold_optimization_many_birds(Bird_IDs, folder_path, threshold_range, threshold_step, lower_threshold, truth_table_suffix='_syll_table.csv')¶

Finds the optimal segmentation threshold across multiple birds at once.

Parameters

segmenter (avn.segmentation.Segmenter child class type) – This determines the segmentation criteria used for threshold segmentation.
Bird_IDs (List of strings) – List of unique bird identifiers. These should correspond to the names of subfolders within the folder_path directory.
folder_path (str) – Path to a local directory containing subdirectories named with the Bird IDs in Bird_IDs, which in turn contain the .wav files to be segmented.
threshold_range (tuple of floats) – Specifies the range thresholds to test.
threshold_step (float) – The size of the step between consequtive thresholds to be tested.
lower_threshold (float) – Lower segmentation thrshold for determining syllable offsets (fixed).
truth_table_suffix (str, optional) – This function requires that the truth table data be located in a .csv file within folder_pathBird_IDand begin with the Bird_ID followed by some descriptor. This is used to specify that final part of the file name. The default is “_syll_table.csv”.

Returns

optimal_threshold (float) – Value of the threshold which results in the best mean F1 score across all birds.
peak_mean_F1 (float) – Mean F1 score of segmentation using optimal threshold across all birds.
segmentation_scores (pandas DataFrame) – DataFrame with columns F1, precision, recall, upper_threshold, lower_threshold, and Bird_ID which contain the metrics of segmentation at every segmentation threshold tested for every bird. This can be useful for plotting the relationship between the threshold value and metrics.

Notes

Selecting a very wide threshold_range and small threshold_step can make this quite slow to run. It is recommended to test a wide threshold_range with large threshold_step and plot F1 across threshold values using segmentation_scores initially, then once you have a better idea of the ballpark of the peak threshold try using a smaller threshold_range with a finer threshold_step.