avn package

avn.dataloading module

Created on Wed Apr 28 09:05:15 2021

@author: Therese

class avn.dataloading.SongFile(file_path)

Bases: object

Data and metadata pertaining to a single audio file.

data

Contains audio data of wavfile.

Type

ndarray

sample_rate

Sample rate of song data. Based on native sample rate of wavfile.

Type

int

duration

Duration of the audio file in seconds.

Type

float

file_path

Path to the local .wav file used to instantiate the SongFile object.

Type

str

bandpass_filter(lower_cutoff, upper_cutoff)

Applies a hamming window bandpass filter to the audio data.

bandpass_filter(lower_cutoff, upper_cutoff)

Applies a hamming window bandpass filter to the audio data.

Parameters
  • lower_cutoff (int) – Lower cutoff frequency in Hz for the filter.

  • upper_cutoff (int) – Upper cutoff frequency in Hz for the filter.

Returns

Return type

None.

class avn.dataloading.Utils

Bases: object

Contains data loading utilities.

add_ev_song_truth_table(file_path)

Loads a ‘ground truth’ segmentation file generated in evsonganaly, and adds it as a .true_seg_table attribute to the provided seg_data object.

Parameters
  • seg_data (avn.segmentation.SegData object) – SegData object containing segmentations of files corresponding to the evsonganaly ground truth segmentation in the file indicated by file_path

  • file_path (str) – String containing the full file path to a ‘ground truth’ segmentation .csv file generated with evsonganaly.

Returns

seg_data – Same seg_data object as passed as input, but with the added .true_seg_table attribute, containing segmentation information from the file indicated by file_path

Return type

avn.segmentation.SegData object

clean_seg_table()

Reformats syllable data frames imported from evsonganaly, so that they have the same format as avn generated seg_tables.

Parameters

syll_table (pandas DataFrame) – Dataframe imported from a csv containing syllable segmentation and labeling generated in evsonganaly in MATLAB.

Returns

syll_table – seg_table style dataframe, with syllable onset and offset times, labels and file names.

Return type

pandas DataFrame

select_syll(onset, offset, padding=0)

Return portion of song wavefile between timestamps onset and offset in seconds + optional padding.

Parameters
  • song (avn.SongFile instance) – Instance of an avn.SongFile object with .data, .sample_rate and .duration attributes.

  • onset (float, < offset) – Time in seconds to start selection in song.

  • offset (float, > onset) – Time in seconds to end selection in song.

  • padding (float, optional) – Time in seconds to pad before onset and after offset times when selecting subsection of song.

Returns

  • syll_data (numpy array, 1D) – One dimensional numpy array containing wave data corresponding to the period beteween onset - padding to offset + padding seconds in song.

  • onset_correction_diff (float) – If (onset - padding) results in a timestamp < 0, the selection will start at 0. This value gives the difference between (onset - padding) and the true onset used in cases of 0 crossing. This value is important for plotting the selected syllable appropriately.

  • offset_correction_diff (float) – If (offset + padding) results in a timestamp longer than song.duration, the selection will end at song.duration. This value gives the difference between (offset + padding) and the true offset used in cases where the padded offset is longer than the source file. This value is important for plotting the selected syllable appropriately.

avn.plotting module

Created on Tue May 4 13:39:10 2021

@author: Therese

class avn.plotting.Utils

Bases: object

Containts plotting utilities.

plot_syll_examples(syll_label, song_folder_path, n_examples=1, random_seed=2021, padding=0.25, figsize=(5, 5))

Plots n_examples examples of syllables with label syll_label from syll_df.

Parameters
  • syll_df (Pandas DataFrame) – pandas dataframe containing one row for every syllable to be analyzed from the subject bird. It must contain columns onsets and offsets which contain the timestamp in seconds at which the syllable occurs within a file, files which contains the name of the .wav file in which the syllable is found, and labels which contains a categorical label for the syllable type. These can be generated through manual song annotation, or automated labeling methods.

  • syll_label (int, float or string) – Syllable label class in syll_df.labels to plot.

  • song_folder_path (string) – Path to folder containing a subfolder called Bird_ID, which contains .wav files of songs in syll_df. Should end with ‘/’.

  • n_examples (int, optional) – The number of random examples of syllable syll_label to plot. The default value is 1.

  • random_seed (int, optional) – Specifies the random state for selecting example syllables. The default value is 2021.

  • padding (float, optional) – The amount of time in seconds before and after syllable onset and offset which should be included in the spectrogram plot. The default value is 0.25.

  • figsize (tuple, optional) – Tuple specifying the dimensions of the figure(s) to be plotted. The default value is (5,5)

avn.plotting.make_spectrogram(song)

Generates spectrogram information for plotting

Parameters

song (avn.dataloading.SongFile object) – SongFile object corresponding to the file to be plotted

Returns

spectrogram_db – Array containing spectrogram data for plotting.

Return type

numpy ndarray, 2D

avn.plotting.plot_spectrogram(spectrogram, sample_rate, ax=None, figsize=(20, 5))

Plots a spectrogram of a song.

Parameters
  • spectrogram (numpy ndarray, 2D) – Array containing spectrogram data.

  • sample_rate (int) – Sample rate of audio. Necessary to determine time along the x-axis.

  • ax (matplotlob.axes._subplots.AxesSubplot object) – Axis object must be specified if you want to plot the spectrogram as a subplot within a matplotlib.pyplot figure with other subplots as well. If plotting a spectrogram alone, ax doesn’t need to be specified.

  • figsize (tuple of floats, optional) – Specifies the dimensions of the output plot. The default is (20, 5).

Returns

Return type

None.

avn.plotting.plot_spectrogram_with_labels(syll_df, song_folder_path, Bird_ID, song_file=None, song_file_index=None, figsize=(80, 10), cmap='tab20', add_legend=True, fontsize=24)

Plots the sectrogram of a specified file with syllable labels indicated through colored bars overlaid on spectrogram.

Parameters
  • syll_df (Pandas DataFrame) – pandas dataframe containing one row for every syllable to be analyzed from the subject bird. It must contain columns onsets and offsets which contain the timestamp in seconds at which the syllable occurs within a file, files which contains the name of the .wav file in which the syllable is found, and labels which contains a categorical label for the syllable type. These can be generated through manual song annotation, or automated labeling methods.

  • song_folder_path (string) – Path to folder containing a subfolder called Bird_ID, which contains .wav files of songs in syll_df. Should end with ‘/’.

  • Bird_ID (string) – String containing a unique identifier for the subject bird.

  • song_file (string, optional) – A value must be provided for song_file OR song_file_index, but not both. String containing the name of a .wav file in syll_df and song_folder_path/Bird_ID/ to be plotted.

  • song_file_index (int >= 0, optional) – A value must be provided for song_file or song_file_index, but not both. Denotes the index of the unique file in syll_df.files.unique() to be plotted.

  • figsize (tuple, optional) – Dimensions of figure to be plotted. The default is (80, 10).

  • cmap (matplotlib colormap, optional) – matplotlib color map for syllable labels. The colormap must contain more unique shades than syllable label types. The default value is ‘tab20’.

  • add_legend (boolean, optional) – If True, a legend mapping syllable labels to colors will be plotted over the spectrogram. If False, no legend will be plotted. The default is True.

  • fontsize (float, optional) – The size of the font for the legend. The default is 24.

Returns

Return type

None

avn.plotting.plot_syll(song, onset, offset, padding=0, figsize=(5, 5), title=None)

Plots the spectrogram of a portion of a song spectrogram (generally a single syllable).

Parameters
  • song (avn.dataloading.SongFile type object) – Instance of an avn.SongFile object with .data, .sample_rate and .duration attributes.

  • onset (float) – Start time in seconds of syllable to plot from song.

  • offset (float) – End time in seconds of syllable to plot from song.

  • padding (float, optional) – Time in seconds to pad before and after onset and offset times for plotting. The default value is 0.

  • figsize (tuple, optional) – Dimensions of the figure to plot. The default is (5,5)

  • title (string, optional) – Title of the figure to plot. The default is None.

Returns

Return type

None

avn.plotting.plot_syntax_raster(syntax_data, syntax_raster_df, figsize=(10, 10), title=None, palette='husl')

Plots a syntax_raster_df dataframe.

Parameters
  • syntax_data (avn.syntax.SyntaxData object) – An instance of avn.syntax.SyntaxData on which .make_synta_raster() was called to generate syntax_raster_df.

  • syntax_raster_df (Pandas DataFrame) – Dataframe where each row reflects a song bout (a sequence of syllables flanked by file boundaries or long silent gaps), and each cell contains the label of the song syllable produced at that index in the song bout, based on syntax_data.syll_df.labels. This is returned by .make_syntax_raster() called on an instance of a syntax.SyntaxData object.

  • figsize (tuple, optional) – Tuple specifying dimensions of output figure. The default is (10, 10)

  • title (String, optional) – Title of the output figure. The default is None, which will result in a figure without a title.

  • palette (string or sequence, optional) – String corresponding to the name of a seaborn palette, matplotlib colormap or sequence of colors in any format matplotlib accepts. See seaborn.color_palette() documentation for more information. The default is ‘husl’.

Returns

Return type

None

avn.segmentation module

Created on Wed May 5 08:29:00 2021

@author: Therese

class avn.segmentation.MFCC

Bases: avn.segmentation.Segmenter

Child class of avn.segmentation.Segmenter(), which segments syllables based on threshold crossing of the first mel frequency cepstral coefficient (MFCC).

get_seg_criteria(song, hop_length=512, n_fft=2048, bandpass=True, lower_cutoff=200, upper_cutoff=9000, rescale=True)

Calculates the first MFCC component at every frame of a song file for later use in threshold crossing-based segmentation.

Parameters
  • song (avn.dataloading.SongFile class instance) – Contains audio data for a single song file

  • hop_length (int > 0, optional) – The number of samples between successive frames used in the short term fourier transform to generate MFCC values. The default is 512.

  • n_fft (int > 0, optional) – The length of the FFT window used to calculate the MFCC values. The default is 2048.

  • bandpass (bool, optional) – If True, the song will be bandpass filtered before calculating the MFCC. If False, the MFCC will be calculated on the song as-is. The default is True.

  • lower_cutoff (float >=0, optional) – The lower frequency limit in Hz used to bandpass filter the input wave before calculating the MFCC. The default is 200.

  • upper_cutoff (float > lower_cutoff, optional) – The upper frequency limit in Hz used to bandpass filter the input wave before calculating the MFCC. The default is 9000.

  • rescale (bool, optional) – If True, the MFCC will be min-max rescaled so that all values fall between 0 and 1. This is meant to ensure consistency across recordings. If False, the raw MFCC values will be returned. The default is True.

Returns

mfcc – 1 dimensional numpy array containing the MFCC values for the input wave.

Return type

numpy ndarray, 1D

class avn.segmentation.MFCCDerivative

Bases: avn.segmentation.Segmenter

Child class of avn.segmentation.Segmenter(), which segments syllables based on threshold crossing of an estimate of the first derivative of the first mel frequency cepstral coefficient (MFCC).

get_seg_criteria(song, hop_length=512, n_fft=2048, bandpass=True, lower_cutoff=200, upper_cutoff=9000, rescale=True, deriv_width=3)

Calculates the first derivative of the MFCC at every frame of a song file for later use in threshold crossing-based segmentation.

Parameters
  • song (avn.dataloading.SongFile class instance) – Contains audio data for a single song file

  • hop_length (int > 0, optional) – The number of samples between successive frames used in the short term fourier transform to generate MFCC values. The default is 512.

  • n_fft (int > 0, optional) – The length of the FFT window used to calculate the MFCC values. The default is 2048.

  • bandpass (bool, optional) – If True, the song will be bandpass filtered before calculating the MFCC. If False, the MFCC will be calculated on the song as-is. The default is True.

  • lower_cutoff (float >=0, optional) – The lower frequency limit in Hz used to bandpass filter the input wave before calculating the MFCC. The default is 200.

  • upper_cutoff (float > lower_cutoff, optional) – The upper frequency limit in Hz used to bandpass filter the input wave before calculating the MFCC. The default is 9000.

  • rescale (bool, optional) – If True, the MFCC will be min-max rescaled so that all values fall between 0 and 1 before the first derivative is taken. This is meant to ensure consistency across recordings. If False, the first derivative is calculated on the raw MFCC. The default is True.

  • deriv_width (int >=3, odd, optional.) – Number of frames over which to compute the change in MFCC to estimate the first derivative.

Returns

mfcc_derivative – 1 dimensional numpy array containing an estimate of the first derivative of the MFCC of the input wave.

Return type

numpy ndarray, 1D

class avn.segmentation.Metrics

Bases: object

Contains functions for calculating segmentation accuracy metrics by comparing automatically generates syllable segmentations to ‘ground truth’ segmentations.

calc_F1(max_gap=0.05, feature='onsets')

Calculates the F1 score, precision and recall of syllable onsets or offsets in seg_data.seg_table relative to seg_data.true_seg_table.

Parameters
  • seg_data (avn.segmentation.SegData instance) – Instance of a SegData object which must have valid .seg_table and .true_seg_table attributes.

  • max_gap (float, optional) – The maximum allowable gap in seconds between a syllable onset or offset in seg_data.seg_table and in seg_data.true_seg_table that will be considered a match. The default is 0.05.

  • feature (['onsets', 'offsets']) – Specifies whether you want to calculate the F1 score of syllable onsets or offsets. The default is ‘onsets’

Returns

seg_data – This will be the same seg_data object as is passed as an argument, with an added .seg_metrics attribute which contains a DataFrame with columns F1, precision and recall and a single row with the value of each metric.

Return type

avn.segmentation.SegData instance

get_time_delta_df(max_gap=0.05, feature='onsets')

Creates a dataframes with the segmenter generated timestamps that align to ground truth timestamps for either syllable onsets or offsets within max_gap seconds.

Parameters
  • seg_data (avn.segmentation.SegData instance) – Instance of a SegData object which must have valid .seg_table and .true_seg_table attributes.

  • max_gap (float, optional) – The maximum allowable gap in seconds between a syllable onset or offset in seg_data.seg_table and in seg_data.true_seg_table that will be considered a match. The default is 0.05.

  • feature (['onsets', 'offsets']) – Specifies whether you want to get the matched timestamps of syllable onsets or offsets. The default is ‘onsets’

  • Returns

  • all_matched_times (pandas DataFrame) – DataFrame with columns ‘True_feat_times’ and ‘Seg_matched_times’, containing the timestamps in seconds of a ground truth syllable onset or offset, and the matched automatically segmented syllable onset or offset, respectively. This can be used to look at the distribution of time differences between true and generated segmentations.

class avn.segmentation.Plot

Bases: object

Contains functions for plotting automatically generated syllable segmentations, segmentation criteria and/or ground truth syllable segmentations over spectrograms.

plot_seg_criteria(segmenter, label, file_idx=0, figsize=(20, 5))

Plots a given segmentation criteria (ie MFCC, RMSE, RMSE Derivative) over the spectrogram of a given song file.

Parameters
  • seg_data (avn.segmentation.SegData object) – SegData object with valid .seg_table and .song_folder_path attributes.

  • segmenter (avn.segmentation.Segmenter daughter class object) – The type of the segmenter (ie MFCC, RMSE, RMSEDerivative) determines which segmentation criteria will be plotted

  • label (str) – Label of segmentation criteria to be displayed in plot legend.

  • file_idx (int >=0, <total number of files segmented in seg_data, optional) – The index of the single file within seg_data to be plotted. The default is 0.

  • figsize (tuple of floats, optional) – Specifies the dimensions of the output plot. The default is (20, 5).

Returns

Return type

None.

plot_segmentations(seg_label, plot_ground_truth=False, true_label='Ground Truth', file_idx=0, figsize=(20, 5), seg_attribute='onsets', plot_title='')

Plots the spectrogram of a given wave file with automatically generated syllable onsets or offsets plotted over top. Ground truth segmentations can be plotted in addition to automatically generate segmentations when available.

Parameters
  • seg_data (avn.segmentation.SegData object) – SegData object containing automatic syllable segmentations in it’s .seg_table attribute, and optionally also ground truth segmentations in its .true_seg_table attribute.

  • seg_label (str) – Label for automatic syllable segmentations to be displayed in legend.

  • plot_ground_truth (bool, optional) – If True, both the automatically generated and ground truth syllable segmentations will be plotted. If False, only automatically generated syllable segmentations will be plotted. The default is False.

  • true_label (str, optional) – Label for ground truth syllable segmentations to be displayed in legend. Only used if plot_ground_truth == True. The default is ‘Ground Truth’.

  • file_idx (int >=0, <total number of files segmented in seg_data, optional) – The index of the single file within seg_data to be plotted. The default is 0.

  • figsize (tuple of floats, optional) – Specifies the dimensions of the output plot. The default is (20, 5).

  • seg_attribute ({'onsets', 'offsets'}, optional) – Specifies whether syllable onset times or offset times should be displayed. The default is ‘onsets’.

  • plot_title (str, optional) – Title of the generated plot. The default is “”.

Returns

Return type

None.

class avn.segmentation.RMSE

Bases: avn.segmentation.Segmenter

Child class of avn.segmentation.Segmenter(), which segments syllables based on root mean square energy (RMSE) threshold crossing.

get_seg_criteria(song, hop_length=512, n_fft=2048, bandpass=True, lower_cutoff=200, upper_cutoff=9000, rescale=True)

Calculates the RMSE at every frame of a song file for later use in threshold crossing-based segmentation.

Parameters
  • song (avn.dataloading.SongFile class instance) – Contains audio data for a single song file

  • hop_length (int > 0, optional) – The number of samples between successive frames used in the short term fourier transform to generate RMSE values. The default is 512.

  • n_fft (int > 0, optional) – The length of the FFT window used to calculate the RMSE values. The default is 2048.

  • bandpass (bool, optional) – If True, the song will be bandpass filtered before calculating the RMSE. If False, the RMSE will be calculated on the song as-is. The default is True.

  • lower_cutoff (float >=0, optional) – The lower frequency limit in Hz used to bandpass filter the input wave before calculating the RMSE. The default is 200.

  • upper_cutoff (float > lower_cutoff, optional) – The upper frequency limit in Hz used to bandpass filter the input wave before calculating the RMSE. The default is 9000.

  • rescale (bool, optional) – If True, the RMSE will be min-max rescaled so that all values fall between 0 and 1. This is meant to ensure consistency across recordings. If False, the raw RMSE values will be returned. The default is True.

Returns

rmse – 1 dimensional numpy array containing the RMSE values for the input wave.

Return type

numpy ndarray, 1D

class avn.segmentation.RMSEDerivative

Bases: avn.segmentation.Segmenter

Child class of avn.segmentation.Segmenter(), which segments syllables based on threshold crossing of the first derivative of the root mean square energy (RMSE).

get_seg_criteria(song, hop_length=512, n_fft=2048, bandpass=True, lower_cutoff=200, upper_cutoff=9000, rescale=True, deriv_width=3)

Calculates the first derivative of the RMSE at every frame of a song file for later use in threshold crossing-based segmentation.

Parameters
  • song (avn.dataloading.SongFile class instance) – Contains audio data for a single song file

  • hop_length (int > 0, optional) – The number of samples between successive frames used in the short term fourier transform to generate RMSE values. The default is 512.

  • n_fft (int > 0, optional) – The length of the FFT window used to calculate the RMSE values. The default is 2048.

  • bandpass (bool, optional) – If True, the song will be bandpass filtered before calculating the RMSE. If False, the RMSE will be calculated on the song as-is. The default is True.

  • lower_cutoff (float >=0, optional) – The lower frequency limit in Hz used to bandpass filter the input wave before calculating the RMSE. The default is 200.

  • upper_cutoff (float > lower_cutoff, optional) – The upper frequency limit in Hz used to bandpass filter the input wave before calculating the RMSE. The default is 9000.

  • rescale (bool, optional) – If True, the RMSE will be min-max rescaled so that all values fall between 0 and 1 before the first derivative is taken. This is meant to ensure consistency across recordings. If False, the first derivative is calculated on the raw RMSE. The default is True.

  • deriv_width (int >=3, odd, optional.) – Number of frames over which to compute the change in RMSE to estimate the first derivative.

Returns

rmse_derivative – 1 dimensional numpy array containing an estimate of the first derivative of the RMSE of the input wave.

Return type

numpy ndarray, 1D

class avn.segmentation.SegData(Bird_ID, seg_table)

Bases: object

Syllable segmentation data for many files from a single bird.

Bird_ID

String containing a unique identifier for subject bird.

Type

str

seg_table

Dataframe with columns onsets, offsets and files, which contains the onset and offset times in seconds of every syllable, and the syllable’s corresponding file. This is generated by an avn.segmentation.Segmenter class object with the function make_segmentation_table.

Type

pandas DataFrame

true_seg_table

Dataframe with columns files, labels, onsets and offsets, which contains ground truth segmentation information for calculation of automatic segmentation metrics. labels contains only {‘n’, ‘s’}, to indicate whether a row reflects a true song syllable (‘s’) or cage noise (‘n’). This can be imported from evsonganaly with the avn.dataloading.add_ev_song_truth_table() function.

Type

pandas DataFrame

seg_metrics

Dataframe with columns F1, precision and recall conatining a single row with each metric calculated by comparing segmentations in SegData.seg_table to SegData.true_seg_table. This is generated with the function segmentation.Metrics.calc_F1().

Type

pandas DataFrame

save_as_csv(out_folder_path)

Saves the contents of SegData.seg_table and SegData.seg_metrics as csv files in the out_folder_path directory.

save_as_csv(out_folder_path)

Saves SegData.seg_table and SegData.seg_metrics as csv files in the out_folder_path directory.

Parameters

out_folder_path (str) – Path to local directory in which to save csv files.

Returns

Return type

None.

Notes

The SegData.seg_table file will be called “[Bird_ID]_seg_table.csv”, and the SegData.seg_metrics file will be called “[Bird_ID]_seg_metrics.csv”.

If either the .seg_table or .seg_metrics attributes do not exist, the corresponding file will not be created.

class avn.segmentation.Segmenter

Bases: object

Parent class for automated syllable segmentation

None
make_segmentation_table(Bird_ID, song_folder_path, upper_threshold, lower_threshold)

Generates a SegData object with syllable segmentation information for all .wav files in song_folder_path

get_seg_criteria(song)

For threshold based segmentation, this calculates the song feature on which to apply threshold segmentation (eg. RMSE, MFCC, MFCC Derivative)

rescale(seg_criteria)

Applies 0 to 1 min-max rescaling to a vector.

get_threshold(seg_criteria, thresh):

Generates a flat threshold vetor for comparison to seg_criteria for threshold based segmentation.

get_syll_onsets_offsets(seg_criteria, upper_thresh, lower_thresh, total_file_duration)

Returns onsets and offset timestamps of all syllables in a file based on seg_criteria threshold crossings.

Notes

An instance of this parent class cannot be used to generate syllable segmentations, as the get_seg_criteria() function is not implemented. Instead, please use one of the child classes which each use a different segmentation criteria (e.g RMSE, RMSEDerivative, MFCC, MFCCDerivative)

get_seg_criteria(song)
make_segmentation_table(Bird_ID, song_folder_path, upper_threshold, lower_threshold, max_syll_duration=0.33, hop_length=512, n_fft=2048)
Parameters
  • Bird_ID (str) – String containing a unique identifier for subject bird.

  • song_folder_path (str) – Path to a local directory containing all .wav files to be segmented.

  • upper_threshold (float > lower_threshold) – Value of the upper segmentation criteria threshold for detecting syllable onsets.

  • lower_threshold (float < upper_threshold) – Value of the lower segmentation criteria threshold used for detecting syllable offsets when onset to onset segmentation results in a syllable longer than max_syll_duration. DESCRIPTION.

  • max_syll_duration (float > 0, optional) – Maximum allowable syllable duration in seconds. The default is 0.33. If the gap between consecutive syllabel onsets is longer than this value, the offset will be determined by lower threshold crossing. If the lower threshold crossing still results in a syllable longer than this value, the syllable offset will be set to the onset timestamp + max_syll_duration.

  • hop_length (int > 0, optional) – The number of samples between successive frames used in the short term fourier transform to generate segmentation criteria values. The default is 512.

  • n_fft (int > 0, optional) – The length of the FFT window used to calculate the segmentation criteria values. The default is 2048.

Returns

segmentation_data – SegData object with attributes .Bird_ID and .seg_table, where .seg_table is a pandas Dataframe with columns onsets, offsets and files, which contains the onset and offset times in seconds of every syllable, and the syllable’s corresponding file.

Return type

avn.segmentation.SegData object

rescale(seg_criteria)

Applies 0 to 1 min-max rescaling to a vector.

Parameters

seg_criteria (numpy ndarray, 1D) – 1 dimensional numpy array to be rescaled

Returns

seg_criteria – 1 dimensional numpy array rescaled between 0 and 1.

Return type

numpy ndarray, 1D

class avn.segmentation.Utils

Bases: object

Contains syllable segmentation utilities

calc_F1_many_birds(Bird_IDs, folder_path, upper_threshold, lower_threshold, truth_table_suffix='_syll_table.csv', max_gap=0.05, feature='onsets')

Calculate the segmentation metrics for all birds in Bird_IDs with a given method and threshold.

Parameters
  • segmenter (avn.segmentation.Segmenter daughter class object.) – Determines the segmentation method.

  • Bird_IDs (List of strings) – List of unique bird identifiers. These should correspond to the names of subfolders within the folder_path directory.

  • folder_path (str) – Path to a local directory containing subdirectories named with the Bird IDs in Bird_IDs, which in turn contain the .wav files to be segmented.

  • upper_threshold (float > lower_threshold) – Value of the upper segmentation criteria threshold for detecting syllable onsets.

  • lower_threshold (float < upper_threshold) – Value of the lower segmentation criteria threshold used for detecting syllable offsets.

  • truth_table_suffix (str, optional) – This function requires that the truth table data be located in a .csv file within folder_pathBird_IDand begin with the Bird_ID followed by some descriptor. This is used to specify that final part of the file name. The default is “_syll_table.csv”.

  • max_gap (float, optional) – The maximum allowable gap in seconds between a syllable onset or offset in seg_data.seg_table and in seg_data.true_seg_table that will be considered a match. The default is 0.05.

  • feature (['onsets', 'offsets']) – Specifies whether you want to calculate the F1 score of syllable onsets or offsets. The default is ‘onsets’

Returns

  • segmentation_scores (pandas DataFrame) – DataFrame with columns F1, precision, recall, upper_threshold, lower_threshold, and Bird_ID, which contains the segmentation metrics for each bird.

  • segmentations_df (pandas DataFrame) – DataFrame with columns onsets, `offsets, files, and Bird_ID which contains the onset and offset timestamps of every segmented syllable in each file for each bird.

get_time_deltas_many_birds(Bird_IDs, folder_path, upper_threshold, lower_threshold, max_gap=0.05, feature='onsets', truth_table_suffix='_syll_table.csv')

Creates a dataframes with the segmenter generated timestamps that align to ground truth timestamps for either syllable onsets or offsets within max_gap seconds for all birds in Bird_IDs.

Parameters
  • segmenter (avn.segmentation.Segmenter daughter class object.) – Determines the segmentation method.

  • Bird_IDs (List of strings) – List of unique bird identifiers. These should correspond to the names of subfolders within the folder_path directory.

  • folder_path (str) – Path to a local directory containing subdirectories named with the Bird IDs in Bird_IDs, which in turn contain the .wav files to be segmented.

  • upper_threshold (float > lower_threshold) – Value of the upper segmentation criteria threshold for detecting syllable onsets.

  • lower_threshold (float < upper_threshold) – Value of the lower segmentation criteria threshold used for detecting syllable offsets.

  • max_gap (float, optional) – The maximum allowable gap in seconds between a syllable onset or offset in seg_data.seg_table and in seg_data.true_seg_table that will be considered a match. The default is 0.05.

  • feature (['onsets', 'offsets']) – Specifies whether you want to get the matched timestamps of syllable onsets or offsets. The default is ‘onsets’

  • truth_table_suffix (str, optional) – This function requires that the truth table data be located in a .csv file within folder_pathBird_IDand begin with the Bird_ID followed by some descriptor. This is used to specify that final part of the file name. The default is “_syll_table.csv”.

Returns

all_time_deltas_df – DataFrame with columns ‘True_feat_times’ and ‘Seg_matched_times’, containing the timestamps in seconds of a ground truth syllable onset or offset, and the matched automatically segmented syllable onset or offset, respectively. This can be used to look at the distribution of time differences between true and generated segmentations.

Return type

Pandas DataFrame

make_segmentation_table_many_birds(Bird_IDs, folder_path, upper_threshold, lower_threshold, save_to_csv=False, out_file_dir=None)

Generates syllable segmentations for many files across many birds.

Parameters
  • segmenter (avn.segmentation.Segmenter child class type) – This determines the segmentation criteria used for threshold segmentation.

  • Bird_IDs (List of strings) – List of unique bird identifiers. These should correspond to the names of subfolders within the folder_path directory.

  • folder_path (str) – Path to a local directory containing subdirectories named with the Bird IDs in Bird_IDs, which in turn contain the .wav files to be segmented.

  • upper_threshold (float > lower_threshold) – Value of the upper segmentation criteria threshold for detecting syllable onsets.

  • lower_threshold (float < upper_threshold) – Value of the lower segmentation criteria threshold used for detecting syllable offsets.

  • save_to_csv (bool, optional) – If True, segmentation table and metrics table .csv files will be saved for each bird in the out_file_dir directory. These will have the Bird_ID in the file name. The default is False.

  • out_file_dir (str, optional) – Path to a local directory in which to save segmentation table and metrics tables for each bird. This will only be used if save_to_csv == True. The default is None.

Returns

segmentations_df – DataFrame with columns onsets, offsets, files and Bird_ID which contains syllable onset and offset timestamps in seconds for every automatically segmented syllable in every file for every bird.

Return type

pandas DataFrame

plot_segmentations_many_birds(Bird_IDs, folder_path, seg_label, upper_threshold, lower_threshold, plot_ground_truth=False, files_per_bird=3, random_seed=2021, true_label='Ground Truth', figsize=(20, 5), seg_attribute='onsets', truth_table_suffix='_syll_table.csv')

Plots files_per_bird number of random example song spectrograms with automatically generated segmentations (and optionally ground truth segmentations) overlaid for each bird in Bird_IDs.

Parameters
  • segmenter (avn.segmentation.Segmenter child class type) – This determines the segmentation criteria used for threshold segmentation.

  • Bird_IDs (List of strings) – List of unique bird identifiers. These should correspond to the names of subfolders within the folder_path directory.

  • folder_path (str) – Path to a local directory containing subdirectories named with the Bird IDs in Bird_IDs, which in turn contain the .wav files to be segmented.

  • seg_label (str) – Label for automatic syllable segmentations to be displayed in legend.

  • upper_threshold (float > lower_threshold) – Value of the upper segmentation criteria threshold for detecting syllable onsets.

  • lower_threshold (float < upper_threshold) – Value of the lower segmentation criteria threshold used for detecting syllable offsets.

  • plot_ground_truth (bool, optional) – If True, both the automatically generated and ground truth syllable segmentations will be plotted. If False, only automatically generated syllable segmentations will be plotted. The default is False.

  • files_per_bird (int>=1, optional) – The number of files to plot from each bird. The default is 3.

  • random_seed (optional) – Any object that can be converted to an integer. This ensures that the same set of randomly selected files will be plotted every time this function is run with the same random_seed value. The default is 2021.

  • true_label (str, optional) – Label for ground truth syllable segmentations to be displayed in legend. Only used if plot_ground_truth == True. The default is ‘Ground Truth’.

  • figsize (tuple of floats, optional) – Specifies the dimensions of the output plot. The default is (20, 5).

  • seg_attribute ({'onsets', 'offsets'}, optional) – Specifies whether syllable onset times or offset times should be displayed. The default is ‘onsets’.

  • truth_table_suffix (str, optional) – This function requires that the truth table data be located in a .csv file within folder_pathBird_IDand begin with the Bird_ID followed by some descriptor. This is used to specify that final part of the file name. The default is “_syll_table.csv”.

Returns

Return type

None.

threshold_optimization(Bird_ID, song_folder_path, truth_table_path, threshold_range, threshold_step, lower_threshold)

Tests a range of upper threshold values for threshold-based segmentation to find the threshold which results in the best F1 score for one bird.

Parameters
  • segmenter (avn.segmentation.Segmenter daughter class object) – Determines the segmentation criteria that will be used for threshold segmentation (ie RMSE, MFCC, RMSEDerivative)

  • Bird_ID (str) – String containing a unique identifier for subject bird.

  • song_folder_path (str) – Path to the folder containing all .wav files to be segmented for the subject bird.

  • truth_table_path (str) – Path to the .csv file generated with evsonganaly which contains the ground truth syllable segmentations.

  • threshold_range (tuple of floats) – Specifies the range thresholds to test.

  • threshold_step (float) – The size of the step between consequtive thresholds to be tested.

  • lower_threshold (float) – Lower segmentation thrshold for determining syllable offsets (fixed).

Returns

  • optimal_threshold (float) – Value of the threshold which results in the best F1 score.

  • peak_F1 (float) – F1 score of segmentation using optimal threshold.

  • segmentation_scores (pandas DataFrame) – DataFrame with columns F1, precision, recall, upper_threshold, and lower_threshold which contain the metrics of segmentation at every segmentation threshold tested. This can be useful for plotting the relationship between the threshold value and metrics.

Notes

Selecting a very wide threshold_range and small threshold_step can make this quite slow to run. It is recommended to test a wide threshold_range with large threshold_step and plot F1 across threshold values using segmentation_scores initially, then once you have a better idea of the ballpark of the peak threshold try using a smaller threshold_range with a finer threshold_step.

threshold_optimization_many_birds(Bird_IDs, folder_path, threshold_range, threshold_step, lower_threshold, truth_table_suffix='_syll_table.csv')

Finds the optimal segmentation threshold across multiple birds at once.

Parameters
  • segmenter (avn.segmentation.Segmenter child class type) – This determines the segmentation criteria used for threshold segmentation.

  • Bird_IDs (List of strings) – List of unique bird identifiers. These should correspond to the names of subfolders within the folder_path directory.

  • folder_path (str) – Path to a local directory containing subdirectories named with the Bird IDs in Bird_IDs, which in turn contain the .wav files to be segmented.

  • threshold_range (tuple of floats) – Specifies the range thresholds to test.

  • threshold_step (float) – The size of the step between consequtive thresholds to be tested.

  • lower_threshold (float) – Lower segmentation thrshold for determining syllable offsets (fixed).

  • truth_table_suffix (str, optional) – This function requires that the truth table data be located in a .csv file within folder_pathBird_IDand begin with the Bird_ID followed by some descriptor. This is used to specify that final part of the file name. The default is “_syll_table.csv”.

Returns

  • optimal_threshold (float) – Value of the threshold which results in the best mean F1 score across all birds.

  • peak_mean_F1 (float) – Mean F1 score of segmentation using optimal threshold across all birds.

  • segmentation_scores (pandas DataFrame) – DataFrame with columns F1, precision, recall, upper_threshold, lower_threshold, and Bird_ID which contain the metrics of segmentation at every segmentation threshold tested for every bird. This can be useful for plotting the relationship between the threshold value and metrics.

Notes

Selecting a very wide threshold_range and small threshold_step can make this quite slow to run. It is recommended to test a wide threshold_range with large threshold_step and plot F1 across threshold values using segmentation_scores initially, then once you have a better idea of the ballpark of the peak threshold try using a smaller threshold_range with a finer threshold_step.

avn.syntax module

Created on Wed Oct 20 10:49:15 2021

@author: Therese

class avn.syntax.SyntaxData(Bird_ID, syll_df)

Bases: object

add_file_bounds(song_folder_path)

Add rows representing syllable boundaries to self.syll_df. A new row with label value ‘file_start’ and onset and offset values = 0 will be added before the first syllable of a new file and a new row with label value ‘file_end’ and onset and offset values reflecting the duration of the file in question will be added after the last syllable of a file.

Parameters

song_folder_path (str) – Path to folder containing .wav files of songs in SyntaxData.syll_df. Should end with ‘/’.

Raises

RuntimeError – If file bounds have already been added to this SyntaxData object, this error is raised to inform the user that file bounds will not be added a second time. This is based on the value of the boolean self.file_bounds_added.

Returns

Return type

None.

add_gaps(min_gap=0.2)

Add rows representing silent gaps between syllables longer than min_gap to self.syll_df.

Parameters

min_gap (float, optional) – Minimum duration in seconds for a gap between syllables to be considered syntactically relevant. This value should be selected such that gaps between syllables in a bout are shorter than min_gap, but gaps between bouts are longer than min_gap. The default is 0.2.

Raises

RuntimeError – If file bounds have already been added to this SyntaxData object, this error is raised to inform the user that file bounds will not be added a second time. This is based on the value of the boolean self.file_bounds_added.

Returns

Return type

None.

drop_calls()

This function drops any rows in self.syll_df reflecting syllables that are preceeded and followed by silent gaps, as these likely reflect calls.

Raises

RuntimeError – Gaps must be added to self.syll_df before calls can be identified, so this function will raise an error if gaps have not been added. It will also raise an error if calls have already been dropped from self.syll_df, to avoid repeating this process unnecessarily.

Returns

Return type

None.

get_entropy_rate()

Calculates the entropy rate of bird’s syntax based on transition matrices. For more information on entropy rate, refer to online documentation.

Returns

entropy_rate – Entropy rate of syntax summarised in self.trans_mat. bounded by 0 and log_2(number of unique syllable states), where larger values reflect more variable / unpredictable syntax.

Return type

float

get_gaps_df()

Makes a dataframe with all the gaps between syllables in self.syll_df

Returns

gaps_df – Dataframe with columns files, onsets, offsets, labels, and duration which represents each gap as a single row. Onsets and offsets give the timestamps in seconds at which the gap occurs, and duration gives the duration of the gap in seconds. The label for all gaps is ‘silent_gap’.

Return type

Pandas DataFrame

get_intro_notes_df()

Determines whether each syllable type in self.syll_df is likely to be an intro note. A syllable is considered a possible intro note if:

  1. the syllable is among the most common syllables transitioned to from silence

  2. AND the syllable makes done dominant transition other than to itself to a

    syllable that is not silence.

Returns

all_intro_notes – Dataframe with columns ‘syllable’, ‘Bird_ID’ and ‘intro_note’ where each row corresponds to a syllable type in self.syll_df, and ‘intro_note’ contains a boolean value reflecting whether or not the syllable meets criteria to be considered an intro note.

Return type

Pandas DataFrame

get_pair_repetition_stats()

Analogous to self.get_single_repetition_stats, but with repetitions of a syllable pair, rather than a syllable type. For example, the sequence ‘ababab’ reflects a repetition bout of duration 3 for the syllable pair ‘a’ and ‘b’.

Returns

  • rep_count_df (Pandas DataFrame) – Dataframe containing counts of syllable pair occurances in repetition bouts of different durations for every syllable pair occuring in self.syll_df.labels. There is one row per syllable pair and column names refer to duration of repetition bout. 1 = syllable pair produced but not repeated (eg iabcd). 2 = syllable pair repeated twice in a row only (eg iababcd) etc.

  • rep_stats_df (Pandas DataFrame) – DataFrame containing statistics about repetition bout length. Columns contain the mean_bout_length, median_bout_length, and CV_bout_length, where bout refers to a repetition bout, ie an instance of the same syllable pair being repeated many times. These values can be used for identifying abnormally repeated song syllables.

get_prob_repetitions()

Find the probability of transition to self (ie repetition) for each syllable type, and return values in a dataframe.

Returns

prob_repetition_df – DataFrame with columns Bird_ID, syllable and prob_repetition containing the probability that each syllable type produced by this bird transitions to itself base on the song data in self.syll_df.

Return type

pandas DataFrame

get_prop_sylls_in_short_bouts(max_short_bout_len=2)

Calculates the proportion of occurances of each syllable type in self.syll_df occur in a bout with length equal to or shorter than max_short_bout_len. This can be useful for identifying which syllable types reflect calls.

Parameters

max_short_bout_len (int, optional) – The maximum length of bout where the bout will be considered ‘short’ and occurances of syllables within bouts of that length or shorter will contribute to the count of syllables occuring in short bouts. The default value is 2.

Returns

all_syll_counts_df – Dataframe with columns ‘syllable’, ‘full_count’, ‘short_bout_count’, ‘Bird_ID’ and ‘prop_short_bout’, where ‘syllable’ contains the label of a syllable type in self.syll_df, ‘full_count’ contains the total number of times that syllable occurs in self.syll_df, ‘short_bout_count’ contains the total number of times that syllable occurs in a bout of length max_short_bout_len or shorter, and ‘prop_short_bout’ contains the proportion of all occurances of the syllable in short bouts. This proportion can be useful for identifying which syllable types represent calls.

Return type

Pandas DataFrame

get_single_repetition_stats()

Analyzes repetitions of single syllables. Specifically looks at occurances of repetition bouts of different durations (ie 2 identical syllables in a row, 3 identical syllables in a row, etc.).

Returns

  • rep_count_df (Pandas DataFrame) – Dataframe containing counts of syllable occurances in repetition bouts of different durations for every syllable type in self.syll_df.labels. There is one row per syllable and column names refer to duration of repetition bout. 1 = syllable produced but not repeated. 2 = syllable repeated twice in a row only etc.

  • rep_stats_df (Pandas DataFrame) – DataFrame containing statistics about repetition bout length. Columns contain the mean_bout_length, median_bout_length, and CV_bout_length, where bout refers to a repetition bout, ie an instance of the same syllable being repeated many times. These values can be used for identifying introductory notes and/or abnormally repeated song syllables.

make_syntax_raster(alignment_syllable=None, sort_bouts=True)

Create a dataframe where each row reflects a song bout (a sequence of syllables flanked by file boundaries or long silent gaps), and each cell contains the label of the song syllable produced at that index in the song bout. This can be plotted using plot_syntax_raster() to get an view of song syntax variability from the subject bird.

Parameters
  • alignment_syllable (string, optional) – The alignment syllable should correspond to a syllable label in self.syll_df.labels. If provided, song bouts will be aligned such that the first occurance of the alignment syllable happens at the same index across bouts. This can make it easier to detect patterns in syntax across bouts. It is generally best to set the alignment syllable to be the first syllable of the dominant song motif, following any intro notes.

  • sort_bouts (bool, optional) – If True, bouts will be sorted such that bouts with more similar sequences will occupy sequential rows in syntax_raster_df. This can make it easier to detect syntax patterns agnostic to the order in which bouts were produced. If False, the order of bouts in syntax_raster_df will be the order in which the bouts occur in self.syll_df. The default is True.

Returns

syntax_raster_df – Dataframe where each row reflects a song bout (a sequence of syllables flanked by file boundaries or long silent gaps), and each cell contains the label of the song syllable produced at that index in the song bout, based on self.syll_df.labels. The number of columns depends on the length of the longest song bout. This can be plotted using plot_syntax_raster() to get a view of song syntx variability from the subject bird.

Return type

Pandas DataFrame

make_transition_matrix()

This funtion calculates the first order transition matrix of syllables in syll_df. It creates 2 new attributes to self; self.trans_mat which contains the raw counts of each transition between syllables types and self.trans_mat_prob which contains the conditional probability of a transition, given that a particular syllable was just produced.

Returns

Return type

None.

plot_syntax_raster(syntax_raster_df, figsize=(10, 10), title=None, palette='husl')

Plots a syntax_raster_df dataframe.

Parameters
  • syntax_raster_df (Pandas DataFrame) – Dataframe where each row reflects a song bout (a sequence of syllables flanked by file boundaries or long silent gaps), and each cell contains the label of the song syllable produced at that index in the song bout, based on self.syll_df.labels. This is returned by self.make_syntax_raster().

  • figsize (tuple, optional) – Tuple specifying dimensions of output figure. The default is (10, 10)

  • title (String, optional) – Title of the output figure. The default is None, which will result in a figure without a title.

  • palette (string or sequence, optional) – String corresponding to the name of a seaborn palette, matplotlib colormap or sequence of colors in any format matplotlib accepts. See seaborn.color_palette() documentation for more information. The default is ‘husl’.

Returns

Return type

None

save_syntax_data(output_directory)

Saves a copy of .syll_df as a .csv file in the output directory. Also saves CSVs of the transition matrices if they exist, and a syntax analysis metadata.csv file with information on the processes used to modify syll_df and create the transition matrices

Parameters

output_directory (string) – Path to a folder in which to save [Bird_ID]_syll_df.csv, [Bird_ID]_syntax_analysis_metadata.csv, and [Bird_ID]_trans_mat.csv and [Bird_ID]_trans_mat_prob.csv, if they exist.

Returns

syntax_analysis_metadata – Dataframe containing information about the package version and processing steps used in creating the versions of .syll_df and transition matrices that were saved by the function.

Return type

Pandas DataFrame

class avn.syntax.Utils

Bases: object

Contains syntax analysis utilities

calc_entropy_rate_all_birds(syll_df_folder_path, syll_df_file_name_suffix, song_folder_path, min_gap=0.2, label_column_name=None)

Creates a dataframe with the syntax entropy rate of each bird in Bird_IDs.

Parameters
  • Bird_IDs (list of strings) – List of Bird_IDs (as strings) for which the transition matrix should be plotted.

  • syll_df_folder_path (string) – Path to a folder containing a syll_df for each bird in Bird_IDs. The syll_df must be a dataframe with one row for every syllable to be analyzed from the subject bird. It must contain columns onsets and offsets which contain the timestamp in seconds at which the syllable occurs within a file, files which contains the name of the .wav file in which the syllable is found, and labels which contains a categorical label for the syllable type. These can be generated through manual song annotation, or automated labeling methods. The syll_df files must be .csv files named Bird_ID_`syll_df_file_name_suffix`.

  • syll_df_file_name_suffix (string) – String that specifies the name of the file containing syll_df for each Bird_ID. For example, if syll_df files are named ‘Bird_ID_syll_df.csv’, syll_df_file_name_suffix should be ‘_syll_df.csv’.

  • song_folder_path (string) – Path to a folder containing subfolders named according to the Bird_IDs, where each subfolder contains the complete set of .wav files used to generate the syll_df loaded from syll_df_folder_path.

  • min_gap (float, optional) – Minimum duration in seconds for a gap between syllables to be considered syntactically relevant. This value should be selected such that gaps between syllables in a bout are shorter than min_gap, but gaps between bouts are longer than min_gap. The default is 0.2.

  • label_column_name (string, optional) – If the column of the syll_df containing syllable labels is not called ‘labels’, the name of that column should be specified here as a string. If no value is provided, an existing column called ‘labels’ in syll_df will be used as syllable labels.

Returns

all_entropy_rates – Dataframe with columns ‘Bird_ID’, ‘entropy_rate’, ‘num_unique_syll_types’ and ‘entropy_rate_norm’. ‘entropy_rate’ contains the raw syntax entropy rate, and ‘entropy_rate_norm’ contains an entopy rate value that is normalized to account for the number of unique syllable types.

Return type

Pandas DataFrame

get_syll_stats_all_birds(syll_df_folder_path, syll_df_file_name_suffix, song_folder_path, min_gap=0.2, label_column_name=None, max_short_bout_len=2)

Compile all per-syllable syntax statistics from all birds in Bird_IDs into a single dataframe. This dataframe can then be used to detect syllable with abnormal repetition patterns.

Parameters
  • Bird_IDs (list of strings) – List of Bird_IDs (as strings) for which the transition matrix should be plotted.

  • syll_df_folder_path (string) – Path to a folder containing a syll_df for each bird in Bird_IDs. The syll_df must be a dataframe with one row for every syllable to be analyzed from the subject bird. It must contain columns onsets and offsets which contain the timestamp in seconds at which the syllable occurs within a file, files which contains the name of the .wav file in which the syllable is found, and labels which contains a categorical label for the syllable type. These can be generated through manual song annotation, or automated labeling methods. The syll_df files must be .csv files named Bird_ID_`syll_df_file_name_suffix`.

  • syll_df_file_name_suffix (string) – String that specifies the name of the file containing syll_df for each Bird_ID. For example, if syll_df files are named ‘Bird_ID_syll_df.csv’, syll_df_file_name_suffix should be ‘_syll_df.csv’.

  • song_folder_path (string) – Path to a folder containing subfolders named according to the Bird_IDs, where each subfolder contains the complete set of .wav files used to generate the syll_df loaded from syll_df_folder_path.

  • min_gap (float, optional) – Minimum duration in seconds for a gap between syllables to be considered syntactically relevant. This value should be selected such that gaps between syllables in a bout are shorter than min_gap, but gaps between bouts are longer than min_gap. The default is 0.2.

  • label_column_name (string, optional) – If the column of the syll_df containing syllable labels is not called ‘labels’, the name of that column should be specified here as a string. If no value is provided, an existing column called ‘labels’ in syll_df will be used as syllable labels.

  • max_short_bout_len (int, optional) – The maximum length of bout where the bout will be considered ‘short’ and occurances of syllables within bouts of that length or shorter will contribute to the count of syllables occuring in short bouts. This is used to identify calls. The default value is 2.

Returns

syll_stats_all_birds – Dataframe with one row for each unique syllable type produced by each bird in Bird_IDs containing information about the repetition and syntax patterns of each syllable. This can be used for detecting abnormal syllable types with Utils.identify_abnormal_syllables().

Return type

Pandas DataFrame

identify_abnormal_syllables(std_cutoff=2, exclude_calls=True, exclude_intro_notes=True, syll_labels_to_exclude=[- 1], prop_short_bout_cutoff=0.5)

Identifies syllables that are over std_cutoff standard deviations from the mean in terms of mean_repetition_length or CV_repetition_length, and returns a version of syll_stats_all_birds with a new column ‘abnormal_repetition’ containing a boolean to indicate whether that syllable exhibits unusually high repetition or repetition variability.

Parameters
  • syll_stats_all_birds (Pandas DataFrame) – Dataframe with one row for each unique syllable type produced by each bird in Bird_IDs containing information about the repetition and syntax patterns of each syllable.

  • std_cutoff (float, optional) – The number of standard deviations from the mean a syllable feature must be for that syllable to be identified as ‘abnormal’. The default value is 2.

  • exclude_calls (bool, optional) – If True, syllables which occur in short bouts > prop_short_bout_cutoff proprotion of the time will be considered calls, and not be considered when calculating the mean and std used to identify abnormal syllable types. These calls will also cannot be identified as ‘abnormal’. If False, syllable occuring in short bouts at high rates will be treated like standard syllables. The default value is True.

  • exclude_intro_notes (bool, optional) – If True, syllables with intro_note == True will not be considered when calculating the mean and std used to identify abnormal syllable types. These intro notes also cannot be identified as ‘abnormal’. If False, intro notes will be treated like standard syllables. The default value is True.

  • syll_labels_to_exclude (list, optional) – List of syllable labels that should not be considered when calculating the mean and std used to identify abnormal syllables. For example, if syllables are labeled automatically with HDBSCAN, the label ‘-1’ doesn’t reflect a relevant grouping of syllables, and thus shouldn’t contribute to population statistics about syllable repetition patterns. The default value is [-1].

  • prop_short_bout_cutoff (float between 0 and 1, optional) – If exclude_calls == True, syllables with which occur in short bouts with a proportion greater than this value will be considered calls and be excluded from analysis of abnormal syllables.

Returns

syll_stats_all_birds – Copy of input syll_stats_all_birds dataframe, with will a column called ‘abnormal_repetition’ added, which contains a boolean value indicating whether that syllable has a mean_repetition_length or CV_repetition_length over std_cutoff standard deviations from the mean.

Return type

Pandas DataFrame

merge_per_syll_stats(short_bout_counts, intro_notes_df)

Merge 3 dataframes containing syntax related measures per syllable type into a single dataframe with all per syllable syntax stats.

Parameters
  • single_rep_stats (Pandas DataFrame) – Dataframe with columns ‘Bird_ID’ and ‘syllable’, as well as other columns with summary statistics, which contains one row per unique syllable type in the bird’s repertoire. This could be the single_rep_stats dataframe returned by .get_single_repetition_stats().

  • short_bout_counts (Pandas DataFrame) – Dataframe with columns ‘Bird_ID’ and ‘syllable’, as well as other columns with summary statistics, which contains one row per unique syllable type in the bird’s repertoire. This could be the ‘short_bout_counts’ dataframe returned by .get_prop_sylls_in_short_bouts().

  • intro_notes_df (Pandas DataFrame) – Dataframe with columns ‘Bird_ID’ and ‘syllable’, as well as other columns with summary statistics, which contains one row per unique syllable type in the bird’s repertoirer. This could be the ‘intro_notes_df’ dataframe returned by .get_intro_notes_df().

Returns

syllable_syntax_stats – DataFrame resulting from merge of the 3 input dataframes on columns ‘Bird_ID’ and ‘syllable’.

Return type

Pandas DataFrame

plot_syntax_raster_all_birds(syll_df_folder_path, syll_df_file_name_suffix, song_folder_path, min_gap=0.2, label_column_name=None, figsize=(10, 10), sort_bouts=True, calc_entropy_rate=True)

Plots the syntax raster plot for each bird in Bird_IDs.

Parameters
  • Bird_IDs (list of strings) – List of Bird_IDs (as strings) for which the transition matrix should be plotted.

  • syll_df_folder_path (string) – Path to a folder containing a syll_df for each bird in Bird_IDs. The syll_df must be a dataframe with one row for every syllable to be analyzed from the subject bird. It must contain columns onsets and offsets which contain the timestamp in seconds at which the syllable occurs within a file, files which contains the name of the .wav file in which the syllable is found, and labels which contains a categorical label for the syllable type. These can be generated through manual song annotation, or automated labeling methods. The syll_df files must be .csv files named Bird_ID_`syll_df_file_name_suffix`.

  • syll_df_file_name_suffix (string) – String that specifies the name of the file containing syll_df for each Bird_ID. For example, if syll_df files are named ‘Bird_ID_syll_df.csv’, syll_df_file_name_suffix should be ‘_syll_df.csv’.

  • song_folder_path (string) – Path to a folder containing subfolders named according to the Bird_IDs, where each subfolder contains the complete set of .wav files used to generate the syll_df loaded from syll_df_folder_path.

  • min_gap (float, optional) – Minimum duration in seconds for a gap between syllables to be considered syntactically relevant. This value should be selected such that gaps between syllables in a bout are shorter than min_gap, but gaps between bouts are longer than min_gap. The default is 0.2.

  • label_column_name (string, optional) – If the column of the syll_df containing syllable labels is not called ‘labels’, the name of that column should be specified here as a string. If no value is provided, an existing column called ‘labels’ in syll_df will be used as syllable labels.

  • figsize (tuple, optional) – Tuple to specify dimensions of each output syntax raster plot. The default is (10,10).

  • sort_bouts (bool, optional) – If True, bouts will be sorted such that bouts with more similar sequences will occupy sequential rows in the plot. This can make it easier to detect syntax patterns agnostic to the order in which bouts were produced. If False, the order of bouts in syntax_raster_df will be the order in which the bouts occur in self.syll_df. The default is True.

  • calc_entropy_rate (bool, optional) – Determines whether entropy rate is calculated for each bird. If True, entropy rate will be calculated and reported in the title of the syntax raster plot for each bird. The default is True.

Returns

Return type

None

plot_transition_matrix_all_birds(syll_df_folder_path, syll_df_file_name_suffix, song_folder_path, min_gap=0.2, calc_entropy_rate=True, label_column_name=None, trans_mat_version='prob', figsize=(10, 8))

Plots the transition matrices of all birds in Bird_IDs

Parameters
  • Bird_IDs (list of strings) – List of Bird_IDs (as strings) for which the transition matrix should be plotted.

  • syll_df_folder_path (string) – Path to a folder containing a syll_df for each bird in Bird_ID. The syll_df must be a dataframe with one row for every syllable to be analyzed from the subject bird. It must contain columns onsets and offsets which contain the timestamp in seconds at which the syllable occurs within a file, files which contains the name of the .wav file in which the syllable is found, and labels which contains a categorical label for the syllable type. These can be generated through manual song annotation, or automated labeling methods. The syll_df files must be .csv files named Bird_ID_`syll_df_file_name_suffix`.

  • syll_df_file_name_suffix (string) – String that specifies the name of the file containing syll_df for each Bird_ID. For example, if syll_df files are named ‘Bird_ID_syll_df.csv’, syll_df_file_name_suffix should be ‘_syll_df.csv’.

  • song_folder_path (string) – Path to a folder containing subfolders named according to the Bird_IDs, where each subfolder contains the complete set of .wav files used to generate the syll_df loaded from syll_df_folder_path.

  • min_gap (float, optional) – Minimum duration in seconds for a gap between syllables to be considered syntactically relevant. This value should be selected such that gaps between syllables in a bout are shorter than min_gap, but gaps between bouts are longer than min_gap. The default is 0.2.

  • calc_entropy_rate (bool, optional) – Determines whether entropy rate is calculated for each bird. If True, entropy rate will be calculated and reported in the title of the transition matrix plot for each bird. The default is True.

  • label_column_name (string, optional) – If the column of the syll_df containing syllable labels is not called ‘labels’, the name of that column should be specified here as a string. If no value is provided, an existing column called ‘labels’ in syll_df will be used as syllable labels.

  • trans_mat_version ('prob' or 'count', optional) – Specifies whether to plot transition probabilities in the transition matrix or counts of transitions in the dataset. The default value is ‘prob’ which results in the plotting of transition probabilities between syllables.

  • figsize (tuple, optional) – Tuple which sets the dimensions of each output transition matrix plot.

Returns

Return type

None

avn.acoustics module

Created on Thu Mar 2 09:18:15 2023

@author: Therese

class avn.acoustics.AcousticData(Bird_ID, syll_df, song_folder_path, win_length=400, hop_length=40, n_fft=1024, max_F0=1830, min_frequency=380, freq_range=0.5, baseline_amp=70, fmax_yin=8000)

Bases: object

Acoustic Feature data pertaining to a set of syllables in syll_df.

Parameters
  • Bird_ID (str) – String containing a unique identifier for the subject bird.

  • syll_df (pd.DataFrame) – pandas dataframe containing one row for every syllable to be analyzed from the subject bird. It must contain columns onsets and offsets which contain the timestamp in seconds at which the syllable occurs within a file, and files which contains the name of the .wav file in which the syllable is found. These can be generated through manual song annotation, or automated segmentation methods.

  • song_folder_path (str) – Path to folder containing the .wav files of the songs in syll_df. Should end with ‘/’.

  • win_length (int, optional) – Length of window over which to calculate each feature in samples. Defaults to 400.

  • hop_length (int, optional) – Number of samples to advance between windows. Defaults to 40.

  • n_fft (int, optional) – Length of the transformed axis of the output. If n is smaller than the length of the win_length, the input is cropped. If it is larger, the input is padded with zeros. Defaults to 1024.

  • max_F0 (int, optional) – Maximum allowable fundamental frequency of signal in Hz. Defaults to 1830.

  • min_frequency (int, optional) – Lower frequency cutoff in Hz. Only power at frequencies above this will contribute to feature calculation. Defaults to 380.

  • freq_range (float, optional) – Proportion of power spectrum frequency bins to consider. Defaults to 0.5, meaning we only consider the lower half of the frequency range. This is consistent with SAP.

  • baseline_amp (int, optional) – Baseline amplitude used to calculated amplitude in dB. Defaults to 70.

  • fmax_yin (int, optional) – Maximum frequency in Hz used to estimate fundamental frequency with the YIN algorithm. Defaults to 8000.

calc_all_feature_stats(features=['Goodness', 'Mean_frequency', 'Entropy', 'Amplitude', 'Amplitude_modulation', 'Frequency_modulation', 'Pitch'])

Calculates summary statistics for all acoustic features specified for each song interval in the syll_df.

This method returns a DataFrame containing the mean, min, max, std, 25th percentile, 50th percentile and 75th percentile values for each acoustic feature specified for each song intervals in the syll_df. These values can be useful for clustering syllables, detecting unusual syllable types, or measuring song changes after a manipulation.

Parameters

features (list, optional) – This is a list of all acoustic features you want returned. By default, all available acoustic features will be returned. That consists of ‘Goodness’, ‘Mean_frequency’, ‘Entropy’, ‘Amplitude’, ‘Amplitude_modulation’, ‘Frequency_modulation’, and ‘Pitch’. If you don’t need all these features, pass a list of only those features you do want. Be sure to enter the feature names exactly as written above, otherwise the feature will not be calculated.

Returns

DataFrame with one row for each song interval in the syll_df, and one column for each summary statistic for each acoustic feature (organized with hierarchical column indexing).

Return type

pd.DataFrame

calc_all_features(features=['Goodness', 'Mean_frequency', 'Entropy', 'Amplitude', 'Amplitude_modulation', 'Frequency_modulation', 'Pitch'])

Calculates all specified acoustic features as time series for each song interval in the syllable table

Returns a dataframe with one row for each syllable in the syll_df, and a column for each acoustic feature. Each cell contains a vector with the acoustic feature values for each short time window in the interval.

NOTE: It is generally more useful to instead have summary statistics for each feature for each syllable(ie the mean and std of the feature, rather than it’s value as a time series). For this, see .calc_all_feature_stats().

Parameters

features (list, optional) – This is a list of all acoustic features you want returned. By default, all available acoustic features will be returned. That consists of ‘Goodness’, ‘Mean_frequency’, ‘Entropy’, ‘Amplitude’, ‘Amplitude_modulation’, ‘Frequency_modulation’, and ‘Pitch’. If you don’t need all these features, pass a list of only those features you do want. Be sure to enter the feature names exactly as written above, otherwise the feature will not be calculated.

Returns

DataFrame with one row for each syllable in the syll_df, and a column for each acoustic feature. Each cell contains a vector with the acoustic feature values for each short time window in the interval.

Return type

pd.DataFrame

save_feature_stats(out_file_path, file_name, features=['Goodness', 'Mean_frequency', 'Entropy', 'Amplitude', 'Amplitude_modulation', 'Frequency_modulation', 'Pitch'])

Save summary statistics for all features specified for each syllable in syll_df.

Saves a dataframe containing the mean, min, max, std, 25th percentile, 50th percentile and 75th percentile values for each acoustic feature specified and for each song intervals in the syll_df in a file called file_name_all_feature_stats.csv. These values can be useful for clustering syllables, detecting unusual syllable types, or detecting song changes after a manipulation. It also saves a .csv file called file_name_metadata.csv with all the hyperparameter values used to calculate the features, as well as the avn version.

Parameters
  • out_file_path (str) – Path to a folder in which to save the .csv files. Must end in ‘/’.

  • file_name (str) – name of the file to serve as the root name for the _all_feature_stats.csv and _metadata.csv files.

  • features (list, optional) – This is a list of all acoustic features you want returned. By default, all available acoustic features will be returned. That consists of ‘Goodness’, ‘Mean_frequency’, ‘Entropy’, ‘Amplitude’, ‘Amplitude_modulation’, ‘Frequency_modulation’, and ‘Pitch’. If you don’t need all these features, pass a list of only those features you do want. Be sure to enter the feature names exactly as written above, otherwise the feature will not be calculated.

save_features(out_file_path, file_name, features=['Goodness', 'Mean_frequency', 'Entropy', 'Amplitude', 'Amplitude_modulation', 'Frequency_modulation', 'Pitch'])

Save acoustic feature time series for each song interval in syll_df and metadata as .csv files.

Saves a table with all specified acoustic features as time series for each song interval in syll_df as a .csv file called file_name_all_features.csv. It also saves a .csv file called file_name_metadata.csv with all the hyperparameter values used to calculate the features, as well as the avn version.

NOTE: Saving The full time series for all features will occupy considerable disk space and isn’t necessary in most cases. See .save_feature_stats() to save summary statistics for each feature for each syllable, rather than the full time series.

Parameters
  • out_file_path (str) – Path to a folder in which to save the .csv files. Must end in ‘/’.

  • file_name (str) – name of the file to serve as the root name for the _all_features.csv and _metadata.csv files.

  • features (list, optional) – This is a list of all acoustic features you want returned. By default, all available acoustic features will be returned. That consists of ‘Goodness’, ‘Mean_frequency’, ‘Entropy’, ‘Amplitude’, ‘Amplitude_modulation’, ‘Frequency_modulation’, and ‘Pitch’. If you don’t need all these features, pass a list of only those features you do want. Be sure to enter the feature names exactly as written above, otherwise the feature will not be calculated.

class avn.acoustics.SongInterval(song_file, onset=0, offset=None, win_length=400, hop_length=40, n_fft=1024, max_F0=1830, min_frequency=380, freq_range=0.5, baseline_amp=70, fmax_yin=8000)

Bases: object

Acoustic Feature data pertaining to a single interval of audio

Parameters
  • song_file (avn.dataloading.SongFile) – SongFile instance containing audio interval of interest.

  • onset (int, optional) – onset timestamp in seconds of the interval of interest within the SongFile. Defaults to 0.

  • offset (int, optional) – offset timestamp in seconds of the interval of interest within the SongFile. If not specified, this will correspond to the end of the SongFile.

  • win_length (int, optional) – Length of window over which to calculate each feature in samples. Defaults to 400.

  • hop_length (int, optional) – Number of samples to advance between windows. Defaults to 40.

  • n_fft (int, optional) – Length of the transformed axis of the output. If n is smaller than the length of the win_length, the input is cropped. If it is larger, the input is padded with zeros. Defaults to 1024.

  • max_F0 (int, optional) – Maximum allowable fundamental frequency of signal in Hz. Defaults to 1830.

  • min_frequency (int, optional) – Lower frequency cutoff in Hz. Only power at frequencies above this will contribute to feature calculations. Defaults to 380.

  • freq_range (float, optional) – Proportion of power spectrum frequency bins to consider. Defaults to 0.5, meaning we only consider the lower half of the frequency range. This is consistent with SAP.

  • baseline_amp (int, optional) – Baseline amplitude used to calculated amplitude in dB. Defaults to 70.

  • fmax_yin (int, optional) – Maximum frequency in Hz used to estimate fundamental frequency with the YIN algorithm. Defaults to 8000.

calc_all_features(features=['Goodness', 'Mean_frequency', 'Entropy', 'Amplitude', 'Amplitude_modulation', 'Frequency_modulation', 'Pitch'])

Calculate all acoustic features for each window in the song interval.

This method returns a dictionary containing the time series values for Goodness, Mean_frequency, Entropy, Amplitude, Amplitude_modulation, Frequency_modulation, and Pitch calculated for each window in the song interval.

Parameters

features (list, optional) – This is a list of all acoustic features you want returned. By default, all available acoustic features will be returned. That consists of ‘Goodness’, ‘Mean_frequency’, ‘Entropy’, ‘Amplitude’, ‘Amplitude_modulation’, ‘Frequency_modulation’, and ‘Pitch’. If you don’t need all these features, pass a list of only those features you do want. Be sure to enter the feature names exactly as written above, otherwise the feature will not be calculated.

Returns

dictionary containing the time series values for

Goodness, Mean_frequency, Entropy, Amplitude, Amplitude_modulation, Frequency_modulation, and Pitch (or the specified subset thereof) as np.arrays. :rtype: dict

calc_amplitude()

Calculates the amplitude of each window in a song interval.

Amplitude is the volume of a sound in decibels, considering only frequencies above min_frequency.

Returns

array containing the amplitude of each frame in the song interval in decibels

Return type

np.array

calc_amplitude_modulation()

Calculates the amplitude modulation of each window in a song interval.

Amplitude modulation is a measure of the rate of change of the amplitude of a signal. It will be positive at the beginning of a song syllable and negative at the end.

Returns

array containing the amplitude modulation of each frame in the song interval.

Return type

np.array

calc_entropy()

Calculates the Wiener entropy of each window in a song interval.

Weiner entropy is a measure of the uniformity of power spread across frequency bands in a frame of audio. The output of this function is log-scaled Weiner entropy, which can range in value from 0 to negative infinity. A score close to 0 indicates broadly spread power across frequency bands, ie a less structured sound like white noise. A large negative score indicates low uniformity across frequency bands, ie a more structured sound like a harmonic stack or pure tone.

Returns

array containing the log-scaled Weiner entropy of each frame in the song interval.

Return type

np.array

calc_feature_stats(features=['Goodness', 'Mean_frequency', 'Entropy', 'Amplitude', 'Amplitude_modulation', 'Frequency_modulation', 'Pitch'])

Calculate summary statistics for acoustic features over a song interval.

This method returns a dataframe containing the mean, min, max, std, 25th percentile, 50th percentile and 75th percentile values for all acoustic features specified from among ‘Goodness’, ‘Mean_frequency’, ‘Entropy’, ‘Amplitude’, ‘Amplitude_modulation’, ‘Frequency_modulation’, and ‘Pitch’ across the current song interval.

Parameters

features (list, optional) – This is a list of all acoustic features you want returned. By default, all available acoustic features will be returned. That consists of ‘Goodness’, ‘Mean_frequency’, ‘Entropy’, ‘Amplitude’, ‘Amplitude_modulation’, ‘Frequency_modulation’, and ‘Pitch’. If you don’t need all these features, pass a list of only those features you do want. Be sure to enter the feature names exactly as written above, otherwise the feature will not be calculated.

Returns

Dataframe with a column for each acoustic feature, with the different summary statistics in each row.

Return type

pd.DataFrame

calc_frequency_modulation()

Calculates the frequency modulation of each window in a song interval.

Frequency Modulation can be thought of as the slope of frequency traces in a spectrogram. A high frequency modulation score is indicative of a sound who’s pitch is changing rapidly, or which is noisy and has an unstable pitch. A low frequency modulation score indicates that the pitch of a sound is stable (like in a flat harmonic stack). This implementation is based on SAP.

Returns

array containing the frequency modulation of each frame in the song interval.

Return type

np.array

calc_goodness()

Calculates the goodness of pitch of each window in a song interval.

Goodness of pitch is an estimate of the harmonic periodicity of a signal. Higher values indicate a more periodic sound (like a harmonic stack), whereas lower values indicate less periodic sounds (like noise). Formally, it is the peak of the cepstrum of the signal for fundamental frequencies below max_F0.

Returns

array containing the goodness of pitch for each frame in the song interval.

Return type

np.array

calc_mean_frequency()

Calculates the mean frequency of each window in a song interval.

This is one way to estimate the pitch of a signal. It is the center of the distribution of power across frequencies in the signal. For another estimate of pitch, see SongInterval.calc_pitch()

Returns

array containing the mean frequency of each frame in the song interval in Hz.

Return type

np.array

calc_pitch()

Estimates the fundamental frequency (or pitch) of each window in a song interval using the yin algorithm.

For more information on the YIN algorithm for fundamental frequency estimation, please refer to the documentation for librosa.yin().

Returns

array containing the YIN estimated fundamental frequency of each frame in the song interval in Hertz.

Return type

np.array

plot_feature(feature, figsize=(20, 5))

Plot specified acoustic feature over spectrogram of song interval.

Parameters
  • feature (str) – The acoustic feature that you want plotted. Options are ‘Goodness’, ‘Mean_frequency’,

  • 'Entropy'

  • 'Amplitude'

  • 'Amplitude_modulation'

  • 'Frequency_modulation'

  • 'Pitch'. (and) –

  • figsize (tuple, optional) – dimensions of figure. Defaults to (20, 5).

save_feature_stats(out_file_path, file_name, features=['Goodness', 'Mean_frequency', 'Entropy', 'Amplitude', 'Amplitude_modulation', 'Frequency_modulation', 'Pitch'])

Save summary statistics for acoustic features and metadata as .csv files.

Saves a table with summary statistics (mean, max, min, std, 25th, 50th and 75th percentiles) for each acoustic feature across the song interval as a .csv file called file_name_feature_stats.csv. It also saves a .csv file called file_name_metadata.csv with all the hyperparameter values used to calculate the features, as well as the avn version, the original file name and the onset and offset timestamps.

Parameters
  • out_file_path (str) – Path to a folder in which to save the .csv files. Must end in ‘/’.

  • file_name (str) – name of the file to serve as the root name for the _feature_stats.csv and _metadata.csv files.

  • features (list, optional) – This is a list of all acoustic features you want returned. By default, all available acoustic features will be returned. That consists of ‘Goodness’, ‘Mean_frequency’, ‘Entropy’, ‘Amplitude’, ‘Amplitude_modulation’, ‘Frequency_modulation’, and ‘Pitch’. If you don’t need all these features, pass a list of only those features you do want. Be sure to enter the feature names exactly as written above, otherwise the feature will not be calculated.

save_features(out_file_path, file_name, features=['Goodness', 'Mean_frequency', 'Entropy', 'Amplitude', 'Amplitude_modulation', 'Frequency_modulation', 'Pitch'])

Save acoustic features and metadata as .csv files.

Saves a table with the acoustic features for each window in the song interval as a .csv file called file_name_features.csv. It also saves a .csv file called file_name_metadata.csv with all the hyperparameter values used to calculate the features, as well as the avn version, the original file name and the onset and offset timestamps of the interval within the file.

Parameters
  • out_file_path (str) – Path to a folder in which to save the .csv files. Must end in ‘/’.

  • file_name (str) – name of the file to serve as the root name for the _features.csv and _metadata.csv files.

  • features (list, optional) – This is a list of all acoustic features you want returned. By default, all available acoustic features will be returned. That consists of ‘Goodness’, ‘Mean_frequency’, ‘Entropy’, ‘Amplitude’, ‘Amplitude_modulation’, ‘Frequency_modulation’, and ‘Pitch’. If you don’t need all these features, pass a list of only those features you do want. Be sure to enter the feature names exactly as written above, otherwise the feature will not be calculated.

avn.timing module

Created on Thu Dec 14 10:30:22 2023

@author: Therese

class avn.timing.RhythmAnalysis(Bird_ID)

Bases: object

calc_peak_frequency_cv()

calculate the cv of the peak frequencies in the rhythm spectrum

Calculates the coefficient of variation (CV) of the frequency with the highest magnitude in the rhythm spectrum of a bird, as a measure of the rendition-to-rendition song timing variability. To calculate this you must first generate the table of peak frequencies using .get_refined_peak_frequencies().

Returns

CV of peak frequency

Return type

float

calc_rhythm_spectrogram_entropy()

Calculate the Weiner entropy of the mean rhythm spectrum

Calculate the Weiner entropy of the mean rhythm power spectrum based on the RhythmAnalysis object’s .rhythm_spectrogram attribute. To create this attribute, please first run rhythm_analysis.make_rhythm_spectrogram() This is calculated in the same way as the acoustic feature ‘entropy’. It is a measure of the uniformity of power spread across frequency bands. The output is log-scaled and can range from 0 to negative infinity. Scores closer to 0 indicate a broader spread of power across bands, and are typical of birds with very variable song timing, such as juvenile birds.

Returns

The log-scaled Weiner entropy of the mean rhythm power spectrum across all spectra included in rhythm_spectrogram.

Return type

float

get_refined_peak_frequencies(freq_range=3)

get frequencies with peak magnitude in rhythm spectrogram.

In addition to the entropy of the mean spectrum, we can investigate rendition-to-rendition timing variability by looking at the variability of the frequency with the highest magnitude in the rhythm spectrogram. To do this reliably, we first find the median of the peak frequency across all frames in the rhythm spectrogram. Then, we restrict the peak frequency search to the freq_range frequencies centered on the median. This helps reduce variability caused by the peak jumping to different harmonic bands, as this doesn’t reflect a meaningful differences in the timing structure. The resulting peaks can then be plotted with .plot_peak_frequencies() and their CV can be calculated with .calc_peak_frequency_cv().

Parameters

freq_range (int, optional) – range of frequencies in Hz centered on the median peak frequency within which to search for the refined peak frequency. For example, if a bird’s median peak frequency is 15Hz when we consider the full rhythm spectrum, and the freq_range is 3 Hz, then the refined peak frequency would be the peak frequency between 15-1.5 Hz and 15+1.5 Hz across all files in the rhythm spectrogram. This should be set to the largest value possible, while still being less than the distance between harmonic bands in a bird with clear harmonic structure in their rhythm spectrogram. defaults to 3

Returns

dataframe containing the refined peak frequency for each file in the rhythm spectrogram.

Return type

pd.DataFrame

make_rhythm_spectrogram(song_files=None, song_folder_path=None, frame_length=3, derivative=True, padded_length=100000, max_frequency=30, hop_length=0.2, n_windows=3)

make rhythm spectrogram dataframe

Generate a rhythm spectrogram using all wav files is song_folder_path, or all files inn song_file. This rhythm spectrogram will have one column per file, where each column represents the rhythm spectrum of that file.

Files containing typical, mature zebra finch song will have prominent harmonic banding in their rhythm spectra, which will be consistent from file to file, resulting in a rhythm spectrogram with horizontal stripes and little jitter.

Immature birds, or birds with very inconsistent song timing will have power more evenly spread across the frequency bands in their rhythm spectrogram, and/or be less consistent from file to file.

For more detailed information on how the rhythm spectrum is calculated for each file, see RhythmAnalysis.rhythm_spectrum_single_file().

Parameters
  • song_files (list of strings, optional) – A value must be provided for song_files OR song_folder_path, but not both. List containing the paths to .wav files to include in the rhythm spectrogram.

  • song_folder_path (str, optional) – A value must be provided for song_files OR song_folder_path, but not both. Path to a folder containing .wav files to include in the rhythm spectrogram. All .wav files of sufficient duration will be included from this folder.

  • frame_length (int, optional) – length in seconds of each frame within the .wav file over which to compute a spectrum, defaults to 3

  • derivative (bool, optional) – If True, the rhythm spectrum will be calculated on the first derivative of the amplitude. If False, the rhythm spectrum will be calculated directly on this amplitude. This generally results in all energy being concentrated at very low frequencies, making it difficult to see harmonic structure. defaults to True

  • padded_length (int, optional) – size in frames to which to pad the amplitude before calculating its spectrum. Larger values will result in smoother interpolation of points in the spectrum. defaults to 100000

  • max_frequency (int, optional) – maximum frequency in Hz to include in the rhythm spectrum, defaults to 30

  • hop_length (float, optional) – time between windows in seconds when windowing .wav file, defaults to 0.2

  • n_windows (int, optional) – number of windows over which to average the rhythm spectra when calculating the rhythm spectrum of a single file, defaults to 3

Returns

rhythm spectrum

Return type

pd.DataFrame()

plot_peak_frequencies(figsize=(6, 6), cbar=False, title=None, cmap='rocket', s=15, color='aqua')

plot the peak frequencies over the rhythm spectrogram

Plots a point over the rhythm spectrogram at the frequency band with the highest magnitude in that frame. This can help illustrate rendition-to-rendition variability in the rhythm spectrogram. To plot this you must first generate the table of peak frequencies using .get_refined_peak_frequencies().

Parameters
  • figsize (tuple, optional) – width and height of figure in inches, defaults to (6, 6)

  • cbar (bool, optional) – If True, the colorbar will be including in the figure. If False, it will be omitted. defaults to False

  • title (string, optional) – Plot’s title. If not specified, the bird ID associated with the RhythmAnalysis object will be used as the title. defaults to None

  • cmap (str, optional) – matplotlib color map name, defaults to ‘rocket’

  • s (int, optional) – size of peak frequency marker in points, defaults to 15

  • color (matplotlib color specification, optional) – color of peak frequency marker, defaults to ‘aqua’

Returns

figure of the rhythm spectrogram overlaid with points indicating the peak frequencies for each frame.

Return type

matplotlib.figure

plot_rhythm_spectrogram(figsize=(6, 6), cbar=False, title=None, smoothing_window=1, cmap='rocket')

Plot rhythm spectrogram

Plots the rhythm spectrogram of the given RhythmAnalysis object. This requires that the object already have a .rhythm_spectrogram attribute, which is created by running rhythm_analysis.make_rhythm_spectrogram().

Parameters
  • figsize (tuple, optional) – width and height of figure in inches, defaults to (6, 6)

  • cbar (bool, optional) – If True, the colorbar will be including in the figure. If False, it will be omitted. defaults to False

  • title (string, optional) – Plot’s title. If not specified, the bird ID associated with the RhythmAnalysis object will be used as the title. defaults to None

  • smoothing_window (int, optional) –

    Size of smoothing window in song files to apply to the rhythm spectrogram. A value of 1 results in no smoothing. Values greater than one result in the mean of smoothing_window spectra being displayed. Higher smoothing_window values can obscure rendition-to-rendition variability, but make it easier to see consistent harmonic patterns in the

    rhythm spectrogram, when present. defaults to 1

  • cmap (str, optional) – matplotlib color map name, defaults to ‘rocket’

Returns

figure of the rhythm spectrogram

Return type

matplotlib.figure

rhythm_spectrum_single_file(song_file_path, frame_length=3, derivative=True, padded_length=100000, max_frequency=30, hop_length=0.2, n_windows=3)

Calculate rhythm spectrum of a single .wav file

This calculates the rhythm spectrum of a single song file. Birds with typical mature song will have prominent harmonic banding in their rhythm spectra, whereas immature or otherwise variable birds will have energy more evenly spread across frequency bands.

The rhythm spectrum of a signal is a fourier transform of the derivative of the amplitude of that signal, however this doing this over the full .wav file won’t represent rhythms present within only smaller portions of the file. To overcome this, this function actually breaks the audio file into multiple windows, each frame_length in duration, separated by an offset of hop_length. From among the resulting windows, the top n_windows with the highest amplitude will be selected, as these likely contain the most song. The rhythm spectrum will be calculated for each of those windows, then averaged across them to produce a single rhythm spectrum representing the song contained in that file.

Parameters
  • song_file_path (str) – path to the .wav file to process.

  • frame_length (int, optional) – length in seconds of each frame within the .wav file over which to compute a spectrum, defaults to 3

  • derivative (bool, optional) – If True, the rhythm spectrum will be calculated on the first derivative of the amplitude. If False, the rhythm spectrum will be calculated directly on this amplitude. This generally results in all energy being concentrated at very low frequencies, making it difficult to see harmonic structure. defaults to True

  • padded_length (int, optional) – size in frames to which to pad the amplitude before calculating its spectrum. Larger values will result in smoother interpolation of points in the spectrum. defaults to 100000

  • max_frequency (int, optional) – maximum frequency in Hz to include in the rhythm spectrum, defaults to 30

  • hop_length (float, optional) – time between windows in seconds when windowing .wav file, defaults to 0.2

  • n_windows (int, optional) – number of windows over which to average the rhythm spectra when calculating the rhythm spectrum of a single file, defaults to 3

Returns

mean rhythm spectrum across windows in the .wav file. Index reflects the frequency in Hz.

Return type

pd.Series

class avn.timing.SegmentTiming(Bird_ID, syll_df, song_folder_path)

Bases: object

calc_gap_duration_entropy(max_gap=0.2)

Calculates entropy of gap duration distribution

Calculates the shannon entropy of the bird’s gap duration distribution. Results range from 0 to 1, with higher scores indicating less predictable gap durations, consistent with the songs of immature birds. Based on Goldberg & Fee, 2011.

Parameters

max_gap (float, optional) – maximum gap duration in seconds. This should be set such that it is longer than all gaps between syllables in a bout, but is shorter than the gaps between bouts. Defaults to 0.2

Returns

shannon entropy of the gap duration distribution

Return type

float

calc_syll_duration_entropy()

Calculate entropy of syllable duration distribution

Calculates the shannon entropy of the bird’s syllable duration distribution. Results range from 0 to 1, with higher scores indicating less predictable syllable durations, consistent with the songs of immature birds. Based on Goldberg & Fee, 2011.

Returns

shannon entropy of syllable duration distribution

Return type

float

get_gap_durations(max_gap=0.2)

Get gap durations in seconds

Create a gap_df attribute to your SegmentTiming instance with the onset, offset, and duration of all silent gaps in the segmentation provided by SegmentTiming.syll_df, excluding any gaps longer than max_gap.

Parameters

max_gap (float, optional) – maximum gap duration in seconds. This should be set such that it is longer than all gaps between syllables in a bout, but is shorter than the gaps between bouts. Defaults to 0.2

Returns

copy of SegmentTiming.gap_df, containing the durations of each gap in seconds

Return type

pd.DataFrame

get_syll_durations()

Get syllable durations in seconds

Adds a column to the SegmentTiming.syll_df dataframe with the duration of each syllable in seconds.

Returns

copy of SegmentTiming.syll_df with a new column, durations, containing the duration of each syllable in seconds.

Return type

pd.DataFrame

avn.similarity module