multimodal_fin.processing.multimodal.audio package

Submodules

multimodal_fin.processing.multimodal.audio.audio_emotion_analyzer module

class multimodal_fin.processing.multimodal.audio.audio_emotion_analyzer.AudioEmotionAnalyzer(mode='emotion2vec', device='cpu', model_name='iic/emotion2vec_plus_large')[source]

Bases: object

Extracts emotion-based audio embeddings or emotion classifications using a specified recognizer.

classify_audio(audio_path)[source]

Returns the top predicted emotion for a given audio file.

Return type:

str

classify_dataframe(df)[source]

Adds a ‘classification’ column to a DataFrame by predicting emotions from audio file paths.

Parameters:

df (DataFrame) – Must contain a ‘Path’ column with paths to audio files.

Returns:

The same DataFrame with a new ‘classification’ column.

Return type:

DataFrame

device: str = 'cpu'

The computation device to use (‘cuda’ or ‘cpu’).

get_embeddings(audio_path)[source]

Returns a centered logits vector representing emotional content from the given audio file.

The vector is ordered as:

[‘happy’, ‘neutral’, ‘surprise’, ‘disgust’, ‘anger’, ‘sadness’, ‘fear’]

Parameters:

audio_path (str) – Path to the audio file.

Returns:

Centered logits vector of emotion scores.

Return type:

Tensor

mode: str = 'emotion2vec'

The name of the recognition model type. Currently, only ‘emotion2vec’ is supported.

model_name: str = 'iic/emotion2vec_plus_large'

Name or path of the model to be loaded.

Module contents