multimodal_fin.processing.multimodal.audio package
Submodules
multimodal_fin.processing.multimodal.audio.audio_emotion_analyzer module
- class multimodal_fin.processing.multimodal.audio.audio_emotion_analyzer.AudioEmotionAnalyzer(mode='emotion2vec', device='cpu', model_name='iic/emotion2vec_plus_large')[source]
Bases:
objectExtracts emotion-based audio embeddings or emotion classifications using a specified recognizer.
- classify_audio(audio_path)[source]
Returns the top predicted emotion for a given audio file.
- Return type:
str
- classify_dataframe(df)[source]
Adds a ‘classification’ column to a DataFrame by predicting emotions from audio file paths.
- Parameters:
df (
DataFrame) – Must contain a ‘Path’ column with paths to audio files.- Returns:
The same DataFrame with a new ‘classification’ column.
- Return type:
DataFrame
- device: str = 'cpu'
The computation device to use (‘cuda’ or ‘cpu’).
- get_embeddings(audio_path)[source]
Returns a centered logits vector representing emotional content from the given audio file.
- The vector is ordered as:
[‘happy’, ‘neutral’, ‘surprise’, ‘disgust’, ‘anger’, ‘sadness’, ‘fear’]
- Parameters:
audio_path (
str) – Path to the audio file.- Returns:
Centered logits vector of emotion scores.
- Return type:
Tensor
- mode: str = 'emotion2vec'
The name of the recognition model type. Currently, only ‘emotion2vec’ is supported.
- model_name: str = 'iic/emotion2vec_plus_large'
Name or path of the model to be loaded.