multimodal_fin.processing.multimodal package

Subpackages

Submodules

multimodal_fin.processing.multimodal.embeddings_extractor module

class multimodal_fin.processing.multimodal.embeddings_extractor.EmbeddingsExtractor(audio_model_name=None, text_model_name=None, video_model_name=None, device='cpu', verbose=1)[source]

Bases: object

Extracts multimodal emotion embeddings from a CSV of conference interventions.

The extractor supports:

Audio emotion embeddings via AudioEmotionAnalyzer.
Text emotion embeddings via TextEmotionAnalyzer.
Video emotion embeddings via VideoEmotionAnalyzer.

audio_model_name: str | None = None: Name of the model used for audio emotion recognition.

device: str = 'cpu': Computation device (e.g., ‘cpu’, ‘cuda’).

extract(csv_path, original_dir)[source]

Loads the classified interventions CSV and computes multimodal embeddings.

Parameters:

csv_path (str) – Path to the CSV file containing interventions.
original_dir (str) – Directory with associated media files and metadata (LEVEL_3.json, audio.mp3).

Return type:

DataFrame

Returns:

A pandas DataFrame with added columns for each modality’s embeddings.

text_model_name: str | None = None: Name of the model used for text emotion recognition.

verbose: int = 1: Verbosity level for logging.

video_model_name: str | None = None: Name of the model used for video emotion recognition.

multimodal_fin.processing.multimodal.multimodal_embeddings module

class multimodal_fin.processing.multimodal.multimodal_embeddings.MultimodalEmbeddings(path_csv, path_json, audio_file_path, audio_emotion_analyzer=None, text_emotion_analyzer=None, video_emmotion_analyzer=None)[source]

Bases: object

Generates multimodal emotion-based embeddings (audio, text, video) from transcript data.

audio_emotion_analyzer: AudioEmotionAnalyzer | None = None: Audio emotion embedding model.

audio_file_path: str: Path to the full audio file.

cortar_audio_temporal(start_time, end_time)[source]

Cuts a segment of the full MP3 audio and returns it as a temporary WAV file.

Parameters:

start_time (int) – Start time in seconds.
end_time (int) – End time in seconds.

Return type:

Optional[NamedTemporaryFile]

Returns:

A temporary WAV file or None on error.

cortar_video_temporal(start_time, end_time)[source]: Placeholder for future video cutting implementation.

generar_embeddings()[source]

Generates audio, text, and video embeddings for each sentence in the transcript.

Returns:: audio_embedding, text_embedding, video_embedding.
Return type:: DataFrame

path_csv: str: Path to the classified CSV containing interventions.

path_json: str: Path to the LEVEL_3 JSON with temporal information.

text_emotion_analyzer: TextEmotionAnalyzer | None = None: Text emotion embedding model.

video_emmotion_analyzer: VideoEmotionAnalyzer | None = None: Video emotion embedding model.

multimodal_fin.processing.multimodal.multimodal_embeddings.dummy_npwarn_decorator_factory()[source]

multimodal_fin.processing.multimodal package

Subpackages

Submodules

multimodal_fin.processing.multimodal.embeddings_extractor module

multimodal_fin.processing.multimodal.multimodal_embeddings module

Module contents