multimodal_fin.processing.multimodal package

Subpackages

Submodules

multimodal_fin.processing.multimodal.embeddings_extractor module

class multimodal_fin.processing.multimodal.embeddings_extractor.EmbeddingsExtractor(audio_model_name=None, text_model_name=None, video_model_name=None, device='cpu', verbose=1)[source]

Bases: object

Extracts multimodal emotion embeddings from a CSV of conference interventions.

The extractor supports:
  • Audio emotion embeddings via AudioEmotionAnalyzer.

  • Text emotion embeddings via TextEmotionAnalyzer.

  • Video emotion embeddings via VideoEmotionAnalyzer.

audio_model_name: str | None = None

Name of the model used for audio emotion recognition.

device: str = 'cpu'

Computation device (e.g., ‘cpu’, ‘cuda’).

extract(csv_path, original_dir)[source]

Loads the classified interventions CSV and computes multimodal embeddings.

Parameters:
  • csv_path (str) – Path to the CSV file containing interventions.

  • original_dir (str) – Directory with associated media files and metadata (LEVEL_3.json, audio.mp3).

Return type:

DataFrame

Returns:

A pandas DataFrame with added columns for each modality’s embeddings.

text_model_name: str | None = None

Name of the model used for text emotion recognition.

verbose: int = 1

Verbosity level for logging.

video_model_name: str | None = None

Name of the model used for video emotion recognition.

multimodal_fin.processing.multimodal.multimodal_embeddings module

class multimodal_fin.processing.multimodal.multimodal_embeddings.MultimodalEmbeddings(path_csv, path_json, audio_file_path, audio_emotion_analyzer=None, text_emotion_analyzer=None, video_emmotion_analyzer=None)[source]

Bases: object

Generates multimodal emotion-based embeddings (audio, text, video) from transcript data.

audio_emotion_analyzer: AudioEmotionAnalyzer | None = None

Audio emotion embedding model.

audio_file_path: str

Path to the full audio file.

cortar_audio_temporal(start_time, end_time)[source]

Cuts a segment of the full MP3 audio and returns it as a temporary WAV file.

Parameters:
  • start_time (int) – Start time in seconds.

  • end_time (int) – End time in seconds.

Return type:

Optional[NamedTemporaryFile]

Returns:

A temporary WAV file or None on error.

cortar_video_temporal(start_time, end_time)[source]

Placeholder for future video cutting implementation.

generar_embeddings()[source]

Generates audio, text, and video embeddings for each sentence in the transcript.

Returns:

audio_embedding, text_embedding, video_embedding.

Return type:

DataFrame

path_csv: str

Path to the classified CSV containing interventions.

path_json: str

Path to the LEVEL_3 JSON with temporal information.

text_emotion_analyzer: TextEmotionAnalyzer | None = None

Text emotion embedding model.

video_emmotion_analyzer: VideoEmotionAnalyzer | None = None

Video emotion embedding model.

multimodal_fin.processing.multimodal.multimodal_embeddings.dummy_npwarn_decorator_factory()[source]

Module contents