multimodal_fin.processing.multimodal package
Subpackages
- multimodal_fin.processing.multimodal.audio package
- multimodal_fin.processing.multimodal.text package
- multimodal_fin.processing.multimodal.video package
- Submodules
- multimodal_fin.processing.multimodal.video.analyzer module
- multimodal_fin.processing.multimodal.video.face_detector module
- multimodal_fin.processing.multimodal.video.processor module
- multimodal_fin.processing.multimodal.video.video_emotion_analyzer module
VideoEmotionAnalyzerVideoEmotionAnalyzer.analyze_video()VideoEmotionAnalyzer.classify_dataframe()VideoEmotionAnalyzer.deviceVideoEmotionAnalyzer.emotieff_modelVideoEmotionAnalyzer.get_aggregated_prediction()VideoEmotionAnalyzer.get_embeddings()VideoEmotionAnalyzer.methodVideoEmotionAnalyzer.modeVideoEmotionAnalyzer.skipsVideoEmotionAnalyzer.swap_disgust_fear()
- Module contents
Submodules
multimodal_fin.processing.multimodal.embeddings_extractor module
- class multimodal_fin.processing.multimodal.embeddings_extractor.EmbeddingsExtractor(audio_model_name=None, text_model_name=None, video_model_name=None, device='cpu', verbose=1)[source]
Bases:
objectExtracts multimodal emotion embeddings from a CSV of conference interventions.
- The extractor supports:
Audio emotion embeddings via AudioEmotionAnalyzer.
Text emotion embeddings via TextEmotionAnalyzer.
Video emotion embeddings via VideoEmotionAnalyzer.
- audio_model_name: str | None = None
Name of the model used for audio emotion recognition.
- device: str = 'cpu'
Computation device (e.g., ‘cpu’, ‘cuda’).
- extract(csv_path, original_dir)[source]
Loads the classified interventions CSV and computes multimodal embeddings.
- Parameters:
csv_path (
str) – Path to the CSV file containing interventions.original_dir (
str) – Directory with associated media files and metadata (LEVEL_3.json, audio.mp3).
- Return type:
DataFrame- Returns:
A pandas DataFrame with added columns for each modality’s embeddings.
- text_model_name: str | None = None
Name of the model used for text emotion recognition.
- verbose: int = 1
Verbosity level for logging.
- video_model_name: str | None = None
Name of the model used for video emotion recognition.
multimodal_fin.processing.multimodal.multimodal_embeddings module
- class multimodal_fin.processing.multimodal.multimodal_embeddings.MultimodalEmbeddings(path_csv, path_json, audio_file_path, audio_emotion_analyzer=None, text_emotion_analyzer=None, video_emmotion_analyzer=None)[source]
Bases:
objectGenerates multimodal emotion-based embeddings (audio, text, video) from transcript data.
- audio_emotion_analyzer: AudioEmotionAnalyzer | None = None
Audio emotion embedding model.
- audio_file_path: str
Path to the full audio file.
- cortar_audio_temporal(start_time, end_time)[source]
Cuts a segment of the full MP3 audio and returns it as a temporary WAV file.
- Parameters:
start_time (
int) – Start time in seconds.end_time (
int) – End time in seconds.
- Return type:
Optional[NamedTemporaryFile]- Returns:
A temporary WAV file or None on error.
- cortar_video_temporal(start_time, end_time)[source]
Placeholder for future video cutting implementation.
- generar_embeddings()[source]
Generates audio, text, and video embeddings for each sentence in the transcript.
- Returns:
audio_embedding, text_embedding, video_embedding.
- Return type:
DataFrame
- path_csv: str
Path to the classified CSV containing interventions.
- path_json: str
Path to the LEVEL_3 JSON with temporal information.
- text_emotion_analyzer: TextEmotionAnalyzer | None = None
Text emotion embedding model.
- video_emmotion_analyzer: VideoEmotionAnalyzer | None = None
Video emotion embedding model.