multimodal_fin.processing.multimodal.video package

Submodules

multimodal_fin.processing.multimodal.video.analyzer module

class multimodal_fin.processing.multimodal.video.analyzer.EmotionVideoAnalyzer(recognizer, face_detector, processor)[source]

Bases: object

Coordinates the facial emotion analysis from video frames.

analyze_video(video_path)[source]

Extracts frames and runs analysis on a given video file.

Parameters:: video_path (str) – Path to the video file.
Returns:: DataFrame of emotion probabilities.
Return type:: DataFrame

analyze_video_frames(frames)[source]

Analyzes each frame to detect faces and classify their emotions.

Parameters:: frames (List) – A list of video frames (np.ndarray or PIL-compatible).
Returns:: Frame-wise emotion probabilities.
Return type:: DataFrame

face_detector: FaceDetector: Face detector to crop faces from frames.

processor: VideoProcessor: Frame processor for sampling frames from video.

recognizer: EmotionRecognizer: Model-specific emotion recognizer.

multimodal_fin.processing.multimodal.video.face_detector module

class multimodal_fin.processing.multimodal.video.face_detector.FaceDetector(device='cpu')[source]

Bases: object

Detects and crops faces from input images or video frames using MTCNN.

detect_faces(image)[source]

Detects a single face in the given PIL image.

Parameters:: image (Image) – Input PIL image.
Returns:: Cropped face image or None.
Return type:: Optional[Image]

device: str = 'cpu'

mtcnn: MTCNN

recognize_faces(frame)[source]

Detects multiple faces in a video frame.

Parameters:: frame (ndarray) – Input frame (BGR or RGB format).
Returns:: List of cropped face arrays.
Return type:: List[ndarray]

multimodal_fin.processing.multimodal.video.processor module

class multimodal_fin.processing.multimodal.video.processor.VideoProcessor(skips=0.1)[source]

Bases: object

Handles video processing tasks like frame extraction and frame sampling.

extract_frames(video_path)[source]

Extracts frames from a video file. Implementation is placeholder.

Parameters:: video_path (str) – Path to the video file.
Returns:: List of video frames (to be implemented).
Return type:: list

reduce_video_frames(video_data)[source]

Reduces the number of frames in a video by uniform sampling.

Parameters:: video_data (list) – List of all video frames as numpy arrays.
Returns:: Sampled list of frames.
Return type:: list
Raises:: ValueError – If skips is not within the valid range or results in zero frames.

skips: float = 0.1: Proportion of frames to retain from the video (0 < skips <= 1).

multimodal_fin.processing.multimodal.video.video_emotion_analyzer module

class multimodal_fin.processing.multimodal.video.video_emotion_analyzer.VideoEmotionAnalyzer(mode, skips=0.1, method='mode', device='cpu', emotieff_model='enet_b0_8_best_afew')[source]

Bases: object

High-level video emotion classification pipeline.

This class orchestrates the process of detecting faces, recognizing emotions per frame, and aggregating predictions to produce a single dominant emotion for the full video.

analyze_video(video_path)[source]

Runs emotion recognition on a full video.

Parameters:: video_path (str) – Path to the video file.
Returns:: Predicted dominant emotion.
Return type:: str

classify_dataframe(df)[source]

Applies video emotion classification to each path in a DataFrame.

Parameters:: df (DataFrame) – Must contain a ‘Path’ column with video paths.
Returns:: Same DataFrame with an added ‘classification’ column.
Return type:: DataFrame

device: str = 'cpu': Device to use (‘cuda’ or ‘cpu’).

emotieff_model: str = 'enet_b0_8_best_afew': Model name for EmotiEffRecognizer.

get_aggregated_prediction(df)[source]

Aggregates frame-level predictions using the selected strategy.

Parameters:: df (DataFrame) – DataFrame of frame-wise emotion probabilities.
Returns:: Final predicted emotion.
Return type:: str

get_embeddings(video_path)[source]

Placeholder for extracting emotion embeddings from video.

Parameters:: video_path (str) – Path to video file.
Returns:: Emotion embedding (future implementation).
Return type:: torch.Tensor

method: str = 'mode': Aggregation strategy (‘mode’, ‘mean’, ‘abs’).

mode: str: Recognition model type (‘vit’, ‘fer’, ‘emotieff’).

skips: float = 0.1: Fraction of frames to process.

swap_disgust_fear(emotion)[source]

Optionally swaps ‘disgust’ and ‘fear’ to align with common misclassifications.

Parameters:: emotion (str) – The predicted emotion.
Returns:: Possibly corrected emotion.
Return type:: str

multimodal_fin.processing.multimodal.video package

Submodules

multimodal_fin.processing.multimodal.video.analyzer module

multimodal_fin.processing.multimodal.video.face_detector module

multimodal_fin.processing.multimodal.video.processor module

multimodal_fin.processing.multimodal.video.video_emotion_analyzer module

Module contents