multimodal_fin.processing.multimodal.video package

Submodules

multimodal_fin.processing.multimodal.video.analyzer module

class multimodal_fin.processing.multimodal.video.analyzer.EmotionVideoAnalyzer(recognizer, face_detector, processor)[source]

Bases: object

Coordinates the facial emotion analysis from video frames.

analyze_video(video_path)[source]

Extracts frames and runs analysis on a given video file.

Parameters:

video_path (str) – Path to the video file.

Returns:

DataFrame of emotion probabilities.

Return type:

DataFrame

analyze_video_frames(frames)[source]

Analyzes each frame to detect faces and classify their emotions.

Parameters:

frames (List) – A list of video frames (np.ndarray or PIL-compatible).

Returns:

Frame-wise emotion probabilities.

Return type:

DataFrame

face_detector: FaceDetector

Face detector to crop faces from frames.

processor: VideoProcessor

Frame processor for sampling frames from video.

recognizer: EmotionRecognizer

Model-specific emotion recognizer.

multimodal_fin.processing.multimodal.video.face_detector module

class multimodal_fin.processing.multimodal.video.face_detector.FaceDetector(device='cpu')[source]

Bases: object

Detects and crops faces from input images or video frames using MTCNN.

detect_faces(image)[source]

Detects a single face in the given PIL image.

Parameters:

image (Image) – Input PIL image.

Returns:

Cropped face image or None.

Return type:

Optional[Image]

device: str = 'cpu'
mtcnn: MTCNN
recognize_faces(frame)[source]

Detects multiple faces in a video frame.

Parameters:

frame (ndarray) – Input frame (BGR or RGB format).

Returns:

List of cropped face arrays.

Return type:

List[ndarray]

multimodal_fin.processing.multimodal.video.processor module

class multimodal_fin.processing.multimodal.video.processor.VideoProcessor(skips=0.1)[source]

Bases: object

Handles video processing tasks like frame extraction and frame sampling.

extract_frames(video_path)[source]

Extracts frames from a video file. Implementation is placeholder.

Parameters:

video_path (str) – Path to the video file.

Returns:

List of video frames (to be implemented).

Return type:

list

reduce_video_frames(video_data)[source]

Reduces the number of frames in a video by uniform sampling.

Parameters:

video_data (list) – List of all video frames as numpy arrays.

Returns:

Sampled list of frames.

Return type:

list

Raises:

ValueError – If skips is not within the valid range or results in zero frames.

skips: float = 0.1

Proportion of frames to retain from the video (0 < skips <= 1).

multimodal_fin.processing.multimodal.video.video_emotion_analyzer module

class multimodal_fin.processing.multimodal.video.video_emotion_analyzer.VideoEmotionAnalyzer(mode, skips=0.1, method='mode', device='cpu', emotieff_model='enet_b0_8_best_afew')[source]

Bases: object

High-level video emotion classification pipeline.

This class orchestrates the process of detecting faces, recognizing emotions per frame, and aggregating predictions to produce a single dominant emotion for the full video.

analyze_video(video_path)[source]

Runs emotion recognition on a full video.

Parameters:

video_path (str) – Path to the video file.

Returns:

Predicted dominant emotion.

Return type:

str

classify_dataframe(df)[source]

Applies video emotion classification to each path in a DataFrame.

Parameters:

df (DataFrame) – Must contain a ‘Path’ column with video paths.

Returns:

Same DataFrame with an added ‘classification’ column.

Return type:

DataFrame

device: str = 'cpu'

Device to use (‘cuda’ or ‘cpu’).

emotieff_model: str = 'enet_b0_8_best_afew'

Model name for EmotiEffRecognizer.

get_aggregated_prediction(df)[source]

Aggregates frame-level predictions using the selected strategy.

Parameters:

df (DataFrame) – DataFrame of frame-wise emotion probabilities.

Returns:

Final predicted emotion.

Return type:

str

get_embeddings(video_path)[source]

Placeholder for extracting emotion embeddings from video.

Parameters:

video_path (str) – Path to video file.

Returns:

Emotion embedding (future implementation).

Return type:

torch.Tensor

method: str = 'mode'

Aggregation strategy (‘mode’, ‘mean’, ‘abs’).

mode: str

Recognition model type (‘vit’, ‘fer’, ‘emotieff’).

skips: float = 0.1

Fraction of frames to process.

swap_disgust_fear(emotion)[source]

Optionally swaps ‘disgust’ and ‘fear’ to align with common misclassifications.

Parameters:

emotion (str) – The predicted emotion.

Returns:

Possibly corrected emotion.

Return type:

str

Module contents