multimodal_fin.processing.multimodal.video package
Submodules
multimodal_fin.processing.multimodal.video.analyzer module
- class multimodal_fin.processing.multimodal.video.analyzer.EmotionVideoAnalyzer(recognizer, face_detector, processor)[source]
Bases:
objectCoordinates the facial emotion analysis from video frames.
- analyze_video(video_path)[source]
Extracts frames and runs analysis on a given video file.
- Parameters:
video_path (
str) – Path to the video file.- Returns:
DataFrame of emotion probabilities.
- Return type:
DataFrame
- analyze_video_frames(frames)[source]
Analyzes each frame to detect faces and classify their emotions.
- Parameters:
frames (
List) – A list of video frames (np.ndarray or PIL-compatible).- Returns:
Frame-wise emotion probabilities.
- Return type:
DataFrame
- face_detector: FaceDetector
Face detector to crop faces from frames.
- processor: VideoProcessor
Frame processor for sampling frames from video.
- recognizer: EmotionRecognizer
Model-specific emotion recognizer.
multimodal_fin.processing.multimodal.video.face_detector module
- class multimodal_fin.processing.multimodal.video.face_detector.FaceDetector(device='cpu')[source]
Bases:
objectDetects and crops faces from input images or video frames using MTCNN.
- detect_faces(image)[source]
Detects a single face in the given PIL image.
- Parameters:
image (
Image) – Input PIL image.- Returns:
Cropped face image or None.
- Return type:
Optional[Image]
- device: str = 'cpu'
- mtcnn: MTCNN
multimodal_fin.processing.multimodal.video.processor module
- class multimodal_fin.processing.multimodal.video.processor.VideoProcessor(skips=0.1)[source]
Bases:
objectHandles video processing tasks like frame extraction and frame sampling.
- extract_frames(video_path)[source]
Extracts frames from a video file. Implementation is placeholder.
- Parameters:
video_path (
str) – Path to the video file.- Returns:
List of video frames (to be implemented).
- Return type:
list
- reduce_video_frames(video_data)[source]
Reduces the number of frames in a video by uniform sampling.
- Parameters:
video_data (
list) – List of all video frames as numpy arrays.- Returns:
Sampled list of frames.
- Return type:
list- Raises:
ValueError – If skips is not within the valid range or results in zero frames.
- skips: float = 0.1
Proportion of frames to retain from the video (0 < skips <= 1).
multimodal_fin.processing.multimodal.video.video_emotion_analyzer module
- class multimodal_fin.processing.multimodal.video.video_emotion_analyzer.VideoEmotionAnalyzer(mode, skips=0.1, method='mode', device='cpu', emotieff_model='enet_b0_8_best_afew')[source]
Bases:
objectHigh-level video emotion classification pipeline.
This class orchestrates the process of detecting faces, recognizing emotions per frame, and aggregating predictions to produce a single dominant emotion for the full video.
- analyze_video(video_path)[source]
Runs emotion recognition on a full video.
- Parameters:
video_path (
str) – Path to the video file.- Returns:
Predicted dominant emotion.
- Return type:
str
- classify_dataframe(df)[source]
Applies video emotion classification to each path in a DataFrame.
- Parameters:
df (
DataFrame) – Must contain a ‘Path’ column with video paths.- Returns:
Same DataFrame with an added ‘classification’ column.
- Return type:
DataFrame
- device: str = 'cpu'
Device to use (‘cuda’ or ‘cpu’).
- emotieff_model: str = 'enet_b0_8_best_afew'
Model name for EmotiEffRecognizer.
- get_aggregated_prediction(df)[source]
Aggregates frame-level predictions using the selected strategy.
- Parameters:
df (
DataFrame) – DataFrame of frame-wise emotion probabilities.- Returns:
Final predicted emotion.
- Return type:
str
- get_embeddings(video_path)[source]
Placeholder for extracting emotion embeddings from video.
- Parameters:
video_path (
str) – Path to video file.- Returns:
Emotion embedding (future implementation).
- Return type:
torch.Tensor
- method: str = 'mode'
Aggregation strategy (‘mode’, ‘mean’, ‘abs’).
- mode: str
Recognition model type (‘vit’, ‘fer’, ‘emotieff’).
- skips: float = 0.1
Fraction of frames to process.