multimodal_fin.utils package

Submodules

multimodal_fin.utils.cli module

CLI-related utility functions for validating user input in multimodal_fin.

This module provides validation logic for CLI arguments, specifically used when embedding enriched JSON files.

multimodal_fin.utils.cli.validate_embed_inputs(json_path=None, json_csv=None)[source]

Validate and resolve the JSON file paths to be embedded.

This function ensures that exactly one of the two options is provided: either a single JSON path or a CSV file containing multiple paths.

Parameters:
  • json_path (Optional[Path]) – Path to a single enriched transcript JSON file.

  • json_csv (Optional[Path]) – Path to a CSV containing a ‘Paths’ column.

Returns:

A list of file paths to be processed.

Return type:

List[str]

Raises:

typer.Exit – If both or neither arguments are provided, or if CSV reading fails.

multimodal_fin.utils.files module

File and path utilities for the multimodal_fin package.

Includes reusable helpers for reading CSV and JSON files, and locating specific files within conference directories.

multimodal_fin.utils.files.find_audio_file(directory)[source]

Locate the first audio file in a directory (supports mp3, wav, flac).

Parameters:

directory (Path) – Directory to search in.

Returns:

Full path to the first found audio file.

Return type:

Path

Raises:

FileNotFoundError – If no supported audio file is found.

multimodal_fin.utils.files.find_level3_json(directory)[source]

Locate a LEVEL_3.json file in a given directory.

Parameters:

directory (Path) – Directory to search in.

Returns:

Full path to the LEVEL_3.json file.

Return type:

Path

Raises:

FileNotFoundError – If the file is not found.

multimodal_fin.utils.files.make_processed_path(original)[source]

Generate the processed output path from an original conference path.

If the original path contains a folder named ‘companies’, it will be replaced with ‘processed_companies’. Otherwise, the method appends ‘_processed’ to the directory name under the same parent.

Parameters:

original (Path) – Original input directory path.

Returns:

Transformed path pointing to processed data.

Return type:

Path

multimodal_fin.utils.files.read_json_file(json_path)[source]

Read a JSON file and return its parsed content.

Parameters:

json_path (Path) – Full path to a JSON file.

Returns:

Parsed JSON content (usually a dict or list).

Return type:

Any

Raises:
  • FileNotFoundError – If the file does not exist.

  • json.JSONDecodeError – If the file content is not valid JSON.

multimodal_fin.utils.files.read_paths_csv(csv_path)[source]

Read a CSV file with a ‘path’ column and return a list of valid paths.

Parameters:

csv_path (str) – Path to the CSV file containing a ‘path’ column.

Returns:

List of paths to directories or files.

Return type:

List[str]

Raises:

ValueError – If the ‘path’ column is missing from the CSV.

multimodal_fin.utils.logging module

Logging utilities for the multimodal_fin package.

Provides a standardized logger with timestamps and log levels.

multimodal_fin.utils.logging.get_logger(name)[source]

Create and configure a logger for a given module or component.

Ensures a consistent logging format and level across the package.

Parameters:

name (str) – Name of the logger (typically use __name__).

Returns:

A configured logger instance.

Return type:

Logger

multimodal_fin.utils.logging.log_ensemble_prediction(model_outputs, final_label, final_confidence, logger=None)[source]

Logs the ensemble classification results in a clean, readable format.

Parameters:
  • model_outputs (List[Tuple[str, str, float]]) – Tuples (model_name, predicted_label, confidence).

  • final_label (str) – Final combined prediction label.

  • final_confidence (float) – Final confidence (0 to 100).

  • logger (Optional[Logger]) – Logger to use. If None, uses default package logger.

Module contents