multimodal_fin.processing package

Subpackages

Submodules

multimodal_fin.processing.basics module

class multimodal_fin.processing.basics.LLMClient(model, host='http://127.0.0.1:11500')[source]

Bases: object

Client wrapper for interacting with Ollama models via the chat API.

This class provides:
  • Automatic model name normalization.

  • Automatic model download if not available locally.

  • Configurable Ollama server host.

chat(messages, schema=None)[source]

Send a list of messages to the model and retrieve the response.

Parameters:
  • messages (List[dict]) – List of message dictionaries in Ollama format.

  • schema (Optional[str]) – JSON schema to enforce structured responses.

Returns:

The content string of the model’s response.

Return type:

str

host: str | None = 'http://127.0.0.1:11500'
model: str
class multimodal_fin.processing.basics.UncertaintyMixin[source]

Bases: object

Provides uncertainty estimation via majority voting.

get_result_and_uncertainty(predict_fn, text, n=5)[source]

Estimates category and confidence using majority voting.

Parameters:
  • predict_fn (Callable[[str], str]) – Prediction function to apply repeatedly.

  • text (str) – The input text to classify.

  • n (int) – Number of evaluations to perform.

Returns:

  • The most frequent predicted category.

  • Confidence score as percentage.

Return type:

Tuple[str, float]

multimodal_fin.processing.pipeline module

class multimodal_fin.processing.pipeline.ConferencePipeline(settings)[source]

Bases: object

Orchestrates the full processing pipeline for a financial conference folder:

Steps performed:
  1. Preprocessing of the transcript and section segmentation.

  2. Text classification and question-answer (Q&A) annotation.

  3. Multimodal embedding extraction (text, audio, video).

  4. Metadata enrichment using LLMs (topics, Q&A analysis, coherence).

  5. Result persistence in CSV and enriched JSON format.

run()[source]

Run the processing pipeline on each conference folder path defined in the input CSV.

Return type:

None

multimodal_fin.processing.processor module

class multimodal_fin.processing.processor.Processor(sec10k_model_names, qa_analyzer_models, audio_model_name=None, text_model_name=None, video_model_name=None, num_evaluations=5, device='cpu', verbose=1)[source]

Bases: object

Orchestrates the multimodal analysis pipeline in two main steps:
  1. Embedding extraction for audio, text, and video.

  2. Metadata enrichment (QA analysis, coherence, topics).

  3. JSON serialization of enriched output.

process_and_save(input_csv_path, original_dir, output_json_path)[source]

Executes the full multimodal pipeline and writes enriched results to a JSON file.

Parameters:
  • input_csv_path (str) – Path to classified interventions CSV.

  • original_dir (Path) – Directory containing LEVEL_3.json and audio/video files.

  • output_json_path (str) – Destination path for saving the final JSON.

Return type:

dict

Returns:

A dictionary containing the enriched multimodal results.

Module contents