multimodal_fin.processing package

Subpackages

Submodules

multimodal_fin.processing.basics module

class multimodal_fin.processing.basics.LLMClient(model, host='http://127.0.0.1:11500')[source]

Bases: object

Client wrapper for interacting with Ollama models via the chat API.

This class provides:

Automatic model name normalization.
Automatic model download if not available locally.
Configurable Ollama server host.

chat(messages, schema=None)[source]

Send a list of messages to the model and retrieve the response.

Parameters:

messages (List[dict]) – List of message dictionaries in Ollama format.
schema (Optional[str]) – JSON schema to enforce structured responses.

Returns:

The content string of the model’s response.

Return type:

str

host: str | None = 'http://127.0.0.1:11500'

model: str

class multimodal_fin.processing.basics.UncertaintyMixin[source]

Bases: object

Provides uncertainty estimation via majority voting.

get_result_and_uncertainty(predict_fn, text, n=5)[source]

Estimates category and confidence using majority voting.

Parameters:

predict_fn (Callable[[str], str]) – Prediction function to apply repeatedly.
text (str) – The input text to classify.
n (int) – Number of evaluations to perform.

Returns:

The most frequent predicted category.
Confidence score as percentage.

Return type:

Tuple[str, float]

multimodal_fin.processing.pipeline module

class multimodal_fin.processing.pipeline.ConferencePipeline(settings)[source]

Bases: object

Orchestrates the full processing pipeline for a financial conference folder:

Steps performed:

Preprocessing of the transcript and section segmentation.
Text classification and question-answer (Q&A) annotation.
Multimodal embedding extraction (text, audio, video).
Metadata enrichment using LLMs (topics, Q&A analysis, coherence).
Result persistence in CSV and enriched JSON format.

run()[source]

Run the processing pipeline on each conference folder path defined in the input CSV.

Return type:: None

multimodal_fin.processing.processor module

class multimodal_fin.processing.processor.Processor(sec10k_model_names, qa_analyzer_models, audio_model_name=None, text_model_name=None, video_model_name=None, num_evaluations=5, device='cpu', verbose=1)[source]

Bases: object

Orchestrates the multimodal analysis pipeline in two main steps:

Embedding extraction for audio, text, and video.
Metadata enrichment (QA analysis, coherence, topics).
JSON serialization of enriched output.

process_and_save(input_csv_path, original_dir, output_json_path)[source]

Executes the full multimodal pipeline and writes enriched results to a JSON file.

Parameters:

input_csv_path (str) – Path to classified interventions CSV.
original_dir (Path) – Directory containing LEVEL_3.json and audio/video files.
output_json_path (str) – Destination path for saving the final JSON.

Return type:

dict

Returns:

A dictionary containing the enriched multimodal results.

multimodal_fin.processing package

Subpackages

Submodules

multimodal_fin.processing.basics module

multimodal_fin.processing.pipeline module

multimodal_fin.processing.processor module

Module contents