multimodal_fin.processing package
Subpackages
- multimodal_fin.processing.metadata package
- Submodules
- multimodal_fin.processing.metadata.coherence_analyzer module
- multimodal_fin.processing.metadata.metadata_enricher module
- multimodal_fin.processing.metadata.prompt_builder module
- multimodal_fin.processing.metadata.qa_analyzer module
- multimodal_fin.processing.metadata.sec10k_analyzer module
- Module contents
- multimodal_fin.processing.multimodal package
- Subpackages
- Submodules
- multimodal_fin.processing.multimodal.embeddings_extractor module
- multimodal_fin.processing.multimodal.multimodal_embeddings module
MultimodalEmbeddingsMultimodalEmbeddings.audio_emotion_analyzerMultimodalEmbeddings.audio_file_pathMultimodalEmbeddings.cortar_audio_temporal()MultimodalEmbeddings.cortar_video_temporal()MultimodalEmbeddings.generar_embeddings()MultimodalEmbeddings.path_csvMultimodalEmbeddings.path_jsonMultimodalEmbeddings.text_emotion_analyzerMultimodalEmbeddings.video_emmotion_analyzer
dummy_npwarn_decorator_factory()
- Module contents
- multimodal_fin.processing.preprocessing package
- Submodules
- multimodal_fin.processing.preprocessing.ensemble_classifier module
EnsembleInterventionClassifierEnsembleInterventionClassifier.NUM_EVALUATIONSEnsembleInterventionClassifier.annotate_question_answer_pairs()EnsembleInterventionClassifier.classify_dataframe()EnsembleInterventionClassifier.ensemble_predict()EnsembleInterventionClassifier.monologue_model_namesEnsembleInterventionClassifier.qa_model_namesEnsembleInterventionClassifier.verbose
- multimodal_fin.processing.preprocessing.monologue_classifier module
- multimodal_fin.processing.preprocessing.preprocessor module
PreprocessorPreprocessor.divide_conference()Preprocessor.extract_qna_intro()Preprocessor.monologue_model_namesPreprocessor.num_evaluationsPreprocessor.process()Preprocessor.process_and_save()Preprocessor.qa_model_namesPreprocessor.qna_keyPreprocessor.section_colPreprocessor.text_colPreprocessor.verbose
- multimodal_fin.processing.preprocessing.qa_classifier module
- multimodal_fin.processing.preprocessing.transcript_preprocessor module
- Module contents
Submodules
multimodal_fin.processing.basics module
- class multimodal_fin.processing.basics.LLMClient(model, host='http://127.0.0.1:11500')[source]
Bases:
objectClient wrapper for interacting with Ollama models via the chat API.
- This class provides:
Automatic model name normalization.
Automatic model download if not available locally.
Configurable Ollama server host.
- chat(messages, schema=None)[source]
Send a list of messages to the model and retrieve the response.
- Parameters:
messages (
List[dict]) – List of message dictionaries in Ollama format.schema (
Optional[str]) – JSON schema to enforce structured responses.
- Returns:
The content string of the model’s response.
- Return type:
str
- host: str | None = 'http://127.0.0.1:11500'
- model: str
- class multimodal_fin.processing.basics.UncertaintyMixin[source]
Bases:
objectProvides uncertainty estimation via majority voting.
- get_result_and_uncertainty(predict_fn, text, n=5)[source]
Estimates category and confidence using majority voting.
- Parameters:
predict_fn (
Callable[[str],str]) – Prediction function to apply repeatedly.text (
str) – The input text to classify.n (
int) – Number of evaluations to perform.
- Returns:
The most frequent predicted category.
Confidence score as percentage.
- Return type:
Tuple[str,float]
multimodal_fin.processing.pipeline module
- class multimodal_fin.processing.pipeline.ConferencePipeline(settings)[source]
Bases:
objectOrchestrates the full processing pipeline for a financial conference folder:
- Steps performed:
Preprocessing of the transcript and section segmentation.
Text classification and question-answer (Q&A) annotation.
Multimodal embedding extraction (text, audio, video).
Metadata enrichment using LLMs (topics, Q&A analysis, coherence).
Result persistence in CSV and enriched JSON format.
multimodal_fin.processing.processor module
- class multimodal_fin.processing.processor.Processor(sec10k_model_names, qa_analyzer_models, audio_model_name=None, text_model_name=None, video_model_name=None, num_evaluations=5, device='cpu', verbose=1)[source]
Bases:
object- Orchestrates the multimodal analysis pipeline in two main steps:
Embedding extraction for audio, text, and video.
Metadata enrichment (QA analysis, coherence, topics).
JSON serialization of enriched output.
- process_and_save(input_csv_path, original_dir, output_json_path)[source]
Executes the full multimodal pipeline and writes enriched results to a JSON file.
- Parameters:
input_csv_path (
str) – Path to classified interventions CSV.original_dir (
Path) – Directory containing LEVEL_3.json and audio/video files.output_json_path (
str) – Destination path for saving the final JSON.
- Return type:
dict- Returns:
A dictionary containing the enriched multimodal results.