Back to Projects - Multi‑format document upload and text extraction (PDF, Word, plain text) - Semantic embeddings with Sentence‑Transformers (`all-mpnet-base-v2`) - Vector storage & similarity search using ChromaDB - AI‑generated answers via Mistral LLM (Ollama) - FastAPI backend with `/query` endpoint and health check - Streamlit web UI for easy interaction - Modular utilities for extraction, embedding, and querying - Frontend: Streamlit - Backend: FastAPI (Uvicorn) - LLM: Mistral (hosted locally through Ollama) - Vector DB: ChromaDB - Embeddings: Sentence‑Transformers (`all-mpnet-base-v2`) - Document parsing: PyMuPDF, python-docx - Core libraries: LangChain, HuggingFace, Requests
System Overview
What the project does
An interactive Q&A system that lets users upload PDF, DOCX, or TXT documents and ask natural‑language questions; the app extracts the text, creates semantic embeddings, stores them in a vector database, and generates answers using a locally‑run Mistral LLM via Ollama.
Key features
Tech stack
Use case
Enables businesses, researchers, or anyone with unstructured documents to quickly build a searchable knowledge base and chatbot that can answer domain‑specific questions without relying on external APIs.
Architecture Details
This system integrates multiple components for a seamless automation flow. Structural interpretation based on project focus:
Backend Infrastructure
Core execution layer for robust data processing and API handling.
AI / Logic Core
Intelligent decisioning via models or logical workflow rules.
Tech Stack
PythonIntegrationAutomationAPIs
Key Capabilities
- ▹ Custom workflow execution
- ▹ Data transformation and routing
- ▹ Extensible architecture