Back to Projects
Python

AI-Powered-Document-Search-and-Chatbot-Development

What the project does**

System Overview

What the project does

An interactive Q&A system that lets users upload PDF, DOCX, or TXT documents and ask natural‑language questions; the app extracts the text, creates semantic embeddings, stores them in a vector database, and generates answers using a locally‑run Mistral LLM via Ollama.

Key features

  • - Multi‑format document upload and text extraction (PDF, Word, plain text)
  • - Semantic embeddings with Sentence‑Transformers (`all-mpnet-base-v2`)
  • - Vector storage & similarity search using ChromaDB
  • - AI‑generated answers via Mistral LLM (Ollama)
  • - FastAPI backend with `/query` endpoint and health check
  • - Streamlit web UI for easy interaction
  • - Modular utilities for extraction, embedding, and querying
  • Tech stack

  • - Frontend: Streamlit
  • - Backend: FastAPI (Uvicorn)
  • - LLM: Mistral (hosted locally through Ollama)
  • - Vector DB: ChromaDB
  • - Embeddings: Sentence‑Transformers (`all-mpnet-base-v2`)
  • - Document parsing: PyMuPDF, python-docx
  • - Core libraries: LangChain, HuggingFace, Requests
  • Use case

    Enables businesses, researchers, or anyone with unstructured documents to quickly build a searchable knowledge base and chatbot that can answer domain‑specific questions without relying on external APIs.

    Architecture Details

    This system integrates multiple components for a seamless automation flow. Structural interpretation based on project focus:

    Backend Infrastructure

    Core execution layer for robust data processing and API handling.

    AI / Logic Core

    Intelligent decisioning via models or logical workflow rules.

    Tech Stack

    PythonIntegrationAutomationAPIs

    Key Capabilities

    • Custom workflow execution
    • Data transformation and routing
    • Extensible architecture