Transforming Presentations Blog

Written by Ashnik Team

| Mar 13, 2025

3 min read

Transforming Presentations and Images into Smart Q&A Systems with Python and AI

Introduction

Have you ever been stuck searching for that one critical slide buried deep inside a hundred PowerPoint presentations or images of whiteboard notes? Imagine a system where you can ask a question and instantly get the right answer, extracted from your presentations and images.

In this article, I’ll walk you through how I built an AI-powered Question Answering (QA) and Document Retrieval system using Python, Transformers, Sentence Transformers, and Elasticsearch. This solution extracts knowledge from PowerPoint files and images, embeds them into a searchable database, and allows you to query it intelligently using extractive and generative AI pipelines.

The Problem We’re Solving

Organizations generate a huge volume of documents—presentations, whiteboard snapshots, and scanned notes. Retrieving valuable information from these scattered resources is often tedious and inefficient.

Solution? Build an intelligent pipeline that:

  • Extracts text from PowerPoint and image files.
  • Embeds and indexes this data in Elasticsearch for fast retrieval.
  • Leverages Transformer models for intelligent Q&A and summarization.

Tools & Technologies

  • python-pptx: Extract text from PowerPoint files.
  • pytesseract & PIL: Extract text from images.
  • transformers (HuggingFace): Perform Question Answering and Summarization.
  • sentence-transformers: Generate sentence embeddings for semantic search.
  • Elasticsearch: Store documents and enable semantic retrieval.

Step-by-Step Implementation

  1. Extract Text from PowerPoint and Images
    First, we collect data from PowerPoint presentations and image files:

    from pptx import Presentation
    import pytesseract
    from PIL import Image

    def extract_text_from_ppt(ppt_path):
    prs = Presentation(ppt_path)
    text = ""
    for slide in prs.slides:
    for shape in slide.shapes:
    if shape.has_text_frame:
    text += shape.text + " "
    return text

    def extract_text_from_image(image_path):
    img = Image.open(image_path)
    text = pytesseract.image_to_string(img)
    return text

    Why?

    Business presentations and scanned documents often contain critical information that’s hard to search manually.

  2. Combine Extracted Content into a Document
    We loop through all presentations and images to create one consolidated document.

    def create_document(ppt_paths, image_paths):
    document_text = ""
    for ppt_path in ppt_paths:
    document_text += extract_text_from_ppt(ppt_path) + " "
    for image_path in image_paths:
    document_text += extract_text_from_image(image_path) + " "
    return document_text
  3. Semantic Indexing with Sentence Embeddings
    Once we have the combined text, we chunk it and create embeddings for semantic similarity search.

    from sentence_transformers import SentenceTransformer
    from elasticsearch import Elasticsearch

    es = Elasticsearch(["http://localhost:9200"])
    embedding_model = SentenceTransformer('all-mpnet-base-v2')

    def index_document(document_text, index_name="document_index1"):
    chunks = document_text.split(".") # Simple chunking strategy
    for i, chunk in enumerate(chunks):
    embedding = embedding_model.encode(chunk)
    doc = {
    "text": chunk,
    "embedding": embedding.tolist()
    }
    es.index(index=index_name, id=i, document=doc)

    Why Elasticsearch?

    Elasticsearch makes it easy to search and score documents based on semantic similarity, which we calculate with cosine similarity.

  4. Ask Questions and Get Answers!
    Now that our data is indexed semantically, we can build a QA pipeline that:

    • Finds the most relevant chunks from Elasticsearch.
    • Uses a QA model to extract precise answers.
    from transformers import pipeline

    qa_pipeline = pipeline("question-answering", model="deepset/roberta-base-squad2")

    def answer_question(question, index_name="document_index1"):
    question_embedding = embedding_model.encode(question)
    search_body = {
    "size": 2,
    "query": {
    "script_score": {
    "query": {"match_all": {}},
    "script": {
    "source": "cosineSimilarity(params.query_vector, 'embedding') + 1.0",
    "params": {"query_vector": question_embedding.tolist()}
    }
    }
    }
    }
    response = es.search(index=index_name, body=search_body)
    retrieved_chunks = [hit["_source"]["text"] for hit in response["hits"]["hits"]]
    context = " ".join(retrieved_chunks)
    result = qa_pipeline(question=question, context=context)
    return result['answer'], context

    Example:

    question = "Who is Sachin Dabir?"
    answer, _ = answer_question(question)
    print(answer)
  5. Extractive + Generative Summarization
    After finding extractive answers, we generate summarized insights using a summarization model.

    summarization_model = pipeline("summarization", model="facebook/bart-large-cnn")

    summary = summarization_model(
    extractive_result + ". " + relevant_context_from_document,
    max_length=100,
    min_length=30
    )

    print(summary[0]['summary_text'])

Learnings from the Project

  1. End-to-End NLP Pipeline

    This project demonstrated how to build an end-to-end pipeline combining OCR, semantic search, and question-answering.

  2. Combining Extractive & Generative AI

    Combining extractive models for retrieval with generative models for summarization provides a richer user experience and deeper insights.

  3. Scalable Search with Elasticsearch

    Using sentence embeddings and Elasticsearch, we can scale semantic search for vast amounts of documents in a production-ready environment.

  4. Document AI for Enterprise

    Handling structured (PPT) and unstructured (images) data sources is crucial for real-world Document AI use cases.

Industrial Applications

  1. Enterprise Knowledge Base

    Automatically create a searchable knowledge base from presentations, meeting notes, whiteboard snapshots, and scanned PDFs.

  2. Document Intelligence & Compliance

    Extract critical information from business documents to ensure regulatory compliance or assist audit teams with instant Q&A systems.

  3. Customer Support & Helpdesk Automation

    Enable support agents or chatbots to find accurate answers instantly from technical documents, user manuals, and FAQs.

  4. Competitive Intelligence & Research

    Analyze competitor reports, product decks, and whitepapers, enabling faster decision-making based on extracted insights.

Future Improvements

  • Advanced Chunking: Improve text chunking strategies with sentence boundary detection.
  • Fine-tuned QA Models: Train on domain-specific data for higher accuracy.
  • Multi-modal Retrieval: Integrate audio, video, and PDF processing for richer search capabilities.
  • UI/UX Layer: Build a frontend dashboard for users to interact with the system easily.

Conclusion

This project highlights the potential of combining NLP, OCR, and Search technologies to unlock hidden knowledge in enterprise documents. Whether you’re dealing with corporate presentations, scanned images, or any unstructured data, this pipeline offers a scalable, intelligent solution.

If you’re interested in Document AI and Enterprise Knowledge Automation, this solution is a great starting point!


Go to Top