

Table of Contents
Introduction
Have you ever been stuck searching for that one critical slide buried deep inside a hundred PowerPoint presentations or images of whiteboard notes? Imagine a system where you can ask a question and instantly get the right answer, extracted from your presentations and images.
In this article, I’ll walk you through how I built an AI-powered Question Answering (QA) and Document Retrieval system using Python, Transformers, Sentence Transformers, and Elasticsearch. This solution extracts knowledge from PowerPoint files and images, embeds them into a searchable database, and allows you to query it intelligently using extractive and generative AI pipelines.
The Problem We’re Solving
Organizations generate a huge volume of documents—presentations, whiteboard snapshots, and scanned notes. Retrieving valuable information from these scattered resources is often tedious and inefficient.
Solution? Build an intelligent pipeline that:
- Extracts text from PowerPoint and image files.
- Embeds and indexes this data in Elasticsearch for fast retrieval.
- Leverages Transformer models for intelligent Q&A and summarization.
Tools & Technologies
- python-pptx: Extract text from PowerPoint files.
- pytesseract & PIL: Extract text from images.
- transformers (HuggingFace): Perform Question Answering and Summarization.
- sentence-transformers: Generate sentence embeddings for semantic search.
- Elasticsearch: Store documents and enable semantic retrieval.
Step-by-Step Implementation
- Extract Text from PowerPoint and Images
First, we collect data from PowerPoint presentations and image files:from pptx import Presentation
import pytesseract
from PIL import Imagedef extract_text_from_ppt(ppt_path):
prs = Presentation(ppt_path)
text = ""
for slide in prs.slides:
for shape in slide.shapes:
if shape.has_text_frame:
text += shape.text + " "
return textdef extract_text_from_image(image_path):
img = Image.open(image_path)
text = pytesseract.image_to_string(img)
return textWhy?
Business presentations and scanned documents often contain critical information that’s hard to search manually.
- Combine Extracted Content into a Document
We loop through all presentations and images to create one consolidated document.def create_document(ppt_paths, image_paths):
document_text = ""
for ppt_path in ppt_paths:
document_text += extract_text_from_ppt(ppt_path) + " "
for image_path in image_paths:
document_text += extract_text_from_image(image_path) + " "
return document_text - Semantic Indexing with Sentence Embeddings
Once we have the combined text, we chunk it and create embeddings for semantic similarity search.from sentence_transformers import SentenceTransformer
from elasticsearch import Elasticsearches = Elasticsearch(["http://localhost:9200"])
embedding_model = SentenceTransformer('all-mpnet-base-v2')def index_document(document_text, index_name="document_index1"):
chunks = document_text.split(".") # Simple chunking strategy
for i, chunk in enumerate(chunks):
embedding = embedding_model.encode(chunk)
doc = {
"text": chunk,
"embedding": embedding.tolist()
}
es.index(index=index_name, id=i, document=doc)Why Elasticsearch?
Elasticsearch makes it easy to search and score documents based on semantic similarity, which we calculate with cosine similarity.
- Ask Questions and Get Answers!
Now that our data is indexed semantically, we can build a QA pipeline that:- Finds the most relevant chunks from Elasticsearch.
- Uses a QA model to extract precise answers.
from transformers import pipeline
qa_pipeline = pipeline("question-answering", model="deepset/roberta-base-squad2")
def answer_question(question, index_name="document_index1"):
question_embedding = embedding_model.encode(question)
search_body = {
"size": 2,
"query": {
"script_score": {
"query": {"match_all": {}},
"script": {
"source": "cosineSimilarity(params.query_vector, 'embedding') + 1.0",
"params": {"query_vector": question_embedding.tolist()}
}
}
}
}
response = es.search(index=index_name, body=search_body)
retrieved_chunks = [hit["_source"]["text"] for hit in response["hits"]["hits"]]
context = " ".join(retrieved_chunks)
result = qa_pipeline(question=question, context=context)
return result['answer'], contextExample:
question = "Who is Sachin Dabir?"
answer, _ = answer_question(question)
print(answer) - Extractive + Generative Summarization
After finding extractive answers, we generate summarized insights using a summarization model.summarization_model = pipeline("summarization", model="facebook/bart-large-cnn")
summary = summarization_model(
extractive_result + ". " + relevant_context_from_document,
max_length=100,
min_length=30
)print(summary[0]['summary_text'])
Learnings from the Project
- End-to-End NLP Pipeline
This project demonstrated how to build an end-to-end pipeline combining OCR, semantic search, and question-answering.
- Combining Extractive & Generative AI
Combining extractive models for retrieval with generative models for summarization provides a richer user experience and deeper insights.
- Scalable Search with Elasticsearch
Using sentence embeddings and Elasticsearch, we can scale semantic search for vast amounts of documents in a production-ready environment.
- Document AI for Enterprise
Handling structured (PPT) and unstructured (images) data sources is crucial for real-world Document AI use cases.
Industrial Applications
- Enterprise Knowledge Base
Automatically create a searchable knowledge base from presentations, meeting notes, whiteboard snapshots, and scanned PDFs.
- Document Intelligence & Compliance
Extract critical information from business documents to ensure regulatory compliance or assist audit teams with instant Q&A systems.
- Customer Support & Helpdesk Automation
Enable support agents or chatbots to find accurate answers instantly from technical documents, user manuals, and FAQs.
- Competitive Intelligence & Research
Analyze competitor reports, product decks, and whitepapers, enabling faster decision-making based on extracted insights.
Future Improvements
- Advanced Chunking: Improve text chunking strategies with sentence boundary detection.
- Fine-tuned QA Models: Train on domain-specific data for higher accuracy.
- Multi-modal Retrieval: Integrate audio, video, and PDF processing for richer search capabilities.
- UI/UX Layer: Build a frontend dashboard for users to interact with the system easily.
Conclusion
This project highlights the potential of combining NLP, OCR, and Search technologies to unlock hidden knowledge in enterprise documents. Whether you’re dealing with corporate presentations, scanned images, or any unstructured data, this pipeline offers a scalable, intelligent solution.
If you’re interested in Document AI and Enterprise Knowledge Automation, this solution is a great starting point!