Learn - Share - Innovate - Elevate

DataOps Newsletter Volume 15 - March 2025

DataOps Labs

Mar 15, 2025

Join Amazing In Person Sessions

1) Be Part of AWS Community Day Bengaluru -

https://acd.awsugblr.in/

Online Events:

Deep Dive: Next-Gen Amazon SageMaker -Unified Platform for Data, Analytics,AI - Monday, March 24, 2025 - 9:00 PM to 09:45 PM IST

The AWS Community Day Bengaluru 2025 Blogathon is an exciting competition designed to encourage AWS enthusiasts, developers, architects, and students to showcase their expertise by writing insightful blog articles. The Blogathon aims to highlight real-world AWS implementations, innovative solutions, and measurable impacts across various domains, including DevOps, FinOps, Big Data, Security, and more.

Topic: Hands-On: Build LangGraph Apps & Run LLAMA & DeepSeek R1 Locally with Ollama

Innovate

Hybrid RAG: Combining Knowledge Graphs and Vector Retrieval for Enhanced Information Extraction

This whitepaper introduces HybridRAG, a novel approach to improve information extraction from unstructured text data, particularly in the finance domain. The authors, from BlackRock and NVIDIA, address the limitations of traditional Retrieval-Augmented Generation (RAG) techniques, especially when dealing with domain-specific terminology and complex document formats common in financial documents like earnings call transcripts.

Problem Statement

Large Language Models (LLMs) struggle with extracting intricate information from unstructured financial text data, even with existing RAG (VectorRAG) techniques.
Financial documents contain domain-specific language, varied data formats, and unique contextual relationships that general-purpose LLMs find difficult to handle.
Traditional VectorRAG methods, which rely on paragraph-level chunking, can miss critical contextual information due to the hierarchical structure of financial statements.

Proposed Solution: HybridRAG

The paper proposes HybridRAG, a combination of two RAG approaches:

VectorRAG: Traditional RAG that uses vector databases for information retrieval based on semantic similarity.
GraphRAG: RAG that leverages Knowledge Graphs (KGs) to represent entities and relationships within the financial documents.

HybridRAG aims to leverage the strengths of both methods to achieve more accurate and contextually relevant answers to questions about financial documents.

Methodology

The paper outlines the following key steps in their HybridRAG approach:

VectorRAG Implementation: Standard RAG process involving chunking documents, creating embeddings, storing them in a vector database, and retrieving relevant chunks based on query similarity. They explicitly incorporate document metadata to improve performance.
Knowledge Graph Construction: A detailed process for building KGs from unstructured text data, semistructure and structure dataset with relationship involving:
- Knowledge Extraction: Identifying entities and relationships from text using NLP techniques, including entity recognition, relationship extraction, and coreference resolution. A two-tiered LLM chain is used for content refinement and information extraction
- Knowledge Improvement: Enhancing the quality of the KG by KG completion and fusion.
GraphRAG Implementation: Using the KG to retrieve relevant nodes (entities) and edges (relationships) related to a user query.

Contributions

HybridRAG Approach: The main contribution is the introduction and evaluation of HybridRAG, which combines VectorRAG and GraphRAG to improve information extraction from financial documents.
Evaluation on Financial Data: The paper uses a novel ground-truth Q&A dataset extracted from financial call transcripts of companies in the Nifty-50 index (Indian stock market index).
Demonstrated Improvement: The results show that HybridRAG outperforms both VectorRAG and GraphRAG individually in terms of retrieval accuracy and answer generation.

Significance

The HybridRAG approach has the potential to significantly improve the accuracy and efficiency of information extraction from complex financial documents. This can lead to better-informed investment decisions, risk management, and overall financial analysis. The technique's applicability extends beyond the financial domain to any area dealing with complex, unstructured data.

Elevate

Feel free to have a glance at blogs and share your thoughts

Amazing blog by Rohith Gowtham G - Student at The National Institute of Engineering | AWS Certified Cloud Practitioner

Building a serverless Clickstream Analytics Pipeline on AWS

Based on my Understanding on HybridRAG - I created a blog for small usecase

Building a HybridRAG System for Financial Document Analysis: An End-to-End Guide