Exposing Privacy Risks in Graph Retrieval-Augmented Generation (GraphRAG)

Posted on Aug 16, 2025

Introduction

Retrieval-Augmented Generation (RAG) has become a powerful paradigm for enhancing large language models with external knowledge. However, when applied to structured data such as knowledge graphs (GraphRAG), new privacy risks may emerge. During my six-month remote research internship at Prof. Suhang Wang’s Lab, I investigated the vulnerabilities of GraphRAG systems under data extraction attacks.

GraphRAG Privacy

Research Highlights

  1. Novel Study: Conducted the first empirical study revealing privacy risks and data extraction vulnerabilities in GraphRAG systems.
  2. Attack Design: Implemented both targeted and untargeted extraction attacks, achieving up to 73.6% entity leakage and 74.0% relationship leakage per query.
  3. Trade-off Analysis: Identified a critical privacy–utility trade-off inherent in graph-based RAG architectures.
  4. Defense Exploration: Evaluated potential defense mechanisms, including summarization, system prompt enhancement, and similarity thresholds.

Impact

This work provides foundational insights into the security and privacy challenges of GraphRAG, helping future researchers and practitioners design more robust and trustworthy retrieval-augmented systems.