← Back to Home
ACL 2026 Findings August 2025

Exposing Privacy Risks in Graph Retrieval-Augmented Generation (GraphRAG)

Conducted during a six-month remote research internship at Prof. Suhang Wang's Lab at Penn State University.

GraphRAG Privacy

Introduction

Retrieval-Augmented Generation (RAG) has become a powerful paradigm for enhancing large language models with external knowledge. However, when applied to structured data such as knowledge graphs (GraphRAG), new privacy risks may emerge. During my six-month remote research internship at Prof. Suhang Wang's Lab, I investigated the vulnerabilities of GraphRAG systems under data extraction attacks.

Research Highlights

01

Novel Study

Conducted the first empirical study revealing privacy risks and data extraction vulnerabilities in GraphRAG systems.

02

Attack Design

Implemented both targeted and untargeted extraction attacks, achieving up to 73.6% entity leakage and 74.0% relationship leakage per query.

03

Trade-off Analysis

Identified a critical privacy–utility trade-off inherent in graph-based RAG architectures.

04

Defense Exploration

Evaluated potential defense mechanisms, including summarization, system prompt enhancement, and similarity thresholds.

Impact

This work provides foundational insights into the security and privacy challenges of GraphRAG, helping future researchers and practitioners design more robust and trustworthy retrieval-augmented systems.