Exposing Privacy Risks in Graph Retrieval-Augmented Generation (GraphRAG)
Conducted during a six-month remote research internship at Prof. Suhang Wang's Lab at Penn State University.
Introduction
Retrieval-Augmented Generation (RAG) has become a powerful paradigm for enhancing large language models with external knowledge. However, when applied to structured data such as knowledge graphs (GraphRAG), new privacy risks may emerge. During my six-month remote research internship at Prof. Suhang Wang's Lab, I investigated the vulnerabilities of GraphRAG systems under data extraction attacks.
Research Highlights
Novel Study
Conducted the first empirical study revealing privacy risks and data extraction vulnerabilities in GraphRAG systems.
Attack Design
Implemented both targeted and untargeted extraction attacks, achieving up to 73.6% entity leakage and 74.0% relationship leakage per query.
Trade-off Analysis
Identified a critical privacy–utility trade-off inherent in graph-based RAG architectures.
Defense Exploration
Evaluated potential defense mechanisms, including summarization, system prompt enhancement, and similarity thresholds.
Impact
This work provides foundational insights into the security and privacy challenges of GraphRAG, helping future researchers and practitioners design more robust and trustworthy retrieval-augmented systems.