YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
π GraphRAG Inference Hackathon β 3-Pipeline Benchmarking System
One query in β three pipelines run β side-by-side responses + metrics out.
Proving that graphs make LLM inference faster, cheaper, and smarter β backed by 12 research papers, 6 novel retrieval techniques, and the full hackathon evaluation stack.
3-Pipeline Architecture Β· TG GraphRAG Integration Β· Novelties Β· Evaluation Β· Quick Start
π― What This Is
A 3-pipeline GraphRAG benchmarking system built on top of the TigerGraph GraphRAG repo, with 14 novel techniques from 2024β2025 research, 12 LLM providers, and a production dashboard showing all three pipelines side-by-side with LLM-as-a-Judge + BERTScore evaluation.
| Pipeline 1: LLM-Only | Pipeline 2: Basic RAG | Pipeline 3: GraphRAG |
|---|---|---|
| Query β LLM β Answer | Query β Embed β Top-K Chunks β LLM | Query β TG GraphRAG Service β NoveltyEngine β LLM |
| No retrieval. Worst-case baseline. | Vector embeddings. Industry standard. | Built on tigergraph/graphrag + 6 novelties. |
π― TigerGraph GraphRAG Integration
Pipeline 3 is built on top of the official TigerGraph GraphRAG repo (Path B: customize). The integration layer (tg_graphrag_client.py) wraps the official service:
from graphrag.layers.tg_graphrag_client import TGGraphRAGClient
client = TGGraphRAGClient(service_url="http://localhost:8000")
client.connect()
# Official retrievers: Hybrid Search, Community, Sibling
result = client.retrieve(query="What did Einstein discover?",
retriever="hybrid", top_k=5, num_hops=2)
result = client.retrieve(query="Main themes?",
retriever="community", community_level=2)
Modes: REST API (official service) β Direct pyTigerGraph (fallback) β Offline (passage-based).
# Deploy official TG GraphRAG + point our system at it
git clone https://github.com/tigergraph/graphrag && cd graphrag && docker-compose up -d
export GRAPHRAG_SERVICE_URL=http://localhost:8000
python -m graphrag.main benchmark --samples 50
ποΈ 3-Pipeline Architecture
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β LAYER 4: EVALUATION β
β LLM-as-a-Judge (PASS/FAIL, β₯90%) β BERTScore F1 (β₯0.55) β RAGAS β F1/EM β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β LAYER 3: UNIVERSAL LLM (12 Providers) β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β LAYER 2: 3-PIPELINE ORCHESTRATION + NOVELTY ENGINE β
β Pipeline 1: LLM-Only β Pipeline 2: Basic RAG β Pipeline 3: GraphRAG β
β NoveltyEngine: PolyG Router β PPR β Spreading Activation β Token Budget β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β LAYER 1: GRAPH β
β TG GraphRAG Service (official repo) ββ Direct pyTigerGraph (fallback) β
β Retrievers: Hybrid, Community, Sibling β GSQL: PPR, Paths, Activation β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Pipeline 3 Flow
Query β keyword extraction β TG GraphRAG Service (hybrid retriever)
β NoveltyEngine: PolyG Router β PPR β Spreading Activation β Token Budget
β Structured context (entities + relationships + passages) β LLM β Answer
π 14 Novel Techniques
Graph Retrieval (6 papers, wired into Pipeline 3 via NoveltyEngine)
| # | Technique | Paper | Result | Code |
|---|---|---|---|---|
| 1 | PPR Confidence Retrieval | CatRAG | Best reasoning on 4 benchmarks | PPRConfidenceScorer |
| 2 | Spreading Activation | SA-RAG | +39% correctness | SpreadingActivation |
| 3 | Flow-Pruned Paths | PathRAG | 62β65% win rate | PathPruner |
| 4 | Token Budget Controller | TERAG | 97% token reduction | TokenBudgetController |
| 5 | PolyG Hybrid Router | RAGRouter-Bench | Adaptive > fixed | PolyGRouter |
| 6 | Incremental Updates | TG-RAG | O(new) cost | IncrementalGraphUpdater |
Architecture + System (#7β14)
Schema-bounded extraction, dual-level keywords, adaptive routing, graph reasoning explanation, 12-provider LLM, OpenClaw agent, live 3-pipeline dashboard, advanced GSQL queries.
π Evaluation Framework
All hackathon-required metrics implemented in evaluation_layer.py:
| Metric | Target | Implementation |
|---|---|---|
| LLM-as-a-Judge (PASS/FAIL) | β₯ 90% pass rate | compute_llm_judge() β reference-guided, CoT, JSON output |
| BERTScore F1 | β₯ 0.55 rescaled / β₯ 0.88 raw | compute_bertscore() β roberta-large with rescaling |
| F1 / Exact Match | β | SQuAD/HotpotQA standard |
| RAGAS | β | Faithfulness, Relevancy, Context Precision/Recall |
| Token Efficiency | β | Per-pipeline per-query tracking |
| Cost per Query | β | tokens Γ provider_pricing |
| Latency | β | End-to-end ms |
from graphrag.layers.evaluation_layer import compute_llm_judge, compute_bertscore
# LLM-as-a-Judge
result = compute_llm_judge(question, reference, candidate, llm_fn)
# β {"verdict": "PASS", "feedback": "..."}
# BERTScore
results = compute_bertscore(predictions, references, rescale=True)
# β {"mean_f1": 0.62, "pass_rate": 0.85}
π Quick Start
git clone https://huggingface.co/muthuk1/graphrag-inference-hackathon
cd graphrag-inference-hackathon && cp .env.example .env
pip install -r requirements.txt
# Setup TigerGraph (schema + core + advanced GSQL queries)
python graphrag/setup_tigergraph.py
# 3-pipeline benchmark
python -m graphrag.main benchmark --samples 50 --output results.json
# 3-column Gradio dashboard
python -m graphrag.main dashboard
# Next.js dashboard
cd web && npm install && npm run dev
# Docker
docker build -t graphrag . && docker run -p 3000:3000 -p 7860:7860 --env-file .env graphrag
# Free (Ollama)
ollama pull llama3.2 && python -m graphrag.main demo
π Project Structure
graphrag/layers/
tg_graphrag_client.py # π Official TG GraphRAG service integration
orchestration_layer.py # π 3-pipeline + NoveltyEngine wiring
evaluation_layer.py # π LLM-Judge + BERTScore + RAGAS + F1/EM
novelties.py # 6 novel techniques (PPR, activation, paths, budget, router, incremental)
graph_layer.py # TigerGraph GSQL + schema
gsql_advanced.py # Advanced GSQL (PPR, paths, activation)
llm_layer.py / universal_llm.py # 12-provider LLM
graphrag/
benchmark.py # π 3-pipeline HotpotQA benchmark
dashboard.py # π 3-column Gradio dashboard
setup_tigergraph.py # π Schema + core + advanced query install
ingestion.py / main.py
web/src/app/api/compare/ # π 3-pipeline Next.js API
openclaw/ # Agent skills
tests/ # 55 tests
π References (12 Papers)
Implemented: CatRAG, SA-RAG, PathRAG, TERAG, RAGRouter-Bench, TG-RAG
Architecture: Microsoft GraphRAG, LightRAG, Youtu-GraphRAG, HippoRAG 2
Evaluation: LLM-as-a-Judge (NeurIPS 2023), BERTScore (ICLR 2020)
π Links
TigerGraph GraphRAG Β· TigerGraph Savanna Β· TigerGraph MCP Β· TigerGraph Docs
π Built for the GraphRAG Inference Hackathon by TigerGraph
3 Pipelines Β· 14 Novelties Β· 12 Papers Β· 12 LLMs Β· 55 Tests Β· LLM-Judge + BERTScore Β· Docker
Build it. Benchmark it. Prove graph beats tokens.