Your RAG faithfulness check is measuring copy-paste, not faithfulness
I was building an eval harness for a retrieval-augmented generation pipeline, and the first faithfulness check I wrote was quietly wrong. It looked reasonable. It ran on every example for free. It just measured the wrong thing, and I only saw it once I started feeding it edge cases on purpose. The w




