Skip to main content

RAG Authorization

Retrieval-Augmented Generation (RAG) enhances LLM responses by retrieving relevant documents from a knowledge base. Without authorization checks, a user can ask a question and receive answers derived from documents they should not have access to. OpenFGA lets you enforce document-level permissions so that RAG pipelines only return content the user is authorized to see.

This guide shows how to model document permissions in OpenFGA and integrate authorization checks into a RAG pipeline, regardless of the framework or vector database you use.

Authorization model

A typical RAG knowledge base contains documents organized in folders, with access controlled at both levels. The following model represents this structure:

model
schema 1.1

type user

type folder
relations
define owner: [user]
define viewer: [user] or owner

type document
relations
define folder: [folder]
define owner: [user]
define viewer: [user] or owner or viewer from folder

A folder has owner and viewer relations. A document belongs to a folder and inherits its viewers — anyone who can view the folder can view all documents inside it. You can also grant direct access to individual documents.

Writing tuples

Set up the folder structure, document ownership, and user access:

tuples:
# anne owns the engineering folder
- user: user:anne
relation: owner
object: folder:engineering

# beth can view the engineering folder (and all its documents)
- user: user:beth
relation: viewer
object: folder:engineering

# link documents to their folder
- user: folder:engineering
relation: folder
object: document:api_design
- user: folder:engineering
relation: folder
object: document:architecture
- user: folder:engineering
relation: folder
object: document:roadmap

# carl can only view the roadmap document
- user: user:carl
relation: viewer
object: document:roadmap

With this setup:

  • anne can view all documents in the engineering folder (as owner).
  • beth can view all documents in the engineering folder (as viewer).
  • carl can only view the roadmap document.

Filtering approaches

There are two main approaches to integrate OpenFGA into a RAG pipeline. Both ensure that the LLM only sees documents the user is authorized to access.

Retrieve then check

Query the vector database first, then filter results by checking permissions with OpenFGA. This is the most common approach and works well when the vector search returns a manageable number of candidates.

The flow is:

  1. The user sends a query to the RAG pipeline.
  2. The pipeline retrieves candidate documents from the vector database.
  3. For each candidate, call OpenFGA to check whether the user can view it.
  4. Filter out unauthorized documents.
  5. Pass only the authorized documents to the LLM as context.

Use the BatchCheck API to check multiple documents in a single request. For example, if a vector search returns three documents for user:carl:

// Requires >=v0.8.0 for the server side BatchCheck, earlier versions support a client-side BatchCheck with a slightly different interface
const body = {
checks: [
{
user: 'user:carl',
relation: 'viewer',
object: 'document:roadmap',
correlationId: 'undefined'
},{
user: 'user:carl',
relation: 'viewer',
object: 'document:api_design',
correlationId: 'undefined'
},{
user: 'user:carl',
relation: 'viewer',
object: 'document:architecture',
correlationId: 'undefined'
}
],
}

const options = {
authorization_model_id: '01HVMMBCMGZNT3SED4Z17ECXCA',
maxBatchSize: 50, // optional, default is 50, can be used to limit the number of checks in a single server request
maxParallelRequests: 10, // optional, default is 10, can be used to limit the parallelization of the BatchCheck chunks
};
const { result } = await fgaClient.batchCheck(body, options);

/*
{
"results": [
{
"correlationId": 'undefined',
"allowed": true,
"request": {
"user": 'user:carl',
"relation": 'viewer',
"object": 'document:roadmap'}
}, {
"correlationId": 'undefined',
"allowed": false,
"request": {
"user": 'user:carl',
"relation": 'viewer',
"object": 'document:api_design'}
}, {
"correlationId": 'undefined',
"allowed": false,
"request": {
"user": 'user:carl',
"relation": 'viewer',
"object": 'document:architecture'}
}
],
}
*/

Only document:roadmap is returned as allowed. The pipeline filters out the other two documents before passing context to the LLM.

Build an authorized list, then retrieve

Retrieve the list of documents the user can access first, then pass those IDs as a filter to the vector search. This approach works well when the user has access to a relatively small number of documents.

The flow is:

  1. Call the ListObjects API to get all document IDs the user can access.
  2. Pass those IDs as a metadata filter to the vector database query.
  3. The vector search only returns results from authorized documents.
  4. Pass the results to the LLM as context.

For example, to get all documents user:carl can view:

const response = await fgaClient.listObjects({
user: "user:carl",
relation: "viewer",
type: "document",
}, {
authorizationModelId: "01HVMMBCMGZNT3SED4Z17ECXCA",
});
// response.objects = ["document:roadmap"]

Pass the resulting document IDs as a filter to your vector database. Most vector databases support metadata filtering — use the document ID stored in each vector's metadata to restrict the search.

Choosing an approach

CriteriaRetrieve then checkBuild list, then retrieve
Vector search returns few candidatesGood fitWorks, but unnecessary overhead
User has access to few documentsWorks, but may discard many resultsGood fit
User has access to most documentsGood fitUnnecessary overhead
Need exact top-K resultsMay return fewer than K after filteringGuarantees all results are authorized

For detailed guidance on choosing between these approaches and handling more complex scenarios, see Search With Permissions.

tip

When using "retrieve then check", request more candidates than you need from the vector database (e.g., 2-3x your target count) to account for documents that will be filtered out.

Framework integration

The filtering patterns above are framework-agnostic. Here is how to apply them in popular RAG frameworks:

  • LangChain (Python/JS): Implement a custom retriever that wraps your vector store retriever. After retrieving candidates, call OpenFGA BatchCheck and filter the results before returning them to the chain.
  • LlamaIndex: Use a post-processing step or a custom node postprocessor that checks permissions against OpenFGA before passing nodes to the response synthesizer.
  • Custom pipelines: Insert the authorization check between the retrieval and generation steps of your pipeline.

In all cases, the authorization check should happen after retrieval and before the documents reach the LLM.

Further reading

These resources explore RAG authorization patterns with OpenFGA in more detail:

Search With Permissions

Detailed guidance on integrating authorization into search, with trade-off analysis for different approaches

Task-Based Authorization

Grant agents scoped permissions to perform specific actions without permanent access

Conditions

Add time-based expiration or other conditions to document access grants