Running OpenFGA in Production
The following list outlines best practices for running OpenFGA in a production environment:
- Configure Authentication
- Enable HTTP TLS or gRPC TLS or both
- Set the log format to "json" and log level to "info"
- Disable the Playground
- Set Cluster
- Set Database Options
- Set Maximum Results
- Set Concurrency Limits
Cluster recommendations
We recommend:
- Turn on in-memory caching in Check API via flags. This will reduce latency of requests, but it will increase the staleness of OpenFGA's responses. Please see Cache Expiration for details on the flags.
- Prefer having a small pool of servers with high capacity (memory and CPU cores) instead of a big pool of servers, to increase cache hit ratios and simplify pool management.
- Turn on metrics collection via the flags
--metrics-enabled
and--datastore-metrics-enabled
. This will allow you to debug issues. - Turn on tracing via the flag
--trace-enabled
, but set sampling ratio to a low value, for example--trace-sample-ratio=0.3
. This will allow you to debug issues without overwhelming the tracing server. However, keep in mind that enabling tracing comes with a slight performance cost.
Database recommendations
To ensure good performance for OpenFGA, it is recommended that the database be:
- Co-located in the same physical datacenter and network as your OpenFGA servers. This will minimize latency of database calls.
- Used exclusively for OpenFGA and not shared with other applications. This allows scaling the database independently and avoiding contention with your database.
- Bootstrapped and managed with the
openfga migrate
command. This will ensure the appropriate database indexes are created.
It's strongly recommended to fine-tune your server database connection settings to avoid having to re-establish database connections frequently. Establishing database connections is slow and will negatively impact performance, and so here are some guidelines for managing database connection settings:
-
The server setting
OPENFGA_DATASTORE_MAX_OPEN_CONNS
should be set to be equal to your database's max connections. For example, in Postgres, you can see this value via running the SQL querySHOW max_connections;
. If you are running multiple instances of the OpenFGA server, you should divide this setting equally among the instances. For example, if your database'smax_connections
is 100, and you have 2 OpenFGA instances,OPENFGA_DATASTORE_MAX_OPEN_CONNS
should be set to 50 for each instance. -
The
OPENFGA_DATASTORE_MAX_IDLE_CONNS
should be set to a value no greater than the maximum open connections (see the bullet point above), but it should be set sufficiently high enough to avoid having to recreate connections on each request.If, when monitoring your database stats, you see a lot of database connections being closed and subsequently reopened, then you should consider increasing the maximum number of idle connections.
-
If idle connections are getting reaped frequently, then consider increasing the
OPENFGA_DATASTORE_CONN_MAX_IDLE_TIME
to a large value. When in doubt, prioritize keeping connections around for longer rather than shorter, because doing so will drastically improve performance.
Concurrency limits
Before modifying concurrency limits please make sure you've followed the guidance for Database Recommendations
OpenFGA queries such as Check, ListObjects and ListUsers can be quite database and CPU intensive in some cases. If you notice that a single request is consuming a lot of CPU or creating a high degree of database contention, then you may consider setting some concurrency limits to protect other requests from being negatively impacted by overly aggressive queries.
The following table enumerates the server's concurrency specific settings:
flag | env | config |
---|---|---|
--max-concurrent-reads-for-list-objects | OPENFGA_MAX_CONCURRENT_READS_FOR_LIST_OBJECTS | maxConcurrentReadsForListObjects |
--max-concurrent-reads-for-list-users | OPENFGA_MAX_CONCURRENT_READS_FOR_LIST_USERS | maxConcurrentReadsForListUsers |
--max-concurrent-reads-for-check | OPENFGA_MAX_CONCURRENT_READS_FOR_CHECK | maxConcurrentReadsForCheck |
--resolve-node-limit | OPENFGA_RESOLVE_NODE_LIMIT | resolveNodeLimit |
--resolve-node-breadth-limit | OPENFGA_RESOLVE_NODE_BREADTH_LIMIT | resolveNodeBreadthLimit |
--max-concurrent-checks-per-batch-check | OPENFGA_MAX_CONCURRENT_CHECKS_PER_BATCH_CHECK | maxConcurrentChecksPerBatchCheck |
Determining the right values for these settings will be based on a variety of factors including, but not limited to, the database specific deployment topology, the FGA model(s) involved, and the relationship tuples in the system. However, here are some high-level guidelines:
-
If a single ListObjects or ListUsers query is negatively impacting other query endpoints by increasing their latency or their error rate, then consider setting a lower value for
OPENFGA_MAX_CONCURRENT_READS_FOR_LIST_OBJECTS
orOPENFGA_MAX_CONCURRENT_READS_FOR_LIST_USERS
. -
If a single Check query is negatively impacting other query endpoints by increasing their latency or their error rate, then consider setting a lower value for
OPENFGA_MAX_CONCURRENT_READS_FOR_CHECK
.
If you still see high request latencies despite the guidance above, then you may additionally consider setting stricter limits on the query resolution behavior by limiting the resolution depth and resolution breadth. These can be controlled with the OPENFGA_RESOLVE_NODE_LIMIT
and OPENFGA_RESOLVE_NODE_BREADTH_LIMIT
settings, respectively. Consider these guidelines:
-
OPENFGA_RESOLVE_NODE_LIMIT
limits the resolution depth of a single query, and thus it sets an upper bound on how deep a relationship hierarchy may be. A high value will allow a single query to involve more hierarchical resolution and therefore more database queries, while a low value will reduce the number of hierarchical resolutions that will be allowed and thus reduce the number of database queries. -
OPENFGA_RESOLVE_NODE_BREADTH_LIMIT
limits the resolution breadth. It sets an upper bound on the number of in-flight resolutions that can be taking place on one or more usersets. A high value will allow a single query to involve more concurrent evaluations to take place and therefore more database queries and server processes, while a low value will reduce the overall number of concurrent resolutions that will be allowed and thus reduce the number of database queries and server processes.
Maximum results
Both the ListObjects and ListUsers endpoints will continue retrieving results until one of the following conditions is met:
- The maximum number of results is found
- The entire pool of possible results has been searched
- The API times out
By default, both ListObjects and ListUsers have a maximum results limit of 1,000. The higher the quantity of potential results in the system, the more time and resource-intensive it becomes to search for a large number of maximum results. This increased load can impact performance, potentially leading to time-outs in some cases. If your use case allows, consider setting a lower max results value via the OPENFGA_LIST_OBJECTS_MAX_RESULTS
or OPENFGA_LIST_USERS_MAX_RESULTS
configuration properties. This adjustment can lead to immediate improvements in time and resource efficiency.