PostgreSQL RLS Tuning: Resolving a 1.8s Query Latency Spike

A technical case study on how we diagnosed a severe latency spike from 45ms to 1.8 seconds after enabling PostgreSQL RLS, and how we optimized it.

1. The Backstory: The Latency Spike That Stalled Our Client's Dashboard

At CodexiLab, we frequently talk about the benefits of PostgreSQL Row Level Security (RLS) for securing multi-tenant applications. However, implementing RLS is not without operational risks. A few weeks ago, we completed a database security migration for an enterprise logistics client. We enabled RLS across their entire schema, ensuring that tenant data was isolated at the database level. During staging tests with minimal data, performance looked excellent. But the moment we deployed the changes to production under a full database load of over 1.2 million rows, disaster struck.

Their main executive dashboard—which queried order histories, shipment status, and billing metrics—saw query response times spike from a clean 45 milliseconds to a crippling 1.8 seconds. CPU utilization on the database instance spiked to 98%, and connection timeouts began piling up. The client was on the verge of disabling RLS entirely, which would have compromised their security compliance. We had to act fast, locate the query planner bottlenecks, and optimize the RLS policies to restore sub-10ms performance. This case study details the exact EXPLAIN ANALYZE profiles, indexing strategies, and policy refactorings we used to resolve the issue.

2. Deconstructing the Query Planner: How PostgreSQL Evaluates RLS Policies

To understand why RLS can cause severe performance degradation, we must understand how the PostgreSQL query optimizer works. When you write a query like SELECT * FROM orders WHERE status = 'shipped', PostgreSQL runs it through a parser, analyzer, and rewriter to generate an execution plan. If RLS is enabled, the rewriter intercepts the query and appends the RLS policy expression as an additional filter.

If your RLS policy is defined as:
USING (tenant_id = current_setting('app.current_tenant_id'))
PostgreSQL rewrites your query to:
SELECT * FROM orders WHERE status = 'shipped' AND (tenant_id = current_setting('app.current_tenant_id'))

Under normal circumstances, if you have an index on tenant_id, the query planner will perform an index scan, and performance will remain fast. However, if the policy expression is complex—for example, if it contains subqueries, joins, or non-volatile functions—the optimizer cannot pre-evaluate the expression. It is forced to evaluate the policy row-by-row, resulting in a 'Sequential Scan' (Seq Scan) across the entire table. In a table with millions of records, this causes the database to read every single row from disk, creating massive disk I/O and CPU spikes.

3. The Culprit: Subqueries and Non-Leakproof Functions inside Policies

When we ran EXPLAIN (ANALYZE, BUFFERS) on our client's slow dashboard query, the execution plan revealed a massive sequential scan. The policy was defined as follows:

CREATE POLICY tenant_isolation_policy ON orders USING (tenant_id IN (SELECT id FROM tenants WHERE is_active = true))

While this policy looks clean, it contains a critical flaw: a subquery. For every single row in the orders table, PostgreSQL was executing the subquery SELECT id FROM tenants WHERE is_active = true to check if the tenant was active. Even though the database tries to optimize subqueries using hash semi-joins, the complexity of the plan escalated, and the optimizer abandoned the index on orders.tenant_id, falling back to a full table scan.

Furthermore, we discovered they were using a custom PL/pgSQL helper function inside the policy to retrieve the active tenant ID from a session table. Because PostgreSQL treats user-defined functions as 'black boxes' whose return values cannot be predicted, it cannot apply standard index optimizations. To tune RLS, we must follow two golden rules: keep policy expressions extremely simple (avoiding subqueries and custom functions), and ensure all query paths can utilize compound indexes.

sql

-- The Slow Policy containing a subquery (Causes Sequential Scan)
-- CREATE POLICY orders_tenant_policy ON orders 
--    USING (tenant_id IN (SELECT id FROM tenants WHERE is_active = true));

-- The Optimized, Index-Friendly Policy
-- Instead of a subquery, we read directly from a session variable 
-- that is validated at the application middleware level.
CREATE POLICY orders_tenant_policy_optimized ON orders
    FOR ALL
    USING (tenant_id = current_setting('app.current_tenant_id', true));

-- To support the optimized policy, we create a compound index
-- combining the tenant_id with query filtering columns.
CREATE INDEX IF NOT EXISTS idx_orders_tenant_status_created
    ON orders (tenant_id, status, created_at DESC);

4. Step-by-Step Optimization of RLS Indexes and SQL Policies

The SQL block above shows the transition from the slow subquery policy to our optimized session-variable policy. By moving the validation check ('Is this tenant active?') from the database policy to the application middleware layer, we simplified the RLS policy to a basic string comparison: tenant_id = current_setting('app.current_tenant_id'). PostgreSQL can evaluate this comparison instantly during the query compilation phase.

Additionally, we created a compound index on (tenant_id, status, created_at DESC). In a multi-tenant database, almost every query includes a filter on tenant_id (automatically injected by RLS) along with a status or date filter. If you only have single-column indexes on tenant_id and status, PostgreSQL must scan both indexes and merge the bitmaps in memory. By creating a compound index, we allow the database to locate the matching tenant's shipped orders in a single, unified index seek operation.

5. Code Walkthrough: Diagnosing Query Plans with EXPLAIN ANALYZE

To verify that our changes successfully resolved the table scans, we run a diagnostic script in PostgreSQL. We mock the active session variable using the set_config command, run the dashboard query prefixed with EXPLAIN (ANALYZE, COSTS, BUFFERS), and inspect the output. Below is the SQL script we used to validate the query plan optimizations.

sql

-- Step 1: Mock the application session variable
SELECT set_config('app.current_tenant_id', 'tenant_xyz_456', true);

-- Step 2: Run EXPLAIN ANALYZE to profile execution details
EXPLAIN (ANALYZE, COSTS, VERBOSE, BUFFERS)
SELECT id, order_number, status, created_at 
FROM orders 
WHERE status = 'shipped' 
ORDER BY created_at DESC 
LIMIT 20;

-- Expect Output to show 'Index Scan' instead of 'Seq Scan':
-- -> Index Scan using idx_orders_tenant_status_created on orders  (cost=0.43..12.54 rows=10 width=45) (actual time=0.042..0.120 rows=20 loops=1)
--      Index Cond: ((tenant_id)::text = 'tenant_xyz_456'::text)
--      Filter: (status = 'shipped'::text)
--      Buffers: shared hit=8

6. The Results: Restoring Sub-10ms Queries and Dropping CPU Load

The results of refactoring the RLS policy and deploying the compound indexes were immediate and dramatic. The main dashboard query latency dropped from 1.8 seconds to a clean 8 milliseconds. Database CPU utilization plummeted from 98% to less than 15%, resolving the connection timeouts and restoring complete stability to the platform.

Furthermore, because the query planner was now utilizing index scans, the volume of data blocks read from disk (Buffers: shared hit) was reduced by 99.8%. This dramatically decreased the load on the database's memory cache, allowing PostgreSQL to handle a significantly higher volume of concurrent queries without requiring a hardware upgrade.

Row Level Security is an incredibly powerful security feature, but it must be treated with engineering respect. By keeping policy expressions simple, avoiding subqueries, and aligning your indexes with the injected RLS filters, you can enjoy all the security benefits of logical tenant isolation without sacrificing application performance.

7. Summary: Best Practices for High-Performance RLS

Tuning PostgreSQL RLS is essential for any enterprise SaaS platform. Remember these three best practices:

Keep Policies Simple: Never put subqueries or complex functions in your USING clause. Use simple config parameters instead.
Always Use Compound Indexes: Ensure your indexes lead with the tenant ID column followed by the fields used for filtering and sorting.
Profile Under Load: Staging environments with small datasets hide RLS bottlenecks. Always test your queries using EXPLAIN ANALYZE against a database seeded with realistic production volumes.

8. Frequently Asked Questions (FAQ)

Q: Does PostgreSQL RLS support prepared statements?
A: Yes, RLS is fully compatible with prepared statements. PostgreSQL will evaluate the RLS policy when the statement is executed, using the session variable value active at that specific moment.

Q: What is the performance impact of enabling RLS on small tables?
A: On small tables (e.g. under 10,000 rows), the overhead of evaluating RLS policies is practically zero. You do not need to worry about advanced tuning for configuration tables, but we still recommend enabling RLS to ensure consistent security.

Q: Can I use PostgreSQL indexes on the session configuration variables?
A: No, session configuration variables (current_setting(...)) are text values resolved at runtime and cannot be indexed directly. Instead, you index the column (e.g. tenant_id) that is compared against the session variable.

Author

Md. Sabbir Al Mamon

Founder of CodexiLab

Md. Sabbir Al Mamon is the founder and lead product engineer of CodexiLab. With over a decade of experience designing and scaling software architectures, he specializes in building high-performance AI integration pipelines, multi-tenant SaaS structures, and robust cross-platform mobile solutions. Passionate about product engineering discipline, design system integration, and semantic web standards, Sabbir guides the technical and product delivery direction at CodexiLab.