A recent outage lead me to investigate Postgres unique constraints more deeply. Postgres implements unique constraints by creating a unique index – an index that can only contain unique values. It turns out that unique indices and concurrent transactions can interact in nasty and surprising ways. Before I get into the “why”, here are the implications:
When two transactions insert the same value into a unique index, one transaction will wait for the other transaction to finish before proceeding. Moreover, you can cause deadlock using only
- Avoid inserting duplicate keys into tables with unique constraints and relying on the unique constraint to save you. This is especially important if multiple transactions do this concurrently.
- Avoid inserting large numbers of records during the same transaction into a table with a unique constraint, especially if you broke rule #1.
- Avoid long transactions involving tables with unique constraints especially if you broke rule #1.
Failure to follow these guidelines can result in your application leaking databases connections and eventually hanging. Here’s how:
application thread 1 in
transaction 1 inserts
application thread 2 in
transaction 2 inserts
transaction 2 and
application thread 2 are now both blocked until
transaction 1 is committed or rolled back
application thread 1 now waits for something from
application thread 2. This seems far fetched, but I’ve seen it multiple times in practice in seemingly reasonable code. A similar behavior can occur if
application thread 1 is blocked for an unrelated reason eg. making a long running API call.
You now have an application level deadlock. The Postgres deadlock detector cannot save you. In most cases you will leak database connections until your application hangs. This is the worst case scenario.
If you were a bit luckier, you could have created deadlock within Postgres. The Postgres deadlock detector will detect the deadlock after
deadlock_timeout (typically one second) and cancel one of the transactions.. It’s not uncommon to be in the worst case scenario described above, until, by sheer luck, insert ordering triggers the deadlock detector and cancels the transactions.
Why does this happen?
In the case where only one transaction is writing to a unique index, the process is straightforward. Postgres simply looks up the tuple we’re attempting to insert in the unique index. If it doesn’t exist, we’re good. If it already exists, our insertion will fail because we violated a unique constraint. Here’s the relevant function in the Postgres source.
If two transactions are writing to the index concurrently, the situation is more complex. For the purposes of our example we’ll have
transaction a and
Both transactions attempt to insert value
v into the same table. Suppose
transaction a has inserted
v first but has not yet committed.
transaction b inserts
v which causes Postgres to check the unique index. A Postgres index stores not only committed data, but also stores data written by ongoing transactions. Postgres will look for the tuple we’re attempting to insert in both the committed and “dirty” (not-yet-committed) sections of the index. In this case, it will find that another in-progress transaction (
transaction a) has already inserted
Postgres handles this situation by having
transaction b wait until
transaction a completes. Every transaction holds an exclusive lock on its own transaction ID while the transaction is in progress. If you want to wait for another transaction to finish, you can attempt to acquire a lock on that transaction ID, which will be granted when that transaction finishes.
In our example, Postgres will determine the transaction ID of the other transaction (
transaction a) and
transaction b will attempt to acquire a lock on the transaction ID of
transaction b is now blocked waiting for
transaction a to finish. There are 3 possible outcomes, ordered best to worst:
1. Benign resolution
transaction a finishes promptly.
transaction b fails with the message
duplicate key v violates unique constraint "..."
2. Postgres detects deadlock
If the two transactions are each inserting multiple rows into the table,
transaction a may attempt to insert a key previously inserted by
transaction b. This causes
transaction a to attempt to acquire a lock on
transaction b. Now
b is waiting on
a is waiting on
b. Deadlock! Postgres detects this after
deadlock_timeout and one of the transactions is aborted with this somewhat confusing deadlock error:
ERROR: deadlock detected
DETAIL: Process 81941 waits for ShareLock on transaction 924071; blocked by process 81944.
Process 81944 waits for ShareLock on transaction 924069; blocked by process 81941.
HINT: See server log for query details.
CONTEXT: while inserting index tuple (0,20) in relation "..."
3. Infinite wait, connection pileup, or other bad things ™
Here’s where things can go downhill. If
transaction a is never committed,
transaction b will remain stalled. This can happen for a few reasons, the most likely being that due to the stalled transactions, some sort of application level, outside-of-Postgres, deadlock occurs. One way this could happen is if the thread running
transaction a is actually waiting for output from the thread running
transaction b. Any future transactions that attempt to insert
v (perhaps a retry?) will join the pileup. If you’re lucky, the situation eventually resolves itself via #1 or #2 before you run out of database connections. If you aren’t lucky, your application will run out of database connections and hang forever.
While unique constraints are great for maintaining application level consistency, they need to be used with care. Besides adding an additional index that must be maintained on-disk and in memory, they can lead to lock contention within Postgres, stalled transactions, and leaked connections.
Thanks to Leah Alpert for being unlucky enough to run into this in production, and figuring out what was going on. Her research was the basis for this post.
Want to get emailed about new blog posts?
I post about once every few weeks on topics like
, language internals
, and recently, deep learning
Do you want to hire me? I’m available for engagements from 1 week to a few months. Hire me!