Posts
Postgres Indexes Under the Hood
Many software engineers use database indexes every day, but few of us really understand how they work. In this post I’ll explain:
How indexing works in Postgres using B-Trees What B-Trees are Why they are a good fit for this problem Indexes in Postgres Postgres actually offers 4 different kinds of indexes for different use cases. In this post I’ll be focusing on the “normal” index, the kind you get by default when you run create index.
Posts
PSA: Python Float Overflow
Python 2 and Python 3 provide arbitrary precision integers. This means that you don’t need to worry about overflowing or underflowing the integer datatypes, ever. 2**10000? No problem. It makes it pretty convenient to write code. Floats, however, are not arbitrary precision:
>>> a = 2 ** 10000 >>> a * .1 Traceback (most recent call last): File "<stdin>", line 1, in <module> OverflowError: int too large to convert to float Luckily, float overflow throws an exception to alert you to the problem.
Posts
My Favorite Algorithm: Linear Time Median Finding
Finding the median in a list seems like a trivial problem, but doing so in linear time turns out to be tricky. In this post I’m going to walk through one of my favorite algorithms, the median-of-medians approach to find the median of a list in deterministic linear time. Although proving that this algorithm runs in linear time is a bit tricky, this post is targeted at readers with only a basic level of algorithmic analysis.
Posts
Resource Pools and Deadlock
In the last 2 months, at two separate clients, I ran into the same type of deadlock, so I figured it was worth writing about to save someone else from the same fate.
In short: When you have a pool of resources, the same thread must not acquire a second resource while already holding a resource. Otherwise, multiple threads can deadlock when they have all acquired their first resource but none can acquire their second resource.
Posts
Demystifying Two Factor Auth
I always wondered how Google Authenticator style 2-factor codes worked. The process of going from QR code to rotating 6-digit pin seemed a bit magical. A few days ago, my curiosity found itself coupled with some free time. Here’s what I found:
What’s in the QR Code I scanned the QR code from Github with a barcode scanning app. Here’s what’s inside:
otpauth://totp/Github:rcoh?secret=onswg4tforrw6zdf&issuer=Github Not too surprising. It tells us the protocol, TOTP, who is issuing this OTP code (Github), and most importantly the secret:1
Posts
An Analysis of Hash Map Implementations in Popular Languages
Few data-structures are more ubiquitous in real-world development than the hash table. Nearly every major programming features an implementation in its standard library or built into the runtime. Yet, there is no conclusive best strategy to implement one and the major programming languages diverge widely in their implementations! I did a survey of the Hash map implementations in Go, Python, Ruby, Java, C#, C++, and Scala to compare and contrast how they were implemented.
Posts
Trip Report: The West Ridge Mount Conness
Even if you go to bed at 9, getting up at 5AM is hard. It’s always darker than you expect and it’s usually freezing. In any case, Calley and I crawled out of our respective vans at 5AM in the Tuolumne Meadows campground, shoveled down some cold cereal, and rolled out for the day’s objective, the West Ridge of Mount Conness. TM Herbert, legendary Tuolumne badass, described the climb as “2 Cathedrals stacked on top of each other.
Posts
How Postgres Unique Constraints Can Cause Deadlock
A recent outage lead me to investigate Postgres unique constraints more deeply. Postgres implements unique constraints by creating a unique index – an index that can only contain unique values.1 It turns out that unique indices and concurrent transactions can interact in nasty and surprising ways. Before I get into the “why”, here are the implications:
When two transactions insert the same value into a unique index, one transaction will wait for the other transaction to finish before proceeding.
Posts
Notes on CPython List Internals
As I was learning to program, Python lists seemed totally magical to me. I imagined them as being implemented by some sort of magical datastructure that was part linked-list, part array that was perfect for everything.
As I grew as an engineer, it occurred that this was unlikely. I guessed (correctly) that rather than some sort of magical implementation, it was just backed by a resizable array. I decided to read the code and find out.
Posts
No Magic: Regular Expressions, Part 3
The code for this post, as well as the post itself, are on github.
This post is part 3 of a 3 part series.
Part 1: Parsing Part 2: Generate an NFA Part 3: Evaluate an NFA
Evaluating the NFA NFAs, DFAs and Regular Expressions Recall from part 2 that there are two types of finite automata: deterministic and non-deterministic. They have one key difference: A non-deterministic finite automata can have multiple paths out of the same node for the same token as well as paths that can be pursued without consuming input.