Posts

Why Writing a Linked List in (safe) Rust is So Damned Hard

Before I start this post, let me preface it by saying that I’m not an experienced Rustacean by any means. The excellent foks at /r/rust gave a lot of helpful feedback and cleared up some misconceptions I had. Futher errata and corrections are appreciated. This post is aimed at helping other fledgling rust-learners avoid my mistake. First, by helping Rust learners pick good introductory projects that will fit naturally in idiomatic rust.

Posts

Maximize Cache Performance with this One Weird Trick: An Introduction to Cache-Oblivious Data Structures

If you read my recent post about Postgres you may have noted that Postgres operates primarily with fixed-size blocks of memory called “pages.” By default, Postgres pages are 8KB. This number is tuned to match operating system page sizes which are tuned to match hardware cache sizes. If you were to run Postgres on hardware with different cache sizes than Postgres was tuned for, you may be able to pick a better page size.

Posts

Postgres Indexes Under the Hood

Many software engineers use database indexes every day, but few of us really understand how they work. In this post I’ll explain: How indexing works in Postgres using B-Trees What B-Trees are Why they are a good fit for this problem Indexes in Postgres Postgres actually offers 4 different kinds of indexes for different use cases. In this post I’ll be focusing on the “normal” index, the kind you get by default when you run create index.

Posts

My Favorite Algorithm: Linear Time Median Finding

Finding the median in a list seems like a trivial problem, but doing so in linear time turns out to be tricky. In this post I’m going to walk through one of my favorite algorithms, the median-of-medians approach to find the median of a list in deterministic linear time. Although proving that this algorithm runs in linear time is a bit tricky, this post is targeted at readers with only a basic level of algorithmic analysis.

Posts

An Analysis of Hash Map Implementations in Popular Languages

Few data-structures are more ubiquitous in real-world development than the hash table. Nearly every major programming features an implementation in its standard library or built into the runtime. Yet, there is no conclusive best strategy to implement one and the major programming languages diverge widely in their implementations! I did a survey of the Hash map implementations in Go, Python, Ruby, Java, C#, C++, and Scala to compare and contrast how they were implemented.

Posts

No Magic: Regular Expressions, Part 3

The code for this post, as well as the post itself, are on github. This post is part 3 of a 3 part series. Part 1: Parsing Part 2: Generate an NFA Part 3: Evaluate an NFA Evaluating the NFA NFAs, DFAs and Regular Expressions Recall from part 2 that there are two types of finite automata: deterministic and non-deterministic. They have one key difference: A non-deterministic finite automata can have multiple paths out of the same node for the same token as well as paths that can be pursued without consuming input.

Posts

No Magic: Regular Expressions, Part 2

The code for this post, as well as the post itself, are on github. This post is part 2 of a 3 part series. Part 1: Parsing Part 2: Generate an NFA Part 3: Evaluate an NFA Converting the Parse Tree to an NFA In the last post, we transformed the flat string representation of a regular expression to the hierarchical parse tree form. In this post we’ll transform the parse tree to a state machine.

Posts

No Magic: Regular Expressions

The code for this post, as well as the post itself, are on github. This post is part 1 of a 3 part series. Part 1: Parsing Part 2: Generate an NFA Part 3: Evaluate an NFA Until recently, regular expressions seemed magical to me. I never understood how you could determine if a string matched a given regular expression. Now I know! Here’s how I implemented a basic regular expression engine in under 200 lines of code.