# Patricia

Patricia is a kind of prefix tree, enhanced based on Trie. In a trie, every bit(integer trie) or character(alphabetic trie) occupies a tree node. This behaviour may be unefficient when lots of node only have one child, as shown in figure 1. Figure 1: integer trie(little-endian) Figure 2: integer patricia(big-endian)

To make trie more compact, we can merge nodes where a parent node only has one child, as shown in figure 2. In Patricia, we have two kinds of nodes:

• Leaf: contains information like key and value.
• Branch node: contains information like the common prefix of its children.

We will use Integer Patricia to demonstrate how to construct a patricia.

## Integer Patricia

Integer patricia is a kind of patricia, whose keys are integers. Similar to integer trie, integer keys are represented using binaries. For example, 2 is represented as `10`, 5 is represented as `101`.

### Insertion

To insert key value pair in patricia, we need to deal with several cases. Suppose we need to insert key value pairs: (4, 4), (6,6), (5,5), (1,1) sequently.

#### case 1

If the root of patricia is empty, we just initiate a new node with key and value as root node, which is (4,4), as shown in figure 3. Note that the binary representation of 4 is `100`. Figure 3: insert key 4 when the root is empty

#### case 2

When insert key 6, we find that key 4 (binary: 100) and key 6 (binary: 110) has a common prefix `100`, so we create a new branch node with prefix 100 and mask 100, as shown in figure 4.

• prefix: the longest common bits of two keys.
• mask: denotes which bit begin to be different of two keys. It's value is exponential of 2, for example: 10, 100, 1000. All bits lower than mask do not belong to common prefix.

Below is python program that calculate prefix and mask.

```def create_prefix_node(self, key1, key2):
xor = key1 ^ key2
while xor &gt; 0:
xor = xor &gt;&gt; 1
prefix = key1 &amp; (~(mask - 1))

```

To calculate the prefix and mask of 100 and 110, we first perform exclusive or between 100 and 110, which is 10. We can think the highest bit of `xor` denotes the bit where `key1` begin to be different from `key2`. Next, we use the `xor` to calculate `mask`, which is 100. Note that mask always has one more bit than `xor`. Last, we use `mask` and `key1`(100/110) to calculate prefix. The key concept of `key1 &amp; (~(mask - 1))` is that we try to keep bits which are higher than or equal to the highest bit of mask, setting others to 0.

The mask 100 means that key 100 and 110 begin to be different at the second bit position. The second bit of key 100 is 0, whereas 1 for key 110. So the key 100 is branched out to left, key 110 is branched out to right. We can write this mask logic in python:

```def match_left(self, mask, key):
return key &amp; (mask &gt;&gt; 1) == 0
``` Figure 4: insert key 6, generate a new branch node

#### case 3

When inserting key 5, we first need to determine whether 5 matches the prefix 100 with mask 100. The matching logic is the same as the prefix generating logic, which can be expressed in python:

```def match_prefix(self, prefix, mask, key):
return key &amp; (~(mask - 1)) == prefix
```

Since the binary of 5 is `101`, it match the prefix 100 with mask 100.

Next, we need to decide which branch(0 or 1) should we branch key 5. We can use the `match_left` function above, which will return True. So 5 will be branched to left.

Finally, we need to calculate the common prefix of 4 and 5, creating a new branch node with prefix 100 and mask 10. Figure 5: insert key 5

#### case 4

The last case is inserting key 1. We first need to check whether 1 match the prefix 100 with mask 100. Obviously, 1 do not match prefix 100, so we need to recalculate a new prefix for key 1 and prefix 100, which yields 0 with mask 1000, and then create a new branch node using the new prefix and mask, as shown in figure 6. figure 6: insert key 1

To search a key in patrica, we begin at root node, and there are 2 cases.

#### case 1

If current node is a leaf, we just compare the key in leaf with search key. If they match, we return the value stored in leaf. Otherwise, we terminate the search process.

#### case 2

If current node is branch node, we need to determine whether the search key match the prefix of branch node. If match, we pass the search key to child of branch node and continue to search its child(left or right child, determined by `match_left` method). Otherwise, we terminate the search process.

The search process can be expressed in python:

```def search(self, key):
if not self.root:
return None

current = self.root
while not self.is_leaf(current) and self.match_prefix(current.prefix, current.mask, key):
current = current.left if self.match_left(current.mask, key) else current.right

if self.is_leaf(current) and current.key == key:
return current.value

return None
```

### Full Python Implementation

```class IntegerPatriciaNode(object):

def __init__(self, key=None, value=None, left=None, right=None, prefix=None, mask=None):
self.key = key
self.value = value
self.left = left
self.right = right
self.prefix = prefix
if self.prefix is not None:
self.is_leaf = False
else:
self.is_leaf = True

def __unicode__(self):
return u'key:%s, prefix:%s, mask:%s' % (self.key, self.prefix, self.mask)

class IntegerPatricia(object):

def __init__(self):
self.root = None

def is_leaf(self, node):
return node.is_leaf

def match_prefix(self, prefix, mask, key):
return key &amp; (~(mask - 1)) == prefix

def match_left(self, mask, key):
return key &amp; (mask &gt;&gt; 1) == 0

def create_prefix_node(self, key1, key2):
xor = key1 ^ key2
while xor &gt; 0:
xor = xor &gt;&gt; 1
prefix = key1 &amp; (~(mask - 1))

def insert(self, key, value):
assert isinstance(key, int)
if not self.root:
self.root = IntegerPatriciaNode(key, value)
return

parent = None
current = self.root
while not self.is_leaf(current) and self.match_prefix(current.prefix, current.mask, key):
parent = current
current = current.left if self.match_left(current.mask, key) else current.right

if self.is_leaf(current) and current.key == key:
current.value = value
return

tmp_key = current.key if self.is_leaf(current) else current.prefix
prefix_node, mask = self.create_prefix_node(tmp_key, key)
prefix_node.left = IntegerPatriciaNode(key, value)
prefix_node.right = current
else:
prefix_node.right = IntegerPatriciaNode(key, value)
prefix_node.left = current

# update root node
if not parent:
self.root = prefix_node
else:
# relink to parent
if parent.left == current:
parent.left = prefix_node
else:
parent.right = prefix_node

def search(self, key):
if not self.root:
return None

current = self.root
while not self.is_leaf(current) and self.match_prefix(current.prefix, current.mask, key):
current = current.left if self.match_left(current.mask, key) else current.right

if self.is_leaf(current) and current.key == key:
return current.value

return None
```
Current rating: 5