The original version of Sailfish, an RNA-Seq quantification tool, used minimal perfect hash functions to replace k-mers with unique integers. (The current version appears to be using a Cuckoo hashmap instead.)
This is my attempt to explain how a minimal perfect hash function could be built. The algorithm described here is not exactly the same as the one Sailfish used, but it follows the same idea.
Sections:
Sailfish and perfect hashing (1:15) Perfect hashing based on binary search or hash tables (4:34) Random hash functions (7:34) Perfect hash function based on an acyclic graph (12:16)Links:
The Sailfish paper The paper describing the perfect hashing algorithm The birthday paradox Integrative Biology & Medicine
Create your
podcast in
minutes
It is Free