De Bruijn Graph Algorithm

De Bruijn Graph Algorithm

1. K-mers

k-mers: A substring with length k, where k is a postive interger.
For example:

String: ATCGAT
4-mers: ATCG 
         TCGA
          CGAT

2. De Bruijn Graph

A De Bruijn Graph is a directed graph that represents the overlap between strings. To generate a De Bruijn Graph, two inputs are required:

  • reads: a list of strings, the target of assembly.
  • k: a positive interger, it is the k-mer size used in the De Bruijn Graph

3. De Bruijn Graph Generation Example

Here is an example of the De Bruijn Graph generation, given the follow inputs.

reads = [ABCDEF, BCDEFG]
k = 4

Step 1
    Copmuter 4-mers from every read.
Step 2
    Spilt each 4-mer to a left 3-mer and a right 3-mer

Step 3
    Draw a directed edge from each left 3-mer to the corresponding right 3-mer.
Step 4
    Construct the De Bruijn Graph by merging every node that have the same 3-mer.