As shown in the figure, this relationship seemed to follow two distributions, one is a linear correlation represented by the line in Additional file 5b and the other is a group with no more than ten clusters that retain more than 50% of the total sequence reads.