iRO-PsekGCC: identify DNA replication origins based on Pseudo k-tuple GC Composition |
The benchmark dataset for Saccharomyces cescerevisiae. It contains 340 replication origins (positive samples), and 342 non-replication origins (negative samples). None of the sequences included has ≥ 80% pairwise sequence identity with any other in a same subset.
The benchmark dataset for Pichia pastoris. It contains 305 replication origins (positive samples), and 302 non-replication origins (negative samples). None of the sequences included has ≥ 80% pairwise sequence identity with any other in a same subset. See Eq.1 for further explanation.