A simple, space-efficient algorithm for detecting start and stop codons in DNA
Schoeller, Scott J.
University of Wisconsin - Whitewater
MetadataShow full item record
The brute force algorithm for string matching has a worst-case execution complexity of O(MN), where N is the length of the entire string and M is the length of the substring to be matched. The efficiency of brute-force matching is insufficient for biological applications due to the sheer size of bioinformatics datasets. This thesis describes an algorithm that uses bit compression techniques to represent multiple codons in a single Python integer. It was proven that with exact matching steps, the search algorithm is as accurate as brute-force matching with significantly higher efficiency, as this algorithm runs in O(N) time and space, with minimal memory overhead. While the bit-based algorithm is not more space-efficient in a theoretical sense, it is more efficient practically as we have observed significant memory savings compared to the size of the original string.
This file was last viewed in Adobe Acrobat Pro.