Find my SEQ

The early screening of existing and novel genetic mutations is indispensable for the diagnosis and treatment of genetic diseases. This research project aims to design an algorithm which is capable of prioritizing the deleterious versus non-deleterious mutations for Alzheimer's disease (AD) based on data collected from genome-wide association studies (GWAS). One of the major parameters of this algorithm is to understand the deoxyribonucleic acid nucleotide (DNA) sequence patterns around the GWAS- significant single nucleotide polymorphisms (SNPs). The identification of motifs with higher or lower frequency in the DNA sequences surrounding GWAS-significant SNPs, compared to other GWAS- interrogated SNPs that have not come up as significant in any GWAS to date, may reveal DNA motif(s) linked with AD (or any other disease to which it is applied).
In order to attain this goal, I designed free online software to perform (1) massive number extractions of DNA sequences from the human genome (February 2009, GRCh37/hg19 assembly from the UCSC Genome Browser), and (2) pattern or motif searching through the input set of sequences. The two programs run independently under the name of the “Find My SEQ” web interface, which is available on the localhost of the Scientific Cluster Computing (SCC) at the Centre for Addiction and Mental Health (CAMH), Toronto. “Find My SEQ” is completely coded in java and powered by a php web interface, which allows the user to submit jobs to “SEQ Extract” (which performs DNA sequence extraction), and “Motif Analysis” (to count the occurrence of all possible motifs of a given window size in the input sequences). The development of this tool has proved to aid in the efficient extraction of DNA sequences from the human genome, and to illuminate patterns within regions of interest through the identification of motifs with a high or lower frequency in the data input.

Project date: December 2014 - May 2015

“VIT        “camh