Objective To secure a full time position in Bioinformatics .
Masters of Science in Bioinformatics
Operating System: Windows XP, Windows Vista, Mac, Linux, Unix
Specialities: Next Generation Sequencing, PLINKSEQ, PLINK, KING, Annovar, dbSNP, HapMap, UCSC genome browser, Circos visualization tool, NCBI, Pubmed, Genomic data formats like VCF, GVF, BED,etc.
DBMS,MS Office Suite, Perl, BioPerl, Python, Bash, Java, PostgreSQL, MySQL, HTML, XML, R
- Displayed Copy Number Variation data ploidy type ,ploidy score and CNV scores plots for four different CNV data files(data summarized on Complete Genomics platform) utilizing Circos genome visualization tool.
- Developed a Perl converter to convert GVF to VCF format for the individual hg19 chromosome fasta files downloaded from the UCSC genome browser .
- Developed a Perl converter to convert GVF file format to VCF format for the hg19 whole genome as a reference.
- Developed a Perl parser to split a large VCF file having autism data for 411 individuals to smaller VCF files for each of these individuals. Used Annovar to analyze these genomes files.
- Developed a Perl script to generate the flanking sequences surrounding all the types of variants of 17 individuals of Utah family sequenced on three different platforms 17 CGI hg19, 17 CGI hg18 and 17 Illumina.
- Developed a Perl script to count the flanking sequences that are present multiple times in the entire list of 7 mers to generate the output having each line as 7mer and count of it in the individual. Accomplished this for 17 individuals for three platforms CGI hg19, CGI hg18 and Illumina
- Wrote a Perl script to get the top 15 or top n most frequent 7mers(flanks + SNPs) for 51 genomes.
- Using Perl, applied Quality score threshold of >= 20 to the SNPs sequenced in Illumina platform for 17 Utah family genomes. Obtained the list of overlapping and non overlapping variants between CGI hg19 and Illumina files, flanking sequences, their counts and top 15 most frequent polymers for 17 individuals with and without Quality score threshold for SNPs.
- Wrote a Perl script to generate all possible combinations of 7mers in the genome that is 16384 strings. Compared the list of overlapping SNPs , non overlapping or CGI specific SNPs and Illumina specific SNPs (without quality score threshold) to the list of 16384 patterns to obtain three tables having the 16384 strings as first column and 17 columns each having their counts per individual using Perl. Obtained top 20 most frequent from each table to draw a comparison between two platforms .
- Developed a Perl script that gives the counts of each of 16384 7mers or patterns from the human genome hg19 as reference.
- Developed a Python script to compare the quality scores columns of overlapped and non overlapped variants GVF files for both CGI+Illumina and Illumina+CGI directions of comparison.
- Developed a shell script to automate multiple command line runs of gsearch for 17 individuals .A single command line run takes as a input CGI GVF file and reference as Illumina GVF file and vice versa for each individual to obtain the list of overlapping and non overlapping variants which is time consuming.
- Developed a shell script to automate multiple runs of generation of Average Quality scores from the GVF file column for the overlapping and non overlapping list of variants in two directions of comparison (CGI+Illumina and Illumina+CGI).
Confidential, Boston, MA
Bioinformatics Research Co-op Summer 2011
- Researched on how bad are the consequences of missense variations in causing disease. Obtained Condel scores from Ensembl variation database for a list of variants.
- Accomplished analysis of whole genome utilizing PLINK for Identity by descent and Inbreeding coefficients writing Perl scripts.
- Accomplished GWAS using King tool for getting kinship scores. Drawn comparison between PLINK and King tools and found that the King tool is more efficient.
- Wrote Perl script to obtain the list of genes that are least probable in different types of statistical tests performed for updating company’s tools.
- Wrote Perl script to obtain Phastcons scores for a list of variants matching with those in the UCSC genome browser.Wrote Perl script to get the variant sites overlapping with the allele coordinates of miRNA’s downloaded from miRBase database .
- Wrote Perl script to parse out the number of drug responders and non responders with one or two variants for each gene in a sorted list of genes with different patterns of variants for 25 responders and 25 non responders respectively.
- Automated Pubmed searches by writing a Perl script.
- Developed Circos genome data visualization tool for Cancer utilizing JavaFX in Netbeans IDE. Also, queried company’s internal database to obtain 1000 Genomes genotypes using Perl.