Bioinformatician Resume
Boston, MA
Objective To secure a full time position in Bioinformatics .
Education
Masters of Science in Bioinformatics
Recipient of Graduate Student scholarship
Masters of Science in Genetics
Bachelors of Science in Microbiology, Genetics, Chemistry
Recipient of scholarship
Computer Skills
Operating System: Windows XP, Windows Vista, Mac, Linux, Unix
Specialities: Next Generation Sequencing, PLINKSEQ, PLINK, KING, Annovar, dbSNP, HapMap, UCSC genome browser, Circos visualization tool, NCBI, Pubmed, Genomic data formats like VCF, GVF, BED,etc. DBMS,MS Office Suite, Perl, BioPerl, Python, Bash, Java, Amazon Web Services, PostgreSQL, MySQL, HTML, XML.
Project
Pathway Analysis of P.aeruginosa in Cystic Fibrosis Patients
Worked in Prof. Kim Lewis‘s lab AntiMicrobial Discovery Centre at Northeastern University. Researched various data visualization tools. Utilized kegg tools and Prodonet 9.12 to obtain the list of metabolic pathways for the high persister mutants from eight different CF patients. Drawn comparisons to analyze the SNPs that are involved in the same metabolic subpathway in the overall pathway.Studied gene-gene interactions in Prodonet and found that GlyA1, GcvT2, GuaB, PA0065, PurF, PykA were essentially contributing to this disorder.
Work Experience
Confidential, Boston,MA
Bioinformatician Feb2012-Present
- Displayed Copy Number Variation data ploidy type ,ploidy score and CNV scores plots for four different CNV data files(data summarized on Complete Genomics platform) utilizing Circos genome visualization tool.
- Developed a Perl converter to convert GVF to VCF format for the individual hg19 chromosome fasta files downloaded from the UCSC genome browser .
- Developed a Perl converter to convert GVF file format to VCF format for the hg19 whole genome as a reference.
- Developed a Perl parser to split a large VCF file having autism data for 411 individuals to smaller VCF files for each of these individuals. Used Annovar to analyze these genomes files.
- Developed a Perl script to generate the flanking sequences surrounding all the types of variants of 17 individuals of Utah family sequenced on three different platforms 17 CGI hg19, 17 CGI hg18 and 17 Illumina.
- Developed a Perl script to count the flanking sequences that are present multiple times in the entire list of 7 mers to generate the output having each line as 7mer and count of it in the individual. Accomplished this for 17 individuals for three platforms CGI hg19, CGI hg18 and Illumina
- Wrote a Perl script to get the top 15 or top n most frequent 7mers(flanks + SNPs) for 51 genomes.
- Using Perl, applied Quality score threshold of >= 20 to the SNPs sequenced in Illumina platform for 17 Utah family genomes. Obtained the list of overlapping and non overlapping variants between CGI hg19 and Illumina files, flanking sequences, their counts and top 15 most frequent polymers for 17 individuals with and without Quality score threshold for SNPs.
- Wrote a Perl script to generate all possible combinations of 7mers in the genome that is 16384 strings. Compared the list of overlapping SNPs , non overlapping or CGI specific SNPs and Illumina specific SNPs (without quality score threshold) to the list of 16384 patterns to obtain three tables having the 16384 strings as first column and 17 columns each having their counts per individual using Perl. Obtained top 20 most frequent from each table to draw a comparison between two platforms .
- Developed a Perl script that gives the counts of each of 16384 7mers or patterns from the human genome hg19 as reference.
- Developed a Python script to compare the quality scores columns of overlapped and non overlapped variants GVF files for both CGI+Illumina and Illumina+CGI directions of comparison.
- Developed a shell script to automate multiple command line runs of gsearch for 17 individuals .A single command line run takes as a input CGI GVF file and reference as Illumina GVF file and vice versa for each individual to obtain the list of overlapping and non overlapping variants which is time consuming.
- Developed a shell script to automate multiple runs of generation of Average Quality scores from the GVF file column for the overlapping and non overlapping list of variants in two directions of comparison (CGI+Illumina and Illumina+CGI).
- Automated multiple command line runs of gsearch for 17 individuals to obtain the list of overlapping variants. Wrote a shell script to automate 11 trio analysis using CGI+ Illumina overlapped files for each child, mother and father in one trio to obtain the de novo mutations of 11 children. Obtained around 68000 de novo mutations for all the 11 kids.
- Wrote a shell script to separate the de novo mutations of 11 children of Utah family into homozygous and heterozygous variants.
- Wrote a Perl script to get the numeric comparison of Quality scores between overlapped and non overlapped total variants.
- Used another method of De novo mutation finding by getting the de novo mutations of each trio in CGI and Illumina platforms separately and then obtained the overlap among DNM CGI and DNM Illumina. Found that this method reduced the number of de novo mutations to around 4700.
- Developed a shell script to automate generating average quality scores for total de novo files, homozygous de novo and heterozygous de novo files.
- Filtered the total De novo Mutations of 11 children based on Sex chromosomes X and Y,X homozygous and Y homozygous De novo mutations using a Perl script. Found that there are zero heterozygous de novo Y variants for all the males.
- Obtained two plots in R . First plot shows the ratio of CGI+Illumina overlapped variants number to total number of CGI variants for individual 12879 of Utah family on Y-axis and ceiling(Quality score/10 ) on X-axis. Second plot has the ratio of Illumina+CGI overlapped variants count to the total number of Illumina variants for individual 12879 on Y-axis and range of quality score that is ceiling(Quality score/10) on X-axis.
- Obtained the top 15 sequence contexts for the individual 12879 for 6 classes: 1. CGI+Illumina overlapped all variants 2. CGI+Illumina overlap DNM 3. CGI specific overlapped all variants 4. CGI specific overlapped DNMs 5. Illumina specific overlapped all variants 6. Illumina specific overlapped DNMs.
- Automated getting the transition to transversion ratio(ts/tv) for six types of variant files (1. CGI+Illumina overlapped files 2.Unmatched CGI+Illumina files 3. Unmatched Illumina+ CGI files 4. Denovo CGI+Illumina overlapped files 5. Unmatched De novo CGI+Illumina files 6. Unmatched De novo Illumina+CGI files )for all 17 members of Utah.
- Developed a shell script to obtain Average Quality scores of six types of variants (1. CGI+Illumina overlapped all variants 2. CGI specific all variants 3. Illumina specific all variants 4. CGI+Illumina overlapped De novo 5. CGI specific de novo 6. Illumina specific De novo for all the 17 individuals of Utah simultaneously.
- Developed a Perl script to obtain the longest length of poly A, poly T , poly G, poly C, the homopolymer and chromosome on which they are found from hg19 as reference.
- Converted homopolymer file in text format to BED format . Using gsearch with reference file as Gene Annotation hg19 reference file and input as longest homopolymer’s chr, start and end coordinates in BED format, obtained the gene annotations of all the six longest homopolymers of human genome. Found that two poly C (in chromosomes 12 and 4), poly G on chromosome 20 are found in introns.
- Poly C on different coordinates in hg19 chromosome 4, poly T on chromosome 7, poly A on chromosome 5 did not have any annotations and found in intergenic regions of the genome.
- Utilized tRNAscan-SE 1.3.1 locally to scan the human genome for true tRNAs. Wrote a Perl script to match the chr, start, end of tRNA file with CG 12877 original file. Found no matches in the t RNA coordinates and individual 12877’s coordinates and no sequences falling in the t-RNA genomic regions.
Confidential, Boston, MA
Bioinformatics Research Co-op Summer 2011
- Researched on how bad are the consequences of missense variations in causing disease. Obtained Condel scores from Ensembl variation database for a list of variants.
- Accomplished analysis of whole genome utilizing PLINK for Identity by descent and Inbreeding coefficients writing Perl scripts.
- Accomplished GWAS using King tool for getting kinship scores. Drawn comparison between PLINK and King tools and found that the King tool is more efficient.
- Wrote Perl script to obtain the list of genes that are least probable in different types of statistical tests performed for updating company’s tools.
- Wrote Perl script to obtain Phastcons scores for a list of variants matching with those in the UCSC genome browser.Wrote Perl script to get the variant sites overlapping with the allele coordinates of miRNA’s downloaded from miRBase database .
- Wrote Perl script to parse out the number of drug responders and non responders with one or two variants for each gene in a sorted list of genes with different patterns of variants for 25 responders and 25 non responders respectively.
- Automated Pubmed searches by writing a Perl script.
- Developed Circos genome data visualization tool for Cancer utilizing JavaFX in Netbeans IDE. Also, queried company’s internal database to obtain 1000 Genomes genotypes using Perl.
