Senior/principal Data Scientist, Team Lead Resume
SUMMARY:
- Over 15 years of hands on experience and leadership in bioinformatics, genomics, big data management, analysis pipelines, interpretation and application development in academic
- Adaptable personality in a hybrid environment; able to focus, set priorities, yet maintain flexibly in changing environments; proficient in implementing appropriate project management methodologies; dynamic detailed oriented team player with strong interpersonal, communications, planning and execution skills
COMPUTATIONAL SKILLS (BIOINFORMATICS):
Big Data Analysis: Association Studies, Population Stratification, Cohort Analyses; GWAS and Meta - Analyses, Natural Language Processing, Data Mining, Information Extraction, Ontology, Semantic Technology
Next Generation Sequencing: BamTools, BamUtil, BCFTools, BedTools, BFast, Bowtie, BWA, Casava, CGATools, CoNIFER, Cufflinks, DeFuse, FastQC, FusionSeq, FusiopnMap, Galaxy, GATK, IGV, MACS, Megan, miRanda, MuTect, MutSig, MutSigCV, Phred, Picard, Pindel, SAMtools, TopHat, VarScan, VarSifter, VCFTools
Databases: gatekeeper for 9 datasets hosted by NCBI linked to publications; 1000 Genomes, COSMIC, UCSC, Oncomine, TCGA, ICGC, CaSNP, dbSNP, dbVar, ClinVar, dbGaP, ExAC…
Sequence Analysis: Annovar, Blast, ClustalW, Emboss, Fasta, PolyPhen-2, Sift, Torrent Variant Caller…
Linkage and Phylogenetic Analysis: Madeline, BEAST, Merlin, LAMP, MACH, Plink, Metal…
Microarray Analysis: Affymetrix Power Tools, Matlab, BRB - Array Tools, Bioconductor and R
Systems Biology, Network and Pathway Analysis: Ingenuity Pathways Analysis, Pathway Studio, GeneGO, Cytoscape and its plugins, Spotfire, Gorilla, DAVID, Reactome, KEGG…
Off-the-shelf software: IPA, GeneGO, Pathway Studio, Partek, Genespring, Genomatix, CLCBio…
COMPUTATIONAL SKILLS (PROGRAMMING AND DATA SCIENCE):
Programming: R & Bioconductor, Python, Perl, Matlab, Fortran, C|C++
Relational databases: MySQL, PostgreSQL
Machine Learning and Deep Learning: Probabilistic Graphical Models, Clustering and Classification, Neural Networks, Hidden Markov Models; Message Passing Algorithms, Belief Propagation (PhD Thesis)
Agile Methodology: configure and development processes, standards and procedures using JIRA and software development methodologies including Agile and Scrum
PROFESSIONAL PROFILE:
Senior/Principal Data Scientist, Team Lead
Confidential
Responsibilities:
- Team performance management and code/project review using Github and JIRA; generate documentations and participate in requirements, design and traceability review; manage one direct report
- Data analytics, complex statistical score modeling and analysis projects with large and complex data structure
- Analysis, design, development test, troubleshooting and documentation of complex data systems using predictive models, data-driven analysis and application of machine learning and deep learning algorithms
Senior Manager R&D Bioinformatics
Confidential, Carlsbad, CA
Responsibilities:
- Reported on delivery status and progress to the GM/VP and key stakeholder throughout the course of the projects; worked with key stakeholders to prioritize bioinformatics services to best meet customer needs
- Worked together with other R&D teams to build leading edge genomics diagnostic products and web-based software to support internal process automation; built strong relationships with departmental leaders throughout the company, and contribute to leadership decisions
- Mentored and managed onsite and remotely a group of five scientists; developed staffing plans and oversaw recruitments as needed
- Identified the costs and benefits of third party software tools versus in-house implementations; integrated commonly used open source bioinformatics software applications together with off-the-shelf tools for variant annotation and interpretation
- Oversaw parts of execution of Bioinformatics and IT infrastructure for national-scale level projects including Saudi Genome Project, Kuwaiti Genome Project, Stratified Medicine Scotland and Million Veterans Program
- Implemented and supported gene-panel, exome, and whole genome re-sequencing pipelines; supported bioinformatics analysis of mapping, detection, annotation and interpretation of variants
- Implemented parts of the execution of population-scale genomic medicine projects performing large-scale association studies to identify disease-causing genes and genetic risk factors; worked with clinicians and software developers in designing, analyzing and preparing customized clinical reports
Senior Bioinformatics Scientist
Confidential, Rockville, MD
Responsibilities:
- Provided scientific and technical leadership to an offshore development team of four software developers and one biologist using Scrum/Agile software methodology
- De novo analysis for Jute genome; developed new database of predict orthologs and metabolic pathways (R)
- Developed a standalone module for Variant Calling analysis including easy-to-use customizable workflows for identifying variants from various public databases
- Developed a Genomic Data Repository automatic pipeline to provide Pathway Studio users access to freely available microarray data for clinical and pathway information for disease studies (R/Bioconductor)
Senior Bioinformatics Fellow
Confidential, Bethesda, MD
Responsibilities:
- Served as a liaison member between Confidential labs and Information Technology division to provide IT solutions in accordance to Confidential regulations; designated as essential personnel during potential furloughs
- Participated in recruiting students and post-docs; mentor post baccalaureate students; trained and taught graduate students and postdoc in analyzing and interpreting various genomic data
- Coordinated teams of postdoc, bench scientists and post-baccalaureate students to ensure solid bioinformatics results delivered in a timely matter
- Reorganized data handling and acquisition for a lab of +50 people that made possible sharing the entire collection of genomics data (including terabyte scale data generators like Illumina GA IIx sequencer) within the lab and also with other members of the scientific community; administrator for in-house databases
- Initiated collaborations with investigators across Confidential campus and universities resulting in more than 30 projects that led to 10 publications in top peer-review journals
- Set up pipelines and performed statistical analysis for microarray and Next Generation Sequence data from various platforms including Illumina, 454, Ion Torrent, Affymetryx and Agilent
- Data Analysis: RNA-seq, ChIP-seq, DNA-seq, Exome-seq, miRNA-seq, using open source (Perl, Python, R/Bioconductor, Helix/Biowulf cluster) and of-the shelf tools (Partek, Genespring, CLCBio, Genomatix)
- Active hands-on member of High Performance Computing (Helix Systems) and the Confidential Biowulf cluster; the first bioinformatics fellow at Confidential that had administrative privileges on Confidential computers, Linux installed on Confidential computers and a Biowulf/Helix account
Postdoc Bioinformatics
Confidential, Ann Arbor, MI
Responsibilities:
- Determined patterns between genes using flow sorted microarray data from retina by reducing the complex network of gene interactions to a topological model using network flows algorithms (Python, Matlab, C++)
- Worked on gene co-expression network methods to explore the systems-level functionality of genes; key disease genes were identified and validated (Matlab, R/Bioconductor)
Research Assistant
Confidential, East Lansing, MI
Responsibilities:
- Machine learning: (i) developed a message passing algorithms to approximate inference problem based on passing local messages; (ii) worked on probabilistic graphical models of SAT on a tree (C++, Perl and Python)
- Deep Learning: theoretical calculations for Standard Belief Propagation for graphical models with loops; applications for Pairwise Random Markov Fields
- Combinatorial optimization algorithms and their applications to disordered systems; developed novel algorithms for solving approximately NP-C hard problems like Satisfiability (SAT) and Coloring
- Numerical simulations on statistical models with and without quenched disorder using the mean field approximation (C++, Fortran); Molecular Dynamics, Monte Carlo, Simulated Annealing (Fortran)
- Systems approach for genomic data: worked on weighted gene networks and pathways reconstruction using minimum spanning tree and network flow methods like MinCut - MaxFlow algorithms (Matlab, Python)
Staff Scientist
Confidential
Responsibilities:
- Developed a database for the Institute’s library, performed numerical calculations (Fortran77) and crystal growth simulation; taught introductory programming classes to physics major students
