We provide IT Staff Augmentation Services!

Data Scientist Resume

San Francisco, CA


  • Over 8+ years of Experience in Data Mining, Data Modeling, Machine Learning, Statistics, Big Data Technologies, Data Warehousing, Data Analysis and Testing of business application systems, DataAnalysis and developing Conceptual, logical models and physical database design for Online Transactional processing (OLTP) and Online Analytical Processing (OLAP) systems.
  • Experienced working with data modeling tools like Erwin, Power Designer and ER Studio.
  • Experienced in designing star schema, Snowflake schema for Data Warehouse, and ODS architecture.
  • Experienced in Data Modeling &Data Analysis experience using Dimensional Data Modeling and Relational Data Modeling, Star Schema/Snowflake Modeling, FACT & Dimensions tables, Physical & Logical Data Modeling.
  • Experienced in big data analysis and developing data models using Hive, PIG, and Map reduce, SQL with strong data architecting skills designing data - centric solutions.
  • Experienced inDataProfiling, Analysis by following and applying appropriate database standards and processes, in definition and design of enterprise businessdatahierarchies.
  • Hands on experience with big data tools like Hadoop, Spark, Hive, Pig, Impala, Pyspark, SparkSql.
  • Very good knowledge and experience on AWS, Redshift, S3 and EMR.
  • Proficient withDataAnalysis, mapping source and target systems fordatamigration efforts and resolving issues relating todatamigration.
  • Excellent development experience SQL, Procedural Language (PL) of databases like Oracle, Teradata, Netezza and DB2
  • Very good knowledge and working experience on bigdatatools like Hadoop, AzureDataLake, AWSRedshift.
  • Experienced in Data Scrubbing/Cleansing, Data Quality, Data Mapping, Data Profiling, Data Validation in ETL
  • Experienced in creating and documenting Metadata for OLTP and OLAP when designing a system.
  • Expertise in managing entire data science project life cycle and actively involved in all the phases of project life cycle including data acquisition, data cleaning, data engineering, features scaling, features engineering, statistical modeling (decision trees, regression models, neural networks, SVM, clustering), dimensionality reduction using Principal Component Analysis and Factor Analysis, testing and validation using ROC plot, K-fold cross validation and data visualization.
  • Excellent Knowledge of Ralph Kimball and BillInmon's approaches toDataWarehousing.
  • Expertise in synthesizing Machine learning, Predictive Analytics and Big data technologies into integrated solutions.
  • Extensive experienced in working with structureddatausingHiveQL, join operations, writing custom UDF's and experienced in optimizingHiveQueries.
  • Extensive experience in development of T-SQL, DTS, OLAP, PL/SQL, Stored Procedures, Triggers, Functions, Packages, performance tuning and optimization for business logic implementation.
  • Experience in using various packages in Rand python like ggplot2, caret, dplyr, Rweka, gmodels, RCurl, tm, C50, Confidential, NLP, Reshape2, rjson, plyr, pandas, numpy, seaborn, scipy, matplotlib, scikit-learn, Beautiful Soup, Rpy2.
  • Experienced using query tools like SQL Developer, PLSQL Developer, and Teradata SQL Assistant.
  • Excellent in performing data transfer activities between SAS and various databases and data file formats like XLS, CSV,DBF,MDB etc.
  • Excellent knowledge of Machine Learning, Mathematical Modeling and Operations Research. Comfortable with R, Python, SAS and Weka, MATLAB, Relational databases. Deep understanding & exposure of BigDataEco-system.
  • Expertise in designing complex Mappings and have expertise in performance tuning and slowly-changing Dimension Tables and Fact tables
  • Extensively worked with Teradata utilities BTEQ, Fast export, and Multi Load to export and load data to/from different source systems including flat files.
  • Hands on experience in implementing LDA, Naive Bayes and skilled in Random Forests, Decision Trees, Linear and Logistic Regression, SVM, Clustering, neural networks, Principle Component Analysis.
  • Expertise in extracting, transforming and loading data between homogeneous and heterogeneous systems like SQL Server, Oracle, DB2, MS Access, Excel, Flat File and etc. usingSSISpackages.
  • Proficient in System Analysis, ER/DimensionalDataModeling, Database design and implementing RDBMS specific features.
  • Experience in UNIX shell scripting, Perl scripting and automation of ETL Processes.
  • Extensively used ETL to load data using Power Center / Power Exchange from source systems like Flat Files and Excel Files into staging tables and load the data into the target database Oracle. Analyzed the existing systems and made a Feasibility Study.
  • Strong experience and knowledge in Data Visualization with Tableau creating: Line and scatter plots, Bar Charts, Histograms, Pie chart, Dot charts, Box plots, Time series, Error Bars, Multiple Charts types, Multiple Axes, subplots etc.
  • Excellent understanding and working experience of industry standard methodologies like System Development Life Cycle (SDLC), as per Rational Unified Process (RUP), AGILE Methodologies.
  • Experience in source systems analysis and data extraction from various sources like Flat files, Oracle 12c/11g/10g/9i IBM DB2 UDB, XML files.
  • Proficiency in SQL across a number of dialects (we commonly write MySQL, Postgre SQL,Redshift, SQL Server, and Oracle)
  • Experienced in developing Entity-Relationship diagrams and modeling Transactional Databases and DataWarehouse using tools like ERWIN, ER/Studio and Power Designer and experienced with modeling using ERWIN in both forward and reverse engineering cases.


Data Analytics Tools/Programming: Python (numpy, scipy, pandas, Gensim, Keras), R (Caret, Weka, ggplot), MATLAB, Microsoft SQL Server, Oracle PLSQL, Python.

Analysis &Modelling Tools: Erwin, Sybase Power Designer, Oracle Designer, Erwin, Rational Rose, ER/Studio, TOAD, MS Visio, SAS.

Data Visualization: Tableau, Visualization packages, Microsoft Excel.

Big Data Tools: Hadoop, MapReduce, SQOOP, Pig, Hive, NOSQL, Cassandra, MongoDB, Spark, Scala.

ETL Tools: Informatica Power Centre, Data Stage 7.5, Ab Initio, Talend.

OLAP Tools: MS SQL Analysis Manager, DB2 OLAP, Cognos Power-play.

Languages: SQL, PL/SQL, T-SQL, XML, HTML, UNIX Shell Scripting, C, C++, AWK, JavaScript.

Databases: Oracle12c/11g/10g/9i/8i/8.0/7.x, Teradata14.0,DB2 UDB 8.1, MS SQL Server 2008/2005, Netezaa 4.0 and Sybase ASE 12.5.3/15,Informix 9, AWS RDS.

Project Execution Methodologies: Ralph Kimball and Bill Inmon data warehousing methodology, Rational Unified Process (RUP), Rapid Application Development (RAD), Joint Application Development (JAD).

Tools: & Software: TOAD, MS Office, BTEQ, Teradata SQL Assistant.

Methodologies: Ralph Kimball, COBOL.

Reporting Tools: Business ObjectsXIR 2/6.5/5.0/5.1 , Cognos Impromptu 7.0/6.0/5.0, Informatica Analytics Delivery Platform, Micro Strategy, SSRS, Tableau.

Tools: MS-Office suite (Word, Excel, MS Project and Outlook), VSS.

Programming Languages: SQL, T-SQL, Base SAS and SAS/SQL, HTML, XML.

Operating Systems: Windows 2007/8, UNIX (Sun-Solaris, HP-UX), Windows NT/XP/Vista, MSDOS.


Confidential, San Francisco, CA

Data Scientist


  • Design, Develop and implement Comprehensive Data Warehouse Solution to extract, clean, transfer, load and manage quality/accuracy ofdatafrom various sources to EDW EnterpriseData Warehouse.
  • Architectframework for data warehouse solutions to bringdatafrom source system to EDW and providedatamart solutions for Order/Sales operation, Salesforce activity, Inventory tracking, in depthdatamining and analysis for market projection etc.
  • Utilized Apache Spark with Python to develop and execute Big Data Analytics and Machine learning applications, executed machine learning use cases under Spark ML and Mllib.
  • Developed and configured on Informatica MDM hub supports the MasterDataManagement (MDM), Business Intelligence (BI) andDataWarehousing platforms to meet business needs.
  • Transforming staging areadatainto a STAR schema (hosted on Amazon Redshift) which was then used for developing embedded Tableau dashboards
  • Proficiency in SQL across a number of dialects (we commonly write MySQL, PostgreSQL, Redshift, Teradata, and Oracle)
  • Responsible for full data loads from production to AWSRedshift staging environment and Worked on migrating of EDW to AWS using EMR and various other technologies.
  • Application of various machine learning algorithms and statistical modeling like decision trees, regression models, neural networks, SVM, clustering to identify Volume using scikit-learn package in python, Matlab.
  • Worked ondatapre-processing and cleaning thedatato perform feature engineering and performeddataimputation techniques for the missing values in the dataset using Python.
  • Created Data Quality Scripts using SQL and Hive to validate successful data load and quality of the data. Created various types of data visualizations using Python and Tableau.
  • Worked with BTEQ to submit SQL statements, import and exportdata, and generate reports in Teradata.
  • Built analytical data pipelines to portdatain and out of Hadoop/HDFS from structured and unstructured sources and designed and implemented system architecture for Amazon EC2 based cloud-hosted solution for client.

Environment: Erwin9.6.4, Oracle 12c, Python, Pyspark, Spark, Spark MLLib, Tableau, ODS, PL/SQL, OLAP, OLTP, Python, MDM, Teradata 15, Hadoop, Spark, Cassandra, SAP, MS Excel, Flat files, Informatica, SSIS, SSRS.

Confidential - San Francisco, CA

Data Scientist


  • Create newdatadesigns and make sure they fall within the realm of the overall Enterprise BI Architecture and Building relationships and trust with key stakeholders to support program delivery and adoption of enterprise architecture.
  • Used R, SQL to create Statistical algorithms involving Multivariate Regression, Linear Regression, Logistic Regression, PCA, Random forest models, Decision trees, Support Vector Machine for estimating the risks.
  • Used R for Exploratory Data Analysis, A/B testing, Anova test and Hypothesis test to compare and identify the effectiveness of Creative Campaigns.
  • Developed and maintains data models and data dictionaries, data maps and other artifacts across the organization, including the conceptual and physical models, as well as metadata repository
  • Performed extensiveDataValidation,DataVerification againstDataWarehouse and performed debugging of the SQL-Statements and stored procedures for business scenarios.
  • Working on a Map RHadoop platform to implementBigdatasolutions using Hive, Map reduce, shell scripting and Pig.
  • Worked with cloud-based technology likeRedshift, S3, AWS, EC2 Machine, etc. and extracting the data from the Oracle financials and theRedshiftdatabase.
  • Involved in designing and developing Data Models and Data Marts that support the Business Intelligence Data Warehouse.
  • Designed the schema, configured and deployed AWSRedshiftfor optimal storage and fast retrieval ofdata.
  • Transforming staging area data into a STAR schema (hosted on AmazonRedShift) which was then used for developing embedded Tableau dashboards
  • Developed SQL scripts for loadingdatafrom staging area to Target tables and worked on SQL and SAS script mapping.
  • Performed transformations of data using Spark and Hive according to business requirements for generating various analytical datasets.
  • Performed Multinomial Logistic Regression, Random forest, Decision Tree, SVM to classify package is going to deliver on time for the new route and Performed data analysis by using Hive to retrieve the data from Hadoop cluster, Sql to retrieve data from Oracle database.
  • Worked on the development of Data Warehouse, Business Intelligence architecture that involves data integration and the conversion of data from multiple sources and platforms.
  • Let the implementation of new statistical algorithms and operators on Hadoop and SQL platforms and utilized optimizations techniques, linear regressions, K-means clustering, Native Bayes and other approaches.
  • Created mapreduce running over HDFS for data mining and analysis using R and Loading & Storage data to Pig Script and R for MapReduce operations.

Environment: Oracle 12c, SQL Plus, Erwin 9.6, MS Visio, SAS, Source Offsite (SOS), Windows XP, AWS, QC Explorer, SSRS, Quick Data, MongoDB, HBase, Hive, Cassandra, JavaScript.

Confidential - Farmington, Connecticut.

Data Modeler


  • Develop a high performance, scalable data architecture solution that incorporates a matrix of technology to relate architectural decision to business needs.
  • Participated in the design, development, and support of the corporate operation data store and enterprise data warehouse database environment.
  • Conducting strategy and architecture sessions and deliver artifacts such as MDM strategy (Current state, Interim State and Target state) and MDM Architecture (Conceptual, Logical and Physical) at detail level.
  • Owned and managed all changes to the data models, Created data models, solution designs and data architecture documentation for complex information systems.
  • Analyze change requests for mapping of multiple source systems for understanding of Enterprise wide information architecture to devise Technical Solutions.
  • Worked on AWSRedshiftand RDS for implementing models anddataon RDS andRedshift.
  • Worked with SME's and other stakeholders to determine the requirements to identify Entities and Attributes to build Conceptual, Logical and PhysicalDataModels.
  • Provided data sourcing methodology, resource management and performance monitoring for data acquisition.
  • Designed and implemented Near Real Time ETL and Analytics usingRedshiftdatabase
  • Supported and followed information governance and data standardization procedures established by the organization. Documents reports library as well as external data imports and exports.
  • PreparedTableaureports and dashboards with calculated fields, parameters, sets, groups or bins and publish on the server.
  • Developed mappings to load Fact and Dimension tables,SCDType 1 andSCDType 2 dimensions and Incremental loading and unit tested the mappings.
  • Performed analysis of data sources and processes to ensure data integrity, completeness and accuracy.
  • Created a logical design and physical design in Erwin.
  • Enforced referential integrity in the OLTPdatamodel for consistent relationship between tables and efficient database design.
  • Developed DataMapping, DataGovernance, and transformation and cleansing rules for the Master Data Management Architecture involving OLTP, ODS and generated ad-hoc reports using OBIEE.
  • Responsible for migrating thedataanddatamodels from SQL server environment to Oracle environment.
  • Analysis and designing the ETL architecture, creating templates, training, consulting, development, deployment, maintenance and support.
  • CreatedSSISPackages which loads the data from the CMS to the EMS library database and Involved indatamodeling and providing technical solutions related to Teradata to the team.
  • Build a real time event analytic system using dynamic Amazonredshiftschema.
  • Designed the physical model for implementing the model into Oracle 11g physicaldatabase and Developed SQL Queries to get complex data from different tables in Hemisphere using joins, database links.
  • Wrote SQL queries, PL/SQL procedures/packages, triggers and cursors to extract and process data from various source tables of database.
  • CreatedHiveTables, loaded transactionaldatafrom Teradata using Sqoop and created and worked Sqoop jobs with incremental load to populateHiveExternal tables.
  • Developed LINUX Shell scripts by using NZSQL/NZLOAD utilities to load data from flat files to Netezzadatabase.
  • Used Erwin to create logical and physicaldatamodels for enterprise wide OLAP system and Involved in mapping the data elements from the User Interface to the Database and help identify the gaps.
  • Designing and customizingdatamodels forDatawarehouse supportingdatafrom multiple sources on real time. Requirements elicitation andDataanalysis. Implementation of ETL Best Practices.
  • Generated comprehensive analytical reports by running SQLqueries against current databases to conductdataanalysis.
  • Createddatamodels for AWSRedshiftand Hive from dimensionaldatamodels.
  • Developed complex SQL scripts for Teradata database for creating BI layer on DW for Tableau reporting.
  • Extensively used ETL methodology for supportingdataextraction, transformations and loading processing, in acomplex EDW using Informatica.

Environment: Erwin 9.5, MS Visio, Oracle 11g, Oracle Designer, MDM, Power BI, SAS, SSIS, Tableau, Tivoli Job Scheduler, SQL Server 2012, JavaScript, AWS Redshift, PL/SQL, SQL/PL SQl, SSRS, PostgreSQL, Data Stage, SQL Navigator Crystal Reports 9, Hive, Netezza, Teradata, T-SQL, Informatica.

Confidential - SFO,CA

Data Architect/Data Analyst/Data Modeler


  • Design and develop data warehouse architecture, data modeling/conversion solutions, and ETL mapping solutions within structured data warehouse environments
  • Reconcile data and ensure data integrity and consistency across various organizational operating platforms for business impact.
  • Define best practices for data loading and extraction and ensure architectural alignment of the designs and development.
  • Used Erwin for effective model management of sharing, dividing and reusing model information and design for productivity improvement.
  • Involved in preparing LogicalDataModels/PhysicalDataModels.
  • Worked extensively in both Forward Engineering as well as Reverse Engineering usingdatamodeling tools.
  • Involved in the creation, maintenance ofDataWarehouse and repositories containing Metadata.
  • Involved using ETL tool Informatica to populate the database,datatransformation from the old database to the new database using Oracle and SQL Server.
  • Identifying inconsistencies or issues from incomingHL7messages, documenting the inconsistencies, and working with clients to resolve the data inconsistencies
  • Resolved thedatatype inconsistencies between the source systems and the target system using the Mapping Documents and analyzing the database using SQL queries.
  • Extensively used both Star Schema and Snow flake schema methodologies in building and designing the logicaldatamodel in both Type1 and Type2Dimensional Models.
  • Worked with DBA group to create Best-Fit Physical DataModel from the LogicalDataModel using Forward Engineering.
  • Worked with Data Steward Team for designing, documenting and configuring InformaticaDataDirector for supporting management of MDMdata.
  • Conducting HL7 integration testing with clients systems that is testing of business scenarios to ensure that information is able to flow correctly between applications.
  • Extensively worked with MySQL andRedshiftperformance tuning and reduced the ETL job load time by 31% and DW space usage by 50%
  • Developed Data Migration and Cleansing rules for the Integration Architecture (OLTP, ODS, DW).
  • Used Teradata SQL Assistant, Teradata Administrator, PMON and data load/export utilities like BTEQ, Fast Load, Multi Load, Fast Export, Tpump on UNIX/Windows environments and running the batch process for Teradata.
  • Created Dashboards onTableaufrom different sources using data blending from Oracle, SQL Server, MS Access and CSV at single instance.
  • Used the Agile Scrum methodology to build the different phases of Software development life cycle.
  • Documented logical, physical, relational and dimensionaldatamodels. Designed thedatamarts in dimensionaldatamodeling using star and snowflake schemas.
  • Created dimensional model based on star schemas and designed them using ERwin.
  • Used tools such as SAS/Access and SAS/SQL to create and extract oracle tables.
  • Data modeling and design of data warehouse and data marts in star schema methodology with confirmed and granular dimensions and FACT tables.
  • Developed SQL Queries to fetch complex data from different tables in remote databases using joins, database links and Bulk collects.
  • Enabled theSSISpackage configuration to make the flexibility to pass the connection strings to connection managers and values to package variables explicitly based on environments.
  • Responsible for Implementation ofHL7to build Orders, Results, ADT, DFT interfaces for client hospitals
  • Connected to Amazon RedShift through Tableau to extract livedatafor real time analysis.
  • Developed SQL Queries to fetch complexdatafrom different tables in remote databases using joins, database links and Bulk collects.
  • Developed Slowly Changing Dimensions Mapping for Type 1SCDand Type 2SCD andUsed OBIEE to create reports.
  • Worked on data modeling and produced data mapping and data definition specification documentation.

Environment: Erwin, Oracle, SQL server 2008, Power BI, MS Excel, Netezza, Agile, MS Visio, Rational Rose, Requisite Pro, SAS, SSIS, SSRS, Windows 7, PL/SQL,, SQl Server, MDM, Teradata, MS Office, MS Access, SQL, SSIS, MS Visio, Informatica.


Data Modeler/Data Analyst


  • Designed logical and physicaldatamodels for multiple OLTP and Analytic applications.
  • Involved in analysis of business requirements and keeping track ofdataavailable from variousdata sources, transform and load thedatainto Target Tables using Informatica Power Center.
  • Extensively used the Erwin design tool &Erwin model manager to create and maintain the DataMart.
  • Extensively used Star Schema methodologies in building and designing the logicaldatamodel into Dimensional Models
  • Created stored procedures using PL/SQL and tuned the databases and backend process.
  • Involved withDataAnalysis primarily IdentifyingDataSets, SourceData, Source MetaData, Data Definitions andDataFormats
  • Performance tuning of the database, which includes indexes, and optimizing SQL statements, monitoring the server.
  • Developed Informatica mappings, sessions, workflows and have written Pl SQL codes for effective and optimizeddataflow coding.
  • Wrote SQL Queries, Dynamic-queries, sub-queries and complex joins for generating Complex Stored Procedures, Triggers, User-defined Functions, Views and Cursors.
  • Created newHL7interface based on the requirement using XML, XSLT technology.
  • Experienced in creating UNIX scripts for file transfer and file manipulation and utilized SDLC and Agile methodologies such as SCRUM.
  • DataStage jobs were scheduled, monitored, performance of individual stages was analyzed and multiple instances of a job were run using DataStage Director.
  • Led successful integration ofHL7Lab Interfaces and used expertise of SQL to integrateHL7Interfaces and carried out detailed and various test cases on newly builtHL7interface.
  • Wrote simple and advanced SQL queries and scripts to create standard and ad hoc reports for senior managers.
  • Involved in collaborating with ETL/Informatica teams to sourcedata, performdataanalysis to identify gaps
  • Used Expert level understanding of different databases in combinations forDataextraction and loading, joiningdataextracted from different databases and loading to a specific database.
  • Designed and Developed PL/SQL procedures, functions and packages to create Summary tables.

Environment: SQL Server, Windows XP, SSIS, SSRS, Embarcadero, ER studio, Erwin, DB2, Informatica, Oracle, Query Management Facility (QMF),SSRS, DataStage, Clear Case forms, SAS, Agile, Unix and Shell Scripting.


Data Analyst/Data Modeler


  • Developed Data Mapping, Data Governance and transformation and cleansing rules for the Master Data Management Architecture involving OLTP, ODS.
  • Created new conceptual, logical and physicaldatamodels using ERWin and reviewed these models with application team and modeling team.
  • Performed numerousdatapulling requests using SQL for analysis and created databases for OLAP Metadata catalog tables using forward engineering of models in Erwin.
  • Enforced referential integrity in the OLTPdatamodel for consistent relationship between tables and efficient database design.
  • Proficient in importing/exporting large amounts ofdatafrom files to Teradata and vice versa.
  • DevelopedDataMapping,DataGovernance, and Transformation and cleansing rules for the MasterData Management Architecture involving OLTP, ODS.
  • Identified and tracked the slowly changing dimensions, heterogeneous sources and determined the hierarchies in dimensions.
  • Utilized ODBC for connectivity to Teradata &MS Excel for automating reports and graphical representation ofdatato the Business and OperationalAnalysts.
  • Extracteddatafrom existingdatasource, Developing and executing departmental reports for performance and response purposes by using oracle SQL, MS Excel.

Environment: UNIX scripting, Oracle SQL Developer, SSRS, SSIS, Teradata, Windows XP, SASdatasets.

Hire Now