We provide IT Staff Augmentation Services!

Sr. Data Scientist Resume

0/5 (Submit Your Rating)

Sacramento, CA

SUMMARY

  • Around 7+years of professional IT experience in Machine Learning Statistic Modeling, Predictive Modeling, Data Analytics, Data Modeling, Data Architecture, Data Analysis, Data Mining, Text Mining and Natural Language Processing (NLP), Artificial Intelligence algorithms, Business Intelligence, Analytics Models (like Decision Trees, Linear & Logistic Regression, Hadoop (Hive, PIG), R, Python, Spark, Scala, MS Excel, SQL and Postgre SQL, Erwin.
  • Proficient in managing entire data science project life cycle and actively involved in all the phases of project life cycle including data acquisition, data cleaning, data engineering, features scaling, features engineering, statistical modeling (decision trees, regression models, neural networks, SVM, clustering), dimensionality reduction using Principal Component Analysis and Factor Analysis, testing and validation using ROC plot, K - fold cross validation and data visualization.
  • Extensive experience in Text Analytics, generating data visualizations using R, Python and creating dashboards using tools like Tableau.
  • Adept and deep understanding of Statistical modeling, Multivariate Analysis, model testing, problem analysis, model comparison and validation.
  • Good knowledge in Database Creation and maintenance of physical data models with Oracle, Teradata, Netezza, DB2 and SQL Server databases.
  • Experience in Big Data Tools like Apache Spark, MapReduce, Hadoop, HBase.
  • Worked on different data formats such as JSON, XML, CSV, TXT, XLS and performed machine learning algorithms in Python using python libraries such as Pandas, NumPy, Seaborn, SciPy, Matplotlib, SciKit-learn.
  • Extensive experience in Data Visualization including producing tables, graphs, Storytelling listings using various tools such as Tableau, MS Excel, Google Data Studio.
  • Expertise in Excel Macros, Pivot Tables, vlookups and other advanced functions and expertise R user with knowledge of statistical programming languages SAS.
  • Experience in Data Analysis, Data Migration, Data Cleansing, Transformation, Integration, Data Import, and Data Export using multiple ETL tools such as Ab Initio and Informatica Power Center.
  • Experienced in developing Conceptual, Logical and Physical Data Models using UML and IE notations for Online Transactional processing (OLTP) and Online Analytical Processing (OLAP) systems using Erwin, ERStudio, Enterprise Architect and Power Designer.
  • Experienced in developing Physical Data Model for multiple platforms - SQL Server/ DB2/ Oracle/ Teradata.
  • Develop new computer vision algorithms to unlock new ways of human-computer interaction.
  • Applying machine learning to computer vision problems.
  • Experience in writing expressions in SSRS and Expert in fine tuning the reports. Created many Drill through and Drill Down reports using SSRS.
  • Extensive Experience in implementation functionalities like Grouping, Sorting, Derived Report parameters by using SSRS.
  • Experience in applying Predictive Modeling and Machine Learning algorithms for Analytical projects.
  • Collaborated with the lead Data Architect to model the Data warehouse in accordance to FSLDM subject areas, Snowflake schema, 3NF format.
  • Experience in coding SQL/PLSQL using Procedures, Triggers and Packages.
  • Experience in designing stunning visualizations using Tableau software and publishing and presenting dashboards, desktop platforms and Storyline on webs.
  • Highly skilled in using visualization tools like ggplot2, Tableau and d3.js for creating dashboards.
  • Proficiency in understanding statistical and other tools/languages - R, C, C++, Java, Python, SQL, UNIX, Qlikview Data visualization tool and Anaplan forecasting tool.
  • Proficient in Predictive Modeling, Data Mining Methods, Factor Analysis, ANOVA, Hypothetical testing, normal distribution and other advanced statistical and econometric techniques.
  • Experienced in using ETL tools in (SSIS) MS SQL 2016, 2014, MS SQL 2012, MS SQL 2008, MSSQL 2005 and DTS in MS SQL 2000.
  • Experience in Deploying the SSIS Packages from development server to production server
  • Expert in creating simple and parameterized reports and also complex reports involving Sub Reports, Matrix/Tabular Reports, Charts and Graphs using SSRS in Business intelligence development studio (BIDS).
  • Actively engage other teams in Active Directory, Office 365 and Azure to identify problems and areas where data mining/machine learning/applied statistics can be used and lead the development of solutions to these problems.

TECHNICAL SKILLS

Version Control Tools: JIRA, MS SharePoint, Rally, TFS, Git-Hub, PVCS.

Operating Systems and Others: Microsoft Windows, Linux (CentOS, Ubuntu & RedHat), Microsoft Office Suite including Visio.

Databases: Oracle, MS SQL Server 2012/2014/2016 , MySQL5.x, MongoDB 3.x, Neo4j

Statistical Methods: Hypothesis Testing, Correspondence Analysis, Cross Validation, Principal Component Analysis (PCA), Exploratory Data Analysis (EDA)

Machine Learning: Linear Regression, Naïve Bayes, Logistic Regression, K-nearest Neighbors (KNN), Decision Tree, Random Forest, Ada Boosting, Gradient Boosting, Support Vector Machine (SVM), K-mean Clustering.

Hadoop Ecosystem: Hadoop 2.x, Spark 2.x, Hive, Pig

BI Tools: Tableau 10.x/9.x, Power BI, QlikView

Cloud Services: Microsoft Azure Machine Learning Studio/HDInsight/ Data Lake, Amazon Web Services (AWS) EC2/S3/Redshift.

Programming Languages: Python 2.x/3.x (Num-Py, Pandas, SciKit-learn, Matplotlib, seaborn), SQL, R, H20

Artificial Intelligence: Tensor flow, Conventional Neural Network, Artificial Neural Network, Deep Learning.

PROFESSIONAL EXPERIENCE

Confidential, Sacramento, CA

Sr. Data Scientist

Responsibilities:

  • Conducted one-to-one sessions with business users to gather data for Data Warehouse requirements.
  • Part of team analyzing database requirements in detail with the project stakeholders through Joint Requirements Development (JRD) sessions.
  • Experience in working on developing different data models in health care such as epic and Cerner.
  • Used Tensor flow programming to build computational graphs based on the client requirements.
  • Experience in installing and building Tensor flow from scratch.
  • Developed an Object modeling in UML for Conceptual Data Model using Enterprise Architect.
  • Developed logical and Physical data models using Erwin to design OLTP system for different applications. knowledge of modern computer vision and deep learning approaches.
  • Developed MapReduce/Spark Python modules for predictive analytics & machine learning in Hadoop.
  • Experience in HD insight service to deploy and provision of Apache Hadoop clusters in the Azure cloud.
  • Designed and developed large scale models using Logistic Regression, Random Forest, Time-series models and NLP Models.
  • Develop creative computer vision and tracking software for a variety of products.
  • Facilitated transition of logical data models into the physical database design and recommended technical approaches for good data management practices.
  • Worked with DBA group to create Best-Fit Physical Data Model with DDL from the Logical Data Model using Forward engineering.
  • Created entity process association matrices using Zachman Framework, functional decomposition diagrams and data flow diagrams from business requirements documents.
  • Involved in detail designing of data marts by using Star Schema and plan data marts involving shared dimensions.
  • Used Model Manager Option in Erwin to synchronize the data models in Model Mart approach.
  • Gather various reporting requirements from Business Analysts.
  • Worked on enhancements to the Data Warehouse model using Erwin as per the business reporting requirements.
  • Reverse Engineering the reports and identified Data Elements (in the source system) . Dimensions, Facts and Measures required for reports.
  • Worked with the ETL team to document the transformation rules for data migration from OLTP to Warehouse environment for reporting purposes.
  • Extensive system study, design, development and testing were carried out in the Oracle environment to meet the customer requirements.
  • Developed Data Migration and Cleansing rules for the Integration Architecture (OLTP, ODS, DW)
  • Used Teradata utilities such as Fast Export, Multi LOAD for handling various tasks.
  • Involved in migration projects to migrate data from data warehouses on Oracle/DB2 and migrated those to Teradata.
  • Developed data mapping documents between Legacy, Production, and User Interface Systems.
  • Generated comprehensive analytical reports by running SQL queries against current databases to conduct data analysis.
  • Developed and maintained Data Dictionary to create Metadata Reports for technical and business purpose using Erwin report designer.
  • Generated ad-hoc repots using Crystal Reports and SQL Server Reporting Services (SSRS)
  • Experienced in using- Node JS, AngularJS, MySQL, SQL, Azure, AWS, NoSQL, MapReduce, Hadoop, Power BI and Azure Data factory and Bootstrap.
  • Applies advanced statistical and predictive modeling techniques to build, maintain, and improve on multiple, real-time decision systems. Closely works with product managers, Service development managers, and product development team in productizing the algorithms developed.
  • Experience in various Deep Learning and Reinforcement Learning techniques like DNN, and ANN
  • In-depth expertise in Statistical Procedures like Parametric and Non-Parametric Tests, Hypothesis Testing, ANOVA, ARIMA, Interpreting P values

Environment: Erwin r9.6, DB2, Teradata, SQL-Server2008, Informatica 8.1, Enterprise Architect, Power Designer, MS SSAS, Crystal Reports, SSRS, ER Studio, Lotus Notes, Windows XP, MS Excel, word and Access.

Confidential, Princeton, NJ

Sr. Data Scientist

Responsibilities:

  • Provided Configuration Management and Build support for more than 5 different applications, built and deployed to the production and lower environments.
  • Evaluated the performance of Various Classification and Regression algorithms using R language to predict the future power.
  • Worked with several R packages including Knitr, dplyr, SparkR, CausalInfer, spacetime.
  • Involved in Detecting Patterns with Unsupervised Learning like K-Means Clustering.
  • Implemented end-to-end systems for Data Analytics, Data Automation and integrated with custom visualization tools using R, Mahout, Hadoop and MongoDB.
  • Gathering all the data that is required from multiple data sources and creating datasets that will be used in analysis.
  • Experience with cloud-based development using Cloud Foundry, Docker, AWS, Azure or other container/PaaS environments.
  • Implementing following algorithms in Jupiter notebook while solving Coursera assignments. Logistic regression classifier to recognize cats, hyper parameter tuning with regularization and optimization, Image recognition problem using Conventional Neural networks.
  • Performed Exploratory Data Analysis and Data Visualizations using R, and Tableau.
  • Perform a proper EDA, Univariate and bi-variate analysis to understand the intrinsic effect/combined effects.
  • Worked with Data governance, Data quality, data lineage, Data architect to design various models and processes.
  • Independently coded new programs and designed Tables to load and test the program effectively for the given POC's using with Big Data/Hadoop.
  • Designed data models and data flow diagrams using Erwin and MS Visio.
  • Developed triggers, stored procedures, functions and packages using cursors and ref cursor concepts associated with the project using Pl/SQL
  • Created various types of data visualizations using R, python and Tableau.
  • Used Python, R, SQL to create Statistical algorithms involving Multivariate Regression, Linear Regression, Logistic Regression, PCA, Random forest models, Decision trees, Support Vector Machine for estimating the risks of welfare dependency.
  • Identified and targeted welfare high-risk groups with Machine learning algorithms.
  • Conducted campaigns and run real-time trials to determine what works fast and track the impact of different initiatives.
  • Developed Tableau visualizations and dashboards using Tableau Desktop.
  • Used Graphical Entity-Relationship Diagramming to create new database design via easy to use, graphical interface.
  • Created multiple custom SQL queries in Teradata SQLWork bench to prepare the right data sets for Tableau dashboards
  • Utilized Google analytics to understand the user traffic on the Target Website and prepared reports.
  • Utilized recommender systems, collaborative filtering techniques to drive Target business priorities
  • Perform analyses such as regression analysis, logistic regression, discriminant analysis, cluster analysis using SAS programming.
  • Used Meta data tool for importing metadata from repository, new job categories and creating new data elements.
  • Scheduled the task for weekly updates and running the model in workflow. Automated the entire process flow in generating the analysis and reports.

Environment: Erwin 8, Teradata 13, SQL Server 2008, Oracle 9i, SQL*Loader, PL/SQL, ODS, OLAP, OLTP, SSAS, Informatica Power Center 8.1.

Confidential, Moline, IL

Data Scientist

Responsibilities:

  • Built models using Statistical techniques like Bayesian HMM and Machine Learning classification models like XG Boost, SVM, and Random Forest.
  • Worked with data compliance teams, data governance team to maintain data models, Metadata, data Dictionaries, define source fields and its definitions.
  • Developed Map Reduce/Spark Python modules for machine learning & predictive analytics in Hadoop on AWS. Implemented a Python-based distributed random forest via Python streaming.
  • Setup storage and data analysis tools in Amazon Web Services cloud computing infrastructure.
  • A highly immersive Data Science program involving Data Manipulation & Visualization, Web Scraping, Machine Learning, Python programming, SQL, GIT, Unix Commands, NoSQL, MongoDB, Hadoop.
  • Transformed Logical Data Model to Erwin, Physical Data Model ensuring the Primary Key and Foreign Key relationships in PDM, Consistency of definitions of Data Attributes and Primary Index Considerations.
  • Developed Oracle10g stored packages, procedures, functions and database triggers using PL/SQL for ETL process, data handling, logging, archiving and to perform Oracle back-end validations for batch processes.
  • Documented logical, physical, relational and dimensional data models. Designed the Data Marts in dimensional data modeling using star and snowflake schemas.
  • Worked as Data Architects and IT Architects to understand the movement of data and its storage and ER Studio 9.7
  • Worked with the UNIX team and installed TIDAL job scheduler on QA and Production Netezza environment.
  • Handled importing data from various data sources, performed transformations using Hive, Map Reduce, and loaded data into HDFS.
  • Worked with BTEQ to submit SQL statements, import and export data, and generate reports in Teradata.
  • Designed and documented Use Cases, Activity Diagrams, Sequence Diagrams, OOD (Object Oriented Design) using UML and Visio.
  • Created Hive queries that helped analysts spot emerging trends by comparing fresh data with EDW reference tables and historical metrics and processed the data using HQL (like SQL) on top of Map-reduce.
  • Hands-on development and maintenance using Oracle SQL, PL/SQL, SQL Loader, and Informatica Power Center9.1.
  • Designed the ETL process to Extract translates and load data from OLTP Oracle database system to Teradata data warehouse.
  • Created tables, sequences, synonyms, joins, functions and operators in Netezza database.
  • Created and implemented MDM data model for Consumer/Provider for Health CareMDM product from Variant.
  • Built and published customized interactive reports and dashboards, report scheduling using Tableau server.
  • Hands on Oracle External Tables feature to read the data from flat files into Oracle staging tables.
  • Using the K-means clustering algorithm in H20 on different data sets to classify the measurements into Clusters.
  • Loading and importing data sets and run the k-means estimator in H20.
  • Analyzed the web log data using the HiveQL to extract number of unique visitors per day, page views, visit duration, most purchased product on website and managed and reviewed Hadoop log files.
  • Used Erwin9.1 for effective model management of sharing, dividing and reusing model information and design for productivity improvement.
  • Designed and developed user interfaces and customization of Reports using Tableau and OBIEE and designed cubes for data visualization, mobile/web presentation with parameterization and cascading.
  • Performed Data Analysis and Data Profiling and worked on data transformations and data quality rules.
  • Created SSIS Packages using Pivot Transformation, Execute SQL Task, Data Flow Task, etc to import data into the data warehouse.
  • Developed and implemented SSIS, SSRS and SSAS application solutions for various business units across the organization.

Environment: ERwin9.x, Teradata, Oracle10g, Hadoop, HDFS, Pig, Hive, MapReduce, PL/SQL, UNIX, Informatica Power Center, MDM, SQL Server, Netezza, DB2, Tableau, Aginity, Architecture, SAS/Graph, SAS/SQL, Tableau, SAS/Connect and SAS/Access.

Confidential - Atlanta, GA

Data Scientist

Responsibilities:

  • Involved in Design, Development and Support phases of Software Development Life Cycle (SDLC)
  • Performed data ETL by collecting, exporting, merging and massaging data from multiple sources and platforms including SSIS (SQL Server Integration Services) in SQL Server.
  • Worked with cross-functional teams (including data engineer team) to extract data and rapidly execute from MongoDB through MongDB connector for Hadoop.
  • Performed data cleaning and feature selection using MLlib package in PySpark.
  • Performed partitional clustering into 100 by k-means clustering using Scikit-learn package in Python where similar hotels for a search are grouped together.
  • Used Python to perform ANOVA test to analyze the differences among hotel clusters.
  • Implemented application of various machine learning algorithms and statistical modeling like Decision Tree, Naive Bayes, Logistic Regression and Linear Regression using Python to determine the accuracy rate of each model.
  • Determined the most accurately prediction model based on the accuracy rate.
  • Used text-mining process of reviews to determine customers' concentrations.
  • Delivered analysis support to hotel recommendation and providing an online A/B test.
  • Designed Tableau bar graphs, scattered plots, and geographical maps to create detailed level summary reports and dashboards.
  • Developed hybrid model to improve the accuracy rate.
  • Delivered the results to operation team for better decisions and feedbacks.

Environment: Python, PySpark, Tableau, MongoDB, Hadoop, SQL Server, SDLC, ETL, SSIS, recommendation systems, Machine Learning Algorithms, text-mining process, A/B test.

Confidential

Data Scientist

Responsibilities:

  • Provided the architectural leadership in shaping strategic, business technology projects, with an emphasis on application architecture.
  • Utilized domain knowledge and application portfolio knowledge to play a key role in defining the future state of large, business technology programs.
  • Participated in all phases of data mining, data collection, data cleaning, developing models, validation, and visualization and performed Gap analysis.
  • Developed MapReduce/Spark Python modules for machine learning & predictive analytics in Hadoop on AWS. Implemented a Python-based distributed random forest via Python streaming.
  • Performed Source System Analysis, database design, data modeling for the warehouse layer using MLDM concepts and package layer using Dimensional modeling.
  • Created ecosystem models (e.g. conceptual, logical, physical, canonical) that are required for supporting services within the enterprise data architecture (conceptual data model for defining the major subject areas used, ecosystem logical model for defining standard business meaning for entities and fields, and an ecosystem canonical model for defining the standard messages and formats to be used in data integration services throughout the ecosystem)
  • Used Pandas, NumPy, seaborn, SciPy, Matplotlib, Scikit-learn, NLTK in Python for developing various machine learning algorithms and utilized machine learning algorithms such as linear regression, multivariate regression, naive Bayes, Random Forests, K-means, &KNN for data analysis.
  • Demonstrated experience in design and implementation of Statistical models, Predictive models, enterprise data model, metadata solution and data life cycle management in both RDBMS, Big Data environments.
  • Developed LINUX Shell scripts by using NZSQL/NZLOAD utilities to load data from flat files to Netezza database.
  • Designed and implemented system architecture for Amazon EC2 based cloud-hosted solution for client.
  • Tested Complex ETL Mappings and Sessions based on business user requirements and business rules to load data from source flat files and RDBMS tables to target tables.
  • Analyzed large data sets apply machine learning techniques and develop predictive models, statistical models and developing and enhancing statistical models by leveraging best-in-class modeling techniques.
  • Hands on database design, relational integrity constraints, OLAP, OLTP, Cubes and Normalization (3NF) and De-normalization of database.
  • Developed MapReduce/Spark Python modules for machine learning & predictive analytics in Hadoop on AWS.
  • Worked on customer segmentation using an unsupervised learning technique - clustering.
  • Worked with various Teradata15 tools and utilities like Teradata Viewpoint, Multi Load, ARC, Teradata Administrator, BTEQ and other Teradata Utilities.
  • Utilized Spark, Scala, Hadoop, HBase, Kafka, Spark Streaming, MLlib, Python, a broad variety of machine learning methods including classifications, regressions, dimensionally reduction etc.

Environment: Erwin r9.6, Python, SQL, Oracle 12c, Netezza, SQL Server, Informatica, Java, SSRS, PL/SQL, T-SQL, Tableau, MLlib, regression, Cluster analysis, Scala NLP, Spark, Kafka, MongoDB, logistic regression, Hadoop, Hive, Teradata, random forest, OLAP, Azure, MariaDB, SAP CRM, HDFS, ODS, NLTK, SVM, JSON, Tableau, XML, Cassandra, MapReduce, AWS.

Confidential

Data Analyst

Responsibilities:

  • Designed and developed Natural Language Processing models for sentiment analysis.
  • Applied clustering algorithms i.e. Hierarchical, K-means with help of Scikit and Scipy.
  • Developed visualizations and dashboards using ggplot, Tableau
  • Worked on development of data warehouse, Data Lake and ETL systems using relational and non-relational tools like SQL, No SQL.
  • Built and analyzed datasets using R, SAS, Matlab and Python (in decreasing order of usage)
  • Participated in all phases of datamining; data collection, data cleaning, developing models, validation, visualization and performed Gap analysis.
  • Data Manipulation and Aggregation from different source using Nexus, Toad, Business Objects, PowerBI and Smart View.
  • Implemented Agile Methodology for building an internal application.
  • Good knowledge of Hadoop Architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node, Secondary Name Node, and Map Reduce concepts.
  • As Architect delivered various complex OLAP databases/cubes, scorecards, dashboards and reports.
  • Programmed a utility in Python that used multiple packages (scipy, numpy, pandas).
  • Implemented Classification using supervised algorithms like Logistic Regression, Decisiontrees, KNN, Naive Bayes.
  • Used Teradata15 utilities such as Fast Export, MLOAD for handling various tasks data migration/ETL from OLTP Source Systems to OLAP Target Systems
  • Maintenance in the testing team for System testing/Integration/UAT.
  • Involved in preparation & design of technical documents like Bus Matrix Document, PPDM Model, and LDM&PDM.
  • Understanding the client business problems and analyzing the data by using appropriate Statistical models to generate insights.

Environment: R 3.0, Erwin, Tableau 8.0, QlikView, ML Lib, PL/SQL, HDFS, Teradata 14.1, JSON, HADOOP (HDFS), MapReduce, PIG, Spark, R Studio, MAHOUT, JAVA, HIVE.

We'd love your feedback!