We provide IT Staff Augmentation Services!

Data Scientist Resume

Columbus, OH

SUMMARY:

  • 9 years of overall IT Experience with 3.10 years of experience in Data Science and Analytics including Machine Learning, Data Mining and Statistical Analysis and 5.2 years' experience ranging over Product Management, Product Development, Product Data Analytics and Business Analysis
  • Hands on experience communicating business insights by dash boarding in Tableau. Developed automated tableau dashboards that helped evaluate and evolve existing user data strategies, which include user metrics, measurement frameworks, and methods to measurement. Also developed and deployed dashboards in Tableau and RShiny to identify trends and opportunities, surface actionable insights, and help teams set goals, forecasts and prioritization of initiatives
  • Experience in architecting and building comprehensive analytical solutions in Marketing, Sales and Operations functions across Technology, Retail and Banking industries. Worked closely with functional team leaders (in Product, Operations, Marketing, etc.) to explain analysis, findings, and recommendations
  • Experience in acquiring, merging, cleaning, analyzing and mining structured, semi - structured and unstructured data sets for analysis
  • Strong track record of contributing to successful end-to-end analytic solutions (clarifying business objectives and hypotheses, communicating project deliverables and timelines, and informing action based on findings)
  • Expertise writing production quality code in SQL, R, Python and Spark. Hands on experience building regression and classification models and other unsupervised learning algorithms with large datasets in distributed systems and resource constrained environments
  • Involved in independent research and experimentation of new methodologies to discover insights, improvements for problems. Delivered findings and actionable results to management team through data visualization, presentation, or training sessions. Proactively involved in roadmap discussions, data science initiatives and the optimal approach to apply the underlying algorithms
  • Experience building interpretable machine learning models, and building end to end data pipelines which included extracting, transforming and combine all incoming data with the goal of discovering hidden insight, with an eye to improve business processes, address business problems or result in cost savings
  • Experience working with large data and metadata sources; interpret and communicate insights and findings from analysis and experiments to both technical and non-technical audiences in ad, service, and business.
  • Experience in statistical programming languages like R and also Python including BigData technologies like Hadoop 2, HAVE, HDFS, MapReduce, and Spark and also experienced in Spark 2.1, Spark SQL and PySpark.
  • Skilled in using dplyr and pandas in R and python for performing Exploratory data analysis and experience working with data modeling tools like Erwin, Power Designer and ERStudio.
  • Experience with data analytics, data reporting, Ad-hoc reporting, Graphs, Scales, PivotTables and OLAP reporting.
  • Experience in Data Modeling retaining concepts of RDBMS, Logical and Physical Data Modeling until 3NormalForm (3NF) and Multidimensional Data Modeling Schema (Star schema, Snow-Flake Modeling, Facts and dimensions). Hands on experience in optimizing the SQL Queries and database performance tuning in Oracle, SQL Server and Teradata databases
  • Solid background in Statistics having expertise in Statistical Inference, Regression Models, ANOVA, Linear Regression, Logistic Regression, Credit Risk Management and SAS Programming
  • Extremely efficient at handling multiple projects simultaneously in a fast-paced working environment
  • Good team player with excellent problem solving and communication skills

TECHNICAL SKILLS:

Exploratory Data Analysis: Univariate/Multivariate Outlier detection, Missing value imputation, Histograms/Density estimation, EDA in Tableau

Supervised Learning: Linear/Logistic Regression, Lasso, Ridge, Elastic Nets, Decision Trees, Ensemble Methods, Random Forests, Support Vector Machines, Gradient Boosting, XGB, Deep Neural Networks, Bayesian Learning

Unsupervised Learning: Principal Component Analysis, Association Rules, Factor Analysis, K-Means, Hierarchical Clustering, Gaussian Mixture Models, Market Basket Analysis, Collaborative Filtering and Low Rank Matrix Factorization

Feature Engineering: Stepwise, Recursive Feature Elimination, Relative Importance, Filter Methods, Wrapper Methods and Embedded Methods

Statistical Tests: T Test, Chi-Square tests, Stationarity tests, Auto Correlation tests, Normality tests, Residual diagnostics, Partial dependence plots and Anova

Sampling Methods: Bootstrap sampling methods and Stratified sampling

Model Tuning/Selection: Cross Validation, AUC, Precision/Recall, Walk Forward Estimation, AIC/BIC Criterions, Grid Search and Regularization

Time Series: ARIMA, Holt winters, Exponential smoothing, Bayesian structural time series

R: caret, glmnet, forecast, xgboost, rpart, survival, arules, sqldf, dplyr, nloptr, lpSolve, ggplot

Python: pandas, numpy, scikit-learn, scipy, statsmodels, matplotlib, tensorflow

SAS: Forecast server, SAS Procedures and Data Steps

Spark: MLlib, GraphX

SQL: Subqueries, joins, DDL/DML statements

Databases/ETL/Query: Teradata, Netezza, SQL Server, Postgres and Hadoop (MapReduce); SQL, Hive, Pig and Alteryx.

Visualization: Tableau, ggplot2 and RShiny

Prototyping: PowerPoint, RShiny and Tableau

Operating System: Windows, Linux, Unix, Macintosh HD, Red Hat.

WORK EXPERIENCE:

Confidential, Columbus, OH

Data Scientist

Responsibilities:

  • Applied models and regression, comparing various initial models, creating pipelines for data processing and presenting reports to other teams within the company.
  • Participated in all phases of data acquisition, data cleaning, developing models, validation, and visualization to deliver data science solutions
  • Participated in all phases of data mining; data collection, data cleaning, Feature engineering, Model building, Model Validation, visualization and performed Gap analysis.
  • Used dplyr, ggplot2, caret, corrplot, xgboost, cowplot in R for developing various machine learning algorithms and Utilized machine learning algorithms such as linear regression, ridge regression, Lasso regression, Random Forests, XG-Boost for data analysis.
  • Performed logistic Regression, K-means clustering, Multivariate analysis and Support Vector Machines in R.
  • Developed Clustering algorithms and Support Vector Machines that improved Customer segmentation and Market Expansion. Conducted studies, rapid plots and using advance data mining and statistical modelling techniques to build a solution that optimize the quality and performance of data.
  • Demonstrated experience in design and implementation of Statistical models, Predictive models, enterprise data model, metadata solution and data life cycle management in both RDBMS, Big Data environments.
  • Analyzed large data sets apply machine learning techniques and develop predictive models, statistical models and developing and enhancing statistical models by leveraging best-in-class modeling techniques.
  • Worked on database design, relational integrity constraints, OLAP, OLTP, Cubes and Normalization (3NF) and De-normalization of the database.
  • Developed MapReduce/Spark Python modules for machine learning & predictive analytics in Hadoop on AWS.
  • Worked on customer segmentation using an unsupervised learning technique - clustering.
  • Utilized Spark, Scala, Hadoop, HBase, Kafka, Spark Streaming, MLlib, Python, a broad variety of machine learning methods including classifications, regressions, dimensionally reduction etc.
  • Developed LINUX Shell scripts by using NZSQL/NZLOAD utilities to load data from flat files to Netezza database.

Confidential, Dallas, TX

Data Scientist

Responsibilities:

  • Performed data extraction, validation, summarizations insights related to such analytical projects from marketing database.
  • Involved in assessing analytical projects to determine the required data elements to be utilized for desired result.
  • Worked with external database host vendor as needed to complete analytical projects.
  • Use SAS procedures like PROC FREQ, PROC MEANS, PROC SORT, PROC PRINT, PROC TABULATE and PROC REPORT in order to prepare summary data.
  • Implemented Merton structural model using market-based variables
  • Applied JPM 2001 market standard reduced form using market observable CDS spread
  • Developed hybrid internal rating model comprising 350 reference entities in with A-IRB approach as recommended by Basel II/III accords
  • Incorporated Standard & Poor’s public debt ratings with historical probabilities of default to include agency rating PDs into hybrid model
  • Hybrid model spanned investment grade and non-investment grade space and 12 sectors.
  • Completed stress test on the model using DFAST & CCAR scenarios for baseline, adverse & severely adverse economic & financial market conditions
  • Validated hybrid credit model by following guidelines outlined in SR 11-7
  • Developed conceptually sounded theoretical framework in support of chosen approaches.
  • Performed literature review and obtained published research in support of chosen approaches and assumptions
  • Performed sensitivity and outcomes analysis of the hybrid model by shocking parameters quantifying impact of each variables contribution to hybrid model PDs
  • Replicated hybrid model in SAS and matched model outcomes, reducing possibility of coding and implementation errors
  • Performed documentation review and augmented documentation to meet regulatory guidelines.
  • Tested the model out of sample using stratified holdout sample comprised of 150 companies excluded from development sample
  • Developed governance and control procedures including change control log to document significant model changes
  • Performed statistical analysis using various SAS procedures such as Proc Univariate, Proc ANOVA, Proc GLM, Proc Logistics, Proc SQL, Proc Freq, Proc Mean, Proc Summary, Proc Contents, Proc Compare, Proc Sort using SAS 9.1
  • Responsible for Estimation of summary statistics, tabulation counts, correlations and check for dispersion from normality are performed using Proc UNIVARIATE, Proc MEAN and Proc FREQ.

Confidential, Tampa, FL

Senior Data Analyst

Responsibilities:

  • Business Analyst is responsible for preparation of Business Requirement Document (BRD), Functional Specifications, System Requirement Specification (SRS), Data element documents.
  • Expertise in Data Modelling, Data Analysis, Data Warehousing and Data Mapping, Development of Entity Relationship diagram, SQL and reporting process.
  • Work in coordination with clients for feasibility study of the project in Tableau and with DB Teams for designing the database as per project requirement.
  • Performs feasibility analysis, scopes projects, and works with the project management team to prioritizes deliverables, and negotiate on product functionalities.
  • Manages system configuration / system settings processes to ensure fulfillment of business requirements and application requirements
  • Manage time, resources, and funds for overall territory in a responsible and effective manner.

Confidential

Associate Applications Developer

Responsibilities:

  • 4+years’ experience in Product Development, Quality Assurance Testing and Business Analysis for Oracle Banking Product - Oracle FLEXCUBE Core Banking
  • Worked as Applications Developer (SAS, SQL) for Confidential Data Management, Data Analytics and Statistics regression tests and related activities.
  • Expertise in Black Box Testing, Functional Testing, GUI testing, System Testing, Integration testing, Regression testing, and Performance testing for Oracle FLEXCUBE Core Banking
  • Proficiency in build/ release & configuration management in managing source code repositories using Tortoise SVN tool
  • Ensure comprehensive test coverage by working closely with the product and engineering teams to prioritize testing execution and report on test execution progress and results in JIRA.
  • Analyze business and system requirements, review Business Requirement and Design documents and manage development of specifications to create and execute detailed test plan.
  • Managing a team of 5-7 people along with generating monthly risk metrics and reports for senior management.
  • Demonstrated effective time-management, with ability to handle multiple tasks, meet deadlines in dynamic environment, as well as to set priorities in a complex team and work environment

Hire Now