Data Scientist Resume
Alpharetta, GA
PROFESSIONAL SUMMARY:
- Above 5+ years of professional IT experience in Web Applications, Machine Learning, Deep Learning, Reinforcement Learning, Statistics, Transforming Data with large datasets of Structured and Unstructured data, Data Engineering, Data Validation, Predictive modeling, and Data Visualization.
- Extensive use of Natural Language Processing (NLP) for Text Analytics, developing different Statistical Machine Learning, Data Mining solutions to various business problems and generating data visualizations using Python3.4.3/2.7, R 3.2.2, SAS, SQL, PL/SQL and Tableau9.4.
- Experience working with Flask and Django framework of Python and Shiny and gWidgets of R.
- Hands on experience in implementing LDA, Naïve Bayes and skilled in Random Forests, Decision Trees, Linear and Logistic Regression, Support Vector Machine, Clustering, Bagging, Boosting, Neural networks, Principal Component Analysis and good noledge on Recommender Systems.
- Proficient in Statistical Modeling and Machine Learning techniques (Linear, Logistics, Decision Trees, Random Forest, Support Vector Machine, K - Nearest Neighbors, Bayesian, XG Boost) in Forecasting/ Predictive Analytics, Segmentation methodologies, Regression based models, Hypothesis testing, Factor analysis/ Principal Component Analysis, and Ensembles.
- Expertise in transforming business requirements into Analytical models, designing algorithms, building models, developing datamining and reporting solutions that scales across massive volume of structured and unstructured data.
- Strong experience in Software Development Life Cycle (SDLC) including Requirements Analysis, Design Specification and Testing as per Cycle in both Lean Six Sigma and Agile methodologies.
- Adept in statistical programming languages like R3.2.5, MATLAB and Python3.4.3/2.7.3 including Big Data technologies like Hadoop, HDFS, Hive, HBase, and Pig.
- Experienced over translating data driven insights combined with domain noledge into business stories and dashboards using traditional presentation tools as well as visualization tools like Tableau 9.4, R3.2.2 programing and Python 3.4.3.
- Expert in working on all activities related to teh development, implementation, administration and support of ETL processes for large-scale Data Warehouses using Data Transformation Services (DTS) and (SQL Server Integration Services) SSIS with MS SQL 2014/2012/2008R2.
- Solid understanding of RDBMS database concepts including Normalization and hands on experience creating database objects such as tables, views, stored procedures, triggers, row-level audit tables, cursors, indexes, user-defined data-types and functions.
- Experience with Data Analytics, Data Reporting, Ad-hoc Reporting, Graphs, Scales, Power PivotTables, Power View, and OLAP reporting using SQL Server Reporting Services (SSRS).
- Good understanding of Microsoft SQL Management Studio and data load / export utilities like BTEQ, Fast Load, Multi Load, Fast Export and Performance Tuning for enhancing extraction speed of data.
TECHNICAL SKILLS
Machine Learning Algorithms: Multivariate regression analysis, SVM, Cluster Analysis, NLP, Deep Learning, Parametric and Non-Parametric Tests and Analysis of variance technique, Polynomial regression, Decision trees, Random Forest, Apriori, Market Basket Analysis, Factor Analysis, Principal Component Analysis, K Nearest Neighbors, Reinforcement Leaning, and Recommender systems.
Programming Languages: Python3.4.3/2.7, R3.2.2, SQL, PL/SQL, SAS, C, C++, and Java.
Databases: SQL Server 2014/2012/2008R2, Oracle 11g/10g, MS Access, DB2, and Teradata.
Tools: HQL, Pig, Apache Spark, PySpark, Spark, Scala, Microsoft Visual Studio, Minitab, Arena, SPSS, Microsoft SQL Integration Services, Microsoft SQL Analyzing Services, Microsoft SQL Reporting Services, and Oracle Database Integrator.
Environments and Libraries: Jupyter, Spyder, Python Console, Pycharm, RStudio, and Tensor Flow, Keras, Theano, NLTK, Genism, h2o, Scikit- learn, Numpy, Pandas, Scipy, dplyr, tidyr, caret, ggplot2, Matplotlib and Seaborn.
Operating system: Windows, Linux, and Unix.
Data Warehousing Tools: SQL Server Business Intelligence Development - SSIS, SSAS, SSRS, Informatica, Alteryx and SAP Business Objects and Business Intelligence.
BI and Power BI Tools: Tableau, Tableau server, Tableau Reader, SAP Business Objects, OBIEE, QlikView, SAP Business Intelligence and Power Pivot, Power Query, Power Map, and Power View.
PROFESSIONAL EXPERIENCE
Confidential, Alpharetta, GA
Data Scientist
Responsibilities:
- Experience working in Data Requirement analysis for transforming data according to business requirements.
- Worked thoroughly with data compliance teams such as Data Analysts and Data Engineers to gathered require raw data and define source fields in Hadoop.
- Applied Forward Elimination and Backward Elimination for data sets to identify most statically significant variables and to remove insignificant variables for Data analysis and to get better predictive insights.
- Utilized Label Encoders in Python to convert non-numerical significant variables to numerical significant variables to identify their impact on pre-acquisition and post acquisitions by using 2 sample paired t test.
- Worked with ETL SQL Server Integration Services (SSIS) for data investigation and mapping to extract data and applied fast parsing and enhanced efficiency by 17%.
- Developed Data Science content involving Data Manipulation and Visualization, Web Scraping, Machine Learning, Python programming, SQL, GIT and ETL for Data Extraction.
- Created A/B testing and Multivariate testing to check teh performance of Patient and record applications with different key indicators such as patient’s review and patient’s feedback.
- Developed Analytical systems, data structures, gather and manipulate data, using statistical techniques.
- Designing suite of Interactive dashboards, which provided an opportunity to scale and measure teh statistics of teh HR dept. which was not possible earlier and schedule and publish reports.
- Provided and created data presentation to reduce biases and telling true story of people by pulling millions of rows of data using SQL and performed Exploratory Data Analysis.
- Applied breadth of noledge in programming (Python, R), Descriptive, Inferential, and Experimental Design statistics, advanced mathematics, and database functionality (SQL, Hadoop).
- Migrated data from Heterogeneous Data Sources and legacy system (DB2, Access, Excel) to centralized SQL Server databases using SQL Server Integration Services (SSIS).
- Applied Descriptive statistics and Inferential Statistics on varies data attributes using SPSS to draw better insights of data regarding to provide products and services for patients.
- Developed Machine learning algorithms such as Collaborative filtering, Neural Network models, Hybrid recommendation model and NLP for analyzing most significant variables to get better predictive insights.
- Rapidly evaluated Machine learning algorithms and Deep Learning frameworks, tools, techniques and approaches for deployment in AWS and consumption of data science teams.
- Utilized NLP algorithms with help of NLTK and Genism libraries to recognize text from teh patient’s reviews.
- Evaluated model’s performance by using these valuable parameters such as Accuracy, F-value, Recall, Precision, Sensitivity, Specificity and Variance.
- Utilized data reduction techniques such as Factor analysis to identify most correlated values to underlying factors of teh data and categorized teh variable according to factors.
- Applied Wilcoxon sign test to patient and treatment data for pre-acquisition and post-acquisition for different sectors to find teh statistical significance in R programming.
- Performance Tuning: Analyze teh requirements and fine tune teh stored procedures/queries to improve teh performance of teh application.
- Designed and built scalable infrastructure to support real- time analytics with teh help of Scala.
- Used Spark Streaming to divide streaming data into batches for batch processing.
- Developed various Tableau9.4 Data Models by extracting and using teh data from various sources files, DB2, Excel, Flat Files and Bigdata.
- Handled importing data from various data sources, performed transformations using Hive, Map Reduce, and loaded data into HDFS by using HQL queries in Hadoop.
- Utilized Amazon Web Services (AWS) S3, EC2, EMR and RDS, Redshift to setup storage for deploying machine learning models such as Neural Network one-layer and two-layer models.
- Setting up EC2 instances and deployment of patient applications and treatment records.
- Interaction with Business Analyst, SMEs and other Data Architects to understand Business needs and functionality for various project solutions.
- Identifying and executing process improvements, hands-on in various technologies such as Oracle, Informatica, and Business Objects.
- Designed both 3NF data models for ODS, OLTP systems and dimensional data models using Star and Snowflake Schemas.
Environment: Python, Jupyter, Tensor Flow, Keras, Theano, R Programming, SPSS, SQL Server 2014, SSRS, SSIS, SSAS, Spark, Scala, AWS, Microsoft office, SQL Server Management Studio, Business Intelligence Development Studio, MS Access, Informatica, SAP Business Objects and Business Intelligence.
Confidential, Carmel, IN
Data Scientist
Responsibilities:
- Involved in gathering, analyzing and translating business requirements into analytic approaches.
- Worked with Machine learning algorithms like Neural network models, Linear Regressions (linear, logistic etc.), SVMs, Decision trees for classification of groups and analyzing most significant variables.
- And utilized SAS for developing Pareto Chart for identifying highly impacting categories in modules to find teh work force distribution and created various data visualization charts.
- Performed univariate, bivariate and multiple analysis of approx. 4890 tuples using bar charts, box plots and histograms.
- Participated in features engineering such as feature creating, feature scaling and One-Hot encoding with Scikit-learn.
- Converted raw data to processed data by merging, finding outliers, errors, trends, missing values and distributions in teh data.
- Generated detailed report after validating teh graphs using R and adjusting teh variables to fit teh model.
- Worked on Clustering and factor analysis for classification of data using machine learning algorithms.
- Developed Descriptive statistics and inferential statistics for Logistics optimization, Value throughput data to at 95% confidence interval.
- Imported data by using Power Query in MS Excel from API’s and Web API’s then created relationship between data tables by using Power Pivot.
- Used Power Map and Power View to represent data very effectively to explain and understand technical and non-technical users.
- Written MapReduce code to process and parsing teh data from various sources and storing parsed data into HBase and Hive using HBase - Hive Integration.
- Created SQL tables with referential integrity and developed advanced queries using stored procedures and functions using SQL server management studio.
- Used TensorFlow, Keras, Theano, Pandas, NumPy, SciPy, Scikit-learn, NLTK in Python for developing various machine learning algorithms such as Neural network models, Linear Regression, multivariate regression, naïve Bayes, random Forests, decision trees, SVMs, K-means and KNN for data analysis.
- Responsible for developing data pipeline with AWS S3 to extract teh data and store in HDFS and deploy implemented machine learning models.
- Used Spark and Spark-SQL/Streaming for faster testing and processing ofdata.
- Used packages like dplyr, tidyr and ggplot2 in R Studio for data visualization and generated scatter plot and high low graph to identify relation between different variables.
- Worked on Business forecasting, segmentation analysis and Data mining and prepared management reports defining teh problem; documenting teh analysis and recommending courses of action to determine teh best outcomes.
- Experience with risk analysis, root cause analysis, cluster analysis, correlation and optimization and K-means algorithm for clustering data into groups.
- Coordinated with data scientists and senior technical staff to identify client's needs.
Environment: SQL Server 2012, Python, Jupyter, R 3.1.2, MATLAB, SSRS, SSIS, SSAS, MongoDB, HBase, HDFS, Hive, Pig, SAS, Power Query, Power Pivot, Power Map, Power View, Microsoft office, SQL Server Management Studio, Business Intelligence Development Studio, MS Access.
Confidential, Princeton, NJ
Data Analyst
Responsibilities:
- Developed complex SQL queries using group by, join, where clause to answer tester questions.
- Communicated and coordinated with other departments to collect Business Requirement Analysis and developed data flow mapping to load OLAP server for analysis of policies details.
- Worked on missing value imputation, outlier’s identification using Random Forest and Box Plots.
- Tackled highly imbalanced dataset using under sampling with ensemble methods, oversampling and cost sensitive algorithms.
- Improved prediction performance by using random forest and gradient boosting for feature selection with teh help of Scikit-learn library in Python.
- Utilized Parametric and Non-Parametric test in SPSS to draw insights from data for making business decisions.
- Implemented machine learning models (logistic regression, XGboost) with teh help of Scikit-learn in Python.
- Validated and selected models using k-fold cross validation, confusion matrices and worked on optimizing models for high recall rate.
- Involved with data profiling for multiple sources and answered complex business questions by providing data to business users.
- Participated in Agile planning process and daily scrums, providing details and discuss with team lead, Data Scientist, Data Analyst, Data Engineer and others.
- Experience with routine DBA activities like Query Optimization, Performance Tuning and Effective SQL Server configuration for better performance and cost reduction.
- Developed Tabular Reports, Sub Reports, Matrix Reports, drill down Reports and Charts using SQL Server Reporting Services (SSRS).
- Designed rich data visualizations with Tableau 9.4 and dynamic dashboards for business analysis.
Environment: SQL Server 2012, R programming, Python, MATLAB, SSRS, SSIS, SSAS, SPSS, Tableau, Minitab, Microsoft office, SQL Server Management Studio, Business Intelligence Development Studio, MS Access.
Confidential
Data Analyst
Responsibilities:
- Gathered requirements and documented requirements with Use cases in Requisite Pro and created different traceability views by MS Visio.
- Worked with Data Compliance teams to identify teh most suitable source of record and outlined teh data.
- Functioned with project team representatives to ensure that logical and physical ER/Studio data models were established in line with business standards and guidelines.
- Deeply analyzed teh clients’ data by using SQL Server Analytic Services (SSAS).
- Completed data cleaning process to discover like taking care of missing values by utilizing strategies, like, supplanting by mean, forward/backward fill, evacuating whole rows/columns/values, expelling outliers and errors, normalizing, and scaling information in data set.
- Implemented metadata repository, maintained data quality, data cleanup procedures, transformations, data standards, data governance program, scripts, stored procedures, triggers and executed test plans.
- Worked with team by Extracting Mainframe Flat Files (Fixed or CSV) onto UNIX Server and then converting them into Teradata Tables using SQL Server Integration Services (SSIS).
- Responsible for report generation using SQL Server Reporting Services (SSRS) and Crystal Reports based on business requirements and connect with Teradata base for generating daily reports.
- Developed visualizations and dashboards using Tableau to present analysis outcomes in terms of patterns, anomalies, and predictions use of bar charts, Line Plots, Scatter Plots, 3D plots, and histograms.
- Documented teh complete process flow to describe program development, logic, testing, implementation, application integration, and coding by using SQL Server Reporting Services (SSRS).
- Created customized SQL Queries using SQL Server 2008/2008R2 Enterprise to pull specifieddatafor analysis and report building in conjunction with Crystal Reports.
- Designed & developed various Ad hoc reports for different teams in Business (Teradata and MS ACCESS, MS EXCEL).
Environment: SSRS, SSIS, SSAS, SQL Server 2008/2008 R2 Enterprise, Tableau, MS Visio, MS Excel, MS Project, Teradata, Crystal Reports, ER Studio, Crystal reports, and Business Objects.