We provide IT Staff Augmentation Services!

Data Scientist Resume

2.00/5 (Submit Your Rating)

Fishers, IN

PROFESSIONAL SUMMARY:

  • Above 6+ years of professional IT experience in Web Applications, Machine Learning, Deep Learning, Reinforcement Learning, Statistics, Transforming Data with large datasets of Structured and Unstructured data, Data Engineering, Data Validation, Predictive modeling, and Data Visualization.
  • Extensive use of Natural Language Processing (NLP) for Text Analytics and Web scraping, developing different Statistical Machine Learning, Data Mining solutions to various business problems and generating data visualizations using Python3.4.3/2.7, R 3.2.2, SAS, SQL, PL/SQL and Tableau9.4.
  • Hands on experience in implementing LDA, Naïve Bayes and skilled in Random Forests, Decision Trees, Linear and Logistic Regression, SVM, Clustering, Bagging, Boosting, Neural networks, Principle Component Analysis and good knowledge on Recommender Systems.
  • Proficient in Statistical Modeling and Machine Learning techniques (Linear, Logistics, Decision Trees, Random Forest, SVM, K - Nearest Neighbors, Bayesian, XG Boost) in Forecasting/ Predictive Analytics, Segmentation methodologies, Regression based models, Hypothesis testing, Factor analysis/ PCA, and Ensembles.
  • Expertise in transforming business requirements into Analytical models, designing algorithms, building models, developing datamining and reporting solutions that scales across massive volume of structured and unstructured data.
  • Strong experience in Software Development Life Cycle (SDLC) including Requirements Analysis, Design Specification and Testing as per Cycle in both Lean Six Sigma and Agile methodologies.
  • Adept in statistical programming languages like R3.2.5, MATLAB and Python 3.4.3/2.7.3 including Big Data technologies like Hadoop, Hive.
  • Experienced over translating data driven insights combined with domain knowledge into business stories and dashboards using traditional presentation tools as well as visualization tools like Tableau 9.4, R3.2.2 programing and Python 3.4.3.
  • Expert in working on all activities related to the development, implementation, administration and support of ETL processes for large-scale Data Warehouses using Data Transformation Services (DTS) and SQL Server Integration Services (SSIS) with MS SQL 2014/2012/2008 R2.
  • Solid understanding of RDBMS database concepts including Normalization and hands on experience creating database objects such as tables, views, stored procedures, triggers, row-level audit tables, cursors, indexes, user-defined data-types and functions.
  • Experience with Data Analytics, Data Reporting, Ad-hoc Reporting, Graphs, Scales, Power PivotTables, Power View, and OLAP reporting using SQL Server Reporting Services (SSRS).
  • Good understanding of Microsoft SQL Management Studio and data load / export utilities like BTEQ, Fast Load, Multi Load, Fast Export and Performance Tuning for enhancing extraction speed of data.

TECHNICAL SKILLS:

Machine Learning Algorithms: Multivariate regression analysis, SVM, Cluster Analysis, NLP, Deep Learning, Parametric and Non-Parametric Tests and Analysis of variance technique, Polynomial regression, Decision trees, Random Forest, Apriori, MBA, Factor Analysis, Principal Component Analysis, K Nearest Neighbors, Reinforcement Leaning.

Programming Languages: Python3.4.3/2.7, R3.2.2, SQL, PL/SQL, SAS, C, C++, and Java.

Databases: SQL Server 2014/2012/2008 R2, Oracle 11g/10g, MS Access, DB2, and Teradata.

Tools: HQL, Pig, Apache Spark, Microsoft Visual Studio, Minitab, Arena, SPSS, Microsoft SQL Integration Services, Microsoft SQL Analyzing Services, Microsoft SQL Reporting Services, and Oracle Database Integrator.

Environment: Jupyter, Spyder, Python Console, PyCharm, R Studio, and Anaconda.

Operating system: Windows, Linux, and Unix.

SQL Server Business Intelligence Development: SSIS, SSAS, SSRS, Informatica, Alteryx and SAP Business Objects and Business Intelligence.

BI Tools: Tableau, Tableau server, Tableau Reader, SAP Business Objects, OBIEE, QlikView, SAP Business Intelligence and Power Pivot, Power Query, Power Map, and Power View.

WORK EXPERIENCE:

Confidential, Fishers, IN

Data Scientist

Responsibilities:

  • Responsible for performing Machine-learning techniques regression/classification to predict the outcomes.
  • Responsible for design and development of advanced R/Python programs to prepare transform and harmonize data sets in preparation for modeling.
  • Setup storage and data analysis tools in Amazon Web Services cloud computing infrastructure.
  • A highly immersive Data Science program involving Data Manipulation & Visualization, Web Scraping, Machine Learning, Python programming, SQL, GIT, Unix Commands, NoSQL, MongoDB, Hadoop.
  • Involved in business process modeling using UML
  • Identifying and executing process improvements, hands-on in various technologies such as Oracle, Informatica and Business Objects.
  • Created PL/SQL packages and Database Triggers and developed user procedures and prepared user manuals for the innovative programs
  • Participated in Business meetings to understand the business needs & requirements.
  • Participated in Architect solution meetings & guidance in Dimensional Data Modeling design.
  • Design Logical & Physical Data Model using MS Visio 2003 data modeler tool.
  • Coordinate and communicate with technical teams for any data requirements.
  • Participated in stake holder’s meetings to understand the business needs & requirements.

Environment: Python, ANN, Regression, Spark MLlib, Oryx 2, Naive Bayes, K- means, SVM, Accord.NET, Flask, ORM, Jinja 2, Django, Mako, Amazon Machine Learning (AML), Apache.

Confidential, Alpharetta, GA

Data Scientist

Responsibilities:

  • Experience working in Data Requirement analysis for transforming data according to business requirements.
  • Applied Forward Elimination and Backward Elimination for data sets to identify most statically significant variables for Data analysis.
  • Utilized Label Encoders in Python to create dummy variables for geographic locations to identify their impact on pre-acquisition and post acquisitions by using 2 sample paired t test.
  • Worked with ETL SQL Server Integration Services (SSIS) for data investigation and mapping to extract data and applied fast parsing and enhanced efficiency by 17%.
  • Developed Data Science content involving Data Manipulation and Visualization, Web Scraping, Machine Learning, Python programming, SQL, GIT and ETL for Data Extraction.
  • Built Analytical systems, data structures, gather and manipulate data, using statistical techniques.
  • Designing suite of Interactive dashboards, which provided an opportunity to scale and measure the statistics of the HR dept. which was not possible earlier and schedule and publish reports.
  • Provided and created data presentation to reduce biases and telling true story of people by pulling millions of rows of data using SQL and performed Exploratory Data Analysis.
  • Applied breadth of knowledge in programming (Python, R), Descriptive, Inferential, and Experimental Design statistics, advanced mathematics, and database functionality (SQL, Hadoop).
  • Migrated data from Heterogeneous Data Sources and legacy system (DB2, Access, Excel) to centralized SQL Server databases using SQL Server Integration Services (SSIS).
  • Applied Descriptive statistics and Inferential Statistics on varies data attributes using SPSS to draw insights of data regarding providing products and services for patients.
  • Developed Machine learning algorithms such as Collaborative filtering, Neural Network models, Hybrid recommendation model and NLP and apply on patient data sets to get better predictive insights.
  • Rapidly evaluated Machine learning algorithms and Deep Learning frameworks, tools, techniques and approaches for deployment in AWS such as S3, EC2, EMR and RDS and Redshift.
  • Utilized data reduction techniques such as Factor analysis to identify most correlated values to underlying factors of the data and categorized the variable according to factors.
  • Developed MapReduce/Spark Python modules for Machine Learning & Predictive analytics in Hadoop.
  • Performance Tuning: Analyze the requirements and fine tune the stored procedures/queries to improve the performance of the application.
  • Used Spark Streaming to divide streaming data into batches for batch processing.
  • Developed various Tableau9.4 Data Models by extracting and using the data from various sources files, DB2, Excel, Flat Files and Bigdata.
  • Handled importing data from various data sources, performed transformations using Hive, Map Reduce, HBase and loaded data into HDFS by using HQL queries in Hadoop.
  • Interaction with Business Analyst, SMEs and other Data Architects to understand Business needs and functionality for various project solutions.

Environment: Python, R Programming, Jupyter, SPSS, SQL Server 2014, SSRS, SSIS, SSAS, SQL Server Management Studio, Spark, Business Intelligence Development Studio, MS Access, SAP Business Objects and Business Intelligence.

Confidential, Carmel, IN

Data Scientist

Responsibilities:

  • Involved in gathering, analyzing and translating business requirements into analytic approaches.
  • Worked with Machine learning algorithms like Neural network models, Linear Regressions (linear, logistic etc.), SVM’s, Decision trees for classification of groups and analyzing most significant variables such as global active power, reactive power, voltage and global intensity.
  • And utilized SAS for developing Pareto Chart for identifying highly impacting categories in modules to find the work force distribution and created various data visualization charts.
  • Performed univariate, bivariate and multivariate analysis of approx. 4890 tuples using bar charts, box plots and histograms.
  • Participated in features engineering such as feature creating, feature scaling and One-Hot encoding with Scikit-learn.
  • Converted raw data to processed data by merging, finding outliers, errors, trends, missing values and distributions in the data.
  • Generated detailed report after validating the graphs using R and adjusting the variables to fit the model.
  • Worked on Clustering and factor analysis for classification of data using machine learning algorithms.
  • Developed Descriptive statistics and inferential statistics for Logistics optimization, Transmission power, Value throughput data to at 95% confidence interval.
  • Written MapReduce code to process and parsing the data from various sources and storing parsed data into HBase and Hive using HBase - Hive Integration.
  • Created SQL tables with referential integrity and developed advanced queries using stored procedures and functions using SQL server management studio.
  • Used Pandas, NumPy, seaborn, SciPy, Matplotlib, Scikit-learn, NLTK in Python for developing various machine learning algorithms such as Neural network models, Linear Regression, multivariate regression, naïve Bayes, Random Forests, decision trees, SVMs, K-means and KNN for data analysis.
  • Used Spark and Spark-SQL/Streaming for faster testing and processing of data.
  • Responsible for developing data pipeline with AWS S3 to extract the data and store in HDFS and deploy implemented all machine learning models.
  • Used packages like dplyr, tidyr and ggplot2 in R Studio for data visualization and generated scatter plot and high low graph to identify relation between different variables.
  • Developed Spark Scala code to cleanse and perform on the data in data pipeline in different stages.
  • Imported data by using Power Query from API’s and Web API’s then created relationship between data tables by using Power Pivot and used Power Map and Power View to represent data.
  • Worked on Business forecasting, segmentation analysis and Data mining and prepared management reports defining the problem; documenting the analysis and recommending courses of action to determine the best outcomes.
  • Experience with risk analysis, root cause analysis, cluster analysis, correlation and optimization and K-means algorithm for clustering data into groups.

Environment: SQL Server 2012, Jupyter, R 3.1.2, Python, MATLAB, SSRS, SSIS, SSAS, Spark, MongoDB, HBase, HDFS, Hive, Pig, Microsoft office, SQL Server Management Studio, Business Intelligence Development Studio, MS Access.

Confidential, Princeton, NJ

Data Analyst

Responsibilities:

  • Developed complex SQL queries using group by, join, where clause to answer tester questions.
  • Communicated and coordinated with other departments to collect Business Requirement Analysis and developed data flow mapping to load OLAP server for analysis of policies details.
  • Worked on missing value imputation, outlier’s identification using Random Forest and Box Plots.
  • Tackled highly imbalanced dataset using under sampling with ensemble methods, oversampling and cost sensitive algorithms.
  • Improved prediction performance by using random forest and gradient boosting for feature selection with Python Scikit-learn.
  • Utilized Parametric and Non-Parametric test in SPSS to draw insight from data for making business decisions.
  • Implemented machine learning models (logistic regression, XGboost) with Python Scikit-learn.
  • Validated and selected models using k-fold cross validation, confusion matrices and worked on optimizing models for high recall rate.
  • Involved with data profiling for multiple sources and answered complex business questions by providing data to business users.
  • Experience with routine DBA activities like Query Optimization, Performance Tuning and Effective SQL Server configuration for better performance and cost reduction.
  • Developed Tabular Reports, Sub Reports, Matrix Reports, drill down Reports and Charts using SQL Server Reporting Services (SSRS).
  • Designed rich data visualizations with Tableau 9.4 and dynamic dashboards for business analysis.

Environment: SQL Server 2012, R programming, Python, MATLAB, SSRS, SSIS, SSAS, SPSS, Minitab, SQL Server Management Studio, Business Intelligence Development Studio, MS Access.

Confidential, NJ

Data Analyst

Responsibilities:

  • Implemented Microsoft Visio and Rational Rose for designing the Use Case Diagrams, Class model, Sequence diagrams, and Activity diagrams for SDLC process of the application
  • Worked with other teams to analyze customers to analyze parameters of marketing.
  • Conducted Design reviews and Technical reviews with other project stakeholders.
  • Was a part of the complete life cycle of the project from the requirements to the production support?
  • Created test plan documents for all back-end database modules
  • Used MS Excel, MS Access and SQL to write and run various queries.
  • Used traceability matrix to trace the requirements of the organization.
  • Recommended structural changes and enhancements to systems and databases.
  • Conducted Design reviews and Technical reviews with other project stakeholders.
  • Maintenance in the testing team for System testing/Integration/UAT
  • Guaranteeing quality in the deliverables.

Environment: UNIX, SQL, Oracle 10g, MS Office, MS Visio.

We'd love your feedback!