We provide IT Staff Augmentation Services!

Data Scientist Resume

3.00/5 (Submit Your Rating)

NJ

SUMMARY:

  • Above 8+ years of experience in Machine Learning, Data mining with large data sets of Structured and Unstructured data, Data Acquisition, Data Validation, Predictive modeling, Data Visualization.
  • Extensive experience in Text Analytics, developing different Statistical Machine Learning, Data Mining solutions to various business problems and generating Data Visualizations using R, Python and Tableau.
  • Experience in developing different Statistical Machine Learning, Text Analytics, Data Mining solutions to various business generating and problems data visualizations using R and Tableau.
  • Expertise in transforming business requirements into building models, designing algorithms, developing data mining and reporting solutions that scales across massive volume of unstructured data and structured.
  • Proficient in Machine Learning techniques (Decision Trees, Linear, Logistics, Random Forest, SVM, Bayesian, XG Boost, K - Nearest Neighbors) and Statistical Modeling in Forecasting/ Predictive Analytics, Segmentation methodologies, Regression based models, Hypothesis testing, Factor analysis/ PCA, Ensembles.
  • Experience in designing visualizations using Tableau software and Storyline on web and desktop platforms, publishing and presenting dashboards.
  • Experience on advanced SASprogramming techniques, such as PROC APPEND, PROC DATASETS, and PROC TRANSPOSE.
  • Strong experience in Software Development Life Cycle (SDLC) including Requirements Analysis, Design Specification and Testing as per Cycle in both Waterfall and Agile methodologies.
  • Expertise in Python programming with various packages including NumPy, Pandas, SciPy and ScikitLearn.
  • Proficient in Data visualization tools such as Tableau, Plotly, PythonMatplotlib and Seaborn.
  • Familiar with Hadoop Ecosystem such as HDFS, HBase, Hive, Pig and Oozie.
  • Experienced in building models by using Spark (PySpark, SparkSQL, Spark MLLib, and Spark ML).
  • Experienced in Cloud Services such as AWS EC2, EMR, RDS, S3 to assist with big data tools, solve the data storage issue and work on deployment solution.
  • Worked on deployment tools such as Azure Machine Learning Studio, Oozie, and AWS Lambda.
  • Proficient in JAVA, Python, R, C/C++, SQL, Tableau.
  • Worked and extracted data from various database sources like Oracle, SQL Server and Teradata.
  • Experience in foundational machine learning models and concepts( Regression, boosting, GBM, NNs, HMMs, CRFs, MRFs, deep learning).
  • Skilled in System Analysis, Dimensional Data Modeling, Database Design and implementing RDBMS specific features.
  • Facilitated and helped translate complex quantitative methods into simplified solutions for users.
  • Knowledge of working with Proof of Concepts and gap analysis and gathered necessary data for analysis from different sources, prepared data for data exploration using data munging.
  • Git , Java , MySQL , MongoDB , Neo4J , AngularJS , SPSS , Tableau .
  • Excellent knowledge of Hadoop Ecosystem and Big Data tools as Pig, Hive &Spark.
  • Worked on different data formats such as JSON, XML and performed machinelearningalgorithms in Python.

TECHNICAL SKILLS:

Exploratory Data Analysis Univariate/Multivariate Outlier detection, Missing value imputation, Histograms/Density estimation, EDA in Tableau: Supervised Learning: Linear/Logistic Regression, Lasso, Ridge, Elastic Nets, Decision Trees, Ensemble Methods, Random Forests, Support Vector Machines, Gradient Boosting, XGB, Deep Neural Networks, Bayesian Learning

Unsupervised Learning Principal Component Analysis, Association Rules, Factor Analysis, K-Means, Hierarchical Clustering, Gaussian Mixture Models, Market Basket Analysis, Collaborative Filtering and Low Rank Matrix Factorization: Feature Selection: Stepwise, Recursive Feature Elimination, Relative Importance, Filter Methods, Wrapper Methods and Embedded Methods

Statistical Tests T Test, Chi-Square tests, Stationarity tests, Auto Correlation tests, Normality tests, Residual diagnostics, Partial dependence plots and Anova: Sampling Methods: Bootstrap sampling methods and Stratified sampling

Model Tuning/Selection Cross Validation, Walk Forward Estimation, AIC/BIC Criterions, Grid Search and Regularization: Time Series: ARIMA, Holt winters, Exponential smoothing, Bayesian structural time series

Machine Learning /: Deep Learning R caret, glmnet, forecast, xgboost, rpart, survival, arules, sqldf, dplyr, nloptr, lpSolve, ggplot.: SAS: Forecast server, SAS Procedures and Data Steps.

Spark MLlib, GraphX.: SQL: Subqueries, joins, DDL/DML statements.

Databases/ETL/Query: Teradata, SQL Server, Redshift, Postgres and Hadoop (MapReduce); SQL, Hive, Pig and Alteryx

Visualization: Tableau, ggplot2 and RShiny

Prototyping: PowerPoint, RShiny and Tableau

PROFESSIONAL EXPERIENCE:

Confidential, NJ

Data Scientist

Responsibilities:

  • Utilized Apache Spark with Python to develop and execute Big Data Analytics and Machine learning applications, executed machine learning use cases under Spark ML and Mllib.
  • Solutions architect for transforming business problems into BigData and Data Science solutions and define Big Data strategy and Roap map.
  • Identified areas of improvement in existing business by unearthing insights by analyzing vast amount of data using machine learning techniques. TensorFlow, Scala, Spark, MLLib, Python and other tools and languages needed.
  • Create and validate machine learning models with Azure Machine Learning
  • Designing a machine learning pipeline using Microsoft Azure Machine Learning to predict and prescribe and Implemented a machine learning scenario for a given data problem
  • Used Scala for coding the components in Play and Akka.
  • Worked on different Machine learning models like LogisticRegression, Multilayer perceptron classifier, K-means clustering by creating Scala-SBT packaging and run it in Spark-shell (Scala) and Auto-encoder model with using R programming.
  • Worked on setting up and configuring AWS's EMR Clusters and Used Amazon IAM to grant fine-grained access to AWS resources to users
  • Created detailed AWS Security Groups, which behaved as virtual firewalls that controlled the traffic allowed to reach one or more AWSEC2 instances
  • Wrote scripts and indexing strategy for a migration to Redshift from Postgres9.2 and MySQL databases.
  • Wrote Kinesis agents to pipe data from streaming app into S3.
  • Good Knowledge in Azure cloud services, Azure storage, Azure active directory, Azure Service Bus. Create and manage Azure ADtenants, and configure application integration with AzureAD. Integrate on-premises WindowsAD with AzureAD Integrating on-premises identity with Azure Active Directory.
  • Working knowledge of Azure Fabric, Micro services, IoT &Docker containers in Azure. Azure infrastructure management & PaaS Solution Architect - (Azure AD, Licenses, Office365, DR on cloud using Azure RecoveryVault, Azure Web Roles, Worker Roles, SQLAzure, Azure Storage).
  • Utilized Spark, Scala, Hadoop, HBase, Cassandra, MongoDB, Kafka, Spark Streaming, MLLib, Python, a broad variety of machine learning methods including classifications, regressions, dimensionally reduction etc. and Utilized the engine to increase user lifetime by 45% and triple user conversations for target categories.
  • Designed and developed NLP models for sentiment analysis.
  • Led discussions with users to gather business processes requirements and data requirements to develop a variety of Conceptual, Logical and Physical Data Models. Expert in Business Intelligence and Data Visualization tools: Tableau, Microstrategy.
  • Performed Multinomial Logistic Regression, Random forest, Decision Tree, SVM to classify package is going to deliver on time for the new route and Performed data analysis by using Hive to retrieve the data from Hadoop cluster, Sql to retrieve data from Oracle database.
  • Worked on machine learning on large size data using Spark and MapReduce.
  • Let the implementation of new statistical algorithms and operators on Hadoop and SQL platforms and utilized optimizations techniques, linear regressions, K-means clustering, Native Bayes and other approaches.
  • Developed Spark/Scala, Python for regular expression (regex) project in the Hadoop/Hive environment with Linux/Windows for big data resources.
  • Developed Data Mapping, Data Governance, Transformation and Cleansing rules for the Master Data Management Architecture involving OLTP, ODS and OLAP.
  • Data sources are extracted, transformed and loaded to generate CSV data files with Python programming and SQL queries.
  • Stored and retrieved data from data-warehouses using Amazon Redshift.
  • Worked on TeradataSQL queries, Teradata Indexes, Utilities such as Mload, Tpump, Fast load and Fast Export.
  • Application of various machine learning algorithms and statistical modeling like decision trees, regression models, neural networks, SVM, clustering to identify Volume using scikit-learn package in python, Matlab.
  • Used Data Warehousing Concepts like Ralph Kimball Methodology, Bill Inmon Methodology, OLAP, OLTP, Star Schema, Snow Flake Schema, Fact Table and Dimension Table.
  • Refined time-series data and validated mathematical models using analytical tools like R and SPSS to reduce forecasting errors.
  • Worked on data pre-processing and cleaning the data to perform feature engineering and performed data imputation techniques for the missing values in the dataset using Python.
  • Created Data Quality Scripts using SQL and Hive to validate successful data load and quality of the data. Created various types of data visualizations using Python and Tableau.

Environment: Horton works - Hadoop Map Reduce, Pyspark, Spark, R, Spark MLLib, Tableau, Informatica, SQL, Excel, VBA, BO, CSV, Erwin, SAS, AWS Redshift, ScalaNlp, Cassandra, Oracle, MongoDB, Cognos, SQL Server 2012, Teradata, DB2, SPSS, T-SQL, PL/SQL, Flat Files, XML, and Tableau.

Confidential, Boston, BA

Data Scientist

Responsibilities:

  • Machine Learning Projects based on Python, SQL, Spark and SAS advanced programming. Performed data exploratory, data visualizations, and feature selections
  • Applications of machine learning algorithms, including random forest and boosted tree, SVM, SGD, neural network, and deep learning using CNTK and Tensorflow.
  • Big data analytics with Hadoop, HiveQL, SparkRDD, and SparkSQL.
  • Tested Python/SAS on AWS cloud service and CNTK modeling on MS-Azure cloud service.
  • Created UI using JavaScript and HTML5/CSS.
  • Developed and tested many features for dashboard using Python, Bootstrap, CSS, and JavaScript.
  • Interacting with the ETL, BI teams to understand / support on various ongoing projects.
  • Extensively using MS Excel for data validation.
  • Extensively used open source tools - RStudio(R) and Spyder(Python) for statistical analysis and building the machine learning.
  • Performed data analysis by using Hive to retrieve the data from Hadoop cluster, Sql to retrieve data from Oracle database and used ETL for data transformation.
  • Exploring DAG's, their dependencies and logs using AirFlow pipelines for automation
  • Performed data cleaning and feature selection using MLlib package in PySpark and working with deep learning frameworks such as Caffe, Neon.
  • Developed Spark/Scala, R, and Python for regular expression (regex) project in the Hadoop/Hive environment with Linux/Windows for big data resources.
  • Tracking operations using sensors until certain criteria is met using Airflow technology.
  • Responsible for different Data mapping activities from Source systems to Teradata using utilities like TPump, FEXP,BTEQ, MLOAD, FLOAD etc
  • Developed PLSQL procedures and functions to automate billing operations, customer barring and number generations
  • Redesigned the workflows of Service Request, Bulk Service orders using UNIXCron jobs and PL/ SQL procedures, thereby reduced order processing time and average slippages per month dropped by 40%.

Environment: Data Governance, SQL Server, Python, ETL, MS Office Suite - Excel (Pivot, VLOOKUP), DB2, R, Visio, HP ALM, Agile, Azure, Data Quality, Tableau and Reference Data Management.

Confidential - Providence, RI

Data Analyst

Responsibilities:

  • Responsible for performing Machine-learning techniques regression/classification to predict the outcomes.
  • Performed Ad-hoc reporting/customer profiling, segmentation using R/Python.
  • Tracked various campaigns, generating customer profiling analysis and data manipulation.
  • Provided R/SQL programming, with detailed direction, in the execution of data analysis that contributed to the final project deliverables. Responsible for data mining.
  • Utilized Label Encoders in Python to convert non-numerical significant variables to numerical significant variables to identify their impact on pre-acquisition and post acquisitions by using 2 sample paired t test.
  • Worked with ETLSQL Server Integration Services (SSIS) for data investigation and mapping to extract data and applied fast parsing and enhanced efficiency by 17%.
  • Developed Data Science content involving Data Manipulation and Visualization, Web Scraping, Machine Learning, Python programming, SQL, GIT and ETL for DataExtraction.
  • Designing suite of Interactive dashboards, which provided an opportunity to scale and measure the statistics of the HR dept. which was not possible earlier and schedule and publish reports.
  • Provided and created data presentation to reduce biases and telling true story of people by pulling millions of rows of data using SQL and performed Exploratory DataAnalysis.
  • Applied breadth of knowledge in programming (Python, R), Descriptive, Inferential, and Experimental Design statistics, advanced mathematics, and database functionality (SQL, Hadoop).
  • Migrated data from Heterogeneous Data Sources and legacy system (DB2, Access, Excel) to centralized SQLServer databases using SQLServer Integration Services (SSIS).
  • Involved in defining the Source To business rules, Target data mappings, and data definitions.
  • Performing Data Validation / Data Reconciliation between disparate source and target systems for various projects.
  • Utilized a diverse array of technologies and tools as needed, to deliver insights such as R, SAS, Matlab, Tableau and more.
  • Built Regression model to understand order fulfillment time lag issue using Scikit-learn in Python.
  • Utilized Spark, Scala, Hadoop, HBase, Kafka, Spark Streaming, MLlib, Python, a broad variety of machine learning methods including classifications, regressions, dimensionally reduction etc.
  • Used T-SQL queries to pull the data from disparate systems and Data warehouse in different environments.
  • Worked closely with the Data Governance Office team in assessing the source systems for project deliverables.
  • Extracting data from different databases as per the business requirements using Sql Server Management Studio.
  • Interacting with the ETL, BI teams to understand / support on various ongoing projects.
  • Extensively using MS Excel for data validation.

Environment: Data Governance, SQL Server, Python, ETL, MS Office Suite - Excel(Pivot, VLOOKUP), DB2, R, Visio, HP ALM, Agile, Azure, MDM, Share point, Data Quality, Tableau and Reference Data Management.

Confidential - Memphis, TN

Data Modler

Responsibilities:

  • Implemented user interface guidelines and standards throughout the development and maintenance of the website using the HTML, CSS, JavaScript and JQuery.
  • Used Django to interface with the JQueryUI and manage the storage and deletion of content.
  • Used Hive queries for data analysis to meet the business requirements.
  • Involved with advanced CSS concepts and building table-free layouts.
  • Used advanced packages like Mock, patch and beautiful soup (b4) to perform unit testing.
  • Used Pandas library for statistics Analysis.
  • Used Numpy for Numerical analysis for Insurance premium.
  • Worked on rebranding the existing web pages to clients according to the type of deployment.
  • Created UI using JavaScript and HTML5/CSS.
  • Developed and tested many features for dashboard using Bootstrap, CSS, and JavaScript.
  • Managed a small team of programmers using a modified version of the agile development.
  • Worked on Jenkins continuous integration tool for deployment of project.
  • Worked on updating the existing clipboard to have the new features as per the client requirements.
  • Performed Unit testing, Integration Testing, GUI and web application testing using Selenium.

Environment: Django, HTML5, CSS, XML, Kafka, MySQL, JavaScript, Angular JS, Backbone JS, Nginix server, Amazon s3, Jenkins, Beautiful soup, JavaScript, Eclipse, Git, GitHub, Linux, and MAC OSX.

Confidential

Data Analyst/Data Modeler

Responsibilities:

  • Analyzed data sources and requirements and business rules to perform logical and physical data modeling.
  • Analyzed and designed best fit logical and physical data models and relational database definitions using DB2. Generated reports of data definitions.
  • Involved in Normalization/De-normalization, Normal Form and database design methodology.
  • Maintained existing ETL procedures, fixed bugs and restored software to production environment.
  • Developed the code as per the client's requirements using SQL, PL/SQL and Data Warehousing concepts.
  • Involved in Dimensional modeling (Star Schema) of the Data warehouse and used Erwin to design the business process, dimensions and measured facts.
  • Worked with Data Warehouse Extract and load developers to design mappings for Data Capture, Staging, Cleansing, Loading, and Auditing.
  • Developed enterprise data model management process to manage multiple data models developed by different groups
  • Designed and created Data Marts as part of a data warehouse.
  • Wrote complex SQL queries for validating the data against different kinds of reports generated by Business Objects XIR2.
  • Using Erwin modeling tool, publishing of a data dictionary, review of the model and dictionary with subject matter experts and generation of data definition language.
  • Coordinated with DBA in implementing the Database changes and also updating Data Models with changes implemented in development, QA and Production. Worked Extensively with DBA and Reporting team for improving the Report Performance with the Use of appropriate indexes and Partitioning.
  • Developed Data Mapping, Transformation and Cleansing rules for the Master Data Management Architecture involved OLTP, ODS and OLAP.
  • Tuned and coded optimization using different techniques like dynamic SQL, dynamic cursors, and tuning SQL queries, writing generic procedures, functions and packages.
  • Experienced in GUI, Relational Database Management System (RDBMS), designing of OLAP system environment as well as Report Development.
  • Extensively used SQL, T-SQL and PL/SQL to write stored procedures, functions, packages and triggers.
  • Analyzed of data report were prepared weekly, biweekly, monthly using MS Excel, SQL & UNIX.

Environment: ER Studio, Informatica Power Center 8.1/9.1, Power Connect/ Power exchange, Oracle 11g, Mainframes,DB2 MS SQL Server 2008, SQL,PL/SQL, XML, Windows NT 4.0, Tableau, Workday, SPSS, SAS, Business Objects, XML, Tableau, Unix Shell Scripting, Teradata, Netezza, Aginity.

Confidential

SQL developer

Responsibilities:

  • Responsible for the study of SAS Code, SQL Queries, Analysis enhancements and documentation of the system.
  • Used R, SAS, and SQL to manipulate data, and develop and validate quantitative models.
  • Brainstorming sessions and propose hypothesis, approaches, and techniques.
  • Analyzed data collected in stores (JCL jobs, stored-procedures, and queries) and provided reports to the Business team by storing the data in excel/SPSS/SAS file.
  • Performed Analysis and Interpretation of the reports on various findings.
  • Responsible for production support Abend Resolution and other production support activities and comparing the seasonal trends based on the data by Excel.
  • Used advanced Microsoft Excel functions such as pivot tables and VLOOKUP in order to analyze the data.
  • Successfully implemented migration of client's requirement application from Test/DSS/Model regions to production.
  • Prepared SQL scripts for ODBC and Tera-data servers for analysis and modeling.
  • Provided complete assistance of the trends of the financial time series data.
  • Various statistical tests performed for clear understanding to the client.
  • Implemented procedures for extracting Excel sheet data into the mainframe environment by connecting to the database using SQL.
  • Complete support to all regions (Test/Model/System/Regression/Production)
  • Actively involved in Analysis, Development, and Unit testing of the data.

Environment: R/R Studio, SQL Enterprise Manager, SAS, Microsoft Excel, Microsoft Access, outlook.

We'd love your feedback!