Data Scientist Resume Holmdel, New Jersey - Hire IT People

SUMMARY:

Strong hands on experience in the field of Data Sciences transforming business requirements into actionable data models, prediction models and informative reporting solutions working in a variety of industries including Banking and Manufacturing.
Expert in the Data Science process life cycle including Data Acquisition, Data Preparation, Data Manipulation, Feature Engineering, Machine Learning Algorithms, Validation, Visualization and Deployment.
Strong knowledge in Statistical methodologies such as Hypothesis Testing, Principal Component Analysis (PCA), Sampling Distributions, ANOVA, Chi - Square tests, Time Series, Factor Analysis, Discriminant Analysis.
Proficient in Python and its libraries such as NumPy, Pandas, Scikit-learn, Matplotlib and Seaborn.
Efficient in preprocessing data in Python using Visualization, Data cleaning, Correlation analysis, Imputations, Feature Selection, Scaling and Normalization, and Dimensionality Reduction methods.
Experienced in building various machine learning predictive models using algorithms such as Linear Regression, Logistic Regression, Naïve Bayes Classifier, Support Vector Machines (SVM), Neural Networks, KNN, K-means Clustering, Decision Trees, Ensemble methods (Random Forest, AdaBoost, Gradient Boosting, and Bagging).
Knowledge in Text Mining, Topic Modelling, Association Rules, Sentiment Analysis, Market Basket Analysis, Recommendation Systems, Natural Language Processing (NLP).
Knowledge on Time Series Analysis using AR, MA, ARIMA, GARCH and ARCH model.
Experienced in tuning models using Grid Search, Randomized Search, K-Fold Cross Validation.
Experience working with Big Data tools such as Hadoop - HDFS and MapReduce, Hive QL, Sqoop, Pig Latin and Apache Spark (PySpark).
Extensive experience working with RDBMS such as SQL Server, MySQL, Oracle and NoSQL databases such as MongoDB, Cassandra, HBase.
Adept in developing and debugging Stored Procedures, User-defined Functions (UDFs), Triggers, Indexes, Constraints, Views, Transactions and Queries using Transact-SQL (T-SQL).
Proficient in developing and designing ETL packages and reporting solutions using MS BI Suite (SSIS/SSRS).
Experience in building and publishing interactive reports and dashboards with design customizations based on the client requirements in Tableau, Looker, Power BI and SSRS.
Proficient in data visualization tools such as Tableau, Python Matplotlib, Python Seaborn, R Shiny, R ggplot2 to create visually powerful and actionable interactive reports and dashboards.
Knowledge and experience working in Waterfall as well as Agile environments including the Scrum process and using Project Management tools like ProjectLibre, Jira/Confluence and version control tools such as GitHub/Git.
Self-motivated, Fast Learner, good team lead and player, strong managing and communication skills

TECHNICAL SKILLS:

Databases: MS SQL Server 2008/2008R 2/2012/2014/2016, MongoDB 3.x, MySQL 5.x, Oracle, HBase, Amazon Redshift, Teradata

Statistical Methods: Hypothetical Testing, ANOVA, Time Series, Confidence Intervals, Bayes Law, Principal Component Analysis (PCA), Chi-square test, Chebyshev's inequality

Machine Learning: Linear Regressions, Logistic Regression, Na ve Bayes, Decision Trees, Random Forest, Support Vector Machine(SVM), Neural Networks, Sentiment Analysis, K-Means Clustering, K-nearest Neighbors (KNN), Ensemble Methods, Gradient Boosting Trees, Ada Boosting, PCA, LDA

Hadoop Ecosystem: Hadoop 2.x, Spark 2.x, MapReduce, Hive QL, HDFS, Sqoop, Pig Latin

BI Reporting Tools: Tableau 10.x / 9.x, MS SQL Server Integration Service and Reporting Service (SSIS/SSRS), Power BI

Data Visualization: Tableau, Python (Matplotlib, Seaborn), R(ggplot2), Looker, Power BI, QlikView

Languages: Python 2.x/3.x (NumPy, Pandas, Scikit-learn, Matplotlib, Seaborn), R (dplyr, ggplot2, rpart, caret, Random Forest, gbm, neuralnet), SQL (T-SQL, MySQL), C++, MATLAB, Octave

Operating Systems: UNIX/UNIX Shell Scripting (via PuTTY client), Linux and Windows XP/7/8/10, Mac OS

Other tools and technologies: Azure ML Studio, Google TensorFlow, Apache Tomcat Webserver, MS Office Suite, Lucid Chart, StatTools, ProjectLibre, Google Analytics, Google Tag Manager, Salesforce, MS SharePoint, Trello, JIRA, Confluence, GitHub/Git, AWS (EC2/S3/Redshift/EMR/Lambda)

PROFESSIONAL EXPERIENCE:

Confidential, Holmdel, New Jersey

Data Scientist

Extensively involved in all phases of data acquisition, data collection, data cleaning, model development, model validation, and visualization to deliver data science solutions
Built machine learning models to predict the customer life time value of the customers that are involved with the bank using the supervised and unsupervised learning methods.
Extracted data from various sources systems like SQL Server, Hadoop HDFS file system and various other sources to Oracle database.
Worked on data cleaning and ensured data quality, consistency, integrity using Pandas, NumPy.
Tackled highly imbalanced Fraud dataset using sampling techniques like under sampling and oversampling with SMOTE (Synthetic Minority Over-Sampling Technique) using Python Scikit-learn.
Used PCA and other feature engineering techniques to reduce the high dimensional data, feature normalization techniques and label encoding with Scikit-learn library in Python.
Used Pandas, NumPy, Seaborn, Matplotlib, Scikit-learn in Python for developing various machine learning models such as Gradient Boosting, Lasso/Ridge Regression, Random forest and step-wise regression.
Worked on Amazon Web Services cloud services to do machine learning on big data and using lambda function.
Developed Spark Python modules for machine learning & predictive analytics in Hadoop.
Used cross-validation to test the models with different batches of data to optimize the models and prevent overfitting.
Experimented with Ensemble methods to increase the accuracy of the training model with different Bagging and Boosting methods.
Deployed the model on AWS EC2 using Flask.
Created and maintained reports to display the status and performance of deployed model and algorithm with Tableau.

Technology Stack: Hadoop2.x, HDFS, Hive, Pig Latin, Oracle, MS-SQL Server, Apache Spark/PySpark/MLlib, Python 3.x (NumPy, Pandas, Scikit-learn, Matplotlib, Seaborn), Jupyter Notebook, Spyder, AWS, GitHub, Linux, Tableau.

Confidential, Arlington TX

Data Scientist

Communicated and coordinated with end client for collecting data and performed ETL to define the uniform standard format.
Queried and retrieved data from Oracle database servers to get the sample dataset.
In preprocessing phase, used Pandas to remove or replace all the missing data and balanced the dataset with Over-sampling the minority label class and Under-sampling the majority label class.
Used PCA and other feature engineering, feature normalization and label encoding Scikit-learn preprocessing techniques to reduce the high dimensional data (>150 features) using entire patient visit history, proprietary comorbidity flags and comorbidity scoring from over 12 million EMR and claims data.
In data exploration stage used correlation analysis and graphical techniques in Matplotlib and Seaborn to get some insights about the patient admission and discharge data.
Experimented with predictive models including Logistic Regression, Support Vector Machine (SVC), Gradient Boosting and Random Forest using Python Scikit-learn to predict whether a patient might be readmitted.
Designed and implemented Cross-validation and statistical tests including ANOVA, Chi-square test to verify the models’ significance.
Implemented, tuned and tested the model on AWS EC2 with the best performing algorithm and parameters.
Set up data preprocessing pipeline to guarantee the consistency between the training data and new coming data.
Deployed the model on AWS Lambda.
Collected the feedback after deployment, retrained the model to improve the performance.
Designed, developed and maintained daily and monthly summary, trending and benchmark reports in Tableau Desktop.
Used Agile methodology and Scrum process for project developing.

Technology Stack: AWS EC2, S3, Oracle DB, AWS Lambda, Linux, Python (Scikit-Learn/NumPy/Pandas/Matplotlib), Machine Learning (Logistic Regression/Support Vector Machine/Gradient Boosting/Random Forest), Tableau

Confidential, Atlanta, GA

Data Scientist

Modeled and simulated the warranty and lease operations of an electro mechanical RMC plant using machine learning, statistical modelling and WITNESS simulation software.
Implemented queuing theory concepts to model the system, verified and validated the model using statistical techniques.
Measured the performance statistics like number of products shipped, WIP, Idle time and analyzed the primary cost required to run the facility
Developed the cost analysis, identified the bottlenecks in the production process and devised effective solutions to improve the facility cost up to 7%.
Utilized decision theories linear programming methods like simplex, dual- simplex methods to find the optimum solutions for shift schedules, logistics.
Worked on data cleaning and ensured data quality, consistency, integrity using Pandas, NumPy.
Performed univariate and multivariate analysis on the data to identify any underlying pattern in the data and associations between the variables.
Explored and analyzed the customer specific features by using Matplotlib in Python and ggplot2 in R.
Performed data imputation using Scikit-learn package in Python.
Participated in features engineering such as feature generating, PCA, feature normalization and label encoding with Scikit-learn preprocessing.
Acquired the data of size 120k records from various sources and performed querying operations to get the required data for the analysis.
Loaded the data into SAS, analyzed the data set and prepared prediction model for various prediction variables.
Log & (1/x) Transformations were used before creating prediction models and eliminated outliers using partial regression plots.
Selected the best model out of all the models using techniques like forward elimination, backward elimination and stepwise approach.
Used F-Score, AUC/ROC, Confusion Matrix, Precision, and Recall evaluating different model’s performance.
Used Python 3.X (NumPy, SciPy, pandas, Scikit-learn, seaborn) and R (caret, trees, arules) to develop variety of models and algorithms for analytic purposes.
Provided delivery recommendations on optimal shift schedules, material handling solutions, Staff Employment and purchase of additional equipment as a function of service demand, and production control logic.

Confidential

BI Developer

Collaborated with data engineers and operation team to implement ETL process, wrote and optimized SQL queries to perform data extraction to fit the analytical requirements.
Experimented and built predictive models including ensemble models using machine learning algorithms such as Logistic regression, Random Forests and KNN to predict output requirement with demand.
Conducted analysis on operator behaviors and discover value of unaccounted time with RMF analysis; applied Six Sigma/lean implementation with clustering algorithms such as K-Means Clustering. Gaussian Mixture Model and Hierarchical Clustering.
Designed and implemented a recommendation system which leveraged Google Analytics data and the machine learning models and utilized Collaborative filtering techniques to recommend courses for different customers.
Designed rich data visualizations to model data into human-readable form with Tableau.

Technology Stack: SQL Server 2008 R2, SQL Server Management Studio, Microsoft BI Suite (SSIS/SSRS), T-SQL, Visual Studio 2010, Tableau, AWS RedShift, Hadoop, HDFS, Python 3.x (Scikit -Learn/ SciPy/ NumPy/ Pandas/ Matplotlib/ Seaborn), R (ggplot2/ caret/ trees), Tableau (9.x/10.x), Machine Learning (Logistic regression/ Random Forests/ KNN/ K-Means Clustering/ Gaussian Mixture Model / Hierarchical Clustering/ Ensemble methods/ Collaborative filtering), JIRA, GitHub, Agile/ SCRUM

We provide IT Staff Augmentation Services!

Data Scientist Resume

Holmdel New, JerseY

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship