Data Scientist Resume South Plainfield, NJ - Hire IT People

SUMMARY:

A Passionate, team - oriented Data Scientist with over 6 years of experience in Statistical Modeling, Data Mining, Data Visualization and Machine Learning with rich domain knowledge in Retail, Healthcare and Banking industries.
Expertise in transforming business resources and tasks into regularized data and analytical models, designing algorithms, developing data mining and reporting solutions across a massive volume of structured and unstructured data.
Involved in entire data science project life cycle, including Data Acquisition, Data Cleansing, Data Manipulation, Feature Engineering, Modelling, Evaluation, Optimization, Testing and Deployment.
Proficient in Machine Learning algorithms and Predictive Modeling including Linear Regression, Logistic Regression, Naive Bayes, Decision Tree, Neural Networks, Random Forest, Ensemble Models, SVM, KNN and K-means clustering.
Solid experience in Deep Learning techniques with Convolutional Neural Networks(CNN), Recursive Neural Networks(RNN), max pooling, normalization and different architectures such as Alexnet, VGG and Darknet.
Excellent proficiency in model validation and optimization with Model selection, Parameter tuning and K-fold cross validation.
Deep understanding of Statistical Methodologies including Hypothesis test, ANOVA, and Chi-Square.
Strong experience with Python (2.x, 3.x) and R Programming to develop analytic models and solutions.
Extensive experience in RDBMS such as SQL server 2012, Oracle 9i/10g.
Experienced in Non-relational database such as MongoDB 3.x.
Familiar with Hadoop ecosystem and Apache Spark framework such as HDFS, MapReduce, Pig Latin, HiveQL, SparkSQL, PySpark.
Proficient in data visualization tools such as Tableau, Python Matplotlib/Seaborn, R ggplot2/Shiny to create visually impactful and actionable interactive reports and dashboards.
Experienced in Amazon Web Services (AWS), such as AWS EC2, EMR, S3, RD3, and Redshift.
Experienced in designing and developing T-SQL queries, ETL packages and business reports using SQL Server Management Studio (SSMS) and BI Suite (SSIS/SSRS).
Adept in developing and debugging Stored Procedures, User-defined Functions (UDFs), Triggers, Indexes, Constraints, Transactions and Queries using Transact-SQL (T-SQL).
Experienced in ticketing systems such as JIRA/confluence and version control tools such as Github.
Excellent understanding of Systems Development Life Cycle (SDLC) such as Agile and Waterfall.
Strong business acumen and analytical skills to translate numbers into actionable business decisions. Great passion in learning cutting-edge theories and algorithms for Machine Learning and always looking for new challenges.

TECHNICIAL SKILLSETS

Databases: MS SQL Server 2008/2008R2/2012/2014, Oracle, HBase, Amazon Redshift, MongoDB 3.x, Teradata

Statistical Methods: Hypothetical Testing, ANOVA, Chi-Square, Exploratory Data Analysis (EDA), Confidence Intervals, Bayes Law, Principal Component Analysis (PCA), Dimensionality Reduction, Cross-Validation, Auto-correlation

Machine Learning: Regression analysis, Naïve Bayes, Decision Tree, Random Forests, Support Vector Machine, Neural Network, Sentiment Analysis, Collaborative Filtering, K-Means Clustering, KNN, CNN, RNN and AdaBoost.

Hadoop Ecosystem: Hadoop 2.x, Spark 2.x, MapReduce, Hive, HDFS, Pig

Cloud Services: Amazon Web Services (AWS) EC2/S3/Redshift

Deep Learning: Keras, Tensor flow, Theano, AlexNet, VGG, CNN, and RNN

Reporting Tools: Tableau Suite of Tools 7.x/8.X/9.X/10.X Server and Online, Server Reporting Services(SSRS)

Data Visualization: Tableau, MatPlotLib, Seaborn, ggplot2

Languages: Python (2.x/3.x), R, Java, SQL

Operating Systems: Microsoft Windows, Linux (Ubuntu), Microsoft Office Suite (Word, PowerPoint, Excel)

PROFESSIONAL EXPERIENCE:

Confidential, South Plainfield, NJ

Data Scientist

Responsibilities:

Collaborated with data engineers and operation team to implement ETL process, wrote and optimized SQL queries to perform data extraction to fit the analytical requirements.
Performed data analysis by using Hive to retrieve the data from Hadoop cluster, SQL to retrieve data from RedShift.
Worked on data cleaning and ensured data quality, consistency, integrity using Pandas, Numpy.
Performed univariate and multivariate analysis on the data to identify any underlying pattern in the data and associations between the variables.
Explored and analyzed the customer specific features by using Matplotlib in Python and ggplot2 in R.
Performed data imputation using Scikit-learn package in Python.
Participated in features engineering such as feature generating, PCA, feature normalization and label encoding with Scikit-learn preprocessing.
Used Python 3.X (numpy, scipy, pandas, scikit-learn, seaborn) and R (caret, trees, arules) to develop variety of models and algorithms for analytic purposes.
Experimented and built predictive models including ensemble models using machine learning algorithms such as Logistic regression, Random Forests and KNN to predict customer churn.
Conducted analysis on customer behaviors and discover value of customers with RMF analysis; applied customer segmentation with clustering algorithms such as K-Means Clustering. Gaussian Mixture Model and Hierarchical Clustering.
Used F-Score, AUC/ROC, Confusion Matrix, Precision, and Recall to evaluate different models’ performance.
Designed and implemented a recommendation system which leveraged Google Analytics data and the machine learning models and utilized Collaborative filtering techniques to recommend items for different customers.
Designed rich data visualizations to model data into human-readable form with Tableau and Matplotlib.

Environment: AWS RedShift, Hadoop, HDFS, Python 3.x (Scikit-Learn/Scipy/Numpy/Pandas/Matplotlib/Seaborn), R (ggplot2/caret/trees/arules), Tableau (9.x), Machine Learning (Logistic regression/Random Forests/KNN/K-Means Clustering/Gaussian Mixture Model/Hierarchical Clustering/Ensemble methods/Collaborative filtering), JIRA, Github, Agile/SCRUM

Confidential, Union, NJ

Data Scientist

Responsibilities:

Oversaw the ETL process. Extracted and merged data using optimized SQL queries from SQL Server 2012.
Aggregated data on collected unstructured data in Mongo DB 3.3.
Performed data cleaning, exploratory analysis and data integrity analysis using Pandas, Numpy.
Analyzed the customer behavior and value using RMF analysis.
Experimented with multiple classification algorithms, such as Logistic Regression, Support Vector Machine (SVM), Random Forest, and Adaboost using Python Scikit-Learn and evaluated the performance
Researched on segmentation of customers by using Random Forest, K-means and Hierarchical Clustering.
Developed the product recommendation engine using Content Filtering, Collaborative Filtering and Gradient Boosting Tree Algorithms.
Generated dashboard and report using Tableau.
Evaluated the marketing strategy using A/B testing.
Conducted sentiment analysis of customer service based on the survey.

Environment: Python 3.X (Scikit-Learn/Numpy/Pandas/Matplotlib/Seaborn), SQL Server 2012, MongoDB 2.X, Tableau 8.X, Git 2.X, AWS EC2, S3

Confidential, Paterson, NJ

Data Scientist

Responsibilities:

Gathered, analyzed, and translated business requirements, communicated with other departments to collect client business requirements and access available data.
Collected data in Hadoop and performed data preparation using Pig Latin to get the right format.
In Preprocessing phase, used Pandas and Scikit-Learn to remove or impute missing values, detect outliers, scale features, and applied feature selection (filtering) to eliminate irrelevant features.
Conducted Exploratory Data Analysis using Python Matplotlib and Seaborn to identify underlying patterns and correlation between features.
Balanced the dataset by over-sampling the minority label class and under-sampling the majority label class.
Used Python (Numpy, Scipy, Pandas, Scikit-Learn, Seaborn), and Spark 2.0 (PySpark, MLlib) to develop variety of models and algorithms for analytic purposes.
Experimented with multiple classification algorithms, such as Logistic Regression, Support Vector Machine (SVM), Random Forest, and Adaboost using Python Scikit-Learn and evaluated the performance.
Implemented, tuned, and tested the model on AWS EC2 to get the best algorithm and parameters.
Used F-Score, AUC/ROC, Confusion Matrix, RMSE to evaluate different model performance.
Tracked the performance by unseen data, retrained the model to improve the accuracy.

Environment: AWS EC2, S3, Hadoop, Pig, HDFS, Spark (PySpark/MLlib/Spark SQL), Python 3.x (Numpy/ Pandas/ Matplotlib/ Seaborn/ Scipy/ Scikit-Learn), MS SQL Server 2012

Confidential, New York, NY

SQL BI Developer/Data Analyst

Responsibilities:

Worked on company’s database and business model and was actively involved in gathering user/project requirements from different stakeholders; worked on documentations required for the project in hand.
Extracted data using T-SQL in SQL server to write Queries, Stored procedures, Triggers, Views, Temp Tables and User-Defined Functions (UDFs).
Designed and developed ETL packages using SSIS to create Data Warehouses from different tables and file sources like Flat and Excel files.
Used different methods in SSIS such as derived columns, aggregations, Merge joins, count, conditional split and more to transform the data.
Developed reporting solutions for different stakeholders from mock-up till deployment in different areas such as Claims, Transactions, Supply, Assets and others in SSRS.
Optimized Queries in T-SQL by removing redundancies, retrieving essential data and using SQL methods like Joins efficiently.

Environment: s: MS SQL Server 2008/2008R2/2012 (T-SQL), SQL Server Management Studio, SQL Server Integration Service, SQL Server Reporting Service, Windows 7, MS Office Suite 2010, Tableau (6.X)

We provide IT Staff Augmentation Services!

Data Scientist Resume

South Plainfield, NJ

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship