We provide IT Staff Augmentation Services!

Data Scientist Resume

3.00/5 (Submit Your Rating)

South Plainfield, NJ

SUMMARY:

  • A Passionate, team - oriented Data Scientist with over 6 years of experience in Statistical Modeling, Data Mining, Data Visualization and Machine Learning with rich domain knowledge in Retail, Healthcare and Banking industries.
  • Expertise in transforming business resources and tasks into regularized data and analytical models, designing algorithms, developing data mining and reporting solutions across a massive volume of structured and unstructured data.
  • Involved in entire data science project life cycle, including Data Acquisition, Data Cleansing, Data Manipulation, Feature Engineering, Modelling, Evaluation, Optimization, Testing and Deployment.
  • Proficient in Machine Learning algorithms and Predictive Modeling including Linear Regression, Logistic Regression, Naive Bayes, Decision Tree, Neural Networks, Random Forest, Ensemble Models, SVM, KNN and K-means clustering.
  • Solid experience in Deep Learning techniques with Convolutional Neural Networks(CNN), Recursive Neural Networks(RNN), max pooling, normalization and different architectures such as Alexnet, VGG and Darknet.
  • Excellent proficiency in model validation and optimization with Model selection, Parameter tuning and K-fold cross validation.
  • Deep understanding of Statistical Methodologies including Hypothesis test, ANOVA, and Chi-Square.
  • Strong experience with Python (2.x, 3.x) and R Programming to develop analytic models and solutions.
  • Extensive experience in RDBMS such as SQL server 2012, Oracle 9i/10g.
  • Experienced in Non-relational database such as MongoDB 3.x.
  • Familiar with Hadoop ecosystem and Apache Spark framework such as HDFS, MapReduce, Pig Latin, HiveQL, SparkSQL, PySpark.
  • Proficient in data visualization tools such as Tableau, Python Matplotlib/Seaborn, R ggplot2/Shiny to create visually impactful and actionable interactive reports and dashboards.
  • Experienced in Amazon Web Services (AWS), such as AWS EC2, EMR, S3, RD3, and Redshift.
  • Experienced in designing and developing T-SQL queries, ETL packages and business reports using SQL Server Management Studio (SSMS) and BI Suite (SSIS/SSRS).
  • Adept in developing and debugging Stored Procedures, User-defined Functions (UDFs), Triggers, Indexes, Constraints, Transactions and Queries using Transact-SQL (T-SQL).
  • Experienced in ticketing systems such as JIRA/confluence and version control tools such as Github.
  • Excellent understanding of Systems Development Life Cycle (SDLC) such as Agile and Waterfall.
  • Strong business acumen and analytical skills to translate numbers into actionable business decisions. Great passion in learning cutting-edge theories and algorithms for Machine Learning and always looking for new challenges.

TECHNICIAL SKILLSETS

Databases: MS SQL Server 2008/2008R2/2012/2014, Oracle, HBase, Amazon Redshift, MongoDB 3.x, Teradata

Statistical Methods: Hypothetical Testing, ANOVA, Chi-Square, Exploratory Data Analysis (EDA), Confidence Intervals, Bayes Law, Principal Component Analysis (PCA), Dimensionality Reduction, Cross-Validation, Auto-correlation

Machine Learning: Regression analysis, Naïve Bayes, Decision Tree, Random Forests, Support Vector Machine, Neural Network, Sentiment Analysis, Collaborative Filtering, K-Means Clustering, KNN, CNN, RNN and AdaBoost.

Hadoop Ecosystem: Hadoop 2.x, Spark 2.x, MapReduce, Hive, HDFS, Pig

Cloud Services: Amazon Web Services (AWS) EC2/S3/Redshift

Deep Learning: Keras, Tensor flow, Theano, AlexNet, VGG, CNN, and RNN

Reporting Tools: Tableau Suite of Tools 7.x/8.X/9.X/10.X Server and Online, Server Reporting Services(SSRS)

Data Visualization: Tableau, MatPlotLib, Seaborn, ggplot2

Languages: Python (2.x/3.x), R, Java, SQL

Operating Systems: Microsoft Windows, Linux (Ubuntu), Microsoft Office Suite (Word, PowerPoint, Excel)

PROFESSIONAL EXPERIENCE:

Confidential, South Plainfield, NJ

Data Scientist

Responsibilities:

  • Collaborated with data engineers and operation team to implement ETL process, wrote and optimized SQL queries to perform data extraction to fit the analytical requirements.
  • Performed data analysis by using Hive to retrieve the data from Hadoop cluster, SQL to retrieve data from RedShift.
  • Worked on data cleaning and ensured data quality, consistency, integrity using Pandas, Numpy.
  • Performed univariate and multivariate analysis on the data to identify any underlying pattern in the data and associations between the variables.
  • Explored and analyzed the customer specific features by using Matplotlib in Python and ggplot2 in R.
  • Performed data imputation using Scikit-learn package in Python.
  • Participated in features engineering such as feature generating, PCA, feature normalization and label encoding with Scikit-learn preprocessing.
  • Used Python 3.X (numpy, scipy, pandas, scikit-learn, seaborn) and R (caret, trees, arules) to develop variety of models and algorithms for analytic purposes.
  • Experimented and built predictive models including ensemble models using machine learning algorithms such as Logistic regression, Random Forests and KNN to predict customer churn.
  • Conducted analysis on customer behaviors and discover value of customers with RMF analysis; applied customer segmentation with clustering algorithms such as K-Means Clustering. Gaussian Mixture Model and Hierarchical Clustering.
  • Used F-Score, AUC/ROC, Confusion Matrix, Precision, and Recall to evaluate different models’ performance.
  • Designed and implemented a recommendation system which leveraged Google Analytics data and the machine learning models and utilized Collaborative filtering techniques to recommend items for different customers.
  • Designed rich data visualizations to model data into human-readable form with Tableau and Matplotlib.

Environment: AWS RedShift, Hadoop, HDFS, Python 3.x (Scikit-Learn/Scipy/Numpy/Pandas/Matplotlib/Seaborn), R (ggplot2/caret/trees/arules), Tableau (9.x), Machine Learning (Logistic regression/Random Forests/KNN/K-Means Clustering/Gaussian Mixture Model/Hierarchical Clustering/Ensemble methods/Collaborative filtering), JIRA, Github, Agile/SCRUM

Confidential, Union, NJ

Data Scientist

Responsibilities:

  • Oversaw the ETL process. Extracted and merged data using optimized SQL queries from SQL Server 2012.
  • Aggregated data on collected unstructured data in Mongo DB 3.3.
  • Performed data cleaning, exploratory analysis and data integrity analysis using Pandas, Numpy.
  • Analyzed the customer behavior and value using RMF analysis.
  • Experimented with multiple classification algorithms, such as Logistic Regression, Support Vector Machine (SVM), Random Forest, and Adaboost using Python Scikit-Learn and evaluated the performance
  • Researched on segmentation of customers by using Random Forest, K-means and Hierarchical Clustering.
  • Developed the product recommendation engine using Content Filtering, Collaborative Filtering and Gradient Boosting Tree Algorithms.
  • Generated dashboard and report using Tableau.
  • Evaluated the marketing strategy using A/B testing.
  • Conducted sentiment analysis of customer service based on the survey.

Environment: Python 3.X (Scikit-Learn/Numpy/Pandas/Matplotlib/Seaborn), SQL Server 2012, MongoDB 2.X, Tableau 8.X, Git 2.X, AWS EC2, S3

Confidential, Paterson, NJ

Data Scientist

Responsibilities:

  • Gathered, analyzed, and translated business requirements, communicated with other departments to collect client business requirements and access available data.
  • Collected data in Hadoop and performed data preparation using Pig Latin to get the right format.
  • In Preprocessing phase, used Pandas and Scikit-Learn to remove or impute missing values, detect outliers, scale features, and applied feature selection (filtering) to eliminate irrelevant features.
  • Conducted Exploratory Data Analysis using Python Matplotlib and Seaborn to identify underlying patterns and correlation between features.
  • Balanced the dataset by over-sampling the minority label class and under-sampling the majority label class.
  • Used Python (Numpy, Scipy, Pandas, Scikit-Learn, Seaborn), and Spark 2.0 (PySpark, MLlib) to develop variety of models and algorithms for analytic purposes.
  • Experimented with multiple classification algorithms, such as Logistic Regression, Support Vector Machine (SVM), Random Forest, and Adaboost using Python Scikit-Learn and evaluated the performance.
  • Implemented, tuned, and tested the model on AWS EC2 to get the best algorithm and parameters.
  • Used F-Score, AUC/ROC, Confusion Matrix, RMSE to evaluate different model performance.
  • Tracked the performance by unseen data, retrained the model to improve the accuracy.

Environment: AWS EC2, S3, Hadoop, Pig, HDFS, Spark (PySpark/MLlib/Spark SQL), Python 3.x (Numpy/ Pandas/ Matplotlib/ Seaborn/ Scipy/ Scikit-Learn), MS SQL Server 2012

Confidential, New York, NY

SQL BI Developer/Data Analyst

Responsibilities:

  • Worked on company’s database and business model and was actively involved in gathering user/project requirements from different stakeholders; worked on documentations required for the project in hand.
  • Extracted data using T-SQL in SQL server to write Queries, Stored procedures, Triggers, Views, Temp Tables and User-Defined Functions (UDFs).
  • Designed and developed ETL packages using SSIS to create Data Warehouses from different tables and file sources like Flat and Excel files.
  • Used different methods in SSIS such as derived columns, aggregations, Merge joins, count, conditional split and more to transform the data.
  • Developed reporting solutions for different stakeholders from mock-up till deployment in different areas such as Claims, Transactions, Supply, Assets and others in SSRS.
  • Optimized Queries in T-SQL by removing redundancies, retrieving essential data and using SQL methods like Joins efficiently.

Environment: s: MS SQL Server 2008/2008R2/2012 (T-SQL), SQL Server Management Studio, SQL Server Integration Service, SQL Server Reporting Service, Windows 7, MS Office Suite 2010, Tableau (6.X)

We'd love your feedback!