We provide IT Staff Augmentation Services!

Data Analyst Resume

5.00/5 (Submit Your Rating)

NY

SUMMARY

  • 6 Years of experience in the IT industry, dealing with all things Data and Modelling using Python and R
  • Hands on experience in the entire Data Science Life Cycle (DSLC), including Data Acquisition, Data Cleaning, Data Wrangling, Exploratory Data Analysis, Data Mining, Applying Machine learning algorithms, Validating and Visualizing
  • Experience and highly skilled in in Financial and Retail Marketing Domains
  • Experience in model evaluation with various metrics like Root Mean Square Error (RMSE), Confusion Matrix, Precision - Recall (PR-Curve), AUC in ROC Curve, F1-Score among other metric assessments
  • Hands-on experience with SAS and Excel.
  • Experience in creating Dashboards and visualization charts using Tableau, Python (Matplotlib, Seaborn, Bokeh, Plotly) and R (ShinyR, Ggplot2)
  • Extensively used Pandas (Python) and dplyr (R) for Data Minging, NumPy and SciPy for numerical computations.
  • Well versed in machine learning algorithms such as Supervised Learning- Linear, Logistic and Penalized Linear Regression, Decision Trees, Random Forest, Support Vector Machines, K Nearest Neighbors, Unsupervised learning- Clustering, K-means and in Deep learning models - Multi Layer Perceptron, Convolution Neural Networks (CNN), Recurrent Neural Networks (RNN) LSTM, GRU and Transformers.
  • Experience in conducting Market Basket analysis using Association and Principal Component Analysis.
  • Experience working with various Database- MySQL, SQLite, and PostgreSQL.
  • Strong Knowledge and experience processing structured, semi-structured and unstructured data.

TECHNICAL SKILLS

Python: Pandas, NumPy, Stats, Scipy, SQLAlchemy, Matplotlib, Seaborn, Bokeh, SQLite, BeautifulSoup,, Regular expression(re), Urllib, JSON, boto3, Redis, Flask, Django, datetime, OS, pyqt, Machine Learning Packages - Scikit-Learn, TensorFlow, Keras, Pytorch

IDE: Jupyter-lab, R Studio, Atom, Notepad++, sublime

R: Dplyr, tidyR, ShinyR, ggplot2, Caret, H2O.

SQL: Analytical Functions, Window Functions, CTE

Dashboard: Tableau

Version Control: Git

Environment: Anaconda, pyenv, virtualenv

PROFESSIONAL EXPERIENCE

Confidential, NY

Data Analyst

Responsibilities:

  • Selected proper data from different tables, such as “Customer” table, “Product” table, “Sales Transactions” table, “Sales Transactions Promo” table in the Aginity workbench using SQL queries.
  • Wrote and run R scripts to calculate customer KPIs, including Average Dollar Sales (ADS), Average Unit Retail (AUR), Unit per Transactions (UPT), Net Sales per Customer (DPC), Transactions per Customer (TPC). Analyzed and made Lifestyle KPIs, E-commerce KPIs, and Outlet KPIS reports for different customer segments.
  • Performed data investigation on customer data using various R libraries, such as “RSQLite”, “sqldf”, “lubridate”, “data.table”, “plyr”, “dplyr”, “tidyr”, “odbc”, “ RODBC”, and “openxlsx”, etc.
  • Performed statistical analysis on customer data to find abnormal data in different customer segments and KPI metrics.
  • Wrote and run R scripts to analyze customer performance in specified time windows. Made reports with Waterfall charts to analyze the trends of new customers, reactivated customers, and retained customers in different time periods.
  • Performed data investigation on promotions items, markdown items, collection items, and non-collection items.
  • Built and deployed basic data analysis to answer specific business questions.

Confidential, Cincinnati, OH

Data Scientist

Responsibilities:

  • Worked with e-commerce reviews leveraging NLP, tokenization and n-gram hashing techniques.
  • Responsible for analyzing the data and partnering with the business to generate best in class outcomes.
  • Tackled highly imbalanced Safety dataset using sampling techniques like under sampling and oversampling with Near miss and SMOTE using Python scikit-learn & imblearn.
  • Based on TF-IDF vectorization and Cosine Similarity, increasing Accessory Sales by 8% and Cross-sell by 15%.
  • Used Redis to store and retrieve the Key-value pairs to show the recommender results.
  • Used NLTK Stem class for natural language processing and stemming the product names.
  • Integrated the Recommender system in the eCommerce website which is developed using Python Flask Framework and Rest API.
  • Performed Market Basket analysis, bundled it with Product Clustering analysis to identify products that are more likely in the same basket and to make product offer selections for cross-sell and up-sell marketing.
  • Experimented A/B Testing of Recommender System to find the best suit model analyzing Click through Rate and Conversion Rate using Google Analytics.
  • Developed a Proof of Concept(POC) for email campaigning using Deep learning frameworks TensorFlow and Keras.
  • Collected Data from historic records, web scraping, web crawling and through other public data records.
  • Extensively used R and Python to extract, clean, transform, impute and analyze the data.
  • Analyzed the Customer Life-Time Value and worked closely with marketing teams to improve the retention rate.
  • Handled Ad-Hoc requests from Business such as extracting required data from MSSQL databases or converting data to understandable formats(EXCEL).
  • Experience Working with web scrapping to extract the data from various websites for catalogs parsing through PDP’s, site maps using XML, scrappy, urllib, requests, beautifulsoup and other HTML tools.
  • Created dashboards and visualizations using Python Bokeh, wordcloud and Matplotlib to communicate the analysis reports to business and Management teams.

Environment: Anaconda3 4.x python 3.x, R 3.x, MSSQL, Redis, Deep Learning, Natural language Processing, Google Analytics, Linux, Microsoft Excel, Jupyter, Flask.

Confidential, New York, NY

Data Scientist

Responsibilities:

  • Worked and enhanced the performance of Fraud Detection Model using Ensemble Models by 0.5% accuracy.
  • Worked with different models K-Means Clustering, SVN for Anomaly detection to detect fraud transactions.
  • Worked with different models to compare the best fit and used Bagging, boosting and stacking of models to get better results.
  • Enhanced model helped in substantial reduction of False-Positives which led to an increase in overall revenue and customer satisfaction.
  • Analyzed Cohort Analysis and developed a Churn Prediction Model using Logistic regression and Naïve Bayes Classifier there by identifying factors to improve the sales and retain customers.
  • Developed a personalized CLIP Model (Credit Line increase program) based on User history and Eigen vector values using Back Propagation Neural Networks.
  • Evaluated the models using RMSE, Confusion Matrix, ROC and AUC in both dev and production environments.
  • Created Big Data Cloudera EDH clusters on AWS and managed the clusters using Cloudera Director.
  • Used HIVE to analyze the partitioned and bucketed data and compute various metrics for reporting.
  • Worked on creating RDD’s and applying Transformations and Actions on them. Good at applyingSparkfilter conditions on dataand worked on joins inSpark.
  • Used Spark DataFrame API to perform analytics on hive data and implemented various checkpoints on RDD's to disk to handle job failures and debugging.
  • Developed Spark jobs written in Python to perform operations like aggregation, data processing and data analysis.
  • Hands on experience in handling Hive tables using Spark SQL.
  • Designed and implemented cross-validation and statistical tests including Hypothetical Testing, ANOVA, Auto-correlation, Simpson’s Paradox to verify each predictive model.
  • Conducted Root Cause Analysis and Factor Analysis on digital marketing performance to identify KPIs for email advertising by using Python SciPy.
  • Researched and developed Attribution Models using Python Scikit-Learn to find the correlation between the conversion rate and email advertising campaigns.
  • Evaluated and recommended the optimized time frequency and time duration for email advertising campaigns.
  • Used Tableau 9.x, R Shiny to create detail level summary reports and dashboards to technical and business stakeholders, by using KPI's and visualized trend analysis.
  • Used Git to track changes and records of files and coordinate work among other programmers.

Environment: Anaconda3 2.x python 3.x, R 3.x, Hadoop 2.3, Spark, Hive, Impala, Cloudera, AWS, Linux, JuPyter, Tableau, SQL Server 2012

Confidential

Data Scientist

Responsibilities:

  • Worked with different models to compare the best fit and used Bagging, boosting and stacking of models to get better results.
  • Worked on Natural Language Processing (NLP) to categorize the courses based on the course titles, activities and descriptions into various binary and Polytomous Categorical variables.
  • Feature extraction by applying all the standard nlp processing steps like lemmatization, stemming, removing stop words, calculating Levenshtein distance using libraries nltk, scipy, Stanfordnlp and fuzzywuzzy.
  • Developed various customer predictable classification models on different categories like PAID vs FREE, EDUCATIONAL vs EXPERIENCIAL, ACTIVITY LEVELS, ACTIVITY TYPES, SKILL LEVELS using ExtraTrees Classifier.
  • With the use of predictive model for email campaigning there is a spike of 50% in PAID courses registration, however the percentage of paid courses when compared overall is low.
  • PR Curves, Accuracy and overall customer coverage where the metrics used to evaluate the model.
  • Path to Purchase - Path to purchase is a customer 360 project where we have used all the information that we know of a customer, Transactional, demographic and behavioral data in order to predict the next n things the customer will be doing in the future.
  • Created a customer 360 dataset that is being used by many analytical teams. Created event catalog table that contains all the events related to the customer.
  • Developed different Recurrent Neural Network (RNN) - LSTM, GRU models using Pytorch to predict the next sequence of events.
  • Used Embedding layers, Cyclic Learning rates, different optimizers to improve the performance of the model.

Environment: Anaconda3 2.x python 3.x, R 3.x, Hadoop 2.3, Spark, Hive, Impala, Cloudera, AWS, Linux, JuPyter, Tableau, SQL Server 2012

Confidential

Data Scientist

Responsibilities:

  • Promoted safe monitoring and quick decision making by adding parameters' trends visualizations along with daily reports using Tableau.
  • Delivered accurate time series analysis while ensuring clients’ requirements timely processing guided by the marketing changes.
  • Installed, configured and maintained Apache Hadoop clusters for application development and Hadoop tools like Hive, Pig, HBase, Zookeeper and Sqoop.
  • Configured MySQL Database to store Hive metadata.
  • Used Sqoop to import data from various relational data sources like MySQL into HDFS.
  • Responsible to manage customer data coming from different sources.
  • Managing and scheduling jobs on a Hadoop cluster using Oozie.
  • Implemented business logic by writing PIG UDF’s in java and used various UDF s from Piggybank and other sources.
  • Developed Pig Latin scripts to extract the data from the web server output files to load into HDFS.
  • Analyzed the customer data by performing Hive queries to know user behavior.
  • Optimized Map Reduce Jobs to use HDFS efficiently by using various compression mechanisms.
  • Used different Serde’s for converting JSON data into pipe separated data.
  • Hands on experience in working with different file formats like Text file and Avro File Format.
  • Worked on partitioning and Bucketing the Hive table and running the scripts in parallel to reduce the run time of the scripts.
  • Involved in Hadoop cluster tasks like Adding and Removing Nodes without any effect to running jobs and data.
  • Hands on experience in configuring cluster on EC2 instances using Cloudera Manager.
  • Experience in creating tables on top of data on AWS S3 obtained from different data sources.
  • Used R for exploratory data analysis and build machine learning models.
  • Developed a model using multivariate regression, conducted Cohort Analysis and helped in increasing User Retention Rate and thereby increasing the total revenue.
  • Used ensemble models build with Xgboost to build the risk prediction application FusionRisk.
  • Created Dashboards using R shiny to communicate the results to the Business and Management Administrations.

Environment: Python 2.x, R 3.x, Cloudera Hadoop, MapReduce, HDFS, Hive, Java (jdk1.7), AWS EC2, Pig, Linux, XML. HBase, Zookeeper, Sqoop

We'd love your feedback!