Data Scientist Resume

SUMMARY:

Dedicated and self - motivated data scientist seeking a position in data science or data analytics where I can utilize my excellent professional and academic experience to contribute towards the organizational goals.
Overall 9+ years of IT experience in Data Science, Business Analytics, Business Intelligence (BI) Systems, Data Mining, Reporting, Marketing Analytics, Data Advisory & Decision support systems.
5+ years of hands-on Data science/Machine learning experience across a variety of business contexts and data sources. Proficient in quickly extracting hidden insights from the data and building useful models by leveraging a repertoire of diverse and deep technical skills
Expertise in Python, Scikit-learn, ‘R’, Tableau, Java and working knowledge of Tensorflow, Theano, and Kera.
Strong Data Visualization skills in communicating the statistical findings to Business users and roll out the Insights in to day to day operations using Tableau, Lumira and Power BI
Expertise in Hadoop ecosystem as Hadoop Architect
Expertise in machine learning tools such as support vector machine, random forest, neural network, logistic regression, decision trees
Strong experience in natural language processing and text analytics techniques
Experience in development of recommendation engines using collaborative filtering and association rule mining techniques
Good working experience in advanced statistical analysis techniques including logistic regression, anova, hypothesis testing, discriminant analysis, chi-squared test, f-test
Development experience in scripting languages like Python and Scala
Experienced with analyzing large data sets and developing analytics solutions
Implemented ASP Aggregator Data Warehouse using MS SQL Server Data Tools
Advanced knowledge in data management and relational databases
Excellent understanding of HDFS, MapReduce framework and extensive experience in developing MapReduce Jobs
Strong knowledge of Spark along with Scala and Python for handling large-scale data processing
In-depth understanding of Hadoop architecture and various components such as HDFS, YARN, Zookeeper, Oozie, Hue, Job Tracker, Task Tracker, Name Node, Data Node and MapReduce concepts
Hands-on experience and knowledge in streaming systems like Flume and Kafka

TECHNICAL SKILLS:

Programming Languages: R Programming, Python, Scikit-Learn, SQL, Base SAS, Scala, Java, C#, VBA

Machine Learning techniques: Natural Language Processing (Sentiment Analysis, LDA, LSA, PLSA, Named Entity Recognition, Word Vectorization, Document Vectorization, Stemming, Lemmatization), Linear Regression (LASSO, Ridge, Elastic net), Classification (Logistic Regression, LDA, Naïve Bayes, KNN,SVM), Decision trees (XgBoost & RF), PCA, Association rules & Recommendation Engines, Survival Analysis, Time Series Analysis, Clustering, Deep Learning . Excellent skill in conducting Exploratory Analysis(EDA)

Big Data Eco System: HADOOP3.0, RHadoop, MongoDB, Spark 1.6, Pig, Hbase, Hive, Impala, Flume, Kafka, Solr, NoSQL

Relational Databases: MySQL, MS SQL Server, Oracle 10g

Data Visualization: R ggplot2, Python Matplotlib, Tableau Desktop 9.3, SAS Enterprise Miner

Web Analytics: Google Analytics, Google Adwords, Facebook Ads

Other: Weka, MATLAB, Statistical Modeling, Econometrics, A/B Testing

PROFESSIONAL EXPERIENCE:

Confidential

Data Scientist

Environment: Python (Xgboost, nltk, scikit-learn, pandas, scipy, numpy, seaborn, matplotlib, re, gensim, wordnet), R, Spark, Scala, Hadoop, Solr

Responsibilities:

Analyzed unstructured textual data from Quora questions and performed exploratory data analysis
Cleaned and prepared textual data by lemmatization, stemming, removing stopwords, and by regular expression
Recognized named entities using named entity recognition technique and tagged parts of speech of each sentence
Used sentence vectorization and cosine distance techniques to check similarity of Quora questions and used gradient boosting classification algorithm XgBoost to classify similarity of Quora questions
Detected spammed SMS messages from a large SMS data set using naïve bayes, tree, and neural network classifiers
Designed and implemented a platform using text mining Packages in R & Python( tm, openNLP,NLTK3.0 ) for cleansing, harmonizing, classifying, matching, merging, de-duping, and profiling buyer, supplier and third-party data for use within solutions and created Business Taxonomy model involving Millions of rows of customer transaction text data
Developed customer segmentation analysis and created campaign planning tools for use by marketing managers
Performed statistical modeling to predict the backorder probability for various products using R & Lumira. Used Logistic regression, LDA & random forest decision trees.
Implemented Ensemble Models-Bagging and Boosting to enhance the efficiency and performance of model
Built a predictive model based on advanced statistical analysis, hypothesis testing and machine learning- multivariate regression model to predict future sales, explaining approximately 35% of variance
Automated large scale data processing in a distributed environment

Confidential

Data Scientist

Environment: Python (Scikit-learn, Nltk, Pandas, Seaborn, Matplotlib, Beautifulsoup, Numpy, Scipy), R (Arules, RandomForest, Caret, Tree), SAS (Proc SQL, Proc Panel, Proc Logistic, ODS), MySQL, Tableau, R-ggplot2, R-Shiny, Spark, Scala, Hadoop, Hive, Impala, Flume, Solr, Weka

Responsibilities:

Detected patterns of physician frauds from US Medicare datasets using logistic regression, random forest, and neural network models and generated heat maps, geographic maps, bubble charts, and dashboards in Tableau to show regional and procedural variances in Medicare costs
Time Series Forecasting: Built various Univariate and Multivariate Time series models (ets, stl, Naïve, Holt, Holt-Winter ARIMA, ARIMAX, VAR, GARCH models) to Forecast Sales for various products.
Extracted insights from a manufacturing industry dataset using spark dataframes, RDDs, SparkSQL, and Scala in Hadoop ecosystem
Responsible for quantitative analysis, data mining, and the presentation of data to see beyond the numbers and understand how users interact with consumer products.
Configured Apache Flume in Hadoop ecosystem to stream Twitter data to HDFS and Apache Solr
Created sentiment analysis model and complex query model of Twitter data using Hadoop ecosystem, HiveQL, Impala, and regular expression
Analyzed trends and regional variances of sales of various products of an office supply company and identified top products in terms of seasonal and regional performances. Prescribed ways to improve sales by performing machine learning and regression analysis.
Worked on large-scale Hadoop YARN clusters for distributed data processing and analysis using Databricks Connectors, Spark core, Spark SQL, Sqoop, Pig, Hive, Impala and NoSQL databases
Implemented Spark scripts using Scala, Python, and SparkSQL to access Hive tables into spark for faster processing of data
Designed and implemented importing data to HDFS using Sqoop from different RDBMS servers.
Developed multinomial logistic regression model to predict brand choice of US deodorant industry and developed clustering model to identify valuable customers.
Developed collaborative filtering-based recommendation engines using Python and R to recommend retail products
Investigated association between physicians’ qualities and their online reviews by scraping physician review data using Python, applying natural language processing techniques, and then using panel regression techniques
Investigated several natural language techniques such as LDA, LSA, PLSA, and Doc2Vec to evaluate their performances in mining consumer review datasets
Compared performances of classification algorithms including Naïve Bayes, Decision Tree, Neural Network, kNN, and Logistic regression techniques

Confidential

Graduate Teaching Assistant

Environment: Python, R, SAS, MySQL, MS SQL Server, Tableau, R-ggplot2, R-Shiny, MS Access, MS Excel Pivot Table, MS Excel V-lookup

Responsibilities:

Created and tested skill-based development assignments and projects for Data Visualization, Systems Analysis and Project Management, IT Security, IT for Management, IT Strategy and Management, and IT for Business courses using Machine Learning techniques, Tableau, R-ggplot2, R-shiny, MS Excel Pivot Table, MS Excel V-lookup, MS Access, Predictive Analytics, and Prescriptive Analytics techniques .
Problem analysis and troubleshooting issue in Machine Learning algorithms, SQL, and Tableau.
Worked as a project manager for IT Strategy and Management Course
Evaluated the assignments submitted by students.
Mentored students in a weekly development lab for student assignments.
Responsible for handling technical issues for 300 customers
Managed paid search campaigns and monitored budgets for same
Analyzed keyword list for searching website and expanded it as required
Optimized search engine and landing page experience of client’s website
Created advertisement campaign for the client through Google Adwords and Facebook Ads and optimized click-through rate for the client. Achieved a click-through rate of over 15%
Used Google Analytics to create ad hoc reports on business segment performances
Analyzed Google Adwords and Facebook Ads data of the client and prescribed ways of improve search engine and landing page experiences of users
Created weekly queries and trending reports on advertisements
Used statistical modeling to identify and rank top business school programs based on customer perception

Confidential

Senior Data Analyst

Environment: Base SAS, SAS Enterprise Miner, MS SQL Server, SSIS, SSRS, SSAS, Tableau, Excel Pivot Table, R

Responsibilities:

Evaluated performances of neural network, logistic regression, and decision tree algorithms to predict success of direct marketing campaign of a European bank. Used base SAS and SAS Enterprise miner.
Created data warehouse using ETL techniques and then used OLAP cubes to analyze the data
Developed integrated sales analysis using sales transaction data, customer data, and product data by modeling calculation views with joined analytical views (OLAP cubes) using actual and planned sales data
Created business intelligence reports of OLAP cube data using Tableau and Pivot Tables
Created data models, dimension fact model (DFM), and star schema for relational data
Used SAS procedures such as Proc Datasets and Proc Contents to make data dictionary
Analyzed large data sets using SAS and Proc SQL & SAS Macro
Used Proc Compare for comparing datasets from different source systems
Created datasets to be used by Tableau for visualization
Used Statistical SAS Procedures such as Proc Freq for univariate statistical analysis
Developed parametric and non-parametric hazard models for client churn

Confidential

Senior Data Analyst

Environment: Excel Pivot Table, Python, R, Tableau

Responsibilities:

Developed content analysis models to identify most popular descriptive, predictive, and prescriptive analytics tools and techniques used in different value chain activities of firms
Developed statistical models to compare performances of online and in-class students of University of Confidential at Greensboro
Created social media analytics models to analyze Twitter and Facebook data
Created statistical models to explore relationships between student performance and the following variables: attendance, study time, weekly assignment grades, and instructor’s teaching experience.
Performed Market Basket analysis to identify customer buying patterns, preferences and behaviors to better manage sales and inventory
Developed data visualization models in Tableau to develop heat maps, trend reports, dashboards, etc.

Confidential

Data Scientist/ Graduate Research Assistant

Environment: R, Python, MATLAB, Linux, FORTRAN

Responsibilities:

Developed a Markov Chain Monte Carlo (MCMC) based Ensemble Kalman Filter forecasting model in MATLAB with 79% more accuracy compared to existing models to predict contaminant transport in groundwater
Evaluated performances of singular value decomposition (SVD) and eigen-value decomposition (EVD) techniques for Ensemble Squared-Root Kalman filter (EnSRKF) models
Compared performances of Kalman filter, Ensemble Kalman filter, and Ensemble Square-Root Kalman filter in groundwater contaminant transport modeling
Performed unsupervised k-means clustering on Fisher’s Iris dataset and checked performances of k-means algorithm with different parameters
Performed Bayes’ classification on Fisher’s Iris dataset with Euclidean and Mahalanobis distances
Developed feature selection models on Fisher’s Iris dataset using Divergence, Transformed Divergence and Bhattacharyya Distance algorithms

Confidential

Senior Analyst/ Assistant Manager

Environment: Python, R, SQL, Excel, SAS

Responsibilities:

Developed advanced analytics-based and statistical models to predict market potential of untapped geographic locations
Developed customer churn analytics models using randomForest in R for business clients to predict probability of churn from financial products
Investigated performances of parametric and non-parametric hazard model (Cox Proportional Hazard) in churn prediction
Developed decision tree and logistic regression-based models to predict default probability of small and medium business customers
32% increase in market reach by suggesting changes based on descriptive and predictive analytics
Performed cluster analysis to identify important client segments for marketing campaign
Created business intelligence reports involving v-lookup, pivot table, bubble chart, heat map for top management of the company
Worked on development of a client fraud detection platform using various machine learning algorithms in R and Python
Developed Discriminant Analysis, Greedy Forward Selection, Greedy Backward Selection and Feature reduction algorithms like Principal Component Analysis (PCA) and Factor Analysis
Collaborated closely with Subject Master Experts to identify& define business reporting requirements
Developed Generalized Linear Models such as Poisson Regression model and Negative Binomial Distribution model for count data of number of accounts for each customer
Developed multinomial discrete choice models to predict popularity of financial products of the company
Developed analytics-based solutions based on predictive, behavioral or other models using statistical analysis and relevant modeling techniques
Participated in all the phases of knowledge discovery: data collection, data cleaning, developing models, validation and visualization
Analyzed key data points and variables to optimize cross-sell, up-sale, and renewal possibilities of existing clients
Developed decision tree-based models for cross-sell, up-sales and renewal possibilities
Constructed recency, frequency and monetary (RFM) analysis to perform customer-segmentation using k-means and k-medoids clustering.
Performed t-tests and other statistical models to identify performance differences of two teams
Developed advanced regression models to predict sales in presence of interaction of advertisements, consumer demographics, and geographic variances.

Confidential

MBA Intern

Environment: SQL, SAS

Responsibilities:

Worked for a monthly expenditure tracking project. Responsibilities include Cleaning, aggregating, analyzing, interpreting data, carrying out quality analysis of the tracker, and preparation of weekly growth analysis report for financial products by performing advanced statistical Techniques(Cluster, Factor & Tree) using SAS and SQL
Developed time series ARIMA models to predict trend and seasonality of product sales
Created data visualization reports involving v-lookup, pivot table, bubble chart, heat map for business leaders of the company
Developed attrition models based on recency, frequency, monetary (RFM) analysis.
Built customer churn model using logistic regression and random forest algorithms to predict the churn probability customers. The model, with an accuracy of 81%, helped the client to retain customers worth $5M.

Confidential

Analyst

Environment: Excel, Python, R, MS SQL Server, MATLAB

Responsibilities:

Developed databases of company clients and construction projects
Managed several construction projects as a project manager
Developed simulation models to determine structure behavior under various loadings
Developed pivot table, heat maps, bar charts, bubble charts for top management of the company
Created interactive visualization models for product sales stages
Developed procurement and spend analytics models to find optimal solution for the client
Developed Gantt chart, CPM, PERT models for various projects
Developed BPMN and UML-based business process models
Created pattern recognition-based models in MATLAB to identify possibility of project delay
Wrote advanced SQL queries to get subtle insights from data and created dashboards and BI reports.

We provide IT Staff Augmentation Services!

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship