We provide IT Staff Augmentation Services!

Data Scientist Resume

Buffalo Groove, IL


  • 8+ Years of experience in the IT industry, concentrated in dealing with Data using Python and R.
  • Hands on experience in the entire data science project life cycle, including Data acquisition, Data Cleaning, Data Wrangling, Data Warehousing, Data Mining, Applying Machine learning algorithms, Validating and Visualizing.
  • Experience developing Proof of Concept(POC).
  • Experience and highly skilled in Financial and Retail Marketing Domains.
  • In - depth knowledge in programming languages Python, R and SQL.
  • Experience in building Churn Predictors using Logistic regression and Random Forest.
  • Experience in building Recommender system using Collaborative filtering and Content based filter.
  • Experience in conducting A/B Testing, analyzing Click through Rate(CTR) and Conversion Rate(CR) to evaluate the best fit recommender system.
  • Experience in model evaluation with various metrics like Root Mean Square Error(RMSE), Confusion Matrix, Precision-Recall (PR-Curve), AUC in ROC Curve.
  • Experience in tuning the models using L1(Lasso), L2(ridge) regularization in regression models to avoid overfitting.
  • Hands-on experience working with NLTK for Natural Language processing, Text Mining and other social media APIs.
  • Hands-on experience with Deep Learning frameworks TensorFlow, Keras, OpenCV and Caffe.
  • Experience in creating Dashboards and visualization charts using Tableau, Python (Matplotlib, Seaborn, Bokeh, pixiedust ) and R (ShinyR, Ggplot2).
  • Deep and extensive knowledge with HDFS, Spark, MapReduce, Pig, Hive, HBase, Sqoop, Storm, Yarn, Flume, Oozie, Zookeeper, Cassandra,etc.
  • Extensively used Pandas(Python) and dplyr(R) for Data Munging, Numpy and Scipy for numerical computations.
  • Well versed in machine learning algorithms such as Supervised Learning- Linear, Logistic and Penalized Linear regression, Decision Trees, Random Forest, Support Vector Machines, K nearest neighbors and Unsupervised learning- Clustering, K-means.
  • Experience in using Bagging and Boosting Ensemble Models like AdaBoost, XGBoost for improving the accuracy of the model.
  • Experience in conducting Market Basket analysis using Association and Principal Component Analysis.
  • Experience working with various Database- MySQL, SQLite, PostgreSQL and NoSQL- HDFS, MongoDB, Redis, Cassandra.
  • Hands-on experience working on Spark Core, Spark Sql, Spark Streaming and Spark Machine Learning (spark Mllib).
  • Experienced with distributions includes Cloudera CDH, Hortonworks HDP, and MapR Data Platform.
  • Strong Knowledge and experience processing structured, semi-structured and unstructured data and handled different file formats like delimited CSV, XML, JSON, Sequence files, AVRO, Parquet, ORC, etc.,
  • Working Knowledge of Cloud computing (AWS and GCP).
  • Experience in handling Ad-hoc requests and generating reports as per the need.


Python: Pandas, NumPy, Scikit-Learn, TensorFlow, Keras, SQLAlchemy, Matplotlib, Seaborn, Bokeh, SQLite, BeautifulSoup, regular expression(re), Urllib, JSON, boto3, Redis, Flask, Django, datetime, OS, pyqt

Tools: Dplyr, tidyR, ShinyR, ggplot2, Caret, H2O.

Database: MySQL, PostgreSQL, MongoDB, Redis, SQLite, Cassandra, HDFS

Big Data: Hadoop Ecosystem - Hive, Pig, MapReduce, Spark, Impala

Cloud: AWS (S3, Red Shift, EC2, EDH, Lambda), GCP (Compute Engine, Big Query, Dataflow, Auto ML)

IDE: Jupyter-lab, R Studio, Eclipse, Spyder, Pycharm, Atom, Notepad++, sublime

Other Technologies: Java, MATLAB, C, C++, Web Technologies (HTML, CSS, JavaScript, Bootstrap)

Environment: Anaconda, pyenv, virtualenv

V ersion Control: Git, svn


Confidential, Buffalo Groove, IL

Data Scientist


  • Developed a Recommender System using collaborative filtering and content- based filtering, based on TF-IDF vectorization and Cosine Similarity, increasing Accessory Sales by 8% and Cross-sell by 15%.
  • Used Redis to store and retrieve the Key-value pairs to show the recommender results.
  • Used NLTK Stem class for natural language processing and stemming the product names.
  • Integrated the Recommender system in the eCommerce website which is developed using Python Flask Framework and Rest API.
  • Performed Market Basket analysis, bundled it with Product Clustering analysis to identify products that are more likely in the same basket and to make product offer selections for cross-sell and up-sell marketing.
  • Experimented A/B Testing of Recommender System to find the best suit model analyzing Click through Rate and Conversion Rate using Google Analytics.
  • Developed a Proof of Concept(POC) for email campaigning using Deep learning frameworks TensorFlow and Keras.
  • Collected Data from historic records, web scraping, web crawling and through other public data records.
  • Extensively used R and Python to extract, clean, transform, impute and analyze the data.
  • Analyzed the Customer Life Time Value and worked closely with marketing teams to improve the retention rate.
  • Handled Ad-Hoc requests from Business such as extracting required data from MSSQL databases or converting data to understandable formats(EXCEL).
  • Created dashboards and visualizations using Python Bokeh, wordcloud and Matplotlib to communicate the analysis reports to business and Management teams.

Environment: Anaconda3 4.x python 3.x, R 3.x, MSSQL, Redis, Deep Learning, Natural language Processing, Google Analytics, Linux, Microsoft Excel, Jupyter, Flask.

Confidential, Plano, TX

Data Scientist


  • Worked and enhanced the performance of Fraud Detection Model using Ensemble Models by 0.5% accuracy.
  • Worked with different models K-Means Clustering, SVN for Anomaly detection to detect fraud transactions.
  • Worked with different models to compare the best fit and used Bagging, boosting and stacking of models to get better results.
  • Enhanced model helped in substantial reduction of False-Positives which led to an increase in overall revenue and customer satisfaction.
  • Analyzed Cohort Analysis and developed a Churn Prediction Model using Logistic regression and Na ïve Bayes Classifier there by identifying factors to improve the sales and retain customers.
  • Developed a personalized CLIP Model ( Credit Line increase program ) based on User history and Eigen vector values using Back Propagation Neural Networks .
  • Evaluated the models using RMSE, Confusion Matrix, ROC and AUC in both dev and production environments.
  • Created Big Data Cloudera EDH clusters on AWS and managed the clusters using Cloudera Director.
  • Used HIVE to analyze the partitioned and bucketed data and compute various metrics for reporting.
  • Worked on creating RDD’s and applying Transformations and Actions on them. Good at applying Spark filter conditions on data and worked on joins in Spark .
  • Used Spark DataFrame API to perform analytics on hive data and implemented various checkpoints on RDD's to disk to handle job failures and debugging.
  • Developed Spark jobs written in Python to perform operations like aggregation, data processing and data analysis.
  • Hands on experience in handling Hive tables using Spark SQL .
  • Designed and implemented cross-validation and statistical tests including Hypothetical Testing, ANOVA, Auto-correlation, Simpson’s Paradox to verify each predictive model.
  • Conducted Root Cause Analysis and Factor Analysis on digital marketing performance to identify KPIs for email advertising by using Python SciPy.
  • Researched and developed Attribution Models using Python Scikit-Learn to find the correlation between the conversion rate and email advertising campaigns.
  • Evaluated and recommended the optimized time frequency and time duration for email advertising campaigns.
  • Used Tableau 9.x, R Shiny to create detail level summary reports and dashboards to technical and business stakeholders, by using KPI's and visualized trend analysis.

Environment: Anaconda3 2.x python 3.x, R 3.x, Hadoop 2.3, Spark, Hive, Impala, Cloudera, AWS, Linux, JuPyter, Tableau, SQL Server 2012

Confidential, Sioux Falls, SD

Data Engineer/ Data Scientist


  • Installed, configured and maintained Apache Hadoop clusters for application development and Hadoop tools like Hive, Pig, HBase, Zookeeper and Sqoop.
  • Configured MySQL Database to store Hive metadata.
  • Used Sqoop to import data from various relational data sources like MySQL into HDFS.
  • Responsible to manage customer data coming from different sources.
  • Managing and scheduling jobs on a Hadoop cluster using Oozie.
  • Implemented business logic by writing PIG UDF ’s in java and used various UDF s from Piggybank and other sources.
  • Developed Pig Latin scripts to extract the data from the web server output files to load into HDFS.
  • Analyzed the customer data by performing Hive queries to know user behavior.
  • Optimized Map Reduce Jobs to use HDFS efficiently by using various compression mechanisms.
  • Used different Serde’s for converting JSON data into pipe separated data.
  • Hands on experience in working with different file formats like Text file and Avro File Format.
  • Worked on partitioning and Bucketing the Hive table and running the scripts in parallel to reduce the run time of the scripts.
  • Involved in Hadoop cluster tasks like Adding and Removing Nodes without any effect to running jobs and data.
  • Hands on experience in configuring cluster on EC2 instances using Cloudera Manager.
  • Experience in creating tables on top of data on AWS S3 obtained from different data sources.
  • Used R for exploratory data analysis and build machine learning models.
  • Developed a model using multivariate regression, conducted Cohort Analysis and helped in increasing User Retention Rate and thereby increasing the total revenue.
  • Experience working with Time-series Analysis of the data.
  • Used ensemble models build with Xgboost to build the risk prediction application FusionRisk.
  • Created Dashboards using R shiny to communicate the results to the Business and Management Administrations.

Environment: Python 2.x, R 3.x, Cloudera Hadoop, MapReduce, HDFS, Hive, Java (jdk1.7), AWS EC2, Pig, Linux, XML. HBase, Zookeeper, Sqoop


Data Analyst/ QA Analyst


  • Migrated the system from old system CTL to OmniPay, a merchant solution-based application.
  • Developed many re-usable automated scripts to generate dummy transactions in both system integration phase and User Acceptance Testing Phase.
  • Wrote SQL server Mappings from old system to the new system using SQL server management studio.
  • Developed scripts to validate the batch files which are generated on a daily, weekly, fortnightly and monthly basis.
  • Worked closely with risk teams to find the default users and generated reports grouped by account transactions using Microsoft Excel.
  • Handled 400k+ total accounts of merchants and analyzed most profitable accounts to promote good will to the merchants.
  • Worked on development of SQL and stored procedures for normalization and renormalization in MYSQL.
  • Build SQL queries for performing various CRUD operations like create, update, read and delete.
  • Involved in testing migrated data field to field mapping.
  • Used HP ALM Quality Center for creating and tracking the defects that are identified during the different phases of testing.

Environment: Python 2.x, Java 1.7, Selenium WebDriver, TestNG, Spring MVC, HTML5, Apache Tomcat 8.0, MySQL, Eclipse, Microsoft Office


Python Developer/ Data Analyst


  • Developed a desktop Gui based CRM application using wxPython .
  • Assisted in reduction of cost and optimization of supplier selection for the CRM Applications.
  • Ensured high quality data collection and maintaining the integrity of the data.
  • Cleaned data and processed third party spending data into maneuverable deliverables within specific formats with Excel macros and python libraries.
  • Used several python libraries like wxPython, numPY and matPlotLib.
  • Was involved in environment, code installation as well as the SVN implementation.
  • Designed and developed data management system using MySQL .
  • Creating unit test/regression test framework for working/new code
  • This project also used other technologies like JQuery for java script manipulations, bootstrap for the front-end html layout.
  • Responsible for debugging and troubleshooting the web application.
  • Skilled in using Collections in Python for manipulating and looping through different user defined objects.
  • Engaged in Design, Development, Deployment, Testing, Implementation of the application.
  • Worked in development of applications in UNIX environment and familiar with all its commands.

Environment: Python 2.x, WxPython, MySQL, Eclipse, Microsoft Office, Unix, SVN

Hire Now