- 8+ Years of experience in the IT industry, concentrated in dealing with Data using Python and R.
- Hands on experience in the entire data science project life cycle, including Data acquisition, Data Cleaning, Data Wrangling, Data Warehousing, Data Mining, Applying Machine learning algorithms, Validating and Visualizing.
- Experience developing Proof of Concept(POC).
- Experience and highly skilled in Financial and Retail Marketing Domains.
- In - depth knowledge in programming languages Python, R and SQL.
- Experience in building Churn Predictors using Logistic regression and Random Forest.
- Experience in building Recommender system using Collaborative filtering and Content based filter.
- Experience in conducting A/B Testing, analyzing Click through Rate(CTR) and Conversion Rate(CR) to evaluate the best fit recommender system.
- Experience in model evaluation with various metrics like Root Mean Square Error(RMSE), Confusion Matrix, Precision-Recall (PR-Curve), AUC in ROC Curve.
- Experience in tuning the models using L1(Lasso), L2(ridge) regularization in regression models to avoid overfitting.
- Hands-on experience working with NLTK for Natural Language processing, Text Mining and other social media APIs.
- Hands-on experience with Deep Learning frameworks TensorFlow, Keras, OpenCV and Caffe.
- Experience in creating Dashboards and visualization charts using Tableau, Python (Matplotlib, Seaborn, Bokeh, pixiedust ) and R (ShinyR, Ggplot2).
- Deep and extensive knowledge with HDFS, Spark, MapReduce, Pig, Hive, HBase, Sqoop, Storm, Yarn, Flume, Oozie, Zookeeper, Cassandra,etc.
- Extensively used Pandas(Python) and dplyr(R) for Data Munging, Numpy and Scipy for numerical computations.
- Well versed in machine learning algorithms such as Supervised Learning- Linear, Logistic and Penalized Linear regression, Decision Trees, Random Forest, Support Vector Machines, K nearest neighbors and Unsupervised learning- Clustering, K-means.
- Experience in using Bagging and Boosting Ensemble Models like AdaBoost, XGBoost for improving the accuracy of the model.
- Experience in conducting Market Basket analysis using Association and Principal Component Analysis.
- Experience working with various Database- MySQL, SQLite, PostgreSQL and NoSQL- HDFS, MongoDB, Redis, Cassandra.
- Hands-on experience working on Spark Core, Spark Sql, Spark Streaming and Spark Machine Learning (spark Mllib).
- Experienced with distributions includes Cloudera CDH, Hortonworks HDP, and MapR Data Platform.
- Strong Knowledge and experience processing structured, semi-structured and unstructured data and handled different file formats like delimited CSV, XML, JSON, Sequence files, AVRO, Parquet, ORC, etc.,
- Working Knowledge of Cloud computing (AWS and GCP).
- Experience in handling Ad-hoc requests and generating reports as per the need.
Python: Pandas, NumPy, Scikit-Learn, TensorFlow, Keras, SQLAlchemy, Matplotlib, Seaborn, Bokeh, SQLite, BeautifulSoup, regular expression(re), Urllib, JSON, boto3, Redis, Flask, Django, datetime, OS, pyqt
Tools: Dplyr, tidyR, ShinyR, ggplot2, Caret, H2O.
Database: MySQL, PostgreSQL, MongoDB, Redis, SQLite, Cassandra, HDFS
Big Data: Hadoop Ecosystem - Hive, Pig, MapReduce, Spark, Impala
Cloud: AWS (S3, Red Shift, EC2, EDH, Lambda), GCP (Compute Engine, Big Query, Dataflow, Auto ML)
IDE: Jupyter-lab, R Studio, Eclipse, Spyder, Pycharm, Atom, Notepad++, sublime
Environment: Anaconda, pyenv, virtualenv
V ersion Control: Git, svn
Confidential, Buffalo Groove, IL
- Developed a Recommender System using collaborative filtering and content- based filtering, based on TF-IDF vectorization and Cosine Similarity, increasing Accessory Sales by 8% and Cross-sell by 15%.
- Used Redis to store and retrieve the Key-value pairs to show the recommender results.
- Used NLTK Stem class for natural language processing and stemming the product names.
- Integrated the Recommender system in the eCommerce website which is developed using Python Flask Framework and Rest API.
- Performed Market Basket analysis, bundled it with Product Clustering analysis to identify products that are more likely in the same basket and to make product offer selections for cross-sell and up-sell marketing.
- Experimented A/B Testing of Recommender System to find the best suit model analyzing Click through Rate and Conversion Rate using Google Analytics.
- Developed a Proof of Concept(POC) for email campaigning using Deep learning frameworks TensorFlow and Keras.
- Collected Data from historic records, web scraping, web crawling and through other public data records.
- Extensively used R and Python to extract, clean, transform, impute and analyze the data.
- Analyzed the Customer Life Time Value and worked closely with marketing teams to improve the retention rate.
- Handled Ad-Hoc requests from Business such as extracting required data from MSSQL databases or converting data to understandable formats(EXCEL).
- Created dashboards and visualizations using Python Bokeh, wordcloud and Matplotlib to communicate the analysis reports to business and Management teams.
Environment: Anaconda3 4.x python 3.x, R 3.x, MSSQL, Redis, Deep Learning, Natural language Processing, Google Analytics, Linux, Microsoft Excel, Jupyter, Flask.
Confidential, Plano, TX
- Worked and enhanced the performance of Fraud Detection Model using Ensemble Models by 0.5% accuracy.
- Worked with different models K-Means Clustering, SVN for Anomaly detection to detect fraud transactions.
- Worked with different models to compare the best fit and used Bagging, boosting and stacking of models to get better results.
- Enhanced model helped in substantial reduction of False-Positives which led to an increase in overall revenue and customer satisfaction.
- Analyzed Cohort Analysis and developed a Churn Prediction Model using Logistic regression and Na ïve Bayes Classifier there by identifying factors to improve the sales and retain customers.
- Developed a personalized CLIP Model ( Credit Line increase program ) based on User history and Eigen vector values using Back Propagation Neural Networks .
- Evaluated the models using RMSE, Confusion Matrix, ROC and AUC in both dev and production environments.
- Created Big Data Cloudera EDH clusters on AWS and managed the clusters using Cloudera Director.
- Used HIVE to analyze the partitioned and bucketed data and compute various metrics for reporting.
- Worked on creating RDD’s and applying Transformations and Actions on them. Good at applying Spark filter conditions on data and worked on joins in Spark .
- Used Spark DataFrame API to perform analytics on hive data and implemented various checkpoints on RDD's to disk to handle job failures and debugging.
- Developed Spark jobs written in Python to perform operations like aggregation, data processing and data analysis.
- Hands on experience in handling Hive tables using Spark SQL .
- Designed and implemented cross-validation and statistical tests including Hypothetical Testing, ANOVA, Auto-correlation, Simpson’s Paradox to verify each predictive model.
- Conducted Root Cause Analysis and Factor Analysis on digital marketing performance to identify KPIs for email advertising by using Python SciPy.
- Researched and developed Attribution Models using Python Scikit-Learn to find the correlation between the conversion rate and email advertising campaigns.
- Evaluated and recommended the optimized time frequency and time duration for email advertising campaigns.
- Used Tableau 9.x, R Shiny to create detail level summary reports and dashboards to technical and business stakeholders, by using KPI's and visualized trend analysis.
Environment: Anaconda3 2.x python 3.x, R 3.x, Hadoop 2.3, Spark, Hive, Impala, Cloudera, AWS, Linux, JuPyter, Tableau, SQL Server 2012
Confidential, Sioux Falls, SD
Data Engineer/ Data Scientist
- Installed, configured and maintained Apache Hadoop clusters for application development and Hadoop tools like Hive, Pig, HBase, Zookeeper and Sqoop.
- Configured MySQL Database to store Hive metadata.
- Used Sqoop to import data from various relational data sources like MySQL into HDFS.
- Responsible to manage customer data coming from different sources.
- Managing and scheduling jobs on a Hadoop cluster using Oozie.
- Implemented business logic by writing PIG UDF ’s in java and used various UDF s from Piggybank and other sources.
- Developed Pig Latin scripts to extract the data from the web server output files to load into HDFS.
- Analyzed the customer data by performing Hive queries to know user behavior.
- Optimized Map Reduce Jobs to use HDFS efficiently by using various compression mechanisms.
- Used different Serde’s for converting JSON data into pipe separated data.
- Hands on experience in working with different file formats like Text file and Avro File Format.
- Worked on partitioning and Bucketing the Hive table and running the scripts in parallel to reduce the run time of the scripts.
- Involved in Hadoop cluster tasks like Adding and Removing Nodes without any effect to running jobs and data.
- Hands on experience in configuring cluster on EC2 instances using Cloudera Manager.
- Experience in creating tables on top of data on AWS S3 obtained from different data sources.
- Used R for exploratory data analysis and build machine learning models.
- Developed a model using multivariate regression, conducted Cohort Analysis and helped in increasing User Retention Rate and thereby increasing the total revenue.
- Experience working with Time-series Analysis of the data.
- Used ensemble models build with Xgboost to build the risk prediction application FusionRisk.
- Created Dashboards using R shiny to communicate the results to the Business and Management Administrations.
Environment: Python 2.x, R 3.x, Cloudera Hadoop, MapReduce, HDFS, Hive, Java (jdk1.7), AWS EC2, Pig, Linux, XML. HBase, Zookeeper, Sqoop
Data Analyst/ QA Analyst
- Migrated the system from old system CTL to OmniPay, a merchant solution-based application.
- Developed many re-usable automated scripts to generate dummy transactions in both system integration phase and User Acceptance Testing Phase.
- Wrote SQL server Mappings from old system to the new system using SQL server management studio.
- Developed scripts to validate the batch files which are generated on a daily, weekly, fortnightly and monthly basis.
- Worked closely with risk teams to find the default users and generated reports grouped by account transactions using Microsoft Excel.
- Handled 400k+ total accounts of merchants and analyzed most profitable accounts to promote good will to the merchants.
- Worked on development of SQL and stored procedures for normalization and renormalization in MYSQL.
- Build SQL queries for performing various CRUD operations like create, update, read and delete.
- Involved in testing migrated data field to field mapping.
- Used HP ALM Quality Center for creating and tracking the defects that are identified during the different phases of testing.
Environment: Python 2.x, Java 1.7, Selenium WebDriver, TestNG, Spring MVC, HTML5, Apache Tomcat 8.0, MySQL, Eclipse, Microsoft Office
Python Developer/ Data Analyst
- Developed a desktop Gui based CRM application using wxPython .
- Assisted in reduction of cost and optimization of supplier selection for the CRM Applications.
- Ensured high quality data collection and maintaining the integrity of the data.
- Cleaned data and processed third party spending data into maneuverable deliverables within specific formats with Excel macros and python libraries.
- Used several python libraries like wxPython, numPY and matPlotLib.
- Was involved in environment, code installation as well as the SVN implementation.
- Designed and developed data management system using MySQL .
- Creating unit test/regression test framework for working/new code
- This project also used other technologies like JQuery for java script manipulations, bootstrap for the front-end html layout.
- Responsible for debugging and troubleshooting the web application.
- Skilled in using Collections in Python for manipulating and looping through different user defined objects.
- Engaged in Design, Development, Deployment, Testing, Implementation of the application.
- Worked in development of applications in UNIX environment and familiar with all its commands.
Environment: Python 2.x, WxPython, MySQL, Eclipse, Microsoft Office, Unix, SVN