Data Scientist Resume Buffalo Groove, IL - Hire IT People

SUMMARY:

8+ Years of experience in the IT industry, concentrated in dealing with Data using Python and R.
Hands on experience in the entire data science project life cycle, including Data acquisition, Data Cleaning, Data Wrangling, Data Warehousing, Data Mining, Applying Machine learning algorithms, Validating and Visualizing.
Experience developing Proof of Concept(POC).
Experience and highly skilled in Financial and Retail Marketing Domains.
In - depth knowledge in programming languages Python, R and SQL.
Experience in building Churn Predictors using Logistic regression and Random Forest.
Experience in building Recommender system using Collaborative filtering and Content based filter.
Experience in conducting A/B Testing, analyzing Click through Rate(CTR) and Conversion Rate(CR) to evaluate the best fit recommender system.
Experience in model evaluation with various metrics like Root Mean Square Error(RMSE), Confusion Matrix, Precision-Recall (PR-Curve), AUC in ROC Curve.
Experience in tuning the models using L1(Lasso), L2(ridge) regularization in regression models to avoid overfitting.
Hands-on experience working with NLTK for Natural Language processing, Text Mining and other social media APIs.
Hands-on experience with Deep Learning frameworks TensorFlow, Keras, OpenCV and Caffe.
Experience in creating Dashboards and visualization charts using Tableau, Python (Matplotlib, Seaborn, Bokeh, pixiedust ) and R (ShinyR, Ggplot2).
Deep and extensive knowledge with HDFS, Spark, MapReduce, Pig, Hive, HBase, Sqoop, Storm, Yarn, Flume, Oozie, Zookeeper, Cassandra,etc.
Extensively used Pandas(Python) and dplyr(R) for Data Munging, Numpy and Scipy for numerical computations.
Well versed in machine learning algorithms such as Supervised Learning- Linear, Logistic and Penalized Linear regression, Decision Trees, Random Forest, Support Vector Machines, K nearest neighbors and Unsupervised learning- Clustering, K-means.
Experience in using Bagging and Boosting Ensemble Models like AdaBoost, XGBoost for improving the accuracy of the model.
Experience in conducting Market Basket analysis using Association and Principal Component Analysis.
Experience working with various Database- MySQL, SQLite, PostgreSQL and NoSQL- HDFS, MongoDB, Redis, Cassandra.
Hands-on experience working on Spark Core, Spark Sql, Spark Streaming and Spark Machine Learning (spark Mllib).
Experienced with distributions includes Cloudera CDH, Hortonworks HDP, and MapR Data Platform.
Strong Knowledge and experience processing structured, semi-structured and unstructured data and handled different file formats like delimited CSV, XML, JSON, Sequence files, AVRO, Parquet, ORC, etc.,
Working Knowledge of Cloud computing (AWS and GCP).
Experience in handling Ad-hoc requests and generating reports as per the need.

TECHNICAL SKILLS:

Python: Pandas, NumPy, Scikit-Learn, TensorFlow, Keras, SQLAlchemy, Matplotlib, Seaborn, Bokeh, SQLite, BeautifulSoup, regular expression(re), Urllib, JSON, boto3, Redis, Flask, Django, datetime, OS, pyqt

Tools: Dplyr, tidyR, ShinyR, ggplot2, Caret, H2O.

Database: MySQL, PostgreSQL, MongoDB, Redis, SQLite, Cassandra, HDFS

Big Data: Hadoop Ecosystem - Hive, Pig, MapReduce, Spark, Impala

Cloud: AWS (S3, Red Shift, EC2, EDH, Lambda), GCP (Compute Engine, Big Query, Dataflow, Auto ML)

IDE: Jupyter-lab, R Studio, Eclipse, Spyder, Pycharm, Atom, Notepad++, sublime

Other Technologies: Java, MATLAB, C, C++, Web Technologies (HTML, CSS, JavaScript, Bootstrap)

Environment: Anaconda, pyenv, virtualenv

V ersion Control: Git, svn

PROFESSIONAL EXPERIENCE:

Confidential, Buffalo Groove, IL

Data Scientist

Responsibilities:

Developed a Recommender System using collaborative filtering and content- based filtering, based on TF-IDF vectorization and Cosine Similarity, increasing Accessory Sales by 8% and Cross-sell by 15%.
Used Redis to store and retrieve the Key-value pairs to show the recommender results.
Used NLTK Stem class for natural language processing and stemming the product names.
Integrated the Recommender system in the eCommerce website which is developed using Python Flask Framework and Rest API.
Performed Market Basket analysis, bundled it with Product Clustering analysis to identify products that are more likely in the same basket and to make product offer selections for cross-sell and up-sell marketing.
Experimented A/B Testing of Recommender System to find the best suit model analyzing Click through Rate and Conversion Rate using Google Analytics.
Developed a Proof of Concept(POC) for email campaigning using Deep learning frameworks TensorFlow and Keras.
Collected Data from historic records, web scraping, web crawling and through other public data records.
Extensively used R and Python to extract, clean, transform, impute and analyze the data.
Analyzed the Customer Life Time Value and worked closely with marketing teams to improve the retention rate.
Handled Ad-Hoc requests from Business such as extracting required data from MSSQL databases or converting data to understandable formats(EXCEL).
Created dashboards and visualizations using Python Bokeh, wordcloud and Matplotlib to communicate the analysis reports to business and Management teams.

Environment: Anaconda3 4.x python 3.x, R 3.x, MSSQL, Redis, Deep Learning, Natural language Processing, Google Analytics, Linux, Microsoft Excel, Jupyter, Flask.

Confidential, Plano, TX

Data Scientist

Responsibilities:

Worked and enhanced the performance of Fraud Detection Model using Ensemble Models by 0.5% accuracy.
Worked with different models K-Means Clustering, SVN for Anomaly detection to detect fraud transactions.
Worked with different models to compare the best fit and used Bagging, boosting and stacking of models to get better results.
Enhanced model helped in substantial reduction of False-Positives which led to an increase in overall revenue and customer satisfaction.
Analyzed Cohort Analysis and developed a Churn Prediction Model using Logistic regression and Na ïve Bayes Classifier there by identifying factors to improve the sales and retain customers.
Developed a personalized CLIP Model ( Credit Line increase program ) based on User history and Eigen vector values using Back Propagation Neural Networks .
Evaluated the models using RMSE, Confusion Matrix, ROC and AUC in both dev and production environments.
Created Big Data Cloudera EDH clusters on AWS and managed the clusters using Cloudera Director.
Used HIVE to analyze the partitioned and bucketed data and compute various metrics for reporting.
Worked on creating RDD’s and applying Transformations and Actions on them. Good at applying Spark filter conditions on data and worked on joins in Spark .
Used Spark DataFrame API to perform analytics on hive data and implemented various checkpoints on RDD's to disk to handle job failures and debugging.
Developed Spark jobs written in Python to perform operations like aggregation, data processing and data analysis.
Hands on experience in handling Hive tables using Spark SQL .
Designed and implemented cross-validation and statistical tests including Hypothetical Testing, ANOVA, Auto-correlation, Simpson’s Paradox to verify each predictive model.
Conducted Root Cause Analysis and Factor Analysis on digital marketing performance to identify KPIs for email advertising by using Python SciPy.
Researched and developed Attribution Models using Python Scikit-Learn to find the correlation between the conversion rate and email advertising campaigns.
Evaluated and recommended the optimized time frequency and time duration for email advertising campaigns.
Used Tableau 9.x, R Shiny to create detail level summary reports and dashboards to technical and business stakeholders, by using KPI's and visualized trend analysis.

Environment: Anaconda3 2.x python 3.x, R 3.x, Hadoop 2.3, Spark, Hive, Impala, Cloudera, AWS, Linux, JuPyter, Tableau, SQL Server 2012

Confidential, Sioux Falls, SD

Data Engineer/ Data Scientist

Responsibilities:

Installed, configured and maintained Apache Hadoop clusters for application development and Hadoop tools like Hive, Pig, HBase, Zookeeper and Sqoop.
Configured MySQL Database to store Hive metadata.
Used Sqoop to import data from various relational data sources like MySQL into HDFS.
Responsible to manage customer data coming from different sources.
Managing and scheduling jobs on a Hadoop cluster using Oozie.
Implemented business logic by writing PIG UDF ’s in java and used various UDF s from Piggybank and other sources.
Developed Pig Latin scripts to extract the data from the web server output files to load into HDFS.
Analyzed the customer data by performing Hive queries to know user behavior.
Optimized Map Reduce Jobs to use HDFS efficiently by using various compression mechanisms.
Used different Serde’s for converting JSON data into pipe separated data.
Hands on experience in working with different file formats like Text file and Avro File Format.
Worked on partitioning and Bucketing the Hive table and running the scripts in parallel to reduce the run time of the scripts.
Involved in Hadoop cluster tasks like Adding and Removing Nodes without any effect to running jobs and data.
Hands on experience in configuring cluster on EC2 instances using Cloudera Manager.
Experience in creating tables on top of data on AWS S3 obtained from different data sources.
Used R for exploratory data analysis and build machine learning models.
Developed a model using multivariate regression, conducted Cohort Analysis and helped in increasing User Retention Rate and thereby increasing the total revenue.
Experience working with Time-series Analysis of the data.
Used ensemble models build with Xgboost to build the risk prediction application FusionRisk.
Created Dashboards using R shiny to communicate the results to the Business and Management Administrations.

Environment: Python 2.x, R 3.x, Cloudera Hadoop, MapReduce, HDFS, Hive, Java (jdk1.7), AWS EC2, Pig, Linux, XML. HBase, Zookeeper, Sqoop

Confidential

Data Analyst/ QA Analyst

Responsibilities:

Migrated the system from old system CTL to OmniPay, a merchant solution-based application.
Developed many re-usable automated scripts to generate dummy transactions in both system integration phase and User Acceptance Testing Phase.
Wrote SQL server Mappings from old system to the new system using SQL server management studio.
Developed scripts to validate the batch files which are generated on a daily, weekly, fortnightly and monthly basis.
Worked closely with risk teams to find the default users and generated reports grouped by account transactions using Microsoft Excel.
Handled 400k+ total accounts of merchants and analyzed most profitable accounts to promote good will to the merchants.
Worked on development of SQL and stored procedures for normalization and renormalization in MYSQL.
Build SQL queries for performing various CRUD operations like create, update, read and delete.
Involved in testing migrated data field to field mapping.
Used HP ALM Quality Center for creating and tracking the defects that are identified during the different phases of testing.

Environment: Python 2.x, Java 1.7, Selenium WebDriver, TestNG, Spring MVC, HTML5, Apache Tomcat 8.0, MySQL, Eclipse, Microsoft Office

Confidential

Python Developer/ Data Analyst

Responsibilities:

Developed a desktop Gui based CRM application using wxPython .
Assisted in reduction of cost and optimization of supplier selection for the CRM Applications.
Ensured high quality data collection and maintaining the integrity of the data.
Cleaned data and processed third party spending data into maneuverable deliverables within specific formats with Excel macros and python libraries.
Used several python libraries like wxPython, numPY and matPlotLib.
Was involved in environment, code installation as well as the SVN implementation.
Designed and developed data management system using MySQL .
Creating unit test/regression test framework for working/new code
This project also used other technologies like JQuery for java script manipulations, bootstrap for the front-end html layout.
Responsible for debugging and troubleshooting the web application.
Skilled in using Collections in Python for manipulating and looping through different user defined objects.
Engaged in Design, Development, Deployment, Testing, Implementation of the application.
Worked in development of applications in UNIX environment and familiar with all its commands.

Environment: Python 2.x, WxPython, MySQL, Eclipse, Microsoft Office, Unix, SVN

We provide IT Staff Augmentation Services!

Data Scientist Resume

Buffalo Groove, IL

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship