- 8+ years of experience in Data Science with expertise in Descriptive, Predictive and Prescriptive analytics.
- Used Pandas, Numpy, Seaborn, Scipy, Matplotlib, Scikit - learn in Python for data manipulation and developing various machine learning algorithms.
- Experience working with senior stake holders to understand the business requirements and present actionable data insights to Senior management.
- Experience working with large data sources (5Bn rows+); interpret and communicate insights and findings from analysis and experiments to both technical and non-technical audiences.
- Strong computational background (complimented by Statistics/Math/Algorithmic expertise), healthy portfolio of projects dealing with Big Data, solid understanding of machine learning algorithms, and with a love for finding meaning in multiple imperfect, mixed, varied, and inconsistent data sets.
Python: pandas, numpy, scikit-learn, scipy, statsmodels, ggplot2
R: caret, glmnet, forecast, xgboost, rpart, survival, arules, sqldf, dplyr, ggplot
Other: Alteryx, Knime and Weka
Databases/ETL/Query: Teradata, SQL Server, Postgres and Hadoop; SQL, Hive
Visualization: Tableau, ggplot2 and RShiny
Prototyping/POC/POV: RShiny, Tableau, and PowerPoint
Confidential, Plano, TX
- Exploring the Truck repair and maintenance data to give hidden insights to the business team.
- Constant update of the insights and patterns in to data to business team to check if the analysis/outliers.
- Building predictive models to forecast the trucks to retire.
- Some stores are given special privileges to that of others.
- Evaluated the performance of those stores by comparing them to the similar stores that are in rest of the retail chain.
- Used clustering to find the similar stores.
- Reported these results in tableau and checked for the percentage change in sales of the test stores in Tableau.
- Evaluated the Data Science tool Data Robot to automate the machine learning process and to build the predictive models.
Confidential, San Jose, CA
- Developed Python scripts to automate the loading of the data from Rally into Oracle database.
- Understanding the business requirements and collecting the required data.
- Extraction of the data from Oracle and HDFS.
- Performing cleaning of the data, treating missing values, outliers in Python.
- Performing exploratory data analysis to gather insights from the customer transactional data.
- Creating Tableau Dashboards from the data and according to the business requirements.
Confidential, Bohemia, NY
- Developed classification machine learning models in python that predicted purchase propensity of customers based on customer attributes such as demographics - education, income, age, geography, historic purchases and other related attributes. These models helped in predicting the existing customers propensity to purchase an iPhone after a new iPhone launch.
- Developed classification models to predict the likelihood of customer churn based on customer attributes like customer size, revenue, type of industry, competitor products and growth rates etc. The models deployed in production environment helped detect churn in advance and aided sales/marketing teams plan for various retention strategies like price discounts, custom licensing plans etc.
- Projected customer lifetime values based on historic customer usage and churn rates using survival models. Understanding customer lifetime values helped business to establish strategies to selectively attract customers who tend to be more profitable for Apple. It also helped business to establish appropriate marketing strategies based on customer values.
- To identify the groups of similar stores, used K-means clustering. These segments are used to select, test and control stores for various experiments involving compensation.
- Also used clustering techniques to segment the global iTunes population by content type.
- Improved sales/demand forecast accuracy by 20-25% by implementing advanced forecasting algorithms that were effective in detecting seasonality and trends in the patterns in addition to incorporating exogenous covariates. Increased accuracy helped business plan better with respect to budgeting and sales and operations planning.
- Implemented market basket algorithms from transactional data, which helped identify products ordered together frequently. Discovering frequent product sets helped unearth Cross sell and Upselling opportunities and led to better pricing, bundling and promotion strategies for sales and marketing teams.
Confidential, Austin, TX
- Developed a personalized coupon recommender system using recommender algorithms (collaborative filtering, low rank matrix factorization) in python that recommended best offers to a user based on similar user profiles. The recommendations enabled users to engage better and helped improving the overall user retention rates.
- Developed a lead scoring system by modeling the users based on company size, industry segment, job title or geographic location using supervised learning algorithms. Scoring leads led to increased sales efficiency and effectiveness, increased marketing effectiveness and tighter marketing and sales alignment.
- Designed the Data Warehouse and MDM hub Conceptual, Logical and Physical data models
- Used Normalization methods up to 3NF and De-normalization techniques for effective performance in OLTP and OLAP systems. Generated DDL scripts using Forward Engineering techniques to create objects and deploy them into the database.
Confidential, Charlotte, NC
- Automated the scraping and cleaning of data from various data sources in R and Python. Developed Banks’s loss forecasting process using relevant forecasting and regression algorithms in R.
- The projected losses under stress conditions helped bank reserve enough funds per DFAST policies.
- Developed several interactive dashboards in Tableau to visualize 8 billion rows (1.2 TB) credit data by designing a scalable data cube structure.
- Built credit risk scorecards and marketing response models using SQL and SAS. Evangelized the complex technical analysis into easily digestible reports for top executives in the bank.
Data Modeler/Data Analyst
- Designed scalable processes to collect, manipulate, present, and analyze large datasets in a production ready environment, using Akamai's big data platform.
- Achieved a broad spectrum of end results putting into action the ability to find, and interpret rich data sources, merge data sources together, ensure consistency of data-sets, create visualizations to aid in understanding data, build mathematical models using the data, present and communicate the data insights/findings to specialists and scientists in their team.
- Implemented full lifecycle in Data Modeler/Data Analyst, Data warehouses and DataMart’s with Star Schemas, Snowflake Schemas, and SCD& Dimensional Modeling Erwin. Performed data mining on data using very complex SQL queries and discovered pattern and used extensive SQL for data profiling/analysis to provide guidance in building the data model.
Data Analyst /Data Modeler
- Worked with SME's and other stakeholders to determine the requirements to identify Entities and Attributes to build Conceptual, Logical and Physical Data Models.
- Used Star Schema methodologies in building and designing the logical data model into Dimensional Models extensively.
- Developed Star and Snowflake schemas based dimensional model to develop the data warehouse.
- Designed Context Flow Diagrams, Structure Chart and ER- diagrams.