Senior Data Architect Resume

OBJECTIVE:

To work as a data scientist and excel in Analytics and Big Data

EXECUTIVE SUMMARY:

Over 7 years of experience working as a Data Scientist and Big Data
Over 10 years of experience in implementing data warehouse solutions for extremely large telecom and oil and gas companies. Over 20 years of experience with Databases.
Very good communication skills, presentation skills and overall knowledge of the industry have helped me in making many successful presentations, speaking engagements at conferences and trade - shows.

SKILLS:

Big Data Technology: Cloudera Hadoop, Hive, Pig, Python, Mahout, MongoDB, Amazon EMR

Statistics: R, Matlab, RapidMiner, Weka, SAS

Data Warehouse & Visualization: Oracle, SQL,BO, Qlickview and Tableau

EXPERIENCE DETAILS:

Confidential

Senior Data Architect

Responsibilities:

Responsible for managing the predictive analytics program for Commercial Marketing
Cross - Sell, Retention and Market Basket Models
Lead the creation, deployment and Adoption of Cross-Sell, Retention and Market Basket Models across the organization.

Confidential

Sr. Data Scientist

Responsibilities:

Created models to predict electricity consumption patterns and detect usage fraud
Prediction of Substation Usage
Designed a multi variate regression based model in R to predict substation usage based on historical usage and temperature and humidity.
Prediction of Fraud
Designed a decision tree regression model in Python to predict commercial accounts with high probability of committing fraud.

Confidential

Sr. Data Scientist

Responsibilities:

Created models to predict customer behavior and detect fraud
Prediction of Abandonment
Designed a model in Python to whether a visitor to the web site will finally purchase the tax product based on his activity in the first session.
The Direct Marketing team is currently is using the results of the model to contact visitors who are predicted to abandon.
Also created a Tableau dashboard to predict trends and display metrics related to this model.
Prediction of Fraud
Designed a model in Python to predict fraudulent activities.

Confidential

Techno Analytics Chief Technology Officer and Chief Data Scientist

Responsibilities:

Designed a model in Python to predict the Confidential category of spend data using a model based on Naïve Bayesian.
The model was trained using 5 years of data.
The forecasted category was accurate with high confidence 92% (vs 93% manual) of the time and was accurate 98% (when selecting the second probable outcome).
This reduced manual classification effort by 90%. It also helped in streamlining the variables used for creating rules (reducing to 5 from 30).
The system also detected rules that were incorrect and also reduced redundant rules.
This system was implemented using Hadoop streaming in a Cloudera cluster.
Migrated an existing process of categorizing of spend data based on rules from Oracle to Hadoop/Hive on Cloudera (2 master and 4 data).
The execution time was reduced from 12 hours to 10 mins.
Implemented the same application using Python and Hadoop streaming.
Designed a Python (using Numpy, NLTK, Matlablib) based application to extract the document type (Contract, Change Request, Call Offs, etc.) from 10000 pdf documents.
The application accurately classified 98% of the documents and also detected that 2% of the original classification was incorrect.
The supplier name was extracted from the agreement using NGrams and mapped to existing suppliers.
A Credit reporting agency had more than 50% abandonment rate at the point when the user was required to provide the social security number (so that the FICA score can be obtained).
Created a model that would estimate the FICA score based on revolving credit, no. of credit card rejections, annual income, housing loans, auto loans.
The model was developed using R.
A leading oil and gas company did not have any Master Data Management System.
Created a model to map similar GL code, Material Code to a master code. This helped in increasing the accuracy of the categorized spend data.
Text cleansing and distance algorithms were used. The probable similar master data were selected by clustering similar spend. This was developed using R and Python.
Designed an application to capture public sentiments about key vendors. The system captured the tweets for the selected supplier from Tweeter. The stop words were removed and the words were then stemmed and tokenized. These tweets were then classified as positive or negative based on sentiment using Mahout (Na ve Bayesian) on Hadoop. The data used to train Mahout was first classified by a scoring algorithm using R and then further classified manually (thus reducing the overall manual effort). The data was stored in MongoDB. BP is planning to use this system for Risk analysis of their key vendors. This application was hosted on Amazon and used Mahout on Amazon Elastic Map Reduce.
Redesigned a data - warehouse in Oracle for storing spend data. The analytics reporting was done using BO. The data-warehouse is being used by BP to track their about $100 Billion of their yearly global spend. Have managed a team of 40.
Analyzed the AMEX spend by BP using Tableau.
This analysis has resulted in significant cost savings for BP.
Reduced the number of redundant suppliers by comparing suppliers using text analytics (string distance).
Suppliers that differ by small distances were grouped together. This list was verified by using Google API to get the correct supplier name.
Created an algorithm to select the best N matches from the existing patent records for a proposed patents.

Chief Architect

Confidential

Responsibilities:

Worked with the users to determine the key RA KPIs for Wireless, Wireline and Long Distance business units.
End to End process analysis using COSA’s Risk and Control covering the Network Objects, Mediation, Billing, Interconnect and Payments,
Implemented the KPIs on a DB2 database. The database size was over 1000 TB.
Designed the physical and logical star schema datawarehouse.
Created a Business Object Universe for the users for Ad-hoc queries.
Migrated the data structure from DB2 to Oracle.
Designed a BI solution for Premium Content Portal.
Implemented Customer Experience Management process that resulted in added revenue and customer retention.
Created CEM dashboards using Qlikview.
Worked with the users to determine the key RA KPIs for the Prepaid business.
Implemented “Load on Demand” methodology for accessing detail CDRs.
Designed the database on Oracle.
Defined Backup and Recovery strategy.
Implemented Fraud Alarms for the operator that enabled them to catch a big fraud operation.
Architected the complete fraud solution.
Implemented a physical design that delivers fraud and RA from the same oracle data warehouse.
Lead the migration of the product from a flat schema to a star schema data warehouse.
Played the key role in designing the first Fraud solution of the company.
Architected a unified solution for Fraud and RA.
Worked on integration of the company’s CEM product with the RA and FMS product line.

We provide IT Staff Augmentation Services!

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship