Data Scientist Resume , Irving, TX - Hire IT People

SUMMARY:

Accomplished and high performing expert with deep expertise in the application of data science, machine learning, deep learning, advanced analytics, business intelligence, data mining, and statistics in across industries
Skilled in the conception and execution of strategic plans, and the architecture of associated tactics including performance benchmarking against key operational targets/goals
Creative problem solver with a unique mix of technical, business, and research proficiency that lends itself to developing key strategies and world - class solutions with significant impact on revenue and ROI.
Excellent written, verbal, interpersonal communicator and negotiator. Skilled in creating and delivering easily understood PowerPoint presentations of complex concepts to audiences of all levels and sizes.
Highly committed to a personal and management style embracing continuous learning and improvement. Excellent knowledge of, and experience with a wide range of software applications, computer operating systems.
True leader and mentor who inspires confidence in employees by empowering them to do their jobs and respecting them for their accomplishments. Foster encouraging environment for creative thinking and innovative, real-world solutions to complex business and technical challenges.
Accomplished mobile app developer for the iPhone and Android applications.
Good at software engineering and problem solving with good analytical skills.

AREAS OF EXPERTISE:

Machine Learning
Statistics
Multivariate Analysis
Data Mining
Predictive Modeling
Custom Algorithms
Social Media Analysis
Social Network Analysis
Segmentation
Clustering
Probability
Predicting next possible word
Spark SQL
HortonWorks
Auto Encoder
Stochastic Gradient Descent
Sensitivity, Specificity
PCA
LDA
TPR
FPR
PPV
Accuracy
R
Python
Natural Language Processing
Text Processing
Business Intelligence
Data Visualization (Tableau)
ScikitLearn,Pandas, Numpy
Speech Synthesis
ScikitFlow, TensorFlow
Spark MLLib
ImageProcessing
Audio analysis
Video processing
Text Classification
Hive
Keras
RNN
CNN
DNN
DBN
Regularization
ROC curve
AUC
Precision
Recall
LSI
Artificial Intellignce
Experimental Design Application Development
Statistical Software (R, Python.)
Tensor Flow
Probabilistic Modeling
Distributed Processing Systems
Reports Generation
Deep Learning
Online Machine Learning
Batch Machine Learning
Text Synthesize
Text Analytics
TensorFlow
Theano
GAN, LSTM, GRU
Overfitting, under fitting
Confusion matrix
Computer Vision
Bias variance trade off sparkSQL
Scala
PySpark
HDFS
Unix
Scala

TECHNICAL CAPABILITIES:

Data scientist, advanced analytics, business intelligence, big data, R, Python, SQL, Knime, Tableau, RapidMiner, SAS, SPSS, HortonWorks, Spark, RServer, NoSQL, artificial intelligence / machine learning (support vector machines, neural networks, random forests, genetic algorithms decision trees, association, clustering, supervised and unsupervised learning, Monte Carlo Simulations, kNN, linear regression, logistic regression, dimensionality reduction (factor analysis, principal component analysis), time-series analysis, collaborative filtering ), Neural Networks( Perceptron, Artificial Neural Networks, Feed Forward, Back Propagation, CNN(Convolutional Neural Networks), RNN(Recurrent Neural Networks), Deep Convolutional Network, DBN (Deep Belief Networks), Hopfield Network, Markov Chains, Auto Encoders, LSTM (Long Short Term Memory), RBM (Restricted Boltzmann Machine), Auto Encoders, Generative Adversial Networks, Markov Chain, classification algorithms, cluster analysis, CART, decision trees, random forests, classification trees, multivariate statistics, experimental design, predictive modeling, regression (logistic, linear, non-linear), sampling techniques, segmentation (behavioral, attitudinal, demographic, psychographic, geographic), data mining, profiling, forecasting, conjoint, correspondence analysis, structural equation modeling, discriminant analysis, multidimensional analysis), marketing mix modeling, Bayesian analytics, hypothesis testing (means difference, proportions difference, ANOVA, Chi square, non-parametric), social network analysis, survey design and analysis, systems modeling, text mining, natural language processing, data visualization, dashboards, customized algorithms, customer lifetime value, Google analytics, market research, network analysis, page tagging, web analytics, social media analytics, warehouse data mart architecture, R, Python, Scala, Java, PyCharm, Anaconda, Spark, PySpark, Spark Mllib, spark ml, SparkSQL, HDFS, Hive, Theano, Keras, TensorFlow.

PROFESSIONAL EXPERIENCE:

Data Scientist

Confidential, Irving, TX

Responsibilities:

This model is to predict customer propensity to call customer service center for issues related to billing or service etc. It is a binary classification (yes or no) problem.
We have used the customer billing information for training the model and call history.
Our model predict how likely a customer will call (propensity model).
I have developed the base model in python.
Then we migrate it to Spark distributed environment, by implementing the model in spark Scala mllib.
Hands on experience in spark, hive, R, Python and PySpark.
Has done extensive exploratory analysis to study data patterns to find the customer call patterns.
Implemented Data Cleaning, Feature transformations, Feature selection on the data.
Designed multiple experiments to find insights from different data sets such as controlled and observed data as well to find the best model from various classification models random forest, Support vector machines and logistic regression to find the best model.
Built model using logistic regression which can give probability as an output to show the likelihood of customer making a call.
Built confusion matrix to see the classification metrics with TP, TN, FP, FN values.
Used Stochastic Gradient Descent optimization to improve the model performance.
To avoid overfitting, optimized regularization parameters by analyzing bias and variance errors.
We saved model which can be reused later.
Objective is to reduce no of calls to customer support by addressing the customers concerns before they call us.
The model will be trained on the call conversation summary text from customer support center.
Our model predict how likely a customer will call for a specific reason. Its multi class text classification problem.
I have developed the multi class text classification model using Support Vector Machines.
Input for this model is text data which has to be transformed before given to the classification model.
The representative notes will be cleaned, transformed before fitting to the model.
We have applied dictionary, stop word remover to the text. Then applied CountVectorizer and TF IDF.
Designed experiments with logistic regression and Support Vector machines to find the best model.
We have applied SVM model for classification of this data, this is all done in python as a Proof of concept before migrating to spark cluster as spark Scala model.
Built application in Scala to apply this model on 1 million representative notes in spark cluster.
Build metrics with confusion matrix to study the classification model performance.
Implemented SGD optimization technique to improve the model performance.
Implemented hyper parameter tuning to find better parameters to enhance model performance.
Saved model so that we can eliminate training process daily as we run model daily for classification purpose.
Classification results will be saved back to hive tables for later retrieval.
I have used designed multiple experiments with various data sets and various machine learning libraries from python, R, spark MLLib.
I have developed models in Python, R as POCs. I have worked end to end to promote these pocs to distributed systems as we have spark in Hortonworks Hadoop environment which enable the model to learn from large amount of data which is compulsory to enhance and get better results from the model.
Provided guidance and recommendations about machine learning techniques, software tools, 3rd party data, and infrastructure needs for the data management.
Lead and contributed knowledge and valuable information to team which can help them to create various advanced analytics use cases POCs targeting the complete value chain.
I used to develop customized visualization solutions for business team to present the model results.
Work product includes: propensity model to know customer call pattern and the purpose of the call. Predictive and prescriptive models, sentiment analysis, natural language processing.

Technical Requirements: Hortonworks Hadoop, Spark MLLib, Python libraries as Numpy, ScikitLearn, PandasTools: Maven, Git Repository, Eclipse, Scala IDE, Anaconda, Jupyter Notebook. Pycharm

Data Scientist

Confidential, Irving, TX

Responsibilities:

Developed, applied, and managed advanced analytics, to find insights from the data with the analytics tools integrated into a newly developed cloud-based platform to drive value for the insurance industry.
Built recommendation system using machine learning libraries from python, R, spark MLLib.
Provide guidance and recommendations for machine learning techniques, software tools, 3rd party data, and infrastructure needs for the data management and analytics in cloud platform.
Lead my team to create various advanced analytics use cases targeting the complete value chain of client operations and services.
Mentor junior team members in artificial intelligence/machine learning techniques.
Develop customized POCs for sales team presentations for financial institutions and insurance companies (life insurance and annuity, and property and casualty)
Work product includes: recommender system, predictive and prescriptive models, clustering/segmentation, forecasting, prospect qualification and targeting, fraud, churn, segmentation, sentiment analysis, natural language processing, voice to text conversion, social media analytics.

Technical Requirements: Hortonworks Hadoop, Spark MLLib, Spark SQL, PySpark, Python libraries as Numpy, ScikitLearn, Pandas, R Studio

Tools: Maven, Git Repository, Eclipse, Anaconda, Jupyter Notebook.

Data Scientist

Confidential

Responsibilities:

Built machine learning models to predict customer's propensity to buy, churn.
Build clusters of the customers by segmentation on behavioral, attitudinal, demographic, psychographic, geographic.
Built recommender system to influence customer to buy the products of the retail stores near by.
This recommender System was developed in Spark MLlib using ALS algorithm package.
Recommendations sent as push notifications to users mobile phones in context of their location.
Recommender system is implemented on Spark Yarn cluster.
Each iBeacon will be configured with unique UID, frequency, range information.
It has a mobile app, this will collect location, nearby store information.
This mobile app can send info about the shop that is nearby using mobile device GPS system.
Through iBeacon sensor infrastructure (IOT) app will send more precise location of the customer to server.

Technical Requirements: HortonWorks, Spark MLLib, Python libraries as Numpy, ScikitLearn, Pandas, R Studio

Tools: Maven, Git Repository, Eclipse, Anaconda, Jupyter Notebook.

Data Scientist

Confidential

Responsibilities:

I have build text analytics system which can collect user reviews, comments, feedback from the customers through various eChannels such as internet banking, retail banking, mobile banking, corporate banking.
I have constructed models (churn, propensity to buy, segmentation, cross-sell/up-sell ) and perform advanced analytics using machine learning, natural language processing techniques as per business specifications.
Generate summary reports of customer reviews, will be accessed by business users to address customers demands for various services.
This system is build using NLTK python natural language library for text processing.
Data collected using ETL tools, stored in HDFS, processed using Hive.
Create a product recommender system to identify optimal products and product groupings for customers to offer better services.
Review and optimize customized algorithms that identify the better services in competitive and noncompetitive business environments.
Built world-class advanced analytics platform and capabilities for enterprise and small business marketing
Identify and assemble talented junior and senior analytics resources for Advanced Analytics team
Lead and mentor the Analytics and Data Management teams
Provide vision and strategy for sustainable corporate growth with data science
Advise senior management on optimal tactics for deploying prescriptive solutions derived from analytics insights

Technical Requirements: HortonWorks, Spark MLLib, Python NLTK, ScikitLearn, Pandas, RStudio.

Tools: Maven, Git Repository, Eclipse, Anaconda,Jupyter Notebook.

Software Engineer

Confidential

Responsibilities:

The feedbacks, reviews are all collected to server and where we process to get insights from the given text using Hortonworks Hadoop data tools such as SQOOP, FLUME.
Using NLTK - natural language processing Python library we have developed this system to classify the customers feedback.
Text classification and text summarizations are all developed using NLTK itself.
Feature Extraction, Text Classification, summary generation are all done with NLTK.
Since spark is not stable until end of 2014 we have to stay with Hortonworks Hadoop Map reduce for large scale text data processing.
We will analyse what are the major functionalities used by the customers by gathering information of user time spent information on a feature screen.
Using Flurry Analytics we can get information from mobile device where the user spending most of the time. There are still other alternative measures to decide the user usage behaviour of the app.
We will generate consolidated reports to be accessed by business users for further enhancement to our banking system.

Technical Requirements: HortonWorks, Spark MLLib, Python NLTK, ScikitLearn, Pandas, RStudio.

Tools: Maven, Git Repository, Eclipse, Anaconda, Jupyter Notebook.

Software Engineer

Confidential

Responsibilities:

User can sign any pdf document in his iPad with different colors and different styles and he can mail it to his clients no more pen and paper.
User can do annotations by adding text and images wherever needed.
User can download the pdf from any server by giving the url of the document even he can upload it back to dropbox or any server he wants to .
User can see the email attachments. User can rotate the pdf document.
It is having more than 60000+ downloads. This is available for iPad.
In this application I have extensively used Quartz 2D, Core Graphics frameworks and Core Foundation classes for developing.
I have used Dropbox API for downloading file from Dropbox and for upload to Dropbox.

Environment: XCode 3 with iOS3 and iOS4.2 and XCode4.2 with iOS5 with PhoneGap

Software Engineer

Confidential

Responsibilities:

HomeBase is a social media application for users who want to use the social networking sites as the advertisement for their products .
In HomeBase application user can login once and post messages, images and videos on the fly . No more login burden.
User can post image, video and status message to Facebook, Facebook Pages, Twitter, Myspace, Tumblr, LinkedIn, Flickr and FourSquare. User can post to multiple accounts of different social networks simultaneously.
User has to login only once and he can use it as many no of times. User can post to all these social networking sites simultaneously. This is for both iPad and iPhone devices.
Special Feature is user can maintain multiple social accounts as many as he wants for multiple Social networks, this feature is available only in this application.

Environment: XCode 3 with iOS3 and iOS4.2 and XCode4.2 with iOS5

We provide IT Staff Augmentation Services!

Data Scientist Resume

Irving, TX

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship