Data Scientist Resume
Irving, TX
SUMMARY:
- Accomplished and high performing expert with deep expertise in the application of data science, machine learning, deep learning, advanced analytics, business intelligence, data mining, and statistics in across industries
- Skilled in the conception and execution of strategic plans, and the architecture of associated tactics including performance benchmarking against key operational targets/goals
- Creative problem solver with a unique mix of technical, business, and research proficiency that lends itself to developing key strategies and world - class solutions with significant impact on revenue and ROI.
- Excellent written, verbal, interpersonal communicator and negotiator. Skilled in creating and delivering easily understood PowerPoint presentations of complex concepts to audiences of all levels and sizes.
- Highly committed to a personal and management style embracing continuous learning and improvement. Excellent knowledge of, and experience with a wide range of software applications, computer operating systems.
- True leader and mentor who inspires confidence in employees by empowering them to do their jobs and respecting them for their accomplishments. Foster encouraging environment for creative thinking and innovative, real-world solutions to complex business and technical challenges.
- Accomplished mobile app developer for the iPhone and Android applications.
- Good at software engineering and problem solving with good analytical skills.
AREAS OF EXPERTISE:
- Machine Learning
- Statistics
- Multivariate Analysis
- Data Mining
- Predictive Modeling
- Custom Algorithms
- Social Media Analysis
- Social Network Analysis
- Segmentation
- Clustering
- Probability
- Predicting next possible word
- Spark SQL
- HortonWorks
- Auto Encoder
- Stochastic Gradient Descent
- Sensitivity, Specificity
- PCA
- LDA
- TPR
- FPR
- PPV
- Accuracy
- R
- Python
- Natural Language Processing
- Text Processing
- Business Intelligence
- Data Visualization (Tableau)
- ScikitLearn,Pandas, Numpy
- Speech Synthesis
- ScikitFlow, TensorFlow
- Spark MLLib
- ImageProcessing
- Audio analysis
- Video processing
- Text Classification
- Hive
- Keras
- RNN
- CNN
- DNN
- DBN
- Regularization
- ROC curve
- AUC
- Precision
- Recall
- LSI
- Artificial Intellignce
- Experimental Design Application Development
- Statistical Software (R, Python.)
- Tensor Flow
- Probabilistic Modeling
- Distributed Processing Systems
- Reports Generation
- Deep Learning
- Online Machine Learning
- Batch Machine Learning
- Text Synthesize
- Text Analytics
- TensorFlow
- Theano
- GAN, LSTM, GRU
- Overfitting, under fitting
- Confusion matrix
- Computer Vision
- Bias variance trade off sparkSQL
- Scala
- PySpark
- HDFS
- Unix
- Scala
TECHNICAL CAPABILITIES:
Data scientist, advanced analytics, business intelligence, big data, R, Python, SQL, Knime, Tableau, RapidMiner, SAS, SPSS, HortonWorks, Spark, RServer, NoSQL, artificial intelligence / machine learning (support vector machines, neural networks, random forests, genetic algorithms decision trees, association, clustering, supervised and unsupervised learning, Monte Carlo Simulations, kNN, linear regression, logistic regression, dimensionality reduction (factor analysis, principal component analysis), time-series analysis, collaborative filtering ), Neural Networks( Perceptron, Artificial Neural Networks, Feed Forward, Back Propagation, CNN(Convolutional Neural Networks), RNN(Recurrent Neural Networks), Deep Convolutional Network, DBN (Deep Belief Networks), Hopfield Network, Markov Chains, Auto Encoders, LSTM (Long Short Term Memory), RBM (Restricted Boltzmann Machine), Auto Encoders, Generative Adversial Networks, Markov Chain, classification algorithms, cluster analysis, CART, decision trees, random forests, classification trees, multivariate statistics, experimental design, predictive modeling, regression (logistic, linear, non-linear), sampling techniques, segmentation (behavioral, attitudinal, demographic, psychographic, geographic), data mining, profiling, forecasting, conjoint, correspondence analysis, structural equation modeling, discriminant analysis, multidimensional analysis), marketing mix modeling, Bayesian analytics, hypothesis testing (means difference, proportions difference, ANOVA, Chi square, non-parametric), social network analysis, survey design and analysis, systems modeling, text mining, natural language processing, data visualization, dashboards, customized algorithms, customer lifetime value, Google analytics, market research, network analysis, page tagging, web analytics, social media analytics, warehouse data mart architecture, R, Python, Scala, Java, PyCharm, Anaconda, Spark, PySpark, Spark Mllib, spark ml, SparkSQL, HDFS, Hive, Theano, Keras, TensorFlow.
PROFESSIONAL EXPERIENCE:
Data Scientist
Confidential, Irving, TX
Responsibilities:
- This model is to predict customer propensity to call customer service center for issues related to billing or service etc. It is a binary classification (yes or no) problem.
- We have used the customer billing information for training the model and call history.
- Our model predict how likely a customer will call (propensity model).
- I have developed the base model in python.
- Then we migrate it to Spark distributed environment, by implementing the model in spark Scala mllib.
- Hands on experience in spark, hive, R, Python and PySpark.
- Has done extensive exploratory analysis to study data patterns to find the customer call patterns.
- Implemented Data Cleaning, Feature transformations, Feature selection on the data.
- Designed multiple experiments to find insights from different data sets such as controlled and observed data as well to find the best model from various classification models random forest, Support vector machines and logistic regression to find the best model.
- Built model using logistic regression which can give probability as an output to show the likelihood of customer making a call.
- Built confusion matrix to see the classification metrics with TP, TN, FP, FN values.
- Used Stochastic Gradient Descent optimization to improve the model performance.
- To avoid overfitting, optimized regularization parameters by analyzing bias and variance errors.
- We saved model which can be reused later.
- Objective is to reduce no of calls to customer support by addressing the customers concerns before they call us.
- The model will be trained on the call conversation summary text from customer support center.
- Our model predict how likely a customer will call for a specific reason. Its multi class text classification problem.
- I have developed the multi class text classification model using Support Vector Machines.
- Input for this model is text data which has to be transformed before given to the classification model.
- The representative notes will be cleaned, transformed before fitting to the model.
- We have applied dictionary, stop word remover to the text. Then applied CountVectorizer and TF IDF.
- Designed experiments with logistic regression and Support Vector machines to find the best model.
- We have applied SVM model for classification of this data, this is all done in python as a Proof of concept before migrating to spark cluster as spark Scala model.
- Built application in Scala to apply this model on 1 million representative notes in spark cluster.
- Build metrics with confusion matrix to study the classification model performance.
- Implemented SGD optimization technique to improve the model performance.
- Implemented hyper parameter tuning to find better parameters to enhance model performance.
- Saved model so that we can eliminate training process daily as we run model daily for classification purpose.
- Classification results will be saved back to hive tables for later retrieval.
- I have used designed multiple experiments with various data sets and various machine learning libraries from python, R, spark MLLib.
- I have developed models in Python, R as POCs. I have worked end to end to promote these pocs to distributed systems as we have spark in Hortonworks Hadoop environment which enable the model to learn from large amount of data which is compulsory to enhance and get better results from the model.
- Provided guidance and recommendations about machine learning techniques, software tools, 3rd party data, and infrastructure needs for the data management.
- Lead and contributed knowledge and valuable information to team which can help them to create various advanced analytics use cases POCs targeting the complete value chain.
- I used to develop customized visualization solutions for business team to present the model results.
- Work product includes: propensity model to know customer call pattern and the purpose of the call. Predictive and prescriptive models, sentiment analysis, natural language processing.
Technical Requirements: Hortonworks Hadoop, Spark MLLib, Python libraries as Numpy, ScikitLearn, PandasTools: Maven, Git Repository, Eclipse, Scala IDE, Anaconda, Jupyter Notebook. Pycharm
Data Scientist
Confidential, Irving, TX
Responsibilities:
- Developed, applied, and managed advanced analytics, to find insights from the data with the analytics tools integrated into a newly developed cloud-based platform to drive value for the insurance industry.
- Built recommendation system using machine learning libraries from python, R, spark MLLib.
- Provide guidance and recommendations for machine learning techniques, software tools, 3rd party data, and infrastructure needs for the data management and analytics in cloud platform.
- Lead my team to create various advanced analytics use cases targeting the complete value chain of client operations and services.
- Mentor junior team members in artificial intelligence/machine learning techniques.
- Develop customized POCs for sales team presentations for financial institutions and insurance companies (life insurance and annuity, and property and casualty)
- Work product includes: recommender system, predictive and prescriptive models, clustering/segmentation, forecasting, prospect qualification and targeting, fraud, churn, segmentation, sentiment analysis, natural language processing, voice to text conversion, social media analytics.
Technical Requirements: Hortonworks Hadoop, Spark MLLib, Spark SQL, PySpark, Python libraries as Numpy, ScikitLearn, Pandas, R Studio
Tools: Maven, Git Repository, Eclipse, Anaconda, Jupyter Notebook.
Data Scientist
Confidential
Responsibilities:
- Built machine learning models to predict customer's propensity to buy, churn.
- Build clusters of the customers by segmentation on behavioral, attitudinal, demographic, psychographic, geographic.
- Built recommender system to influence customer to buy the products of the retail stores near by.
- This recommender System was developed in Spark MLlib using ALS algorithm package.
- Recommendations sent as push notifications to users mobile phones in context of their location.
- Recommender system is implemented on Spark Yarn cluster.
- Each iBeacon will be configured with unique UID, frequency, range information.
- It has a mobile app, this will collect location, nearby store information.
- This mobile app can send info about the shop that is nearby using mobile device GPS system.
- Through iBeacon sensor infrastructure (IOT) app will send more precise location of the customer to server.
Technical Requirements: HortonWorks, Spark MLLib, Python libraries as Numpy, ScikitLearn, Pandas, R Studio
Tools: Maven, Git Repository, Eclipse, Anaconda, Jupyter Notebook.
Data Scientist
Confidential
Responsibilities:
- I have build text analytics system which can collect user reviews, comments, feedback from the customers through various eChannels such as internet banking, retail banking, mobile banking, corporate banking.
- I have constructed models (churn, propensity to buy, segmentation, cross-sell/up-sell ) and perform advanced analytics using machine learning, natural language processing techniques as per business specifications.
- Generate summary reports of customer reviews, will be accessed by business users to address customers demands for various services.
- This system is build using NLTK python natural language library for text processing.
- Data collected using ETL tools, stored in HDFS, processed using Hive.
- Create a product recommender system to identify optimal products and product groupings for customers to offer better services.
- Review and optimize customized algorithms that identify the better services in competitive and noncompetitive business environments.
- Built world-class advanced analytics platform and capabilities for enterprise and small business marketing
- Identify and assemble talented junior and senior analytics resources for Advanced Analytics team
- Lead and mentor the Analytics and Data Management teams
- Provide vision and strategy for sustainable corporate growth with data science
- Advise senior management on optimal tactics for deploying prescriptive solutions derived from analytics insights
Technical Requirements: HortonWorks, Spark MLLib, Python NLTK, ScikitLearn, Pandas, RStudio.
Tools: Maven, Git Repository, Eclipse, Anaconda,Jupyter Notebook.
Software EngineerConfidential
Responsibilities:
- The feedbacks, reviews are all collected to server and where we process to get insights from the given text using Hortonworks Hadoop data tools such as SQOOP, FLUME.
- Using NLTK - natural language processing Python library we have developed this system to classify the customers feedback.
- Text classification and text summarizations are all developed using NLTK itself.
- Feature Extraction, Text Classification, summary generation are all done with NLTK.
- Since spark is not stable until end of 2014 we have to stay with Hortonworks Hadoop Map reduce for large scale text data processing.
- We will analyse what are the major functionalities used by the customers by gathering information of user time spent information on a feature screen.
- Using Flurry Analytics we can get information from mobile device where the user spending most of the time. There are still other alternative measures to decide the user usage behaviour of the app.
- We will generate consolidated reports to be accessed by business users for further enhancement to our banking system.
Technical Requirements: HortonWorks, Spark MLLib, Python NLTK, ScikitLearn, Pandas, RStudio.
Tools: Maven, Git Repository, Eclipse, Anaconda, Jupyter Notebook.
Software Engineer
Confidential
Responsibilities:
- User can sign any pdf document in his iPad with different colors and different styles and he can mail it to his clients no more pen and paper.
- User can do annotations by adding text and images wherever needed.
- User can download the pdf from any server by giving the url of the document even he can upload it back to dropbox or any server he wants to .
- User can see the email attachments. User can rotate the pdf document.
- It is having more than 60000+ downloads. This is available for iPad.
- In this application I have extensively used Quartz 2D, Core Graphics frameworks and Core Foundation classes for developing.
- I have used Dropbox API for downloading file from Dropbox and for upload to Dropbox.
Environment: XCode 3 with iOS3 and iOS4.2 and XCode4.2 with iOS5 with PhoneGap
Software Engineer
Confidential
Responsibilities:
- HomeBase is a social media application for users who want to use the social networking sites as the advertisement for their products .
- In HomeBase application user can login once and post messages, images and videos on the fly . No more login burden.
- User can post image, video and status message to Facebook, Facebook Pages, Twitter, Myspace, Tumblr, LinkedIn, Flickr and FourSquare. User can post to multiple accounts of different social networks simultaneously.
- User has to login only once and he can use it as many no of times. User can post to all these social networking sites simultaneously. This is for both iPad and iPhone devices.
- Special Feature is user can maintain multiple social accounts as many as he wants for multiple Social networks, this feature is available only in this application.
Environment: XCode 3 with iOS3 and iOS4.2 and XCode4.2 with iOS5