Machine Learning Engineer, Data Engineer Resume
Sunnyvale, CA
SUMMARY
- Machine Learning, Deep Learning, Reinforcement learning, Artificial Intelligence, Data science and Data Engineering experience, with 12 plus years in developing Machine Learning, Deep learning, Computer vision,Natural language processing, and Data Engineering solutions across various business functions
- Strong mathematical and statistical knowledge and hands on experience in traditional Machine Learning and advanced deep learning algorithms like K - Nearest Neighbors, Logistic Regression, Linear regression, Naïve Bayes, Support Vector Machines, Decision Trees, Random Forests, Gradient Boosted Decision Trees. Neural network, CNN, RNN, Autoencoders, Deep generative models, GAN, computer vision algorithms, Deep reinforcement learning, dependable and explainable AI solutions
- Experience with data visualization using tools like GGplot, Matplotlib, Seaborn, Tableau, and using Tableau software to publish and presenting dashboards, storyline on web and desktop platforms.
- Developed Natural Language Processing, machine translation, language detection, classification with different aspects of dealing NLP like Phonology, Morphology, document classification, Named Entity Recognition (NER), topic modelling, document summarization, computational linguistics, advanced and semantic information search, extraction, induction, classification and exploration.
- Worked on general-purpose architectures (BERT, GPT-2, RoBERTa, XLM, DistilBert, XLNeT,etc.,) provided byTransformers (pytorch-transformers/pytorch-pretrained-bert) for Natural Language Understanding (NLU) and Natural Language Generation (NLG).
- Expertise in advanced statistics, mathematical analysis and statistical modeling, Time series analysis and forecasting, Optimization and simulation, Communication, story-telling and visualization, EDA, Data preparation and transformation
- Extensive experience in Hadoop Ecosystem (HDFS, MapReduce, Spark, Hive, Hbase, Sqoop, Flume, Zookeeper, YARN, Airflow, Oozie etc.)
- Extensive experience in Big Data batch & streaming processing using Apache Kafka, Spark & Apache NiFi.
- Extensive experience Enterprise search using Elasticsearch, Apache solr.
- Extensive experience in Cloud computing using Microsoft Azure (ADLS, HDInsight, AKS, Log Analytics etc.), GCP( Google cloud platform) and AWS components.
- Extensive experience in team handling and mentoring to junior engineers.
TECHNICAL SKILLS
Programming Languages: Python, Scala, Java, Statistics/Machine Learning
Exploratory Data Analysis: Univariate/MultivariateOutlier detection, Missing value imputation, Histograms/Density estimation, EDA in Tableau
Supervised Learning: Linear/Logistic Regression, Lasso, Ridge, Elastic Nets, Decision Trees, Ensemble Methods, Random Forests, Support Vector Machines, Gradient Boosting, XGB, Deep Neural Networks, Bayesian Learning, Heuristics, Neural Nets, Markov Decision Process
Unsupervised Learning: Principal Component Analysis, Association Rules, Factor Analysis, K-Means, Hierarchical Clustering, Gaussian Mixture Models, Market Basket Analysis, Collaborative Filtering and Low Rank Matrix Factorization
Reinforcement learning: DRL, Active and passive RL, Deep Q learning, gated Recurrent Unit (GRU).
Feature Engineering: Stepwise, Recursive Feature Elimination, Relative Importance, Filter Methods, Wrapper Methods and Embedded Methods
Statistical Tests: T Test, Chi-Square tests, Stationarity tests,Auto Correlation tests, Normality tests, Residual diagnostics, Partial dependence plots and Anova
Sampling Methods: Bootstrap sampling methods and Stratified sampling
Model Tuning/Selection: Cross Validation, AUC, Precision/Recall, Walk Forward Estimation, AIC/BIC Criterions, Grid Search and Regularization
Time Series: ARIMA, SARIMAX, Holt winters, Exponential smoothing, Bayesian structural time series
Deep learning: CNN, RNN, Model compression, Auto encoder, Neural search, GAN, DBN and many more, Machine Learning / Deep Learning/ AI/ Natural Language Processing
Python:pandas, numpy, scikit-learn, scipy, statsmodels, matplotlib,PySpark
Spark: MLlib, GraphX
SQL: Subqueries, joins, DDL/DML statements
Deep Learning: CNN, RNN, Model compression, Auto encoder, Neural search, GAN, DBN, computer vision using Tensorflow, PyTorch(Huggingface), Keras,Tesseract, PyOCR, OpenCV, YOLO
Deep Learning Graph-compilers - Glow, XLA, or TVM
NLP: Word embedding (word2vec, doc2vec), topic classification, sentiment analysis, also Image/Video analytics
Multi-Purpose NLP Models: ULMFiT, Transformer, Google’s BERT, Transformer-XL, OpenAI’s GPT-2
Word Embeddings: ELMo, Flair
Other Pretrained Models: StanfordNLP, Big Data technologies, Hadoop Ecosystem (HDFS, MapReduce, Sqoop, Flume, Hive, Hbase, Zookeeper, Oozie many more), Apache Kafka, Apache Spark, Apache Nifi, Apache Airflow, Search technologies, Elastic search, Apache solr, Visualizations, Tableau, Power BI, Cloud environments, MS Azure, AWS, GCP, DevOps, Jenkins, Docker, Kubernetes, git, maven, JIRA, Prometheus, Grafana.
Web technologies/Frameworks: Spring, SpringBoot, Web Services, Hibernate, Flask
Databases/Datastore: Oracle, MongoDB, Cassandra, MySQL, CosmosDB etc.
PROFESSIONAL EXPERIENCE
Confidential, Sunnyvale, CA
Machine Learning Engineer, Data Engineer
Responsibilities:
- Developed Data and Machine learning platform as central platform for different data and models needs of different departments within Confidential .
- Teams onboarded in the platform either can use readymade data pipeline and machine learning pipeline, customize as per use cases or create fresh one
- Developed a personalized recommender system using recommender algorithms (collaborative filtering, low rank matrix factorization) that recommended best service to a user based on similar user profiles.
- The recommendations enabled users to engage better and helped improving the overall user retention rates at Confidential
- Forecasted sales and improved accuracy by 10-20% by implementing advanced forecasting algorithms that were effective in detecting seasonality and trends in the patterns in addition to incorporating exogenous covariates. Increased accuracy helped business plan better with respect to budgeting and sales and operations planning
- Analyzed the complex datasets and created models to interpret and predict trends or patterns in the data using Time series analysis, Forecasting, Regression analysis.
- Customer segmentation based on their behavior or specific characteristics like age, region, income, geographical location and applying Clustering algorithms to group the customers based on their similar behavior patterns.
- The results from the segmentation helps to learn the Customer Lifetime Value of every segment and discover high value and low value segments and to improve the customer service to retain the customers.
- Used Principal Component Analysis and t-SNE in feature engineering to analyze high dimensional data .
Confidential, Pleasanton, CA
Data Engineer, Machine Learning Engineer
Responsibilities:
- Developed central Data and Machine learning platform for all data needs for different data and models needs of different departments within KP.
- Teams onboarded in the platform either can use readymade data pipeline and machine learning pipeline, customize as per use or create fresh model
- Developed ML models to predict high impact services usages/Insight driven prediction on time services basis for KP services
- Classification of high impact areas for KP services on the basis event generated directly from different KP hospitals.
- Analysed and implemented few research proofs of concept models for Image classifications
- Developed 11 customer segments using unsupervised learning techniques like KMeans.
- The clusters helped business simplify complex patterns to manageable set of 11 patterns that helped set strategic and tactical objectives pertaining to customer retention, acquisition and spend
Confidential, WI
Machine Learning Engineer, Solution Architect
Responsibilities:
- Developed a personalized recommender system using recommender algorithms (collaborative filtering, low rank matrix factorization) that recommended best candidates and jobs for recruiters and candidates respectively
- Developed NLP based search service like conceptual search, contextual search, Image based search
- Total 25 types of text and Image based search service developed for the platform
- Developed Insight and sentiment analysis of passive candidates from the profiles captured from different social sites ( LinkedIn, Twitter and many more) and predict candidates that can move to active candidates pool.
Confidential
Java/Scala/Spark developer
Responsibilities:
- Developed different data pipelines for seller and subscription portal
- Developed Search solution in Apache solr and Elastic search for seller and subscription portal services
Confidential
Java Developer
Responsibilities:
- Developed high quality Java/j2ee/webservices code and unit test cases
- Developed design and architecture documents .