Sr. Cloud/DevOps Engineer Resume

SUMMARY

Highly passionate professional with 6+ years of experience as a Data Scientist/ Machine Learning Engineer in variety of industries like Retail, Finance, Healthcare, and Manufacturing.
Experienced in developing data - driven solution for challenging business problem by using Python language that can automate the repetitive task.
Experienced in using Pandas, NumPy, SciPy, Matplotlib, Bokeh, Seaborn, Scikit-Learn in Python at various stages of developing Machine Learning model and utilized machine learning algorithms such as Linear Regression, Naive Bayes, Random Forests, Decision Trees, K-means, SVM, and K-Nearest Neighbor.
Proficient in building scalable Machine Learning algorithms using Python by using best software principles in designing and implementing the data-driven solution that can automate the repetitive task and can improve the efficiency and productivity of overall project development.
Expertise in all phases of Data Acquisition, Data Cleaning, Developing Machine Learning models, Validation, and Visualization to deliver data science solutions to complex business problem in Retail, Finance, and Healthcare industry.
Hands-on experience in developing and deploying the Machine Learning models in the Amazon Web Services and Azure for productionizing and scaling the application.
Experienced in handling AWS infrastructure to host the model. Used S3, EC2, Sage Maker, Machine Learning, and RDS services in AWS. Well versed in using S3, RDS for storage and deploying code in Sage Maker.
Hands-on experience with Machine Learning techniques includes Classification, Clustering, Regression, Decision Trees, Random Forests, and Artificial Neural Networks and its application in Retail, Finance, Manufacturing, and Healthcare industries.
Extensive experience in transforming business requirements into analytical models, designing algorithms, building models, developing data mining and reporting solutions that scales across massive volume of structured and unstructured data in AWS environment.
Experienced in generating actionable insights from data by using advanced analytics and data-mining techniques.
Extensive experience in implementing the imbalanced class methods like Random Oversampling, Random Undersampling, and Synthetic Minority Over-sampling Technique (SMOTE) to the imbalanced datasets.
Proficient of advanced analytics concepts and Unsupervised Machine Learning algorithms like K-Means, Hierarchical Clustering (Agglomerative, Divisive) and Recommender Systems.
Good knowledge in implementing various feature engineering techniques to various business problems (One-Hot Encoding, Imputation, Scaling, Log Transformation, and handling Outliers and missing values).
Extensive experience in implementing cross-validation techniques of Machine Learning models (K-Fold Cross Validation, Leave P-Out Cross Validation, Stratified K-Fold Cross Validation, etc.)
Experience in tuning algorithms using methods such as Grid Search, Randomized Search, and K-Fold Cross Validation.
Hands-on experience in implementing machine learning models in R by using caret, dplyr, mlbench, e1071, and random Forest.
Proficient in using data visualization tool like Tableau, Power BI, Matplotlib, Seaborn, ggplot2 for communicating the results after analysis.
Strong communication skills, writing skills and proficient in articulating the actionable insights to various clients and stakeholders of both technical and non-technical background. Extensive experience in collaborating and presenting the results to different teams and leader in a company.
Excellent understanding of conceptual foundations and practical hands-on projects related to Supervised Machine Learning (Linear and Logistic Regression, Decision Tree, Random Forest, Support Vector Machine (SVM), Artificial Neural Networks (ANN)), Unsupervised Machine Learning (Clustering, Dimensionality Reduction (PCA), Recommender Systems), Probability & Statistics, experiment analysis, confidence intervals, A/B testing, Algorithms, and Data Structures.
Strong background in analyzing the data by performing Hive queries (HiveQL), Spark SQL and PySpark. Experience in using Sqoop for importing and exporting data from RDBMS to HDFS and Hive.
Proficient in developing ETL applications on large volumes of data using different tools: MapReduce, Spark-Scala, PySpark, Azure Databricks, and Spark SQL.
Extensive hands-on experience in using distributed computing architectures such as AWS products (e.g. EC2, Redshift, and Elastic search), Hadoop, Python, Spark and effective use of MapReduce, Hive, SQL and PySpark to solve big data type problems.
Demonstrated ability working and adapting to Big Data tools such as - Azure Databricks, HDFS, MapReduce Hive, Sqoop, MLlib, Solr, YARN, Zookeeper, Oozie, Spark, etc.

TECHNICAL SKILLS

Machine Learning Regression Algorithms: Simple Linear Regression, Ridge Regression, Multiple Linear Regression, Lasso Regression, KNN Regression, Random Forest Regressor, Partial Least Square Regression, Principal Component Regression, Support Vector Machine Regressor, Decision Tree Regressor, Extreme Gradient Boosting Regressor, etc.

Machine Learning Classification Algorithms: Logistic Regression, Decision Tree, Random Forest, K-Nearest Neighbor (KNN), Gradient Boosting Classifier, Extreme Gradient Boosting Classifier, Support Vector Machine (SVM), Artificial Neural Networks (ANN), Naïve Bayes Classifier, Extra Trees Classifier, Stochastic Gradient Descent, etc.

Machine Learning Clustering Algorithms: K-Means, EM, Agglomerative, Affinity, Hierarchical, DBSCAN, Spectral Clustering.

Recommendation Engine Algorithms: Apriori, FP Growth, Collaborative Filtering, Item Based Filtering, User Based Filtering, Content Based Filtering.

Ensemble and Stacking: Averaged Ensembles, Weighted Averaging, Base Learning, Meta Learning, Majority Voting, Stacked Ensemble, AutoML - Scikit-Learn, MLjar, etc.

Statistical Methods / Techniques: Null Hypothesis, Sampling, Resampling methods, Hypothesis Testing, Confidence Interval, P-value, Critical value, Confusion Matrix, Z-Test, T-Test, ANOVA, Chi-Square Test, VIF, Correlation, Feature Engineering / Feature Selection techniques, etc.

Programming / Query Languages: Python Programming (Pandas, NumPy, SciPy, Scikit-Learn, Seaborn, Matplotlib, NLTK), SQL, NoSQL, PySpark, PySpark SQL, Jupyter Notebook, R Programming (Caret, Glmnet, XGBoost, rpart, ggplot2, sqldf), RStudio.

Data Engineer/Big Data Tools / Cloud / Visualization / Other Tools: Databricks, Hadoop Distributed File System (HDFS), Sqoop, MapReduce, Flume, YARN, Hortonworks, Cloudera, Mahout, MLlib, Zookeeper, etc. AWS, Azure Databricks, Azure Data Explorer, Salesforce, GCP, Google Shell, Linux, PuTTY, Bash Shell, Unix, etc., Tableau, Power BI, Matplotlib, Seaborn, Bokeh.

PROFESSIONAL EXPERIENCE

Confidential, Boston, MA

Sr. Cloud/DevOps Engineer

Responsibilities:

Collaborated with Apple Inc. and NIEHS to develop Apple Women’s Health Study which aims to gain data-driven understanding of impact of lifestyle habits and menstrual cycle irregularities on women’s health.
Utilized pyspark to perform ETL, Visualizations and machine learning to understand how demographics and lifestyle factors impact menstrual cycles and gynecologic conditions such as infertility, menopause, PCOS.
Managed a team of 2 analysts to build a solution to analyze menstrual cycle tracking data from iPhone and/or Apple Watch, along with participants’ survey responses.
Used Pandas, NumPy, Seaborn, SciPy, Matplotlib, Scikit-Learn in Python for developing various machine learning algorithms and utilized machine learning algorithms such as Linear Regression, Multivariate Regression, Naive Bayes, Support Vector Machine, Random Forests, K-means, and KNN for predicting the menstrual cycle length.
Developed the machine learning model for predicting the churn rate of the participants and various factors that contributing to menstrual cycle length by performing the Exploratory Data Analysis and Statistical test on the participant’s data in Python and AWS.
Performed Univariate Analysis like Box Plot, Violin Plot, Histogram and Multivariate Analysis like Scatterplot, Heatmap, Pairplot to know how data is distributed and how the features are correlated.
Performed Feature Engineering techniques like One-Hot Encoding, Feature Scaling, Normalization on the study data for balancing the bias-variance trade-off.
Successfully implemented Python package named as Boruta for feature ranking, thereby removing features which did not contribute well to model; as a result, got best features for data model.
Implemented oversampling technique like Random Oversampling, Random Undersampling, SMOTE methods in Python on imbalanced data in the data preparation process to ensure data quality for downstream analysis.
Performed Feature Scaling to normalize the data in range using Min-Max Scaler and Normalizer in Python.
Successfully dealt with the bias-variance trade-off by reducing the Misclassification Rate and False Negative Rate after implementing Imbalanced Class techniques.
Measured Machine Learning model performance using Area Under Curve and Receiver Operating Characteristic Curve.
Evaluated various tracking metrics in ML models (Accuracy, Recall, F1-Score, and Precision) and improved overall accuracy from 73% to 96% which helped to classify participant data more accurately and improved quality of study.
Developed Customer Segmentation technique by using Exploratory Data Analysis, Statistical Analysis, RFM Clustering in Python on participant’s activity history by analyzing the customer’s behavior, needs, economic value by grouping customers of similar characteristics.
Created the recommendation engine by using KNN, Apriori, Collaborative Filtering algorithm for targeting the right participant at right time with a targeting plan that has improved the study recruitment by 56%.
Maintained the codebase in GitHub throughout the project development phase.
Collaborated with Junior Data Scientist’s by guiding them how to create reproducible machine learning models on participant’s data in AWS cloud.
Managed team of 2 junior Data Scientists and evaluated their work based on statistical and management criteria and guided them to find solutions to complex problems and provide the knowledge transformation sessions.
Collaborated and Communicated the results of analysis to the decision makers by presenting actionable insights by using visualization charts and dashboards in Tableau.

Confidential, Boston, MA

Sr. Cloud/DevOps Engineer

Responsibilities:

Automated survey analysis by building a tool that performs Key Driver Analysis and Respondent Clustering, using R, Alteryx, and Tableau; reduced time spent on analysis process by 70%.
Reduced call center operations cost by 20% for a pharmaceutical giant by performing High-Frequency Caller Analysis.
Automated categorization of HR job titles using Support Vector Machines, and Natural Language Processing, and created a user-friendly python flask application to improve ease of utilization.
Analyzed culture of business sectors of a healthcare giant by building and extracting survey insights from Tableau dashboards.
Worked closely with marketing team to deliver actionable insights from huge volume of data coming from different survey campaigns and customer interaction forum data by using Python, R, and Spark SQL.
Created SQL, Spark SQL tables and loaded the structured data after performing ETL process and wrote SQL queries to further analyze the product data to identify issues and customer behavioral patterns.
Collaborated with database engineers to implement ETL process, wrote and optimized SQL queries to perform data extraction and merged data from SQL databases (MySQL) and integrated it with Python for downstream analysis.
Performed the Exploratory Data Analysis and Descriptive Data Analysis in Python using Pandas, Matplotlib, Seaborn, SciPy, NumPy, Azure Data Explorer and Stats Models.
Implemented Univariate, Bivariate, and Multivariate Analysis on the cleaned data for getting actionable insights on the 500-product sales data by using visualization techniques in Matplotlib, Seaborn, Bokeh, and created reports in Power BI.
Performed Feature Engineering in Python (Pandas, NumPy, Seaborn, Matplotlib) on the products and customers purchase data for creating meaningful new features for making better prediction results.
Implemented Big Data Analytics and Advanced Data Science techniques to identify trends, patterns, and discrepancies on petabytes of data by using Hadoop, Python, PySpark, Spark SQL, and MapReduce.
Developed models by using distributed framework in Python, Spark (PySpark, Spark SQL, Spark MLlib, and Spark ML) for faster processing and computation power.
Developed potential Machine Learning algorithms such as Logistic Regression, SVM, Decision Tree, Random Forest, and XGBoost to predict Remaining User Life (RUL) of the various products.
Created visualizations to understand the engine degradation simulation dataset and performed data cleaning, data pre-processing to handle missing values, outliers & perform data transformations
Developed complex PySpark scripts to efficiently extract large volumes of smartphone data from AWS S3
Wrote production customized Machine Learning Regression models and Ensembled Regression models using Python and PySpark to predict the activity of the participants of different cohorts.
Performed data manipulation of smartphone data using PySpark and Spark SQL to build Tableau dashboards to analyze enrollment numbers, activity status and amount of data collected across different health studies.
Processed smartphone data features such as accelerometer, gyroscope, GPS information to understand change in activity patterns and isolation practices across different health cohorts during COVID-19 lockdown.
Performed data preprocessing like cleaning (for outlier, missing values analysis, imputation, etc.) and Data Visualization (Scatter Plots, Box Plots, Histograms, etc.) using Matplotlib, Seaborn, Bokeh libraries in Python.
Performed all necessary day-to- day Git, GitHub support for version controlling different projects and design and created of the Git Repositories for managing production code.

Confidential

Sr. Cloud/DevOps Engineer

Responsibilities:

Developed muHPC, a Big Data Machine Learning library consisting of Linear regression, Logistic Regression, K Means, KNN, Random Forests, using Python, Spark SQL.
Received ‘Spot ’ for exceptional contribution in building a Python package to run algorithms in muHPC product.
Created muRx, a data modeling tool built using Python that performs all-round data analysis from data cleaning to data modeling in order to make informed business decisions for 3 healthcare clients.
Performed Exploratory Data Analysis on features to identify multi-collinearity in healthcare data with the help of subject matter experts on various use cases and removed collinear features, which gave us unique records and clean data to downstream analysis.
Created various machine learning algorithms and statistical modeling like Decision Trees, Regression models, Random Forest, Neural Networks, SVM to identify patient's disease using Scikit-Learn package in Python.
Implemented the Dimensionality Reduction (PCA) on the data to convert high dimensional feature space to lower dimension and to reduce the Time and Speed Complexity.
Performed Feature Selection technique to select features that are useful for making predictions and dropped features that causing model Overfitting and Underfitting.
Evaluated the model with various performance metrics like Confusion matrix, Accuracy, Precision, Recall, Sensitivity, and Specificity.
Retrained the machine learning model with optimal parameters by implementing Hyperparameter tuning with Grid Search and Random Search.
Implemented Spark/PySpark, Python, R for potential healthcare project in the Hadoop/Hive environment with Linux/Windows for big data resources. Used machine learning clustering technique K-Means to identify outliers and to classify unlabeled data.
Communicated and coordinated with other departments to collection business requirement. Analyzed and created an action plan for accomplishing the project by translating the business problem to analytical solution.
Created reports and produced rich data visualizations to model data into human-readable form with the Tableau, Power BI, Matplotlib, and Seaborn to show client how prediction can help the business decisions.
Statistical analyses and operations research modeling to optimize lab workflow (e.g., optimal order to process work to maximize delivery success).
Collaborated with healthcare professionals, clinicians, and SME's and communicated various patterns/findings found in healthcare data and how to make a smart decision for improving the product quality and service for company's business success.

Confidential

Sr. Cloud/DevOps Engineer

Responsibilities:

Performed Exploratory Data Analysis and Statistical Analysis for getting insights from the data by using Python libraries Pandas, Matplotlib, Seaborn and visualized the results in Tableau dashboard.
Created an end-to-end data pipeline and ETL process for facilitating the dataflow for analysis by using Python, SQL,and R.
Implemented the entire data science flow including data collection, data cleaning, data exploration analysis, data visualization, modeling, data validation, and data quality.
Created ETL processes using Python, SQL, and Power BI for reporting and analysis and defined data requirements and report layouts for weekly stakeholder review meeting.
Integrated SQL with Python Jupyter Notebook and performed Exploratory Data Analysis and Data Visualization by using Pandas, Seaborn, Azure Data Explorer and Matplotlib.
Created a User Defined Functions in Python to automate the repetitive task to increase the efficiency of data pipeline development.
Assisted teams and provide insights on operations and sales data thorough Ad-Hoc analysis and reporting.
Created and updated SQL tables, database, stored procedures, and queries to modify and/or create reports for respective business units.
Performed Data visualization and Designed dashboards with Tableau, and generated complex reports, including Charts, Summaries, and Graphs to communicate the findings to the team and stakeholders.
Collaborated and communicate the analysis results with the stakeholders and clients with help of powerful data visualization charts and tools (Power BI, Tableau) and PowerPoint presentation.
Provided recommendations through actionable insights from the data for taking key business decision for company revenue growth and expanding market for promotions.

We provide IT Staff Augmentation Services!

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship