We provide IT Staff Augmentation Services!

Data Scientist / Machine Learning Engineer Resume

5.00/5 (Submit Your Rating)

Bloomfield -, CT

SUMMARY

  • Data Scientist / Machine Learning Engineer with around 6 years of experience in handling Structured and Unstructured data, writing advanced SQL queries, Data Wrangling, Data Visualization, Data Acquisition, Predictive modeling, Probabilistic Graphical Models, Inferential statistics, Data Validation.
  • Experience in building robust Machine Learning, Deep Learning and Natural Language Processing Models.
  • Expertise in Statistical analysis, Text mining, Supervised learning, Unsupervised Learning, and Reinforcement learning.
  • Excellent understanding of Software Development Life Cycle (SDLC), Agile, Scrum and waterfall.
  • Experience working in AWS environment using S3, Athena, Lambda, AWS Sage maker, AWS Lex, AWS Aurora, Quick Sight, Cloud formation, Cloud Watch, IAM, Glacier, EC2, EMR, Recognition and API Gateway.
  • Experience working with Oracle, SQL Server, MongoDB, Teradata databases.
  • Experience on advanced scripting using Unix Shell Scripting and Python.
  • Experience working on Supervised, Unsupervised techniques such as Regression, Classification, Clustering, Machine Learning (ML), Deep Learning (DL).
  • Experience in writing big data queries using Apache Spark with Python and Scala (Spark RDD and Data frames) to pull data from different sources, perform big data analytics and build Machine Learning models using SparkMLlib.
  • Experience in using Jupyter Notebook.
  • Experience with Python Libraries Pandas, Numpy, Seaborn, Matplotlib, NLTK, Spacy, Scikit - learn, Keras and TensorFlow in developing end to end Analytics and ML, DNN models.
  • Experience in plotting visuals, building dashboards and storytelling using Tableau, AWS Quick Sight, Matplotlib, Seaborn, Plotly and Power BI.
  • Strong mathematical background in Linear algebra, Probability, Statistics, Differentiation and Integration.
  • Strong skills in statistical methodologies such as A/B test, experiment design, hypothesis test, ANOVA, and Time Series Analysis.
  • Hands-on experience on Azure Cloud and ML API process via Flask.
  • Having a deep understanding of state-of-the-art machine learning and deep learning algorithms, techniques and best practices.
  • Proficient in Machine Learning techniques (Decision Trees, Linear/Logistic Regressors, Random Forest, SVM, Bayesian, XGBoost, K-Nearest Neighbors) and Statistical Modeling in Forecasting/ Predictive Analytics, Segmentation methodologies, Regression based models, Hypothesis testing, Factor analysis/ PCA, Ensembles.
  • Strong experience in Text Mining of cleaning and manipulating text and finding sentiment analysis from text mining data.
  • Hands-on experience with Big Data tools like Hive, Pig, Sqoop, Apache Flume and Kafka.
  • Experience in building models with TensorFlow and top-level frameworks such as Keras, Theano and Pytorch.
  • Good knowledge of Hadoop architecture and its components like HDFS, MapReduce, Job Tracker, Task Tracker, Name Node and Data Node.
  • Experience in designing Star schema, Snowflake schema for Data Warehouse, ODS architecture.
  • Skilled in Advanced Regression Modeling, Correlation, Multivariate Analysis, Model Building, Business Intelligence tools and application of Statistical Concepts.
  • Good industry knowledge, analytical & problem-solving skills and ability to work well within a team as well as an individual.

TECHNICAL SKILLS

Programming Languages/Software libraries: Python, R, Java, Scala, SQL, TensorFlow

Supervised and Unsupervised: XGBoost, Light GBM, Artificial Neural Networks, Auto encoders, Convolutional Neural Networks, Recurrent Neural Networks, LSTM, Bi-directional LSTM, ANN, CNN, Multi-Layer perceptron, Linear Regression, Polynomial Regression, Logistic Regression, SVM, Random Forests, Decision Trees, K-NN, Naive Bayes, K-Means, Hierarchical clustering, Association Rule Learning, Reinforcement Learning, Self-organizing mapsDimensionality Reduction TechniquesPrinciple Component Analysis (PCA), Latent Dirichlet Allocation (LDA), Kernel PCA.

Model Evaluation / Engineering: Cross Validation Technique, Activation Functions, Grid Search, Bayesian Optimization and Regularization (Lasso and Ridge Regression), Feature Selection methods, Feature Scaling.

Natural Language Processing (NLP): Text Analytics, Text processing (Tokenization, Lemmatization), Text Classification, Text clustering, Name Entity Recognition (NER), Word Embedding and Word2Vec, POS Tagging, Speech Analytics, Sentimental Analysis.

Python Programming Skills: Keras, Pandas, Numpy, scikit-learn, NLTK, SpaCy, SciPy, PySpark, Plotly, Cufflinks, Seaborn, Theano, matplotlib, Django, Flask, GloVe, Pytorch, Beautiful Soap (bs4), Web Scraping

R Programming Skills: R Shiny, MICE, rpart, CARET, random Forest, Data Preprocessing, Web Scraping, Data Extraction, Dplyr, GGplot2, Statistical Analysis, Predictive Analysis, GGplotly, rvest, Data visualization

Data Visualization: AWS Quick sight, Tableau, MS Power BI, Seaborn, QlikView matplotlib, Plotly, cufflinks, ggplot2, RShiny.

Big Data: Hadoop, Hive, MongoDB, Apache Spark, Scala, Pig, Sqoop

Database Servers: MySQL, Microsoft SQL server, SQLite, Red Shift, PostgreSQL, MongoDB, Teradata

Amazon Web Services: EC2, Lambda, Sage Maker, EMR, S3, Quick Sight, API Gateway, Athena, Lex, Recognition, CI/CD, Code Commit, DynamoDB, transcribe, Cloud Formation, Cloud Watch, Glacier, IAM

Development Environments/ Cloud: AWS, IBM Cloud, Azure

PROFESSIONAL EXPERIENCE

Confidential, Bloomfield - CT

Data Scientist / Machine learning Engineer

Responsibilities:

  • Responsible for clarifying business objectives, data collection, data wrangling, data preprocessing, exploratory data analysis, feature engineering, machine learning modeling, model tuning, deploying models.
  • Worked closely with internal stakeholders such as business teams, product managers, engineering teams, and partners.
  • Implemented Pandas, NumPy, Seaborn, SciPy, Matplotlib, Scikit - learn in python to perform data cleaning, preprocessing techniques like checking the skewness and distributing the data normally by log transformation and Box-Cox, also perform visualization.
  • Built a robust prescriptive model using various Classification models.
  • Implemented various data preprocessing techniques to manipulate the unstructured, structured data and tackled highly imbalanced dataset using undersampling and over-sampling techniques like SMOTE and ADASYN.
  • Used SparkML to leverage the computational power of spark to Machine Learning in improving the performance and optimization of the existing algorithms using SparkContext, Spark-SQL and Spark Data Frames.
  • Wrote PySpark queries to clean impute and manipulate over 100 million records of customer data for EDA and modeling.
  • Various Feature Selection techniques, SHAP plots were implemented to prescribe the ranges to the business to get the suggestions implemented on the shop floor.
  • Built a multi text classifier on the business data glossary to classify more than 3000 attributes using NLTK, Word2Vec to build word embeddings, and then leveraged blazing text algorithms.
  • Developed an Intent based chatbot on AWS Lex and wrote a server-less lambda function to invoke the model endpoint for the deployment on Sage Maker.
  • Involved in performing text analytics with customer feedback in emails and call transcripts, built a text classifier and Sentiment Analyzer using Recurrent Neural Network LSTM and a huge word cloud in python.
  • Used AWS transcribe to obtain call transcripts, perform text processing (cleaning, tokenization, lemmatization)
  • Leverage AWS Sage Maker to build, train, tune and deploy state of art Machine Learning and Deep Learning models.
  • Spun AWS EMR spark cluster to process huge data sets stored in S3 buckets, used spark data frames to perform preprocessing.
  • Worked extensively on AWS services like Sage Maker, Lambda, Lex, EMR, S3, Redshift, Quick Sight etc.
  • Done Configuration and Processing of ETL Streaming data pipelines from Data Lake to Azure Databricks and used Cognitive API Services in Notebooks.
  • Performed K-fold Cross Validation, Log loss function, ROC curves and AUC to evaluate the model performance.
  • Implemented Dimensionality Reduction techniques such as Principal Component Analysis (PCA), t-Stochastics Neighborhood Embedding (t-SNE) to reduce the number of features for visualization.
  • Developed Spark ML Pipelines that drive data for the automation of training and testing the models.
  • Handled importing data from various data sources, performed transformations using Hive, Map Reduce, and loaded data into HDFS.
  • Created Data Quality Scripts using SQL and Hive to validate successful data load and quality of the data.
  • Built interactive dashboards using AWS Quick Sight to develop charts and graphs, auto narratives and ML Insights on the dashboards to tell stories to the management.
  • Performed model tuning by finding the best parameters using GRID search and Bayesian Optimization.

Confidential, Franklin, TN

Data Scientist / Machine learning Engineer

Responsibilities:

  • Performed data manipulation, data preparation, normalization, and predictive modeling. Improve efficiency and accuracy by evaluating models in Python and R.
  • Analyzed and solved business problems and found patterns and insights within structured and unstructured data.
  • Implemented big data processing applications to collect, clean and normalize large volumes of open data using Hadoop ecosystems such as PIG, Hive, and HBase.
  • Built Customer Lifetime Value prediction model using XGBoost gradient boosting algorithm on customer attributes like customer demographics, tenure, age, revenue, retirement plans etc.
  • Predicted the probability of customer loan default by building a robust Artificial Neural Network classifier in the same ML - pipeline of LTV which helped to detect churn in advance.
  • Performed Spatial Analysis, Spatial clustering using python and visualized the finding in geographical clusters and heat maps using Plotly and Bokeh.
  • Explored and analyzed the customer specific features by using Matplotlib, Seaborn, GGplot in Python and built dashboards in Tableau.
  • Conducted Data blending, Data preparation using SQL for tableau consumption and publishing data sources to Tableau server.
  • Performed data wrangling, data imputation and EDA using pandas, Numpy, Sklearn and Matplotlib in Python.
  • Implemented classification algorithms such as Logistic Regression, K-NN neighbors and Random Forests to predict the Customer churn.
  • Experimented and built predictive models including ensemble methods such as Gradient boosting trees and Neural Network by Keras to predict insurance rates.
  • Used RMSE/MSE to evaluate different models' performance, Correlation matrix to ensure that the model has low False Positive Rate.
  • Addressed overfitting and underfitting by tuning the hyper parameter of the machine learning algorithms by using Lasso and Ridge Regularization.
  • Experimented with Ensemble methods to increase the accuracy of the training model with different Bagging and Boosting methods.
  • Determined trends and relationships in data by applying advanced statistical methods like T-test, hypothesis testing, ANOVA, Chi-Square test and Correlation analysis.
  • Coordinated with the data scientist team and BA team to analyze on building a predictive model based on the requirements using various machine learning algorithms.
  • Developed of data collection processes and data management systems maintenance of data integrity.
  • Loaded data from Hadoop and made it available for modeling in Keras.
  • Performed web crawling and web data scraping using Beautiful Soup, Request, Newspaper3k and collected data into panda’s data frames for further analytics.
  • Worked closely with internal stakeholders such as business teams, product managers, engineering teams and partner teams.

Confidential

Data Scientist

Responsibilities:

  • Performed Data analysis, Data cleaning, Data transformations and Data modeling in R and Python.
  • Experimented with various predictive models including Logistic Regression, Support Vector Machine (SVM), Random Forest, XGBoost, Decision trees to check the model performances and accuracies.
  • Analyzed and solved business problems and found patterns and insights within structured and unstructured data.
  • Designed logical and physical data models for multiple OLTP and Analytic applications.
  • Involved in analysis of business requirements and keeping track of data available from various data sources, transform and load the data into Target Tables using Informatica Power Center.
  • Worked on outlier's identification with box - plot, K-means clustering using Pandas, NumPy, matplotlib, seaborn.
  • Generated the reports and visualizations based on the insights mainly using Tableau and developed dashboards
  • Built a text classifier on the data glossary using TF-IDF to construct a feature space, implanted Naive-Bayes algorithm and deployed it using REST API and Flask.
  • Extensively used Star Schema methodologies in building and designing the logical data model into Dimensional Models.
  • Used Apache Spark in handling huge sets of data and built Machine learning models using sparkML libraries.
  • Performed Data pulls to get the from AWS S3 buckets.
  • Built robust Machine Learning models using bagging and boosting methods.
  • Created stored procedures using PL/SQL and tuned the databases and backend process.
  • Involved with Data Analysis Primarily Identifying Data Sets, Source Data, Source Metadata, Data Definitions and Data Formats.
  • Done Performance tuning of the database, which includes indexes, and optimizing SQL statements, monitoring the server.
  • Developed Informatica mappings, sessions, workflows and have written Pl SQL codes for effective and optimized data flow coding.
  • Wrote SQL Queries, Dynamic-queries, sub-queries and complex joins for generating Complex Stored Procedures, Triggers, User-defined Functions, Views and Cursors.
  • Wrote simple and advanced SQL queries and scripts to create standard and ad hoc reports for senior managers.
  • Used Expert level understanding of different databases in combinations for Data extraction and loading, joining data extracted from different databases and loading to a specific database.

Confidential

Data Analyst

Responsibilities:

  • Participated in requirements meetings and data mapping sessions to understand business needs
  • Wrote several efficient queries that can deliver accurate results
  • Completed various data collection and data mining activities from primary and secondary data sources.
  • Designed and implemented customized linear regression model to predict the sales utilizing diverse sources of data to predict demand, risk and price elasticity.
  • Processed collected data using Python Pandas and Numpy packages for statistical analysis.
  • Resolved compliance issues and gaps, using SQL, analyzing both contractual language and XML coding.
  • Identified and documented detailed business rules and use cases based on requirements analysis
  • Built the infrastructure required for optimal extraction, transformation, and loading of data from a wide variety of data sources using SQL and AWS big data technologies.
  • Wrote, executed, performance tuned SQL Queries for Data Analysis & Profiling and wrote complex SQL queries using joins, sub queries and correlated sub queries.
  • Built analytics tools that utilize the data pipeline to provide actionable insights into customer acquisition, operational efficiency and other key business performance metrics
  • Assembled large, complex data sets that meet functional / non - functional business requirements. Identified and recommended the most appropriate paradigms and technology choices for batch and real time ML scenarios
  • Worked on automated data pipelines. Lead the architecture, design, and development of components and services to enable Machine Learning at scale utilizing distributed systems
  • Involved in Create and maintain optimal data pipeline architecture. Transformed project data requirements into project data models.
  • Constructed efficient data infrastructures that are easy to maintain and can be used effectively. Seamlessly spotting and resolving any issue within the infrastructure
  • Dedicated time in keeping the code clean and organized while having the proper documentation in place by logging all activity while placing alarms. Documented every step-in detail throughout all the phases.

We'd love your feedback!