We provide IT Staff Augmentation Services!

Data Scientist/ Machine Learning Engineer Resume

5.00/5 (Submit Your Rating)

Centerville, MA

SUMMARY

  • Having 9+ years of experience in machine learning/data science. Evaluating technology stack for building Analytics solutions on cloud by doing research and finding right strategies, tools for building end to end analytics solutions and help designing technology roadmap for Data Ingestion, Data lakes, Data processing and Visualization.
  • Experience in large datasets of structured and unstructured data, data visualization, data acquisition, predictive modeling, NLP / NLU / NLG / AI / Machine Learning / Deep Learning / Inferential statistics / Apache Spark / Data Validation.
  • Proficient in Machine Learning techniques (LDA, Decision Trees, Linear, Logistics, Random Forest, SVM, Bayesian, XG Boost, K - Nearest Neighbors, Clustering) and Deep Learning techniques (CNNs, RNNs, LSTM, BERT, GPT, T5) and Statistical Modeling in Forecasting/ Predictive Analytics, Segmentation methodologies, Regression based models, Ensembles.
  • Experience in deep learning Convolutional Neural Network based image processing for images.
  • Extensive experience on Data analytics for satisfying Marketing Campaign.
  • Experience with deep learning LSTM and RNN based speech recognition using Tensorflow and PyTorch.
  • Experience in data mining algorithms and approach with good design techniques.
  • Experience in data preprocesing, developing different statistical machine learning model and data mining solutions to various business, generating data visualizations using Python, R, Tableau, Microsoft Power BI, version control with GIT.
  • Strong programming expertise in Python and strong in Database SQL.
  • Worked and extracted data from various database sources like Oracle, SQL Server.
  • Solid coding and engineering skills in Machine Learning.
  • Good knowledge of business process analysis and design, re-engineering, cost control, capacity planning, performance measurement and quality.

TECHNICAL SKILLS

  • Machine Learning
  • Deep Learning
  • Computer Vision
  • Data Science
  • Splunk ML Toolkit
  • AWS Machine Learning
  • Python
  • TensorFlow
  • Pandas
  • Keras
  • Theano
  • PyTorch
  • NumPy
  • SciPy
  • SQL
  • Matplotlib
  • Seaborn
  • AWS
  • R
  • Scikit-learn pandas project engineering
  • Spark
  • Linux
  • FastAPI
  • ML Flow
  • Snowflake

PROFESSIONAL EXPERIENCE

Confidential - Centerville, MA

Data Scientist/ Machine Learning Engineer

Responsibilities:

  • Implemented convolutional and recurrent neural network architecture to analyze spatial-temporal patterns in data.
  • Developed autoregressive integrated moving average filters to model relationships in temporal data.
  • Utilized Python, Spark, R, a broad variety of machine learning methods including classifications, regressions, dimensionality reduction etc.
  • Worked on feature engineering, created dummy variables, removed some of the non-significant variables and selected statistically significant variables.
  • Analyzed large data sets, applied machine learning techniques and developed predictive models, statistical models and developed and enhanced statistical models by leveraging best-in-class modeling techniques.
  • Used a derivative of a clustering technique KNN Distance to identify outliers and to classify unlabeled data.
  • Use of a variety of NLP methods for information extraction, topic modeling, parsing, and relationship extraction.
  • Facilitated the data collection sessions. Analyze and document data processes, scenarios, and information flow.
  • Derived high quality information, significant patterns from textual data source. Used Document Term Frequency and TF-IDF (Term Frequency- Inverse Document Frequency) algorithm in order to find information for topic modelling.
  • Collaborated with data engineers to implement ETL process, write and optimized SQL queries to perform data extraction from cloud
  • Pre-processed the 1.52G paralleled time series system categorical sequence, comprehensively evaluated existing classification techniques and demonstrated the necessity for developing a matrix representation for categorical sequence to take advantage of the image processing techniques. This novel idea improves accuracy from 45% to 95%.
  • Performed data cleaning, features scaling, features engineering using pandas and NumPy packages in python and build models using Scikit-learn.
  • Used R and Python for exploratory data analysis, AWS Redshift, ANOVA test and Hypothesis test to compare and identify the effectiveness of Creative Campaigns.
  • Performed Data Cleaning, features scaling, features engineering using pandas and NumPy packages in Python 3. x/ R.
  • Replacement of missing data and perform a proper EDA, univariate and bi-variate analysis to understand the intrinsic effect/combined effects.
  • Used Python 3.x / R to develop many other machine learning algorithms such as decision tree, linear/logistic regression, multivariate regression, natural learning processing, naive bayes, random forests, gradient boosting, XG boost, K-means, and KNN based on unsupervised/supervised model that help in decision making using Keras, TensorFlow and Scikit-learn.
  • Performed model Validation using test and Validation sets via K- fold cross validation, statistical significance testing.
  • Performed metric evaluation via regression (RMSE, R2, MSEetc), classification (accuracy, precision, recall, concordance, discordance etc), threshold calculations using ROC plot.
  • Used predictive analytics and machine learning algorithms to forecast key metrics in the form of designed dashboards on Tableau.
  • Provided data and analytical support for the company’s highest-priority initiatives.
  • Created impact documents specifying changes introduced as part of the program and lead the business process team.
  • Work with big data consultants to analyze, extract, normalize and label relevant data using Statistical modeling techniques like Support vector machine and neural networks.

Confidential

Data Scientist

Responsibilities:

  • Building Reusable Data ingestion and Data transformation frameworks using Python
  • Used Python to develop many other machine learning algorithms such as decision tree, linear/logistic regression, multivariate regression, natural learning processing, naive bayes, random forests, gradient boosting, XG boost, K-means, and KNN based on unsupervised/supervised model that help in decision making
  • Worked on Natural Language Processing with NLTK module of Python and developed NLP models for sentiment analysis.
  • Experimented and built predictive models including ensemble models using machine learning algorithms such as logistic regression, Random Forests, and KNN to predict customer churn.
  • Responsible for architecting a complex data layer to source the Raw data from variety of different sources and generating a derived data as per the business requirement and feed the data to BI Reporting to data scientist team
  • Developed Hive queries for data sampling and analysis to the analysts.
  • Analysed the sql scripts and designed it by using PySpark SQL for faster performance.
  • Involved in automating the Bigadata jobs in Microsoft HDInsight Platform and managing logs.
  • Responsible for creating the design documents, establish specific solutions, creating the Test Cases.
  • Responsible for closing the defects identified by QA team and responsible for managing the Release process for the modules.

Environment: Spark, Python, MS SQL Server, Shell scripting.

Confidential

Data Scientist

Responsibilities:

  • Collaborated with data engineers and operation team to implement the ETL process, wrote and optimized SQL queries to perform data extraction to fit the analytical requirements.
  • Understanding of Snowflake cloud technology.
  • Day-to-day responsibility includes developing ETL Pipelines in and out of the data warehouse and developing major regulatory and financial reports using advanced SQL queries in snowflake.
  • Performed univariate and multivariate analysis on the data to identify any underlying pattern in the data and associations between the variables.
  • Explored and analyzed the customer-specific features by using Matplotlib in Python and ggplot2 in R.
  • Performed data Imputation using Scikit- learn package in python.
  • Participated in features engineering such as feature generating, PCA, feature normalization, and label encoding with Scikit-learn preprocessing.
  • Used Python (NumPy, SciPy, Pandas, Scikit-learn, Seaborn) and R to develop a variety of models and algorithms for analytic purposes.
  • Worked on Natural Language Processing with NLTK module of Python and developed NLP models for sentiment analysis.
  • Experimented and built predictive models including ensemble models using machine learning algorithms such as logistic regression, Random Forests, and KNN to predict customer churn.
  • Conducted analysis of customer behaviors and discover the value of customers with RMF analysis; applied customer segmentation with clustering algorithms such as K- means clustering, Gaussian mixture model, and hierarchical clustering.
  • Used F- Score, AUC/ROC. Confusion Matrix, Precision, and Recall Evaluation different models Performance.
  • Designed and implemented a recommendation system that leveraged Google Analytics data and machine learning models and utilized collaborative filtering techniques to recommend courses for different customers.
  • Designed rich data visualizations to model data into human-readable form with Tableau and Matplotlib.
  • Worked related to downloading Big Query data into pandas or Spark data frames for advanced ETL capabilities.
  • Experienced in integration of various data sources (SQL Server,PL/SQL,Teradata) into the data staging area.

Environment: Python, R, Tableau, Machine Learning (Logistic regression/Random forests/KNN/ K- Means clustering/ Hierarchical clustering/ Ensemble methods/ Collaborative filtering), Jira, GitHub, Agile/SCRUM

Confidential

Data Analyst

Responsibilities:

  • Analyze, prepare and summarize financial data for various levels of management.
  • Collected, cleansed for modelling and analysis of structured and unstructured data used for major business initiatives.
  • Automated Driver-Partner enrollment with faster background checks to increase productivity.
  • Collect data for analysis and build statistical model to optimize our Driver-Partners routes and schedule, which increased booking times by 35%.
  • Python to manipulate data for data loading and extraction and worked with Python libraries like Matplotlib, Scikit-Learn, Numpy, Seaborn, TensorFlow, Keras and Pandas for data analysis.
  • Performed K-means clustering, Multivariate analysis and Support Vector Machines.
  • Worked on Natural Language Processing with NLTK module for application development for automated customer response.
  • Utilized machine learning algorithms such as linear regression, multivariate regression, Naive Bayes, Random Forests, K-means, & KNN for data analysis.
  • Worked with large data sets, automate data extraction, built monitoring/reporting dashboards and high-value, automated Business Intelligence solutions (data warehousing and visualization)
  • Performed data entry, data auditing, creating data reports & monitoring all data for accuracy
  • Wrote ETL scripts in Python/SQL for extraction and validating the data.
  • Interpreting raw data using a variety of tools (Python, R), algorithms, and statistical/econometric models (including regression techniques, decision trees, etc.) to capture the bigger picture of the business.
  • Effectively used data blending feature intableauand defined best practices forTableaureport development.
  • Created personalized monthly reports for the drivers to maximize their profits efficiently.

Environment: Pyspark, regression, logistic regression, random forest, neural networks, Metadata, NLTK, Git & Json, Python, SQL, PL/SQL, MS Access, MS Excel, XML, Unix

We'd love your feedback!