We provide IT Staff Augmentation Services!

Machine Learning Engineer/data Engineer Resume

3.00/5 (Submit Your Rating)

Richmond, VirginiA

SUMMARY

  • Over 6+ years of experience in Machine Learning, Datamining with large datasets of Structured and Unstructured data, Data Acquisition, Data Validation, Predictive modeling, Data Visualization.
  • Microsoft Azure Machine Learning Certified for big Data Analytics.
  • Proficient in Statistical Modeling and Machine Learning techniques (Linear, Logistics, Decision Trees, Random Forest, SVM, K - Nearest Neighbors, Bayesian, XG Boost) in Forecasting/ Predictive Analytics,
  • Intensive experience using Machine Learning Algorithms Supervised, Unsupervised, Reinforcement Learning, Recommendation Engine, Active learning, Deep Learning and Artificial Intelligent techniques to solve the business problem
  • Developed predictive models using Decision Tree, Random Forest, Naïve Bayes, Logistic Regression, Cluster Analysis, and Neural Networks.
  • Experience with NLP for sentiment analysis, topic modeling, voice recognition, language understanding, language generation, and entity extraction.
  • Experience with mathematical and statistical Python Libraries such as numpy, pandas, skit-learn, Numpy, SciPy, Matplotlib, Seaborn, Beautiful Soup, Rpy2, NLTK, TensorFlow, Keras, Theano, R packages CaTools, Caret, rpart, dplyr, rjson, Rweka, tidytext, tm (text mining), stringr, snowball, pylr, RCurl, gmodels, C50, reshape2, twitter r.
  • Employing various SDLC methodologies such as Waterfall, Agile, Kanban and SCRUM methodologies.
  • Expert in the entire Data Science process life cycle including Data Acquisition, Data Preparation, Data Manipulation, Feature Engineering, Machine Learning Algorithms, Validation and Visualization.
  • Hands - on experience in Machine Learning algorithms such as Linear Regression, GLM, CART, SVM, KNN, LDA/QDA, Naive Bayes, Random Forest, SVM, Boosting.
  • Proficient in Python and its libraries such as NumPy, Pandas, Scikit-learn, Matplotlib and Seaborn.
  • Experience in working with Hadoop Big Data tools such as HDFS, Hive, Pig Latin and Spark.
  • Deep knowledge of SQL languages for writing Queries, Stored Procedures, User-Defined Functions, Views, Triggers, Indexes and etc.
  • Designed, trained, validated end to end ML models & pipelines for image, video, timeseries, seq2seq using TensorFlow, Keras, Pytorch to setup deep CNN.
  • Maintenance and monitoring of Docker in a cloud-based service during production and Set up a system for dynamically adding and removing web services from a server using Docker.
  • Worked on integration of diverse mathematical and statistical procedures, pattern recognition, model building, creating various scientific and industrial packages within R.
  • Knowledge and experience in agile environments such as Scrum and using project management tools like Jira/Confluence, TFS and version control tools such as GitHub/Git.
  • Collaborated with data engineers to implement ETL process, wrote and optimized SQL queries to perform data extraction from Cloud and merging from Oracle.
  • Hands-on experience in importing and exporting data using Relational Database including Oracle, MySQL and MS SQL Server, and NoSQL database like MongoDB.
  • Experience in implementing data analysis with various analytic tools, such as Jupyter, Notebook, Spyder, Reshape, ggplot2, Dlpr, Car, Mass, SAS, Matlab and Excel.
  • Good team player and quick learner; highly self-motivated person with good communication and interpersonal skills.

TECHNICAL SKILLS

Programming Languages: Python, SQL, R, Java

Scripting Languages: XML, Python, Bash.

Apps Development: Swift, C#,C++

Natural Languages Processing: NLTK, Standford CoreNLP(tokenization, part of speech, parsing, and name tag entity recognition), TextBlob, Gensim, Apache Spark, Scikit-learn

Data Sources: HDFS, Teradata, Metadata,SQL Server, Excel

Data Visualization: PowerBI, TABLEAU, Matplotlib, Plotly, Seaborn, ggplot, Matlab, Penthau

Predictive and Machine Learning: Regression (Linear, Logistic, Bayesian, Polynomial, Ridge, Lasso)Classification (Logistic Reg., two/multiclass classification, Boosted Decision Tree, Random Forest, Decision Tree, Naïve Bayes, Support Vector Machines, k-Nearest Neighbors, Neural Network, and various other models), Clustering (K-means, Hierarchical ), Anomaly Detection, LSTM, RNN.

Big Data Ecosystems: Hadoop, MapReduce, HDFS, HBase, Hive, Spark

Database Systems: Oracle, MySQL, Mongo DB, DB2, Teradata, Metadata

Operating System: Linux, Windows, Unix, MacOS

Methodologies: Agile, Scrum, Waterfall

Version control: Git, SVN (subversion)

Other Skills: Data Analysis, Data transformation and visualization, Machine learning algorithms, ETL, Dashboard, Flask Apps, Rest API

PROFESSIONAL EXPERIENCE

Machine Learning Engineer/Data Engineer

Confidential, Richmond, Virginia

Responsibilities:

  • Developed predictive causal model using annual failure rate and standard cost basis for the new bundled service offering
  • Design and develop analytics, machine learning models, and visualizations that drive performance and provide insights, from prototyping to production deployment and product recommendation and allocation planning
  • Under supervision of Sr. Data Scientist performed Data Transformation method for Re scaling and Normalizing Variables
  • Designed and developed an Active Directory integration in Python for the Birst BI tool, using the tool's APIs. Allowed for automatic provisioning of users based on AD groups.
  • Helped architect solutions to improve server stability and reliability
  • Spark/Scala data pipelines on AWS
  • Created playbook and disaster recovery documentation for our apps
  • Perform data extraction using various queries/formulas, manipulating using various analytical tools to create reports.
  • Performed data pre-processing and cleaning to prepare data sets for further statistical analysis.
  • Performed ETL operations on raw data from various sources on mainframe platform and load into Teradata platform.

Data Scientist /Informatics Analytics

Confidential, Dayton, Ohio

Responsibilities:

  • Applied Supervised Machine Learning Algorithms Logistic Regression, Decision Tree, and Random Forest for the predictive modeling various types of problems: Successful Transition from Skilled Nursing Facility, identify predictors for Medicare Advantage members, lower the cost of mitigating homelessness, issues management.
  • Developed NLP models for Topic Extraction, Sentiment Analysis for the MA Disenrollment root cause Analysis. Work with NLTK library to NLP data processing and finding the patterns.
  • Implemented Topic Modeling (LDA: Latent Dirichelet Allocation) for the theme understanding what the member is talking about to find the issues on services.
  • Implemented classification Models including Random Forest and Logistic Regression to quantify the likelihood of each member enrollment for the upcoming enrollment period.
  • Participated in all phases of data mining, data collection, data cleaning, developing models, validation, and visualization to deliver data science solutions.
  • Used Linear Regression for the member cost for the upcoming enrollment period.
  • Application of various machine learning algorithms and statistical Modeling like decision trees, text analytics, Natural Language Processing (NLP), supervised and unsupervised, Regression models, social network analysis, neural networks, deep learning, SVM, clustering to identify Volume using Scikit-learn package in python.
  • Used analytical programming languages such as R, Python.
  • Used different IDEs Jupyter Notebook, Visual Studio code, R studio.
  • Visualization of data using MS PowerBI, ggplot, Seaborn, matplotlib and plotly.
  • Deployed the model using Flask, Microsoft Azure and Rest API for the web application.
  • Extracting the source data from Oracle tables, SQL Server, sequential files and excel sheets.
  • Developing and maintaining Data Dictionary to create metadata reports for technical.
  • Responsible in developing system models, prediction algorithms, solutions to prescriptive analytics problems, data mining techniques, and/ or econometric model.
  • Communicate the results with operations team for taking best decisions and Collect data needs and requirements by Interacting with the other departments.
  • Demonstrated and build statistical / machine learning systems to solve large-scale customer-focused problems and leveraging statistical methods and applying them to real-world business problems
  • Perform Data Profiling to learn about behavior with various features of turnover before the hiring decision, when one has no on-the-job behavioral data.
  • Performed preliminary data analysis using descriptive statistics and handled anomalies such as removing duplicates and imputing missing values.
  • Conducted a hybrid of Hierarchical and K-means Cluster Analysis using IBM SPSS and identified meaningful segments of through a discovery approach.
  • Built Artificial Neural Network using Tensor Flow in Python to identify the customer's probability of canceling the connections. (Churn rate prediction)
  • Understanding the business problems and analyzing the data by using appropriate Statistical models to generate insights.
  • Identify and assess available machine learning and statistical analysis libraries (including regressors, classifiers, statistical tests, and clustering algorithms).
  • Categorize comments into positive and negative clusters from different social networking sites using Sentiment Analysis and Text Analytics.
  • Ensure that the model has low False Positive Rate and Text classification and sentiment analysis for unstructured and semi-structured data.

Data Scientist

Confidential, Phoenix, Arizona

Responsibilities:

  • Applied Regression, Classification, Clustering algorithms, Trend Analysis, Forecasting and NLP for product review, fraud detection, customer support, loan application/ credit application Status finding, default finding, pattern finding of customer spend, invest or making financial decision, forecasting, trend Analysis, product recommendation, loan offering.
  • Used classification techniques including Random Forest and Logistic Regression to quantify the likelihood of each user referring.
  • Utilize machine learning algorithms such as logistic regression, multivariate regression, K-means, & Recommendation algorithms to extract the hidden information from the data.
  • Used Pandas, NumPy, Scikit-learn in Python for developing various machine learning models and utilized algorithms such as Linear regression, Logistic regression, Gradient Boosting, SVM and KNN.
  • Worked on data cleaning and ensured data quality, consistency, integrity using Pandas, NumPy.
  • Explored and analyzed the customer specific features by using Matplotlib, Seaborn in Python and dashboards in Tableau.
  • Collaborated with data engineers and operation team to implement ETL process, wrote and optimized SQL queries to perform data extraction to fit the analytical requirements.
  • Participated in features engineering such as feature generating, PCA, feature normalization and label encoding with Scikit-learn preprocessing.
  • Experimented and built predictive models including ensemble methods such as Gradient boosting trees and Neural Network by Keras to predict Sales amount.
  • Conducted analysis and patterns on customers' shopping habits in different location, different categories and different months by using time series modeling techniques.
  • Designed rich data visualizations to model data into human-readable form with Tableau and Matplotlib.
  • Used Tensor-Flow library in dual GPU environment for training and testing of the Neural Networks
  • Create and design reports that will use gathered metrics to infer and draw logical conclusions from past and future behavior.
  • Worked on Deep learning for image recognition using TensorFlow/Keras
  • Using NLP developed deep learning algorithms for analyzing text, over their existing dictionary-based approaches.
  • Perform Data Cleaning, features scaling, features engineering using python and R.
  • Create Data Quality Scripts using SQL and Hive to validate successful data load and quality of the data.
  • Create various types of data visualizations using Python and Tableau.
  • Collected data using Hadoop tools to retrieve the data required for building models such as Hive and Pig
  • Generate reports to meet regulatory requirements.
  • Extensively involved in tuning of SQL to optimize the performance.

Data Scientist

Confidential, Southfield, MI

Responsibilities:

  • Build model using Linear regression, Logistic regression, Decision Tree, k-means clustering, SVM and Random Forest intelligence decision models to analyze the customer response behaviors, interaction pattern, and Sales prediction and forecasting.
  • Performed data extraction, manipulation, cleaning, analysis, modeling and data mining.
  • Acquired and cleaned data using R and python extracting from various sources including internal and external databases.
  • Compiled and Analyzed sales and marketing reports using SQL and statistical method.
  • Applied Probabilistic Graphical Methods (Bayesian and Gaussian network) to create Machine learning models.
  • Built Machine Learning classification models using supervised algorithm like Boosted Decision, Logistic Regression, SVM, and Random Forest, NaïveBayes, KNN.
  • Participated in all phases of data mining, data cleaning, data collection, developing models, validation, visualization, and performed Gap analysis.
  • A highly immersive Data Science program involving Data Manipulation &Visualization, Web Scraping, Machine Learning, Python programming, SQL, GIT.
  • Designed 20+ dashboards in Tableau for sales managers with instant access to personalized analytics portal, so they can access the key business metrics.
  • Collaborated with data engineers and operation team to implement ETL process, wrote and optimized SQL queries to perform data extraction to fit the analytical requirements.
  • Worked on data cleaning and ensured data quality, consistency, integrity using Pandas, NumPy.
  • Explored and analyzed the customer specific features by using Matplotlib, Seaborn in Python and dashboards in Tableau.
  • Performed data imputation using Scikit-learn package in Python.
  • Participated in features engineering such as feature generating, PCA, feature normalization and label encoding with Scikit-learn preprocessing.
  • Experimented and built predictive models including ensemble methods such as Gradient boosting trees and Neural Network by Keras to predict Sales amount.
  • Conducted analysis and patterns on customers' shopping habits in different location, different categories and different months by using time series modeling techniques.
  • Used RMSE/MSE to evaluate different models' performance.
  • Designed rich data visualizations to model data into human-readable form with Tableau and Matplotlib.
  • Provided Level of Efforts, activity status, identified dependencies impacting deliverables, identified risks and mitigation plans, identified issues and impacts.
  • Designed database solution for applications, including all required database design components and artifacts.
  • Developed and maintained Data Dictionary to create Metadata Reports for technical and business purpose using Erwin report designer.

We'd love your feedback!