We provide IT Staff Augmentation Services!

Data Scientist/machine Learning Engineer Resume

3.00/5 (Submit Your Rating)

SUMMARY:

  • Data Scientist with 7+ years of experience executing data - driven solutions to increase efficiency, accuracy, and utility of internal Data Processing.
  • Extensive experience with “Machine learning” solutions to ratifying Business situations and generating visualization data by using Python.
  • Worked with different tools like Pandas, NumPy, Matplotlib, and Scikit-Learn for python to generate short coding data with Machine Learning Models.
  • Hand’s on working with Naïve Bayes, Random forests, Decision trees, Linear and Logistic Regression. Principle component analysis, SVM, Clustering, Neural Networking, and circulated vision on related systems.
  • Passionate in implementing “Deep Learning Techniques” like Kera’s, Theano.
  • Experience knowledge of Time Series Forecasted sales and demand for loans using time series modeling techniques like Introgressive. Moving average, Holt-winter.
  • Python Packages used for Developing visible visualization to plot results like Seaborn, Matplotlib, gg, plot and Pygal
  • Extracted data and worked with data from different databases for Oracle, SQL Server, DB2, Mongo DB, NoSQL, PostgreSQL, Teradata, and Cassandra.
  • Followed with “Data Science life cycle”, SDLC, Waterfall, and Agile methodologies and used to develop software products.
  • Used Python 3. X (NumPy, SciPy, Pandas) and Spark 2.0 (PySpark, MiLB) to develop a variety of models and algorithms for analytic purposes.
  • Experience with statistical programming languages such as Python and R.
  • For performing data mining, data analysis, and predictive modeling worked with the Java machine learning Library WEKA.
  • Experienced in Cloud automation using AWS Cloud Formation Techniques, the Java script Python.
  • Worked on Automating the provisioning of AWS cloud using cloud formation
  • Experience with container-based deployment using Docker, working with Docker images and Docker Registers.
  • Worked with Optimization tools like CPLEX for computing support vector machines for classification.
  • Actively involved in all phases of data science project life cycle including Data Extraction, Data Cleaning, Data Visualization, and on and building Models.
  • Also used t-SNE (t-Distributed Stochastic Neighbor Embedding), and UMAP (Uniform Manifold Approximation and Projection).
  • Expertise in Machine Learning Unsupervised algorithms such as K-MeaDensity-Basedased Clustering (DBSCAN), Hierarchical Cluster, ring, and good knowledge of Recommender Systems.
  • Extensive hands-on experience and high proficiency with structures, semi-structured and unstructured data, using a broad range of data science programming languages and big data tools including Python, Spark ML lib, SQL, Scikit-Learn, Hadoop.
  • Working with Fluid, IT, and mechanical systems by developing a mathematical model using Linear, Multi-linear and Non-linear regression and fault analysis of a system.
  • Trained data analysts, data engineers, and juniors and leveraged my experience and skills to motivate them and increase communication as well as improve my subject skills.

TECHNICAL SKILLS:

Programming Languages / Software libraries: Python, R, Java, Scala, SQL, TensorFlow

Supervised and Unsupervised: XGBoost, Light GBM, Artificial Neural Networks, Auto encoders, Convolutional Neural Networks, Recurrent Neural Networks, LSTM, Bi-directional LSTM, ANN, CNN, Multi-Layer perceptron's, Linear Regression, Polynomial Regression, Logistic Regression, SVM, Random Forests, Decision Trees, K-NN, Naive Bayes, K-Means, Hierarchical clustering, Association Rule Learning, Reinforcement Learning, Self-organizing maps

Dimensionality Reduction Techniques: Principle Component Analysis (PCA), Latent Dirichlet Allocation (LDA), Kernel PCA.

Model Evaluation / Engineering: Cross Validation Technique, Activation Functions, Grid Search, Bayesian Optimization and Regularization (Lasso and Ridge Regression), Feature Selection methods, Feature Scaling.Natural Language Processing (NLP):Text Analytics, Text processing (Tokenization, Lemmatization), Text Classification, Text clustering, Name Entity Recognition (NER), Word Embedding and Word2Vec, POS Tagging, Speech Analytics, Sentimental Analysis.

Python Programming Skills: Keras, Pandas, Numpy, scikit-learn, NLTK, SpaCy, SciPy, PySpark, Plotly, Cufflinks, Seaborn, Theano, matplotlib, Django, Flask, GloVe, Pytorch, Beautiful Soap (bs4), Web Scraping

R Programming Skills: R Shiny, MICE, rpart, CARET, random Forest, Data Preprocessing, Web Scraping, Data Extraction, Dplyr, GGplot2, Statistical Analysis, Predictive Analysis, GGplotly, rvest, Data visualization

Data Visualization: AWS Quick sight, Tableau, MS Power BI, Seaborn, Qlik View matplotlib, Plotly, cufflinks, ggplot2, RShiny.

Big Data: Hadoop, Hive, Mongo DB, Apache Spark, Scala, Pig, Sqoop

Database Servers: MySQL, Microsoft SQL server, SQLite, Red Shift, PostgreSQL, Mongo DB, Teradata

Amazon Web Services: EC2, Lambda, Sage Maker, EMR, S3, Quick Sight, API Gateway, Athena, Lex, Recognition, CI/CD, Code Commit, DynamoDB, transcribe, Cloud Formation, Cloud Watch, Glacier, IAM

Development Environments/ Cloud: AWS, IBM Cloud, Azure

WORK EXPERIENCE:

Confidential

Data Scientist/Machine Learning Engineer

Responsibilities:

  • Worked on end-to-end machine learning workflow, written python code for gathering the data from AWS snowflake, data preprocessing, feature extraction, feature engineering, modeling, evaluating the model, and deployment. Written python code for exploratory data analysis using Scikit-learn machine learning python packages- NumPy, Pandas, Matplotlib, Seaborn, statsmodels, pandas profiling.
  • Trained Random Forest ­­algorithm on customer web activity data on media applications to predict the potential customers. Worked on Google TensorFlow, Keras API- convolution neural networks for classification problems.
  • Written code for feature engineering, Principal component analysis PCA, and hyper parameter tuning to improve the accuracy of the model.
  • Worked on various machine learning algorithms like Linear regression, logistic regression, Decision trees, random forests, K- means clustering, Support vector machines, and XGBoosting on client requirements.
  • Developed machine learning models using recurrent neural networks - LSTM for time series, and predictive analytics.
  • Developed machine learning models using Google TensorFlow Keras API Convolution neural networks for Classification problems, fine-tuned the model performance by adjusting the epochs, bath size, and Adam optimizer.
  • Good knowledge of image classification problems using the Keras Models for image classification with weights trained on ImageNet like VGG16, VGG19, ResNet, ResNetV2, and InceptionV3. Knowledge of OpenCV for real-time computer vision.
  • Worked on natural language processing for documentation classification, and text processing using NLTK, SPACY, and TextBlob to find the sensitive information in the electronically stored files and text summarization.
  • Developed the Python automation script for consuming the Data subjects request from AWS snowflake tables and posting the data to adobe analytics privacy API.
  • Developed the python script to automate the data cataloging in the Alation data catalog tool. Tagged all the Personally Identified Information (PII) data in the Alation enterprise data Catalog tool, to identify the sensitive consumer information.
  • Consumed the Adobe analytics web API and written the python script to get the adobe consumer information for digital marketing into a snowflake. Worked on Adobe analytics ETL jobs.
  • Written stored procedures in AWS snowflake to look for sensitive information across all the data sources and hash the sensitive data with salt value to anonymize the sensitive information to meet the CCPA law.
  • Worked on AWS boto3 API to make the HTTP calls to AWS amazon web services like S3, AWS secrets manager, and AWS SQS.
  • Created an integration to consume the HBO consumer subscription information posted to AWS SQS- simple queue services and loaded into Snowflake tables for data processing, stored the metadata information into Postgres tables.
  • Worked on generating the reports to provide the warner media brands consumer information to data subjects through python automation jobs.
  • Implemented AWS lambda functions, the python script that pulls the privacy files from AWS S3 buckets to post to it the Malibu data privacy endpoints.
  • Involved in different phases of data acquisition, data collection, data cleaning, model development, model validation, and visualization to deliver solutions.
  • Worked with Python NumPy, SciPy, Pandas, Matplot, and Stats packages to perform dataset manipulation, data mapping, data cleansing, and feature engineering. Built and analyzed datasets using R and Python.
  • Extracted the data required for building models from AWS snowflake Database. Performed data cleaning including transforming variables and dealing with missing values and ensuring data quality, consistency, and integrity using Pandas and NumPy.
  • Tackled highly imbalanced Fraud dataset using sampling techniques like under-sampling and over-sampling with SMOTE using Python Scikit-learn.
  • Utilized PCA and other feature engineering techniques to reduce the high dimensional data, applied feature scaling, and handled categorical attributes using one hot encoder of Scikit-learn library.
  • Developed various machine learning models such as Logistic regression, KNN, and Gradient Boosting with Pandas, NumPy, Seaborn, Matplotlib, and Scikit-learn in Python.
  • Elucidating the continuous improvement opportunities of current predictive modeling algorithms. Proactively collaborates with business partners to determine identified population segments and develop actionable plans to enable the identification of patterns related to quality, use, cost and other variables.
  • Experimented with ensemble methods to increase the accuracy of the model with different bagging and boosting methods and deployed the model on AWS.
  • Created and maintained reports to display the status and performance of deployed model and algorithm with Tableau.
  • Developed MapReduce jobs in Python for data cleaning and data processing.

Environment: Machine learning, AWS, MS Azure, Cassandra, SAS, Spark, HDFS, Hive, Pig, Linux, Python, MySQL, Eclipse, PL/SQL, SQL connector, SparkML.

Confidential -Tempe, AZ

Data Scientist/Machine Learning Engineer

Responsibilities:

  • Involved in installing Hadoop Ecosystem components.
  • Develop and run Map-reduce jobs on multi-petabyte YARN and Hadoop clusters which process billions of events every day, to generate daily and monthly reports as per user’s need.
  • Used to manage and review the Hadoop log files and was responsible to manage data coming from different sources.
  • Extract Transform and Load data from Sources Systems to Azure Data Storage services using a combination of Azure Data Factory, Informatica BDM, T-SQL, Spark SQL, and Azure Data Lake Analytics.
  • Used Azure Synapse to manage processing workloads and served data for BI and prediction needs.
  • Participated in the development/implementation of Cloudera Hadoop environment.
  • Used python Boto 3 to configure the services AWS glue, EC2, S3.
  • Developed spark code using python for faster processing of data on Hive.
  • Developed entire frontend and backend modules using Python on Django Web Framework.
  • Continuous monitoring and managing the Hadoop cluster using Cloudera Manager.
  • Identified areas of improvement in existing business by unearthing insights by analyzing the vast amounts of data using machine learning techniques.
  • Connected to AWS Redshift through Tableau to extract live data for real-time analysis.
  • Responsible for creating on-demand tables on S3 files using Lambda Functions and AWS Glue using Python and PySpark.
  • Interpret problems and provides solutions to business problems using data analysis, data mining, optimization tools, machine learning techniques, and statistics.
  • Designed and developed NLP models for sentiment analysis.
  • Led discussions with users to gather business processes requirements and data requirements to develop a variety of Conceptual, Logical, and Physical Data Models. Expert in Business Intelligence and Data Visualization tools: Tableau, Microstrategy.
  • Worked on machine learning on large-size data using Spark and MapReduce.
  • Let the implementation of new statistical algorithms and operators on Hadoop and SQL platforms and utilized optimization techniques, linear regressions, K-means clustering, Naive Bayes, and other approaches.
  • Development of UDF in java for Hive and Pig
  • Data Ingestion to one or more Azure Services - (Azure Data Lake, Azure Storage, Azure SQL, Azure DW) and processing the data in Azure Databricks.
  • Developed Spark/Scala, Python for regular expression (regex) project in the Hadoop/Hive environment with Linux/Windows for big data resources.
  • Data sources are extracted, transformed, and loaded to generate CSV data files with Python programming and SQL queries.
  • Stored and retrieved data from data warehouses using Amazon Redshift.
  • Worked on TeradataSQL queries, Teradata Indexes, and Utilities such as Mload, Tpump, Fast load, and FastExport.
  • Configured Hadoop tools like Hive, Pig, Zookeeper, Flume, Impala, and Sqoop.
  • Used Data Warehousing Concepts like Ralph Kimball Methodology, Bill Inmon Methodology, OLAP, OLTP, Star Schema, SnowFlake Schema, Fact Table, and Dimension Table.
  • Refined time-series data and validated mathematical models using analytical tools like R and SPSS to reduce forecasting errors. queried both Managed and External tables created by Hive using Impala.
  • Worked on data pre-processing and cleaning the data to perform feature engineering and performed data imputation techniques for the missing values in the dataset using Python.
  • Created Data Quality Scripts using SQL and Hive to validate successful das ta load and quality of the data. Created various types of data visualizations using Python and Tableau.

Environment: Hadoop, Azure, Map Reduce, Spark, Spark MLLib, Java, Tableau, Azure DevOps SQL, Excel, VBA, SAS, Matlab, AWS, SPSS, Cassandra, Oracle, MongoDB, SQL, DB2, T-SQL, PL/SQL, XML, Tableau.

Confidential -Tampa, FL

Data Analyst/Data Scientist

Responsibilities:

  • Collecting data from various data sources including the oracle database server and customer support department and integrating those into a single data set.
  • Responsible for data identification, collection, exploration, and cleaning for modeling.
  • Worked with Data preprocessing techniques like checking the data normally distributed and implemented log transformation, Box-Cox, cube root, and square root transformations.
  • Used Pandas, NumPy, seaborn, SciPy, Matplotlib, and Scikit-learn, to visualize the data after removing missing and outliers to fit in the model.
  • Used Spark 2.0 (PySpark, MiLB) to develop a variety of models and algorithms for analytic purposes.
  • Performed and treated outliers and missing values detected using boxplots and Pandas predefined functions.
  • Worked with dimensionality reduction techniques like PCA, LDA, and ICA.
  • Also used t-SNE (t-Distributed Stochastic Neighbor Embedding), and UMAP (Uniform Manifold Approximation and Projection).
  • Worked with Regression model which includes Random Forest regression. Lasso Regression.
  • Worked with various classification algorithms including Naïve Bayes, Random Forest, Support Vector Machines, Logistic Regression, etc.
  • Also worked with K-Nearest Neighbors and Apriori algorithms for product recommendations including content-based filtering and collaborative filtering methods.
  • Applied Clustering algorithms such as K-means to categorize customers’ data into certain groups.
  • Involved in Time Series forecasting models such as Varimax, Arimax Holt Winter, and Vector autoregression.
  • Worked with Content Based Filter, Collaboration filter for recommending products to the customers.
  • Used Regularization techniques such as L1 and L2 and Elastic net to balance variance - basics tradeoff.
  • Used Pyplot, ggplot, seaborn, Matplotlib, and Plotly for visualizing the results.

Environment: Oracle, Pandas, NumPy, SciPy, Spark 2.0, Matplotlib, Scikit-learn, SQL, java, Cassandra, MLlib, Tableau, Maven, Git, PySpark, Pyplot, ggplot, seaborn, Matplotlib, Plotly..

Confidential

DataModeler/DataAnalyst

Responsibilities:

  • Identified customer’s p by performing K-mean Clustering.
  • Collected and extracted relevant data from 20 public companies’ annual reports over 30 years under the GAAP rule.
  • Studied company’s news highlights to forecast potential financial ratio changes and presented essential findings
  • Predicted customer conversion likelihood to improve marketing efficiency and reduce marketing costs.
  • Created technical reports and visualized data in Tableau to support marketing and project activities
  • Collaborated with business leaders to analyze problems optimize processes and build presentation dashboards.
  • Created data connections, projects, and groups in the Tableau server.
  • Created incremental refreshes for data sources on the Tableau server.
  • Created views in Tableau Desktop that were published to the internal team for review and further data analysis and customization using filters and actions.
  • Created reports for users in Tableau by connecting to various data sources (MS SQL Server, Oracle, MS Excel, Netezza, CSV).
  • Created Heat Map showing current service subscribers by the color that were broken into regions allowing the business user to understand where we have most users vs. least users.
  • Blended data from multiple databases into one report by selecting primary keys from each database.
  • Created documentation and test cases, worked with users for new module enhancements and testing.

Environment: Tableau, Python 3.2 (Numpy, Pandas, Scikit-Learn, Matplotlib), SQL, Excel

We'd love your feedback!