Data Scientist / Machine Learning Engineer Resume
St Louis, MO
SUMMARY
- Data Scientist around 6 years of experience executing data - driven solutions with adept knowledge on Data Analytics, Text Mining, Machine Learning (ML), Predictive Modelling, and Natural Language Processing (NLP).
- Proficient in Statistical Modeling and Machine Learning techniques (Linear, Logistics, Decision Trees, Random Forest, SVM, K-Nearest Neighbors, Bayesian, XG Boost) in Forecasting/ Predictive Analytics
- Experience in using various packages in R like dplyr, ggplot2, caret, rjson, plyr, and python packages like NumPy, Pandas, SciPy, Scikit-learn, Beautiful Soup, NLTK, Matplotlib, Seaborn.
- Comprehensive knowledge and experience innormalization/de-normalization, data extraction, data cleansing, data wrangling and data manipulation
- Hands on solving problems which brings significant business value by building predictive models utilizing structured & unstructured data.
- Experience in productionizing Machine Learning pipelines on Google cloud platform which performs data extraction, data cleaning, model training and updating the model on performance basis.
- Expertise in building ML modelsto predict failure events over store self-checkout machinesand provides root cause for those failure events.
- Hands on experience in Data Governance, Data Mining, Data Analysis, Data Validation, Predictive modeling, Data Lineage and Data Visualization in all the phases of the Data Science Life Cycle.
- Experience with Data Analytics, Data Reporting, Ad-hoc Reporting, Graphs, Scales, PivotTables
- Built a Machine Learning model to predict hourly sales (Orders, Invoices and Shipments) for an ecommerce platform.
- Hands - on experience in Machine Learning algorithms such as Linear Regression, GLM, CART, SVM, KNN, LDA/QDA, Naive Bayes, Random Forest, SVM, Boosting.
- Experience in working on both windows, Linux andUNIXplatforms including programming and debugging skills inUNIXShell Scripting.
- Hands on experience increating data visualizations, dashboards in a Tableau desktop.
- Expertise in performing time series analysis and built forecasting models to predict the temperature and humidity spikes inside cold storage warehouses.
- Expertise in building monitoring dash boards that visualizes the present and predicted health of the cold storage warehouses.
- Proficient in Python and its libraries such as NumPy, Pandas, Scikit-learn, Matplotlib and Seaborn.
- Experience in building data warehouses, data marts and data cubes for creating power BI reportsto visualize various key performance indicators of business.
- Utilized python libraries namely pandas, matplotlib and plotly for performing data analysis, data visualizations and predicted unexpected reboot events on store self-checkout machines (POS systems).
- Expertise in building computer vision and deep learning applications.
- Using python, Telegraph and Kafka built a metrics data pipelines to push virtual infrastructure performance metrics, failed events to Wave Front tenant and builtmonitoring dash boards.
- Using Docker and ansible, containerized virtual infrastructure’s configuration management tasks which are used to detect config drifts and change back to original configurations.
- Expertise in containerizing applications using Docker composes.
- Achieved Continuous Integration & Continuous Deployment (CI/CD) for applications using Concourse.
TECHNICAL SKILLS
Programming Languages: R, Python, SQL, Scala, C#
Software Methodologies: SDLC - Waterfall, Agile, SCRUM
Data Science tool: R-Studio, PyCharm, Jupyter Notebook, Rshiny, Flask, Docker
Databases: MySQL, SQL Server, TSQL, MySQL, PostgreSQL, Hive
Machine Learning Algorithms: Linear regression, Logistic regression, SVM, Decision tree, Random Forest, Boosting, Bagging, KNN, K-means clustering, Naïve bayes, GBT, LDA, QDA, PCA
Data Modelling Tools: Star-Schema Modelling, Star Schema, Snowflake Schema Modelling, FACT and dimension tables, Pivot Tables
Bigdata Framework: HDFS, MapReduce, Spark, Storm, Kafka, Scala
BI Tools: Power BI,Tableau, Crystal Reports, Azure Data Warehouse, Splunk
Cloud Technologies: AWS (EC2, S3), Microsoft Azure, Open stack.
Hands on R-Packages: tidyR, reshape2, stringR, lubridate, validate, neuralnet, caret, ctree, rpart, tseries, randomforest, forecast, quantmode, tm
Hands on Python Packages: Pandas, numpy, matplotlib, scipy, sklearn, Beautifulsoup, urllib2, nltk
Pipeline Tools: Azure Data Factory, Informatica, Apache Kafka
Operating Systems: Windows, Mac OS, ubuntu
Other Tools and Technologies: PL/SQL, ASP, Visual Basic, Django Framework, XML, C#, JAVA, HTML 5, CSS, JavaScript, MS Office, Project Management Skills, SPSS, Minitab, AutoCAD, Factory Flow, Microsoft Project, Primavera6, Macros, SQL, Pivot Table, Statistical Analysis, and Database Management
PROFESSIONAL EXPERIENCE
Confidential, St Louis, MO
Data Scientist / Machine Learning Engineer
Responsibilities:
- Applied Supervised Machine Learning Algorithms Logistic Regression, Decision Tree, and Random Forest for the predictive modeling various types of problems.
- Explored and analyzed the customer specific features by using Matplotlib, Seaborn in Python and dashboards in Tableau.
- Built an ML model to perform customer segmentation which in turn helped to Increase the sales by strategic email campaigning of offers to targetcustomers.
- Developed predictive models using Decision Tree, Random Forest, Naïve Bayes, Logistic Regression, Cluster Analysis, and Neural Networks.
- Performed K-means clustering, Regression and Decision Trees in R.
- Worked on Naïve Bayes algorithms for Agent Fraud Detection using R.
- Used packages like dplyr, tidyr and ggplot2 in R Studio for datavisualization and generated scatter plot and high low graph to identify relation between different variables.
- Developed NLP models for Topic extraction, Sentiment Analysis.
- Automated the temperature monitoring system by building a forecasting model topredict the temperature and humidity spikes inside cold storage warehouses.
- Developed pipeline using Hive (HQL) to retrieve the data from Hadoop cluster, SQL to retrieve data from Oracle database and used ETL for data transformation.
- Derived data from relational databases to perform complex data manipulations and conducted extensive data checks to ensure data quality. Performed Data wrangling to clean, transform and reshape the data utilizing Numpy and Pandas library.
- We have worked with datasets of varying degrees of size and complexity including both structured and unstructured data and Participated in all phases of Data mining, Data cleaning, Data collection, variable selection, feature engineering, developing models, Validation, Visualization and Performed Gap analysis.
- Used tableau desktop for creating data visualizations.
- Understanding in UNIX Shell scripts and writing SQL Scripts for development, automation of ETL process, error handling, and auditing purposes.
- Utilized Spark, Scala, Hadoop, HQL, VQL, oozie, pySpark, Data Lake, TensorFlow, HBase, Cassandra, Redshift, Mongo DB, Kafka, Kinesis, Spark Streaming, Edward, CUDA, MLLib, AWS, Python, a broad variety of machine learning methods including classifications, regressions, dimensionally reduction etc.
- Implemented Installation and configuration of multi-node cluster on Cloud using AWS.
- Productionized machine learning pipelines that gather data from BigQuery and build forecasting models to predict temperature and humidity spikes inside the warehouses.
- Built monitoring dash boards that visualizes the present and predicted health of the cold storage warehouses. worked on Amazon Sage Maker, which is a fully managed service that helped us in preparing, building, training, and deploying high-quality machine learning models quickly by bringing together a broad set of capabilities.
Environment: Python, ML.NET,Tableau, SQL Server, Hive, AWS, OLAP, DB2, Metadata, MS Excel, Mainframes MS Vision,R-Studio, Spark, Map-Reduce, Rational Rose, SQL and MongoDB, Unix/Linux.
Confidential, Portland, OR
Data Scientist / Data Engineer
Responsibilities:
- Queried data from SQL server, imported other formats of data and performed data checking, cleansing, manipulation, and reporting using SAS (Base and Macro) and SQL
- Data Warehousing experience with Oracle, Redshift, Teradata and other MPPdatabases
- Extracted the data from different sources like Teradata and DB2 into HDFS and HBase using Sqoop.
- Developed OozieWorkflows for daily incremental loads to get data from Teradata and imported them into Hive tables.
- Applied resampling methods like Synthetic Minority Over Sampling Technique (SMOTE)to balance the classes in large data sets.
- Performed Ad hoc analysis and reported the data using SAS and Excel.
- Collaborated with database engineers to implement ETL process, wrote and optimized SQL queries to perform data extraction and merging from SQL server database.
- Partnered with the data science team to prepare structured and unstructured data that they can use for predictive and prescriptive modeling.
- Resolved classification algorithms such as Linear Regression, Logistic Regression, and K-NN to predict the customer churn and customer interface.
- Used Python to transform datafrom nested JSON, and various formats into usable data.
- Developed Spark code using Python and Spark-SQL for faster testing and processing of data in real time.
- Developed full life cycle ofDataLake,DataWarehouse with Bigdata technologies like Spark and Hadoop.
- CreatedKafkatopics, provided ACLs to users and setup mirror maker to transfer the databetween twoKafkaclusters.
- Involved in team meetings, discussions with business teams to understand the business use cases.
- Experienced in using Linux environment and working under matrix organizational structure.
- Developed mappings that perform Extraction, Transformation, and Load of source data into Derived Masters schema using various power center transformations like Source Qualifier, Aggregator, Filter, Router, Sequence Generator, look up, Rank, Joiner, Expression, Stored Procedure, SQL, Normalizer and update strategy to meet business logic in the mappings.
- Created and run advanced SQL queries to pinpoint client's data issue.
- Created issue tracking system using BI dashboard.
- Developed Scala scripts using both Data frames/SQL/Data sets and RDD/MapReduce in Spark for Data Aggregation, queries and writing data back into OLTP system through Sqoop.
- Used Apache Falcon for mirroring of HDFS and HIVE data.
- Used Teradata utilities like BTEQ, fast load, fast export, multiload for data conversion.
Environment: Python, R, Git, GitHub, Analysis ToolPak, Aware framework, SQL, R Studio 3.5.1, AWS S3, NLP, EC2, Neural networks, SVM, Cassandra, Oracle, MongoDB, Flat Files, XML, and Tableau.
Confidential
Data Analyst
Responsibilities:
- Build model using Linear regression, Logistic regression, Decision Tree, k-means clustering, SVM and Random Forest intelligence decision models to analyze the customer response behaviors, interaction pattern, and Sales prediction and forecasting.
- Support differentiated instruction daily data updates to ensure user input is formatted accurately prior to importing to database.
- Drill down in the submitted data to locate any errors, share with management to contact school for correction to be made.
- Reviewed and Designed Functional Requirement Specifications and Test Plans.
- Analyzed the System Specification to develop the Test Cases.
- Tested Web services for application to ensure all the components are working as per the requirements.
- Designed rich data visualizations to model data into human-readable form with Tableau and Matplotlib.
- Data Mapping from Access to SQL server and vice versa.
- Managed overall data from Database for PowerPoint presentation.
- Developed and executed Test Cases to test data loads, verify application rules and workflows & data security.
- Worked in the Test-Driven Development.
- Used complex SQL query to extract and import data for Databases Analysis.
- Performed database testing using SQL and developed SQL Queries for backend testing
- Performed query operations on Oracle to do database validation using SQL Functions and Commands.
- Developed and debugged Testing Scripts for J2EE applications using JMeter.
- Identifies and reports software defects and test findings using JIRA.
Environment: Tableau, MS Access, MS Excel, MS Visio, Oracle, Informatica Power Center, Unix, QlikView, SDLC, SQL Server.