Sr. Data Scientist /machine Learning Engineer Resume
Florida, TX
SUMMARY
- Passionate, goal - driven Sr. Data Scientist/ Machine Learning Engineer over 5 years of experience where I Performed data science life cycle operations iteratively in past to solve complex problems at large scale. Currently Cortona Tim Burton sender the president seeking new assignment to grow as a “problem solver” and contribute well to company’s success. ver 5 years of experience in Azure Machine Learning, Datamining with large Data Sets of Structured and Unstructured Data, Data Acquisition, Data Validation, Predictive modeling, Data Visualization, Web Crawling, Web Scraping Statistical Modeling, Data Mining and Natural Language Processing (NLP) by using R and python.
- Proficient in managing entire data science project life cycle and actively involved in all the phases of project life cycle including data acquisition, data cleaning, features scaling, features engineering, statistical modeling (Decision Trees, Regression Models, Neural Networks by bot or chatbot, Raspberry Pi, Support Vector Machine (SVM), Clustering), dimensionality reduction using Principal Component Analysis and Factor Analysis, testing and validation using ROC plot, K - fold cross validation and Data Visualization by python and R.
- Expertise in transforming business requirements into analytical models, designing algorithms, building models, developing Data Mining and reporting solutions that scales across massive volume of structured and unstructured Data.
- Skilled in performing Data Parsing, Data Manipulation and Data Preparation with methods including describe Data contents, compute descriptive statistics of Data, regex, split and combine, remap, merge, subset, reindex, melt and reshape.
- Experience in using various packages in Python and R like ggplot2, caret, dplyr, Rweka, gmodels, RCurl, tm, C50, twitteR, NLP, Reshape2, rjson, plyr, pandas, numpy, seaborn, scipy, matplotlib, scikit-learn, Beautiful Soup, Rpy2.
- Hands on experience in implementing LDA, NaiveBayes and skilled in Random Forests, Decision Trees, Linear and Logistic Regression, SVM, Clustering, neural networks by bot or chatbot, Raspberry Pi, Principle Component Analysis using R and python.
- Extensive experience in Text Analytics, generating Data Visualization using Python and R, customizing interactive reports, creating dashboards using tools like Tableau, Power BI by producing tables, graphs, listings using various strategies.
- Utilized analytical applications like R, SPSS, Rattle and Python, java to identify trends and relationships between different pieces of data, draw appropriate conclusions and translate analytical findings into risk management and marketing strategies that drive value.
- Knowledge of working with Proof of Concepts (PoC's) and gap analysis and gathered necessary data for analysis from various sources, prepared data for data exploration using data munging and Teradata.
- Worked closely with customer's, cross-functional teams, research scientists, software developers, and business teams in an Agile/Scrum work environment to drive data model implementations and algorithms into practice.
- Capacity to separate Web search and information assortment, Web information mining, Extract database from site, Extract Data passage and Data handling with R Visualization, Power BI and Tableau.
- Skilled in Advanced Regression Modeling, Time Series Analysis, Statistical Testing, Correlation, Multivariate Analysis, Forecasting, Model Building, Business Intelligence tools and application of Statistical Concepts.
- Ability to extract Web search and data collection, Web data mining, Extract database from website, Extract Data entry and Data processing with R Visualization, Power BI and Tableau.
- Good knowledge, understanding and implementing of data mining techniques like classification, clustering, regression techniques and random forests in python and R- programming.
- Experience on advanced SAS programming techniques, such as PROC APPEND, PROC DATASETS, and PROC TRANSPOSE.
- Implemented Deep Learning models and numerical Computation with the help of data flow graphs using Tensor Flow Machine learning with the help python.
- Proficient knowledge of statistics, mathematics, machine learning, recommendation algorithms and analytics with an excellent understanding of business operations and analytics tools for effective analysis of data.
- Experience of building machine learning solutions using PySpark for large sets of data on Hadoop ecosystem.
- Experience in developing and designing ETL packages and reporting solutions using MS BI Suite (SSIS/SSRS) and Tableau.
- Experience in tuning algorithms using methods such as Grid Search, Randomized Search, K-Fold Cross Validation, and Error Analysis in both R and python.
- Demonstrated ability working and adapting to Big Data tools such as Azure Databricks, Azure bolb storage, HDFS, Pig, MapReduce Hive, Sqoop, Flume Apache Ambari, MLlib, Mahout, Solr,, Zookeeper, Oozie, Azure HDInsight, Spark, etc.
- Proficient in developing ETL applications on large volumes of data using different tools: MapReduce, Spark-Scala, Hive, PySpark, Azure Databricks, Azure SQL, Azure HDInsight and Pig.
- Demonstrated analytical thinking and implementing leveraging statistical techniques, such as T-test, P-value analysis, z-score analysis, ANOVA, Confidence Interval, Confusion Matrix, Precision, Recall, ROC / AUC curve analysis, etc. by both python and R.
- Seven years of experience in working on structured, semi-structured, and un-structured data in various banking, healthcare, finance, and software domain.
- Expertise in Feature Engineering methods such as - Filtering (correlation), Wrapper Methods (Forward, Backward, Stepwise selection, etc.), and Embedded Methods (feature importance of decision trees or tree ensembles).
TECHNICAL SKILLS
ML Regression Algorithms: Simple Linear Regression, Multiple Linear Regression, Ridge Regression, Lasso Regression, Partial Least Square Regression, Principal Component Regression, KNN Regression, Support Vector Machine Regressor, Decision Tree Regressor, Random Forest Regressor, Extreme Gradient Boosting Regressor, etc. in AWS platform
ML Classification Algorithms: Logistic Regression, KNN, Decision Tree Classifier, Random Forest Classifier, Gradient Boosting Classifier, Extreme Gradient Boosting Classifier, Extra Trees Classifier, Support Vector Machine Classifier, Naïve Bayes Classifier, etc. in In windows AWS
Ensemble and Stacking: Base Learning, Meta Learning, Majority Voting, Averaged Ensembles, Weighted Averaging, Stacked Ensemble, AutoML - Scikit-Learn, MLjar, etc.
Statistical Methods / Techniques: Sampling, resampling methods, Hypothesis testing, Confidence Interval, P-value, Confusion Matrix, T-Test, ANOVA, VIF, Correlation, Feature Engineering / Feature Selection techniques, anomaly detection, outlier removal, etc.
Programming / Query Languages: R Programming (caret, glmnet, xgboost, rpart, ggplot2, sqldf), Python Programming (numpy, pandas, scikit-learn, seaborn, matplotlib, NLTK), NoSQL, PySpark. PySpark sql, Sql, etc. R Studio, Jupyter Notebooks.
Big Data tools / Cloud / Visualization / Other tools: Hadoop Distributed File System, Sqoop, Map Reduce, Flume, PIG, Hive, Ambari, MapR, Hortonworks, Cloudera, Mahout. MLlib, oozie, zookeeper, etc. Google Cloud, Google Shell, Linux, Putty D3js, Bash Shell, Unix, etc. Tableau, ggplot2, matplotlib, seaborn, STATA.
PROFESSIONAL EXPERIENCE
Confidential - Florida, TX
Sr. Data Scientist /Machine Learning Engineer
Responsibilities:
- This project was focused on customer clustering based on ML and statistical modeling effort including building predictive models and generate data products to support customer classification and segmentation.
- Worked on Natural Language Processing (NLP) with NLTK module of python for application development for automated detection of cyber threats.
- Worked in large scale database environment like Hadoop and MapReduce , with working mechanism of Hadoop clusters , nodes and Hadoop Distributed File System (HDFS).
- Conducted analysis in assessing cyber threats that are targeting Chevron and predict the future threats with RFM analysis , applied threat segmentation with clustering algorithms such as K-Means Clustering and Hierarchical Clustering in python and R and used Amazon Sagemaker for deploying them and to track changes in deploying the models git is used .
- Used elastic seek engine to store, search , and analyze massive volumes of records and powered applications that had complex seek capabilities and necessities and extensively utilized in AWS platform to o deploy, operate, and scale Elasticsearch clusters.
- Developed an Estimation model for various product & services bundled offering to optimize and predict the gross margin.
- Designed and developed analytics, machine learning models on AWS platform , and visualizations that drove performance and provided insights, from prototyping to production deployment and product recommendation and allocation planning and used Amazon Sagemaker to set up fashions and to song changes in deploying the models git is used .
- Worked with cyber threat intelligence team for Partner and collaborate with a cross-functional team to frame and answer important data questions prototyping and experimentation ML/DL algorithms in windows AWS and integrated into production system for different business needs.
- Design built and deployed a set of python modelling APIs for customer analytics, which integrate multiple machine learning techniques for various user behavior prediction and support multiple marketing segmentation programs on a AWS cloud .
- Used classification techniques including Random Forest and Logistic Regression to quantify the likelihood of detecting threat referring in python and R, java in AWS platform .
- Designed and implemented end-to-end systems for Data Analytics and Automation , integrating custom , visualization tools using python , R, Tableau, and Power BI.
- Implement k-means and Bayesian classification techniques , analyze polynomial algorithms , created non-linear regression models and defend the need for tree-based decision forest learning approaches.
- Worked with various databases like Oracle, Azure SQL and performed the computations, log transformations , feature engineering , and Data exploration to identify the insights and conclusions from complex data using python, R- programming in R-studio, Machine learning Algorithms on a AWS Cloud .
- Integrated SAS datasets into Excel using Dynamic Data Exchange , using SAS to analyze data, statistical tables, listings and graphs for reports .
- Implemented predictive models using machine learning algorithms linear regression and linear boosting algorithms and performed in- depth analysis on the structure of models, compared the performance of all the models and found tree boosting is the best for the prediction.
- Performed data analysis on the datasets using Proc Print, Proc Sort, Proc Transpose, Proc Means, Proc Summary, Proc Tabulate, Proc Univariate and Proc Freq in SAS .
- Planned dashboards with Tableau 9.2 gave complex reports, including synopses, outlines, and diagrams to decipher discoveries to group and partners and furthermore with power BI .
- Used and created various visualizations in reports like scatter plot, box plot, Sankey chart, bar graphs, grant charts, trend lines, waterfall charts, statistical models, heatmaps, geo-maps in python and R allowing end users to utilize full functionality of the dashboard.
- Integrated data from data sources with R and analyzed data with R libraries like ggplot2, Shiny, h2o, dplyr, reshape2, plotly, RMarkdown, Knitr, ElmStatLearn, caTools etc.
- Achieved 92% accuracy with 93% specificity, 83% sensitivity and 84% AUC in prediction of customer preferences using Artificial Neural Networks using BOT or chatbot, Raspberry Pi.
- Partitioned data into multiple k parts to avoid risk of overfitting and get a generalized model, which is not bias affected. This helped deal with bias-variance tradeoff.
- Used power BI to Create representations and reports for mentioned ventures, worked together with groups to incorporate frameworks.
- Performed and implemented various sampling, resampling techniques, such as - simple random sampling, stratified sampling, reservoir sampling, random under / over sampling to improve quality of statistical model.
- Design, build, and deploy BI solutions by creating tools in order to store data like OLAP cubes with the help of Power Bi, Tableau.
Confidential - Houston, TX
Sr. Data Scientist / Data Engineer/Machine Learning Engineer
Responsibilities:
- Tackled highly imbalanced Fraud dataset using under sampling, oversampling with smote and cost sensitive algorithms with Python Scikit-learn .
- Used Pandas, NumPy, seaborn, SciPy, Matplotlib, Scikit-learn, NLTK in Python for developing various machine learning algorithms and utilized machine learning algorithms such as linear regression, multivariate regression, naive Bayes, Random Forests, K-means, & KNN for data analysis in python and R .
- Demonstrated experience in design and implementation of Statistical models, Predictive models, enterprise data model, metadata solution and data life cycle management in both RDBMS, Big Data environments.
- Created multiple custom SQL queries in Teradata SQL Workbench in AWS platform to prepare the right data sets for Tableau dashboards . Queries involved retrieving data from multiple tables using various join conditions that enabled to utilize efficiently optimized data extracts for Tableau workbooks.
- Developed personalized products recommendation with Machine Learning algorithms, including Collaborative filtering and Gradient Boosting Tree, to better meet the needs of existing customers and acquire new customers and used Amazon Sagemaker to set up fashions and to track modifications in deploying the fashions git is used .
- Designed dashboards with Tableau 9.2 provided complex reports, including summaries, charts, and graphs to interpret findings to team and stakeholders and also with power BI.
- Implemented a project using python libraries such as skit-learn, Numpy, Scipy and UNIX for effective deliver of results in customer analytics, price optimization, asset utilization ...etc. using artificial neural networks (Natural Language Processing) using BOT or chatbot and convolutional neural networks used by bot or chatbot, Raspberry Pi .
- Carried out Regression Analysis with R/SAS, investigated on the model for problems like goodness of fit, over-fitting, Multi co linearity, residual normality etc. Established our linear model for forecasting.
- Performed statistical analysis of data using SAS and SPSS. Applied descriptive and inferential methodologies to identify disease trends & warning signals and to undertake impact assessment.
- Transformed data from MS SQL Server 2008 to MS SQL Server 2012 using OLE DB connection by creating various SSIS packages .
- Created business critical KPIs using SSAS representing aggregations in several different ways - hierarchically and using custom groupings that the company will use to analyze performance.
- Used power BI to Create visualizations and reports for requested projects, collaborated with teams to integrate systems.
- Performed batch ETL and streaming on data ingested from various sources including, Kafka, Flume, HDFS, S3 , etc. and saved the manipulated data into our local RDBMS and NoSQL databases and used Amazon Sagemaker to deploy models and to track modifications in deploying the fashions git is used .
- Used Spark and H2O together with Flow UI to perform various deep learning tasks implementing classification and regression algorithms and used Amazon Sagemaker to installation fashions and to track modifications in deploying the fashions git is used .
- Used elastic sea rch engine to store, search, and analyze big volumes of data and powered applications that had complex search features and requirements and used in AWS platform to o deploy, operate, and scale Elasticsearch clusters.
- Extensive involvement with Text Analytics, producing Data Visualization utilizing Python and R, tweaking intuitive reports, making dashboards utilizing devices like Tableau, Power BI by delivering tables, diagrams, postings utilizing different procedures.
- Used the decision tree algorithm to develop a model to optimize profit, asset utilization ...etc.
- Built and trained multi-layered Neural Networks with the help of bot or chatbot, Raspberry Pi to implement Deep Learning by using TensorFlow, Keras, KNIME & Azure ML studio and used Amazon Sagemaker to set up fashions and to track modifications in deploying the fashions git is used .
- Improved operational performance from 60% to 80% by using Random Forest and Gradient Boosting for feature selection with Python Scikit-learn .
- Applied predictive analysis and statistical modeling techniques using Python, R, Tableau and Spotfire to analyze customer behavior and offer customized products, reduce delinquency rate and default rate. Lead to fall in default errors from 5% to 2%.
- Used Power BI for evaluating and improving existing BI systems, Developed and executed database queries and conduct analyses, developed and executed database queries and conduct analyses
Confidential - Lowell, AR
Machine Learning Engineer
Responsibilities:
- Analysis of functional and non-functional categorized data elements for data profiling and mapping from source to target data environment. Developed working documents to support findings and assign specific tasks.
- Performed data analysis and data profiling using complex SQL in AWS window on various sources systems including Oracle and Teradata.
- Performing data management projects and fulfilling ad-hoc requests according to user specifications by utilizing data management software programs and tools like Perl, Toad, MS Access, Excel and SQL.
- Written SQL scripts to test the mappings and Developed Traceability Matrix of Business Requirements mapped to Test Scripts to ensure any Change Control in requirements leads to test case update.
- Assisted in defining business requirements for the IT team and created BRD and functional specifications documents along with mapping documents to assist the developers in their coding.
- Designed and developed database models for the operational data store, data warehouse, and federated databases to support client enterprise Information Management Strategy.
- Extensively used ETL methodology for supporting data extraction, transformations and loading processing in a complex EDW using Informatica.
- Worked in importing and cleansing of data from various sources like Teradata, Oracle, flat files, SQL Server 2005 with high volume data.
- Involved in Data mapping specifications to create and execute detailed system test plans. The data mapping specifies what data will be extracted from an internal data warehouse, transformed and sent to an external entity.
- Tested the ETL process for both before data validation and after data validation process. Tested the messages published by ETL tool and data loaded into various databases.
- Applied logistic regression in Python mainly concentrated on AWS platform and SAS to understand the relationship between different attributes of dataset and causal relationship between them.
- Developed Two Stage Least Square models to solve system of equations for predicting the performance of the bikes by changing the sensors ranges from 40 to 100.
- Designed workbook with bar chart, line chart, stack bar chart, onion chart, 3D cylinder chart with trigonometry function sine, cosine, radian in Tableau calculated field.
- Extensively used R programming packages dbplyr, dplyr, ggplot2, stringR, lubridate, tidyverse to solve all kinds of regression and classification related problems and optimized code snippets on monthly basis in machine learning algorithms.
Confidential
Data Analyst
Responsibilities:
- Take part in extraction, aggregation and quality assurance of data from multiple sources in support of performing operational reporting, client reporting, and quantitative analyses of utilization of products.
- Executed and verified report data in regard to claims (EDI), insurance, population, treatment plans and other funding/financial data.
- Used PCE System fully hosted “ SaaS/ASP” software (DASH) for data accessibility including care management from initial intake assessment and plan development recommendation engines.
- Defined parameters to identify multi-collinearity in interest rate with customer deposit with the help of subject matter experts on various use cases and removed collinear columns, which gave us unique records and clean data.
- Developed Test Cases for unit testing, prepared spreadsheet for testing criteria, including regression, positive and negative testing, process flow testing and screenshot for test results to complete expected and actual results.
- Created and updated SQL tables, database, stored procedures, and queries to modify and/or create reports for respective business units and also used Mongo DB to create queries.
- Created a User Defined Functions in Python to automate the repetitive task to increase the efficiency of data pipeline development.
- Performed Data visualization and Designed dashboards with Tableau, and generated complex reports, including Charts, Summaries, and Graphs to communicate the findings to the team and stakeholders.
- Prepared automated code to ensure proper data collection, data access, manipulation and reporting functions with R programming.
- Proactively identified opportunities to automate time and resource intensive procedures associated with data validation and transformation using Python.
- Advised management personnel in the planning of distribution of resources for operational tasks based on analysis of historical data and trends, based on criteria including, but not limited to, geography, economic events, and other exogenous variables.
Confidential
Tableau Developer
Responsibilities:
- Worked with Business Analyst and the Business users to understand the user requirements, layout, and look and feel of the application to be developed.
- Data blending implemented on databases and generated interactive dashboards with Quick filters, Parameters and sets to handle views more efficiently.
- Created adhoc reports to users while connecting to different data sources like excel sheet, flat files, CSV etc.
- Wrote SQL on day to day basis to test and extract the test data.
- Analyzed, Designed and developed Data discovery dashboards in tableau to present the data story to the business stakeholders.
- Restricted data for users using Row level security and User filters .
- Performed duties as a Data Quality Analyst in Analyzing and documenting test cases.