Business Data Analyst Resume
Santa Ana, CA
SUMMARY
- Highly efficient Data Scientist/Data Engineer with over 7+ years of experience in areas including Data Analyst, Statistical Analysis, Machine Learning, Data mining with large data sets of structured and unstructured data in Manufacturing and healthcare industry.
- Developed Classification models in R and Python - Logistic regression (81%-accuracy), SVM (76%), Random Forest (69% overall accuracy).
- 70% time spent on exploratory Data analysis and cleaning in order to build Visualizations by using ggplot2 and Matplotlib and sea born libraries in R and Python.
- 3+ years of extensive experience as aDataScientistwith experience inData Mining, StatisticalDataAnalysis, ExploratoryDataAnalysis and Machine Learning algorithms.
- Certified R and Python programmer with Machine learning Modules.
- Hands on experience with R packages and libraries like ggplot2, Plyr, dplyr, reshape, plotly, R-Markdown, caTools etc.
- Build predictive models in R by using machine learning algorithms such as KNN, SVM, Logistic regression.
- Experienced python matplotlib library to generate visualizations and hands on experience on Amazon web services (AWS) cloud solutions.
- Experienced with rational data base queries and unsupervised and supervised data and GIS.
- Develop project outlines and design analytical approaches to answer research questions lead the interpretation of statistical results to draw conclusions.
- Collaborate with Talent Reporting Analysts to extract and transform the data
- Performed Data Cleaning by identifying missing data outliers, feature scaling and feature engineering.
- Extensive experience with python Notebook and sea born library to build visualizations.
- Experienced working on BI visualization tools such as Tableau, Shiny & QlikView.
- Developed Predictive Machine learning models in R on testing data sets.
- Efficiently handled large data sets, and successfully identified and handled missing data values.
- 3 + years of experience working on ML algorithms such Random Forest, SVM, Logistic regression and K means clustering.
- Hands on experienced with statistics such as regression analysis, binominal distribution, hypothesis testing, ANOVA and chi-square tests.
- Developed Interactive dashboards in Tableau and Excel for production team resulted in process improvement by 17% and optimizing decision-making.
- Efficiently extract data from ERP system with the help of SQL and designed custom based dashboards for production team.
- Python Scikit-learn, Tensorflow and keras packages to train machine-learning models.
- Implemented Machine learning algorithms such SVM and ANN (Artificial Neural networks) for building models.
- Successfully provided support for Natural Language Processing teams and developed Machine Learning models to predict raw material prices.
- Developed 12 months rolling time series forecast by using machine-learning algorithms for cleansing and scaling of data.
- Extensive experience in Text Analytics, developing different Statistical Machine Learning, Data Mining solutions to various business problems and generating data visualizations using R, Python.
- Excellent understanding of machine learning techniques and algorithms, such as k-NN, Naive Bayes, SVM, Decision Forests, natural language processing (NLP) as well as neural networks as a part of deep learning.
- Extensive knowledge of Apache spark and Haddoop big data platforms.
- Python Libraries: Numpy, Pandas, Tensorflow, Sea-born, matplotlib, Plotly and Scikit-Learn. IDE: Notebook and spider.
- R IDE: R-STUDIO, ggplot2, missmap, caTools, e1071 for SVM. Dplyr, plyr for data manipulation.
- Performed Gap analysis by conducting document analysis sessions and led scrum ceremony to decide project scope for agile projects.
PROFESSIONAL EXPERIENCE
Confidential, Sylmar CA
SR. Data Analyst/SR. data scientist
Responsibilities:
- Developed machine utilization plots by using ggplot2 library in R to identify Capacity usage for CNC machines.
- Analyzed and performed exploratory data analysis to identify missing values and structure of the data to build predictive machine-learning models.
- Generated and analyzed graphs using ggplot2 library and Tableau for an overview of the analytical models and results.
- Developed Shiny -R application showcasing machine-learning algorithms for improving business forecasting.
- Developed predictive models using Vector Machines, Decision Tree, Random Forest and Naïve Bayes, collaborating with marketing and Production teams.
- Successfully performedMachine-learning algorithms to predict the raw material price outcomes.
- Created data visualization with Sea born in python to understand annual sales trend pattern.
- Load data into R studio from directory and performed initial data analysis and build machine learning models by sing logistic regression with 78% overall accuracy.
- K means clustering algorithm used to identify regions with k values range from k = 2 to k =4. caTools library used for splitting of the data into testing and training data sets.
- Performed data manipulation in python with libraries like Numpy, Pandas and Scipy.
- Designed and Developed dashboards/reports-using EXCEL enables real-time decision making to improve visibility and optimizing production flow.
- Performed statistical analysis to build predictive models to measure sales revenue and production capacity resulted in to increase productivity flow by 15%.
- Successfully extract data from ERP system into .csv file format and then utilize for initial data exploration and analysis.
- Applied linear regression, multiple regression, ordinary least square method, mean-variance, theory of large numbers, logistic regression, dummy variable, residuals, Poisson distribution, Bayes, Naive Bayes, fitting function etc. to data with help of Scikit, Scipy, Numpy and Pandas module of Python.
- Python and R scripting to wrangle and aggregate large datasets consisting of 1+ million records and inconsistent formats.Functions used such as is.na, median and filters like which ().
- Reset data frame index in R for misaligned data and generate qplot () for data visualization.
- Partnered with modelers to develop data frame requirements for projects and converting vector data into matrices by using rbind () and nbind () functions.
- Performed Ad-hoc reporting/customer profiling, segmentation using R/Python.
- Analyzed large datasets to answer business questions by generating reports and predictions.
- Worked in a team of 3 programmers and data analysts to develop insightful deliverables that support data- driven marketing strategies.
- Executed SQL queries from R/Python on complex table configurations.
- Retrieving data from database through SQL as per business requirements.
- Prepared data frames by using Gsub () function in R for identifying missing data that used for production data analysis.
- Creates, maintains and optimize SQL Server databases and troubleshoot server problems.
- Accomplished Data analysis, statistical analysis, generated reports, listings, and graphs.
- Worked on R and Python to identify business performance via Classification, tree map, and regression models along with visualizing data for interactive understanding and decision-making.
- Documented all programs and procedures to ensure an accurate historical record of work completed on an assigned project, which improved quality and efficiency of process by 15%.
- Adhering to best practices for project support and documentation.
- Developed the hypothesis models and validate the same for data.
- Involved in data analysis with using different analytic techniques and modeling techniques.
Environment: MS Excel, PL/SQL, R, Python, SAS, SQL, MS Word, MS Excel, Hadoop, and Tableau.
Confidential, North Hollywood CA
Data Scientist
Responsibilities:
- Developed Machine learning models on training and testing data sets to predict future inventory usage of the raw material.
- Created and published multiple dashboards and reports by using Tableau server.
- Developed visualizations in python using sea born and matplotlib library for total revenue by region.
- Experience in working on Spark SQL queries, Data frames, import data from Data sources, perform transformations, perform read/write operations, save the results to output directory into HDFS.
- Successfully Identifying outliers, anomalies and trends in any given data sets by using R and Python.
- Developed, installed, maintained and monitored company databases in high performance/high availability environment with supported configuration, performance tuning to ensure optimal resource usage.
- Documented all programs and procedures to ensure an accurate historical record of work completed on assigned project as well as to improve quality and efficacy
- Produced quality reports for management for decision-making and Participated in all phases of research including data collection, data cleaning, data mining, developing models and visualizations.
- Performed data imputation using Scikit-learn package in Python.
- Performed data processing using Python libraries like Numpy and Pandas and performed data visualization usingggplot2 library in R to generate better understanding of customers' behaviors.
Environment: Machine learning, AWS, MS Azure, Cassandra, Spark, HDFS, Hive, Pig, Linux, Python (Scikit-Learn/Scipy/ Numpy/Pandas), R, SAS, SPSS, MySQL, Eclipse, PL/SQL, SQL connector, Tableau 14.
Confidential, Santa Ana CA
Business Data Analyst
Responsibilities:
- Conducted business requirement gathering sessions with client to capture and prioritize requirements.
- Created Business Requirement documents and Functional requirement documents for development team.
- Performed GAP analysis and conducted document analysis sessions with client before starting with project.
- Performed root cause analysis sessions with techniques such as 5 why and 8D matrix resulted in reduction in non-conformances rates by 12%.
- Implemented Various EDI codes such as EDI 810 for invoices, EDI 855 for Purchase order acknowledgments and EDI 870 for order status reports.
- Implemented Agile Methodology techniques such as Scrum and FDD to build the models, conducted scrum ceremony with scrum master.
- Access big data and apply predictive analytic techniques and visualize analysis outcomes such as patterns, anomalies and future trends by using Tableau.
- Create multiple workbooks, dashboards, and charts using calculated fields, quick table calculations, Custom hierarchies, sets& parameters to meet business needs
- Designed technical solution roadmap to deal with noise in sales data.
- Created dashboards and reports in excel by using SQL queries.
- Developed scalable machine learning solutions within a distributed computation framework (e.g. Hadoop, Spark, Storm etc.).
- Implemented Classification using supervised algorithms like Logistic Regression, Decision trees, KNN, Naive Bayes.
Confidential
Data Analyst/Scientist
Responsibilities:
- Worked with several R packages including dplyr, Spark, Causal Infer, spacetime.
- Implemented end-to-end systems for Data Analytics, Data Automation and integrated with custom visualization tools using R and Hadoop.
- Gathering data that is required from multiple data sources and creating datasets that will be used in analysis.
- Performed Exploratory Data Analysis and Data Visualizations using R, and Tableau.
- Worked with Data governance, Data quality, data lineage, Data architect to design various models and processes by sing data gradient and boosting.
- Independently coded new programs and designed Tables to load and test the program effectively for the given POC's using with Big Data/Hadoop.
- Designed data models and data flow diagrams using Erwin and MS Visio.
- Extensively worked in Oracle SQL, PL/SQL, SQL*Loader, Query performance tuning, created DDL scripts, created database objects like Tables, Views Indexes, Synonyms and Sequences.
- Designed and implemented machine learning algorithms to enhance existing data mining capabilities.
- Used variety of analytical tools and techniques (regression, logistic, decision trees, SVM etc.) to carry out analysis and derive conclusions.
- Visualize, interpret and report findings to develop strategic uses of data.
Environment: Unix, Python 3.5, MLLib, SAS, regression, logistic regression, Hadoop 2.7, NoSQL, Teradata, OLTP, random forest, OLAP, HDFS, ODS, NLTK, SVM, JSON, XML.
Confidential
DATA Scientist / DATA MODELER
Responsibilities:
- Worked with project team representatives to ensure that logical and physical ER/Studio data models were developed in line with corporate standards and guidelines.
- Involved in defining the source to target data mappings, business rules, data definitions, Worked with BTEQ to submit SQL statements, import and export data, and generate reports in Teradata.
- Responsible for defining the key identifiers for each mapping/interface.
- Worked with users to identify the most appropriate source of record and profile the data required for sales and service.
- Documented the complete process flow to describe program development, logic, testing, and implementation, application integration, coding.
- Involved in defining the business/transformation rules applied for sales and service data.
- Define the list codes and code conversions between the source systems and the data mart.
- Worked with internal architects and, assisting in the development of current and target state data architectures
