Business Data Analyst Resume Santa Ana CA - Hire IT People

SUMMARY

Highly efficient Data Scientist/Data Engineer with over 7+ years of experience in areas including Data Analyst, Statistical Analysis, Machine Learning, Data mining with large data sets of structured and unstructured data in Manufacturing and healthcare industry.
Developed Classification models in R and Python - Logistic regression (81%-accuracy), SVM (76%), Random Forest (69% overall accuracy).
70% time spent on exploratory Data analysis and cleaning in order to build Visualizations by using ggplot2 and Matplotlib and sea born libraries in R and Python.
3+ years of extensive experience as aDataScientistwith experience inData Mining, StatisticalDataAnalysis, ExploratoryDataAnalysis and Machine Learning algorithms.
Certified R and Python programmer with Machine learning Modules.
Hands on experience with R packages and libraries like ggplot2, Plyr, dplyr, reshape, plotly, R-Markdown, caTools etc.
Build predictive models in R by using machine learning algorithms such as KNN, SVM, Logistic regression.
Experienced python matplotlib library to generate visualizations and hands on experience on Amazon web services (AWS) cloud solutions.
Experienced with rational data base queries and unsupervised and supervised data and GIS.
Develop project outlines and design analytical approaches to answer research questions lead the interpretation of statistical results to draw conclusions.
Collaborate with Talent Reporting Analysts to extract and transform the data
Performed Data Cleaning by identifying missing data outliers, feature scaling and feature engineering.
Extensive experience with python Notebook and sea born library to build visualizations.
Experienced working on BI visualization tools such as Tableau, Shiny & QlikView.
Developed Predictive Machine learning models in R on testing data sets.
Efficiently handled large data sets, and successfully identified and handled missing data values.
3 + years of experience working on ML algorithms such Random Forest, SVM, Logistic regression and K means clustering.
Hands on experienced with statistics such as regression analysis, binominal distribution, hypothesis testing, ANOVA and chi-square tests.
Developed Interactive dashboards in Tableau and Excel for production team resulted in process improvement by 17% and optimizing decision-making.
Efficiently extract data from ERP system with the help of SQL and designed custom based dashboards for production team.
Python Scikit-learn, Tensorflow and keras packages to train machine-learning models.
Implemented Machine learning algorithms such SVM and ANN (Artificial Neural networks) for building models.
Successfully provided support for Natural Language Processing teams and developed Machine Learning models to predict raw material prices.
Developed 12 months rolling time series forecast by using machine-learning algorithms for cleansing and scaling of data.
Extensive experience in Text Analytics, developing different Statistical Machine Learning, Data Mining solutions to various business problems and generating data visualizations using R, Python.
Excellent understanding of machine learning techniques and algorithms, such as k-NN, Naive Bayes, SVM, Decision Forests, natural language processing (NLP) as well as neural networks as a part of deep learning.
Extensive knowledge of Apache spark and Haddoop big data platforms.
Python Libraries: Numpy, Pandas, Tensorflow, Sea-born, matplotlib, Plotly and Scikit-Learn. IDE: Notebook and spider.
R IDE: R-STUDIO, ggplot2, missmap, caTools, e1071 for SVM. Dplyr, plyr for data manipulation.
Performed Gap analysis by conducting document analysis sessions and led scrum ceremony to decide project scope for agile projects.

PROFESSIONAL EXPERIENCE

Confidential, Sylmar CA

SR. Data Analyst/SR. data scientist

Responsibilities:

Developed machine utilization plots by using ggplot2 library in R to identify Capacity usage for CNC machines.
Analyzed and performed exploratory data analysis to identify missing values and structure of the data to build predictive machine-learning models.
Generated and analyzed graphs using ggplot2 library and Tableau for an overview of the analytical models and results.
Developed Shiny -R application showcasing machine-learning algorithms for improving business forecasting.
Developed predictive models using Vector Machines, Decision Tree, Random Forest and Naïve Bayes, collaborating with marketing and Production teams.
Successfully performedMachine-learning algorithms to predict the raw material price outcomes.
Created data visualization with Sea born in python to understand annual sales trend pattern.
Load data into R studio from directory and performed initial data analysis and build machine learning models by sing logistic regression with 78% overall accuracy.
K means clustering algorithm used to identify regions with k values range from k = 2 to k =4. caTools library used for splitting of the data into testing and training data sets.
Performed data manipulation in python with libraries like Numpy, Pandas and Scipy.
Designed and Developed dashboards/reports-using EXCEL enables real-time decision making to improve visibility and optimizing production flow.
Performed statistical analysis to build predictive models to measure sales revenue and production capacity resulted in to increase productivity flow by 15%.
Successfully extract data from ERP system into .csv file format and then utilize for initial data exploration and analysis.
Applied linear regression, multiple regression, ordinary least square method, mean-variance, theory of large numbers, logistic regression, dummy variable, residuals, Poisson distribution, Bayes, Naive Bayes, fitting function etc. to data with help of Scikit, Scipy, Numpy and Pandas module of Python.
Python and R scripting to wrangle and aggregate large datasets consisting of 1+ million records and inconsistent formats.Functions used such as is.na, median and filters like which ().
Reset data frame index in R for misaligned data and generate qplot () for data visualization.
Partnered with modelers to develop data frame requirements for projects and converting vector data into matrices by using rbind () and nbind () functions.
Performed Ad-hoc reporting/customer profiling, segmentation using R/Python.
Analyzed large datasets to answer business questions by generating reports and predictions.
Worked in a team of 3 programmers and data analysts to develop insightful deliverables that support data- driven marketing strategies.
Executed SQL queries from R/Python on complex table configurations.
Retrieving data from database through SQL as per business requirements.
Prepared data frames by using Gsub () function in R for identifying missing data that used for production data analysis.
Creates, maintains and optimize SQL Server databases and troubleshoot server problems.
Accomplished Data analysis, statistical analysis, generated reports, listings, and graphs.
Worked on R and Python to identify business performance via Classification, tree map, and regression models along with visualizing data for interactive understanding and decision-making.
Documented all programs and procedures to ensure an accurate historical record of work completed on an assigned project, which improved quality and efficiency of process by 15%.
Adhering to best practices for project support and documentation.
Developed the hypothesis models and validate the same for data.
Involved in data analysis with using different analytic techniques and modeling techniques.

Environment: MS Excel, PL/SQL, R, Python, SAS, SQL, MS Word, MS Excel, Hadoop, and Tableau.

Confidential, North Hollywood CA

Data Scientist

Responsibilities:

Developed Machine learning models on training and testing data sets to predict future inventory usage of the raw material.
Created and published multiple dashboards and reports by using Tableau server.
Developed visualizations in python using sea born and matplotlib library for total revenue by region.
Experience in working on Spark SQL queries, Data frames, import data from Data sources, perform transformations, perform read/write operations, save the results to output directory into HDFS.
Successfully Identifying outliers, anomalies and trends in any given data sets by using R and Python.
Developed, installed, maintained and monitored company databases in high performance/high availability environment with supported configuration, performance tuning to ensure optimal resource usage.
Documented all programs and procedures to ensure an accurate historical record of work completed on assigned project as well as to improve quality and efficacy
Produced quality reports for management for decision-making and Participated in all phases of research including data collection, data cleaning, data mining, developing models and visualizations.
Performed data imputation using Scikit-learn package in Python.
Performed data processing using Python libraries like Numpy and Pandas and performed data visualization usingggplot2 library in R to generate better understanding of customers' behaviors.

Environment: Machine learning, AWS, MS Azure, Cassandra, Spark, HDFS, Hive, Pig, Linux, Python (Scikit-Learn/Scipy/ Numpy/Pandas), R, SAS, SPSS, MySQL, Eclipse, PL/SQL, SQL connector, Tableau 14.

Confidential, Santa Ana CA

Business Data Analyst

Responsibilities:

Conducted business requirement gathering sessions with client to capture and prioritize requirements.
Created Business Requirement documents and Functional requirement documents for development team.
Performed GAP analysis and conducted document analysis sessions with client before starting with project.
Performed root cause analysis sessions with techniques such as 5 why and 8D matrix resulted in reduction in non-conformances rates by 12%.
Implemented Various EDI codes such as EDI 810 for invoices, EDI 855 for Purchase order acknowledgments and EDI 870 for order status reports.
Implemented Agile Methodology techniques such as Scrum and FDD to build the models, conducted scrum ceremony with scrum master.
Access big data and apply predictive analytic techniques and visualize analysis outcomes such as patterns, anomalies and future trends by using Tableau.
Create multiple workbooks, dashboards, and charts using calculated fields, quick table calculations, Custom hierarchies, sets& parameters to meet business needs
Designed technical solution roadmap to deal with noise in sales data.
Created dashboards and reports in excel by using SQL queries.
Developed scalable machine learning solutions within a distributed computation framework (e.g. Hadoop, Spark, Storm etc.).
Implemented Classification using supervised algorithms like Logistic Regression, Decision trees, KNN, Naive Bayes.

Confidential

Data Analyst/Scientist

Responsibilities:

Worked with several R packages including dplyr, Spark, Causal Infer, spacetime.
Implemented end-to-end systems for Data Analytics, Data Automation and integrated with custom visualization tools using R and Hadoop.
Gathering data that is required from multiple data sources and creating datasets that will be used in analysis.
Performed Exploratory Data Analysis and Data Visualizations using R, and Tableau.
Worked with Data governance, Data quality, data lineage, Data architect to design various models and processes by sing data gradient and boosting.
Independently coded new programs and designed Tables to load and test the program effectively for the given POC's using with Big Data/Hadoop.
Designed data models and data flow diagrams using Erwin and MS Visio.
Extensively worked in Oracle SQL, PL/SQL, SQL*Loader, Query performance tuning, created DDL scripts, created database objects like Tables, Views Indexes, Synonyms and Sequences.
Designed and implemented machine learning algorithms to enhance existing data mining capabilities.
Used variety of analytical tools and techniques (regression, logistic, decision trees, SVM etc.) to carry out analysis and derive conclusions.
Visualize, interpret and report findings to develop strategic uses of data.

Environment: Unix, Python 3.5, MLLib, SAS, regression, logistic regression, Hadoop 2.7, NoSQL, Teradata, OLTP, random forest, OLAP, HDFS, ODS, NLTK, SVM, JSON, XML.

Confidential

DATA Scientist / DATA MODELER

Responsibilities:

Worked with project team representatives to ensure that logical and physical ER/Studio data models were developed in line with corporate standards and guidelines.
Involved in defining the source to target data mappings, business rules, data definitions, Worked with BTEQ to submit SQL statements, import and export data, and generate reports in Teradata.
Responsible for defining the key identifiers for each mapping/interface.
Worked with users to identify the most appropriate source of record and profile the data required for sales and service.
Documented the complete process flow to describe program development, logic, testing, and implementation, application integration, coding.
Involved in defining the business/transformation rules applied for sales and service data.
Define the list codes and code conversions between the source systems and the data mart.
Worked with internal architects and, assisting in the development of current and target state data architectures

We provide IT Staff Augmentation Services!

Business Data Analyst Resume

Santa Ana, CA

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship