We provide IT Staff Augmentation Services!

Business Data Analyst Resume

0/5 (Submit Your Rating)

Santa Ana, CA

SUMMARY

  • Highly efficient Data Scientist/Data Engineer with over 7+ years of experience in areas including Data Analyst, Statistical Analysis, Machine Learning, Data mining with large data sets of structured and unstructured data in Manufacturing and healthcare industry.
  • Developed Classification models in R and Python - Logistic regression (81%-accuracy), SVM (76%), Random Forest (69% overall accuracy).
  • 70% time spent on exploratory Data analysis and cleaning in order to build Visualizations by using ggplot2 and Matplotlib and sea born libraries in R and Python.
  • 3+ years of extensive experience as aDataScientistwith experience inData Mining, StatisticalDataAnalysis, ExploratoryDataAnalysis and Machine Learning algorithms.
  • Certified R and Python programmer with Machine learning Modules.
  • Hands on experience with R packages and libraries like ggplot2, Plyr, dplyr, reshape, plotly, R-Markdown, caTools etc.
  • Build predictive models in R by using machine learning algorithms such as KNN, SVM, Logistic regression.
  • Experienced python matplotlib library to generate visualizations and hands on experience on Amazon web services (AWS) cloud solutions.
  • Experienced with rational data base queries and unsupervised and supervised data and GIS.
  • Develop project outlines and design analytical approaches to answer research questions lead the interpretation of statistical results to draw conclusions.
  • Collaborate with Talent Reporting Analysts to extract and transform the data
  • Performed Data Cleaning by identifying missing data outliers, feature scaling and feature engineering.
  • Extensive experience with python Notebook and sea born library to build visualizations.
  • Experienced working on BI visualization tools such as Tableau, Shiny & QlikView.
  • Developed Predictive Machine learning models in R on testing data sets.
  • Efficiently handled large data sets, and successfully identified and handled missing data values.
  • 3 + years of experience working on ML algorithms such Random Forest, SVM, Logistic regression and K means clustering.
  • Hands on experienced with statistics such as regression analysis, binominal distribution, hypothesis testing, ANOVA and chi-square tests.
  • Developed Interactive dashboards in Tableau and Excel for production team resulted in process improvement by 17% and optimizing decision-making.
  • Efficiently extract data from ERP system with the help of SQL and designed custom based dashboards for production team.
  • Python Scikit-learn, Tensorflow and keras packages to train machine-learning models.
  • Implemented Machine learning algorithms such SVM and ANN (Artificial Neural networks) for building models.
  • Successfully provided support for Natural Language Processing teams and developed Machine Learning models to predict raw material prices.
  • Developed 12 months rolling time series forecast by using machine-learning algorithms for cleansing and scaling of data.
  • Extensive experience in Text Analytics, developing different Statistical Machine Learning, Data Mining solutions to various business problems and generating data visualizations using R, Python.
  • Excellent understanding of machine learning techniques and algorithms, such as k-NN, Naive Bayes, SVM, Decision Forests, natural language processing (NLP) as well as neural networks as a part of deep learning.
  • Extensive knowledge of Apache spark and Haddoop big data platforms.
  • Python Libraries: Numpy, Pandas, Tensorflow, Sea-born, matplotlib, Plotly and Scikit-Learn. IDE: Notebook and spider.
  • R IDE: R-STUDIO, ggplot2, missmap, caTools, e1071 for SVM. Dplyr, plyr for data manipulation.
  • Performed Gap analysis by conducting document analysis sessions and led scrum ceremony to decide project scope for agile projects.

PROFESSIONAL EXPERIENCE

Confidential, Sylmar CA

SR. Data Analyst/SR. data scientist

Responsibilities:

  • Developed machine utilization plots by using ggplot2 library in R to identify Capacity usage for CNC machines.
  • Analyzed and performed exploratory data analysis to identify missing values and structure of the data to build predictive machine-learning models.
  • Generated and analyzed graphs using ggplot2 library and Tableau for an overview of the analytical models and results.
  • Developed Shiny -R application showcasing machine-learning algorithms for improving business forecasting.
  • Developed predictive models using Vector Machines, Decision Tree, Random Forest and Naïve Bayes, collaborating with marketing and Production teams.
  • Successfully performedMachine-learning algorithms to predict the raw material price outcomes.
  • Created data visualization with Sea born in python to understand annual sales trend pattern.
  • Load data into R studio from directory and performed initial data analysis and build machine learning models by sing logistic regression with 78% overall accuracy.
  • K means clustering algorithm used to identify regions with k values range from k = 2 to k =4. caTools library used for splitting of the data into testing and training data sets.
  • Performed data manipulation in python with libraries like Numpy, Pandas and Scipy.
  • Designed and Developed dashboards/reports-using EXCEL enables real-time decision making to improve visibility and optimizing production flow.
  • Performed statistical analysis to build predictive models to measure sales revenue and production capacity resulted in to increase productivity flow by 15%.
  • Successfully extract data from ERP system into .csv file format and then utilize for initial data exploration and analysis.
  • Applied linear regression, multiple regression, ordinary least square method, mean-variance, theory of large numbers, logistic regression, dummy variable, residuals, Poisson distribution, Bayes, Naive Bayes, fitting function etc. to data with help of Scikit, Scipy, Numpy and Pandas module of Python.
  • Python and R scripting to wrangle and aggregate large datasets consisting of 1+ million records and inconsistent formats.Functions used such as is.na, median and filters like which ().
  • Reset data frame index in R for misaligned data and generate qplot () for data visualization.
  • Partnered with modelers to develop data frame requirements for projects and converting vector data into matrices by using rbind () and nbind () functions.
  • Performed Ad-hoc reporting/customer profiling, segmentation using R/Python.
  • Analyzed large datasets to answer business questions by generating reports and predictions.
  • Worked in a team of 3 programmers and data analysts to develop insightful deliverables that support data- driven marketing strategies.
  • Executed SQL queries from R/Python on complex table configurations.
  • Retrieving data from database through SQL as per business requirements.
  • Prepared data frames by using Gsub () function in R for identifying missing data that used for production data analysis.
  • Creates, maintains and optimize SQL Server databases and troubleshoot server problems.
  • Accomplished Data analysis, statistical analysis, generated reports, listings, and graphs.
  • Worked on R and Python to identify business performance via Classification, tree map, and regression models along with visualizing data for interactive understanding and decision-making.
  • Documented all programs and procedures to ensure an accurate historical record of work completed on an assigned project, which improved quality and efficiency of process by 15%.
  • Adhering to best practices for project support and documentation.
  • Developed the hypothesis models and validate the same for data.
  • Involved in data analysis with using different analytic techniques and modeling techniques.

Environment: MS Excel, PL/SQL, R, Python, SAS, SQL, MS Word, MS Excel, Hadoop, and Tableau.

Confidential, North Hollywood CA

Data Scientist

Responsibilities:

  • Developed Machine learning models on training and testing data sets to predict future inventory usage of the raw material.
  • Created and published multiple dashboards and reports by using Tableau server.
  • Developed visualizations in python using sea born and matplotlib library for total revenue by region.
  • Experience in working on Spark SQL queries, Data frames, import data from Data sources, perform transformations, perform read/write operations, save the results to output directory into HDFS.
  • Successfully Identifying outliers, anomalies and trends in any given data sets by using R and Python.
  • Developed, installed, maintained and monitored company databases in high performance/high availability environment with supported configuration, performance tuning to ensure optimal resource usage.
  • Documented all programs and procedures to ensure an accurate historical record of work completed on assigned project as well as to improve quality and efficacy
  • Produced quality reports for management for decision-making and Participated in all phases of research including data collection, data cleaning, data mining, developing models and visualizations.
  • Performed data imputation using Scikit-learn package in Python.
  • Performed data processing using Python libraries like Numpy and Pandas and performed data visualization usingggplot2 library in R to generate better understanding of customers' behaviors.

Environment: Machine learning, AWS, MS Azure, Cassandra, Spark, HDFS, Hive, Pig, Linux, Python (Scikit-Learn/Scipy/ Numpy/Pandas), R, SAS, SPSS, MySQL, Eclipse, PL/SQL, SQL connector, Tableau 14.

Confidential, Santa Ana CA

Business Data Analyst

Responsibilities:

  • Conducted business requirement gathering sessions with client to capture and prioritize requirements.
  • Created Business Requirement documents and Functional requirement documents for development team.
  • Performed GAP analysis and conducted document analysis sessions with client before starting with project.
  • Performed root cause analysis sessions with techniques such as 5 why and 8D matrix resulted in reduction in non-conformances rates by 12%.
  • Implemented Various EDI codes such as EDI 810 for invoices, EDI 855 for Purchase order acknowledgments and EDI 870 for order status reports.
  • Implemented Agile Methodology techniques such as Scrum and FDD to build the models, conducted scrum ceremony with scrum master.
  • Access big data and apply predictive analytic techniques and visualize analysis outcomes such as patterns, anomalies and future trends by using Tableau.
  • Create multiple workbooks, dashboards, and charts using calculated fields, quick table calculations, Custom hierarchies, sets& parameters to meet business needs
  • Designed technical solution roadmap to deal with noise in sales data.
  • Created dashboards and reports in excel by using SQL queries.
  • Developed scalable machine learning solutions within a distributed computation framework (e.g. Hadoop, Spark, Storm etc.).
  • Implemented Classification using supervised algorithms like Logistic Regression, Decision trees, KNN, Naive Bayes.

Confidential

Data Analyst/Scientist

Responsibilities:

  • Worked with several R packages including dplyr, Spark, Causal Infer, spacetime.
  • Implemented end-to-end systems for Data Analytics, Data Automation and integrated with custom visualization tools using R and Hadoop.
  • Gathering data that is required from multiple data sources and creating datasets that will be used in analysis.
  • Performed Exploratory Data Analysis and Data Visualizations using R, and Tableau.
  • Worked with Data governance, Data quality, data lineage, Data architect to design various models and processes by sing data gradient and boosting.
  • Independently coded new programs and designed Tables to load and test the program effectively for the given POC's using with Big Data/Hadoop.
  • Designed data models and data flow diagrams using Erwin and MS Visio.
  • Extensively worked in Oracle SQL, PL/SQL, SQL*Loader, Query performance tuning, created DDL scripts, created database objects like Tables, Views Indexes, Synonyms and Sequences.
  • Designed and implemented machine learning algorithms to enhance existing data mining capabilities.
  • Used variety of analytical tools and techniques (regression, logistic, decision trees, SVM etc.) to carry out analysis and derive conclusions.
  • Visualize, interpret and report findings to develop strategic uses of data.

Environment: Unix, Python 3.5, MLLib, SAS, regression, logistic regression, Hadoop 2.7, NoSQL, Teradata, OLTP, random forest, OLAP, HDFS, ODS, NLTK, SVM, JSON, XML.

Confidential

DATA Scientist / DATA MODELER

Responsibilities:

  • Worked with project team representatives to ensure that logical and physical ER/Studio data models were developed in line with corporate standards and guidelines.
  • Involved in defining the source to target data mappings, business rules, data definitions, Worked with BTEQ to submit SQL statements, import and export data, and generate reports in Teradata.
  • Responsible for defining the key identifiers for each mapping/interface.
  • Worked with users to identify the most appropriate source of record and profile the data required for sales and service.
  • Documented the complete process flow to describe program development, logic, testing, and implementation, application integration, coding.
  • Involved in defining the business/transformation rules applied for sales and service data.
  • Define the list codes and code conversions between the source systems and the data mart.
  • Worked with internal architects and, assisting in the development of current and target state data architectures

We'd love your feedback!