Digital Product Data Analyst/scientist Resume
Minneapolis, MN
SUMMARY
- Over 9+ years of experience in the IT industry having a wide range of skill sets and experience with Data Science, Statistics, Data Analysis, supervised and unsupervised Machine Learning, Data Visualization and reporting using DOMO and ADOBE Analytics, Tableau, Power Bi, SQL, Python, and R.
- Proficient in managing the entire data science and analytics project life cycle and actively involved in all the phases of the project life cycle including requirement gathering, data collection, data cleaning, data engineering, data Analysis, EDA, features scaling, features engineering, statistical modeling model deployment and model validation and making recommendations to the business.
- Experienced with Data Analytics, Data Reporting, Ad - hoc Reporting, and developing Graphs, Scales, and pivot tables using various tools like Domo, Power Bi, Tableau, Excel, Python, and R.
- Worked on developing complex KPI scorecards, heat maps, tree views, circle views, histogram visualizations, and interactive reports and dashboards to find the trends in data using tools like Domo, ADOBE analytics, Kibana, Power Bi, and Tableau.
- Worked with packages like Pandas, NumPy, matplotlib, Seaborn, PyODBC, SharePoint, SharePlum, sci-kit learn in Python and dplyr, ggplot2, tidy, DBI, ODBC and shiny in R for data processing, EDA and develop Machine Learning models.
- Experienced in analyzing the requirements and evaluating technologies for data science and analytics capabilities including Data collection, Data analysis, EDA, reporting, Machine Learning, predictive modeling, statistical analysis, forecasting, time series analysis, propensity of delay, and hypothesis testing.
- Experienced in designing and reverse engineering enterprise systems using multi-dimensional, metadatamodeling and data modeling (ERD, Star Schema, Snowflake Schema).
- Experienced in resolving the over-fitting problem by reducing loss function either by using regularization functions (LASSO, Ridge, and Elastic net) or by performing cross-validation.
- Experienced in using various databases like SQL Server, Hive, Oracle, Kusto, PostgreSQL, and No SQL databases like MongoDB and Cosmos.
- Experienced in using Hadoop, HBase, Spark, Scala, and Hive for basic analysis and extraction of data in the infrastructure to provide data summarization.
- Experience in Hadoop architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node, and MapReduce programming paradigm.
- Experienced in working with financial data such as payment methods, Confidential wallet, and buy now pay later programs such as affirm, etc.…
- Experienced with AZURE, Google Cloud (GCP), and ADOBE Experience cloud environments.
- Experienced in developing Machine Learning models in development environments like Azure data Bricks, Jupyter notebooks and Zeppelin notebooks, and R Studio.
- Hands-on experience with file formats like AVRO, sequence files, Parquet, ORC, and RC.
- Worked in Data Models, Database Design Development, and Data Catalog.
- Good Knowledge of Hadoop Cluster architecture and working with Hadoop clusters using Cloudera (CDH5) and Hortonworks Distributions.
- Experienced in developing Data Pipelines and scheduling tasks using Python, Oozie, Airflow, and Azure Data Bricks.
- Experienced in Problem-solving utilizing hypothesisstatements using techniques like Hypothesis testing and A/B Testing.
- Experienced in an onsite - offshore model of working in a SCRUM team environment to share work and tasks daily.
- Experienced in working with web analytics data such as tagging, cookies, variables, and data layers for both web and app.
- Experienced in working with PII data of the user’s adherent to privacy laws such as CCPA and CPA.
- Adept and deep understanding of Statistical modeling, Multivariate Analysis, model testing, problem analysis, model comparison, and validation.
- Experienced in working on Snowflake Warehouse, Databases, Tables, Views, Internal and external stages, Resource monitor, Network policies, Result cache, and Disk cache.
- Experienced in Story Telling by building a narrative with the results of the machine learning model resultswith the help of reports and dashboards.
- Experienced in writingcustom measures and calculated columns usingDAXin Power BI to satisfy business needs.
- Experienced in attending daily stand-up meetings, sprint planning, sprint backlog grooming, and sprint demos to plan, collaborate and deliver the work tasks.
- Experience with developingAPIs and custom ETL scripts using Python.
- Experienced in the whole supply chain process from order placement to order delivery.
- Experienced in data imputing with the help of machine learning algorithms like KNNImpute, rfImpute, MICE, auto impute, and impyute.
- Connected Python with Spark/Scala/Hive to ingest and roll up datasets and perform data analytics.
- Solid knowledge of deep learning methodologies like Neural networks, RNN, CNN, LSTM, Keras, and TensorFlow.
- Solid knowledge using AdobeAnalytics and GoogleAnalytics.
- Worked on gathering Business requirements from the Business users and converting them into Business requirement documents and Functional documents to build the reporting tools for the end users.
- Solid knowledge of mathematics, statistics, and experience in applying regularization and ensemble methods (bagging, boosting) to apply in technical and research fields.
- Strong experience in Software Development Life Cycle (SDLC) including Requirements Analysis, Design Specification and Testing, and GAP Analysis as per Cycle in both Waterfall and Agile methodologies.
- Good industry knowledge, analytical & problem-solving skills, and ability to work well within a team as well as an individual.
- Highly creative, innovative, committed, intellectually curious, and business savvy with good communication and interpersonal skills.
- Worked in Unix as well as Windows environments. Participated in collaborative workshops like JAD sessions and walkthroughs involving executives, developers, end users, and stakeholders.
TECHNICAL SKILLS
Technology Stack: Python, R programming, MySQL, PostgreSQL, Tableau, Power BI, Advanced Excel, Adobe Analytics, Kibana, Lighthouse.
Spark Components: Apache Spark,Data Frames, Spark SQL, Spark
Programming Languages: Python, SQL, Bash, PySpark
Methodologies: Agile, Rad, V-model, waterfall model, Scrum, Kanban
Databases: Oracle, MySQL, HBase, MS SQL Server, Mongo DB, Teradata, Snowflake, Google Big Query, Hive, Cosmos, Kusto, DAX
IDE’s: Visual Studio
Cloud technologies: Azure, GCP
Operating System: Windows, Ubuntu, Red Hat Linux, CentOS, Linux.
BI Tools: Domo, Power BI, Tableau, Excel, Kibana, ADOBE Analytics
Data Analytics Tools: Python, R programming, AZURE Data Bricks
Packages: Pandas, NumPy, SciPy, XGBoost, OpenCV, Scikit-image, Matplotlib, Scikit-learn, Seaborn, Beautiful Soap, Gurobi, Plotly, Bokeh, Ggplot2, Tidy verse, skimr, dlookr.
Workflow: Apache Airflow, OOZIE
ERP: Oracle EBS
Machine learning: Linear Regression, Logistic Regression, K-nearest Algorithm, Random Forest
Statistics: Predictive Modeling, Pandas, Scikit- Learn, Matplotlib, Hypothesis testing, Web scraping, Linear Modeling, Descriptive Modeling
PROFESSIONAL EXPERIENCE
Digital Product Data Analyst/Scientist
Confidential, Minneapolis, MN
Responsibilities:
- Responsible for testing business strategies in stores and digital channels driven by big data.
- Worked on understanding and applying best practices for requirement gathering, data wrangling, analysis, EDA, and visualization by sharing well-documented solutions with clients and peers while upskilling on new and emerging technologies with a mindset for continuous improvement.
- Worked on understanding the customer by building a deep understanding of the domain, processes they own, tools they use, and their business goals and contribute to measuring key business outcomes.
- Worked on creating A/B test plans and implementation and collaborate with product teams to assist with data collection, reporting, and A/B test development and involved in thought partnership with the product owner to define null and alternate hypotheses of an A/B test, calculate sample size, audience creation, traffic, and plane allotment, and duration of the test.
- Worked on running statistical analysis on A/B tests to determine if there’s a positive, negative, or no impact of business metrics between the control and treatment groups by calculating the p-value and r-squared value.
- Worked on the Hadoop ecosystem, including HDFS, MapReduce, Hive, and Spark for managing data processing and storage for big data applications running on clustered systems.
- Worked on performing regular and ad-hoc analysis of data by identifying proper analytic and visualization methodology to optimize response accuracy and prioritize identified test improvements.
- Worked on gathering, understanding, and documenting business objectives, specifications, and requirements of multiple products in the product group.
- Worked on migrating data from Teradata to Google Big Query (GCP).
- Worked on identifying new data to be collected for measuring the success of new products and features, defining KPIs, and working with engineering on tracking and instrumentation of data.
- Worked on defining and articulating business rules required for data accuracy and consistency and worked on defining and resolving data quality issues.
- Worked on supporting and developing a variety of analytics and Data Science initiatives including metric identification, requirement gathering, reason coding, hypothesis testing, time series analysis, cluster analysis, root cause analysis, scale of impact quantification, predictive analysis, business process improvement, or scenario analysis.
- Worked on understanding the project Scope with business and documenting what is “In Scope” and what is “NOT in Scope” to address the exact customer needs and opportunities.
- Worked on learning and applying best practices for data wrangling, analysis, compelling visualization, descriptive statistics, and intuitive story-telling using the data.
- Worked on designing and building solutions to deliver pipeline patterns using Google Cloud Platform (GCP) services: BigQuery, and Dataflow.
- Worked on learning specific business domains and understanding business metrics/processes/decisions and applying data analysis best practices to the context.
- Contributed during meetings with task progress updates, KPI updates, and active learnings from the agile team regularly.
- Maintaining multiple reports and dashboards in Domo.
- Worked on multiple Magical transformations in DOMO such as Magic ETL, Magic SQL Dataflow, and Data Blends and created beast mode segments.
- Worked on creating pipelines to upload data to DOMO from hive using DOMO API and created DOMO alerts to identify the abnormalities in data.
- Worked on defining acceptance criteria through use cases, user stories, and defect logs.
- Worked on the Global supply chain and Logistics process including Inventory Management, Process Engineering, Supply chain process, and operations.
- Worked on driving execution towards outcomes to meet timelines, understand business interdependencies, surface obstacles, use independent judgment, decision making, and leverage the team to deliver as per product scope, provide inputs to establish product/ project timelines.
- Worked on measuring the success rate of how dashboard/analysis/insights are adapted into decision-making and create feedback loops to enhance analysis.
- Worked on scheduling and workflow designing tools like Oozie, Control M, and Airflow.
- Worked on designing and maintaining Oozie workflows to manage jobs at the cluster level.
- Worked on Collaborating with multi-functional stakeholders to determine the best approach to test (POC/pilot) and implement solutions that align with business strategy.
- Worked on conducting general data management tasks of investigating data anomalies and reporting accuracies in addition to filtering and cleaning data.
- Worked on conducting data profiling, analysis, and data visualization to identify data redundancies and measure and maintain the efficiency of product data.
- Worked on supporting the problem statement, approach, metrics, and measurement strategy, including success criteria, to determine and improve the value of the work through an understanding of the business, business process, data ecosystem, and rapid data mining.
- Worked on educating stakeholders on the application of Data Science best practices by enabling self-service, advocating for how data is consumed, interpreted, and turned into action through compelling visualizations and A/B testing.
- Worked on supporting and improving data-driven decision-making by stakeholders through descriptive statistics and in-depth analysis to turn complex or disparate data into clear stories, helping to guide decisions and identify opportunities for deeper analysis with increased mathematical rigor.
- Worked on pulling the right information from the Enterprise Big Data Warehouse containing massive amounts of structured and unstructured data. Design complex SQL queries in Teradata, Oracle, Google Big Query, and Hive to extract data from multiple sources, transforming the data according to key business rules and loading them in HDFS servers.
- Worked with POs to proactively assess internal and external forces and factors impacting Confidential and the retail industry to identify trends, risks, and opportunities for the business.
- Worked on analyzing requirements/user stories at business meetings and strategizing the impact of requirements on different platforms/applications. Understanding the current production state of an application and determining the impact and risks of new implementation on existing business processes.
- Worked on identifying and seeking key stakeholders across the enterprise to support the identification, assessment, aggregation, and overall management of risks and controls.
- Worked on tracking and reporting web analytics data to report KPIs such as user engagement, page views, avg time spent click-through rate, conversion rate, and cart abandonment rate, etc.… for both web and app using adobe analytics and Lighthouse.
- Worked with financial data such as sales, payment methods, wallet, and buy now pay later features such as affirm, etc.
- Worked with PII data of the user’s adherent to privacy laws such as CCPA and CPA.
- Worked on different phases of reporting development like requirement gathering, data collection, data cleaning, data preprocessing, and analysis for developing the reports, dashboards, visualizations, data shepherding, data standardization, data gap analysis, and data validation for business users.
- Worked on creating visualizations in Adobe Analytics, DOMO, Tableau, and Kibana to understand current and past business trends and forecast future trends.
- Worked on creating pivot tables, VLOOKUPS, custom filters, calculations, year-over-year trend lines, pie charts, histograms, line charts, scatter plots, etc. to create visually impactful dashboards.
- Worked on learning and adopting a continuous improvement mindset, including personal learning through interactions with platform teams to understand new capabilities, utilizing self-paced learning, and attending training opportunities of BI tools and technologies (e.g., Greenfield, APIs) and descriptive and inferential analytics.
- Ensured all projects have proper documentation considering potential regulatory, legal, or business concerns.
Environment: SQL, Windows, requirement gathering, Data mining, Data Collection, Data Cleaning, Data Visualization, Data Standardization, Data shepherding, Data Mapping, Data Modeling, Dashboards, Reporting, Data Profiling, SQL Server, PostgreSQL, Agile, Scrum, Waterfall, GitHub, Supply Chain, SCM, Inventory Management, MS Excel, Hive, Domo, A/B testing, Python, Jira, Scrum, Kanban, PII, Confluence, BRD, ADOBE Analytics, KPI, Big Data, Hadoop, HDFS, MapReduce, p-value, r-square value, Kibana, Oozie, Airflow, Teradata, Oracle, OLAP, OLTP, API, Greenfield, ad-hoc analysis, pivot tables, VLOOKUPS, Time Series Analysis, EDA, pipelines, Figma, Teradata, Google Big Query, Tableau, Web analytics, Lighthouse, Financial data.
Data Scientist
Confidential, Redmond, WA
Responsibilities:
- Responsible for data identification, collection, exploration & cleaning for modeling, and participation in model development.
- Developed different machine learning algorithms like SVM, Clustering, linear regression, Random Forest classifier, and regressor for predicting the duration of a data center deployment forecasting.
- Worked on finetuning the machine learning models to get better results from the machine learning models using methods like Grid Search.
- Developed and deployed Machine Learning models to predict the new data center deployment based on different issues like Location, weather, previous deployment history in the same location and inventory availability, etc.
- Worked on all phases of a machine learning model development like Data Collection, Data Cleaning, Feature selection, Model development, and tuning of the model.
- Worked with different Python libraries like SciPy, Pandas, NumPy, matplotlib, seaborn, and skelarn to develop machine learning algorithms.
- Worked on AZURE Data Factory to develop the pipelines for Data refresh and to deploy Machine Learning models to Production.
- Worked with databases like on-premises SQL Server, AZURE SQL Server, AZURE Cosmos, and AZURE Data Explorer (Kusto) for Data Collection
- Developed MetadataModel forDataLineage, Trackingdatasources and targets, and Query Statistics.
- Developed a Machine Learning algorithm to detect Anomalies in data using the Isolation Forest Algorithm.
- Developed different visualizations in R using R Shiny and Python.
- Developed reports in PowerBI using data from SQL server, CSV files, Cosmos, and KUSTO databases.
- Worked on writing custom measures and calculated columns usingDAXin Power BI to satisfy business needs.
- Developed and scheduled ETL jobs using AZURE Data Factory.
- Developed Python scripts to perform ETL jobs as per client requirements.
- Optimized and debug the Machine Learning code with Python on RESTAPIfor databases.
- Handled imbalanced data using resampling methods like oversampling and under-sampling methods.
- Developed unsupervised machine learning models to get insights from data using clustering models like Kmodes clustering and Kprototype Clustering and Natural Language Processing.
- Developed Python code to automate the data processing and Data cleaning as per the customer requirements.
- Worked on NLP Libraries like Word2Vec, Autoencoders, NLTK, spaCy, Fast Text, Glove, and Genism.
- Developed a phase-wise machine learning model for the prediction of duration for different phases of the deployment as per the feature availability.
- Worked on AZURE data bricks for the development of Machine Learning models using SparkR.
- Worked on feature selection for the ML model development by using methods like Information Value, Weight of Evidence, and PCA.
- Worked on developing strategies, tools, and methodologies to measure, monitor, and report risks.
- Worked on machine learning models like KNN Impute and Random Forest Impute for imputing the NULL values in the data.
- Performed GAP analysis to find GAPs between different data models.
- Developed hypothesis and performed hypothesis testing and A/B testing.
- Interacted with the other departments to understand and identify data needs and requirements and work with other members of the IT organization to deliver the Machine Learning models and reporting solutions to address those needs.
- Have ample knowledge of the full software life cycle in Agile and Scrum methodologies.
- Interacted with SMEs andDataArchitects to understand Business needs and functionality for various project solutions.
Environment: Machine learning, Supervised learning, Unsupervised learning, Reinforcement learning, K Means Clustering, K-Modes Clustering, K-Prototype Clustering, SVM, Decision Trees, Isolation Forest, Random forest, KNN Impute, rfimpute, Python, R, SQL, Windows, Data mining, SQL Server, Agile, Scrum, Waterfall, GitHub, MS Excel, Anaconda, Jupyter Notebook, R Studio, Python Shell, R Shell, TFS, AZURE, Power BI, DAX, KUSTO(AZURE Data Explorer), AZURE Data bricks, AZURE Data Factory, AZURE Cosmos, RShiny.
Oracle Applications Database Analyst
Confidential
Responsibilities:
- Worked on projects with machine learning, big data, and data visualization using environments like R and Python.
- Used Lambda functions like filter (), map (), and reduces () with pandas Data Frame and perform various operations.
- Used Python and R packages to perform Exploratory Data Analysis and connected to Tableau Desktop to visualize the same.
- Proficient in handling huge data and performing creating, reading, updating, and deleting (CRUD) operations on Oracle and MongoDB
- Participated in JAD sessions with the project managers, business analysis team, finance, and development teams to gather, analyze, and document the business and reporting requirements.
- Used SQL to retrieve data from the Oracle database for data analysis and visualization.
- Performed Market Basket analysis to identify customer buying patterns, preferences, and behaviors to better manage sales and inventory.
- Understood and articulated business requirements from user interviews and then convert requirements into technical specifications by effectively communicating with the SMEs to gather the requirements.
- Worked on the client-server model, gathered the requirements, and documented accordingly.
- Worked on Normalization and De-Normalization techniques for both OLTP and OLAP systems.
- Identified various Dimensions like date/time, services, customers, and FACT tables to support various measures.
- Organized the data to the required type and format for further manipulation.
- Used advanced Microsoft Excel to create pivot tables using MS Excel functions using packages Data Analysis
- Used statistical techniques for hypothesis testing to validate data and interpretations and presented findings and data to the team to improve strategies and operations.
- Used SVN for version control and coordinating with the team.
Environment: Python, R, Machine learning, Oracle, Tableau, MongoDB, SQL server, GIT, SVN, Data Collection, Statistical Analysis, Excel, Data Cleansing, MySQL, Data Analysis.