Senior Qa Analyst Resume
Data Analyst Data Engineer Kansas City, MO
SUMMARY
- Over 7+ years of extensive experience working as a Data Analyst/Data Modeler for Data Warehouse/Data Mart development, Online Transaction Processing (OLTP) and Online Analytical Processing (OLAP).
- Responsibilities include experience with python production quality scripting, data wrangling, data visualization and showcasing results to business strategy team. Experience with machine learning algorithm such as logistic Regression, random forest, KNN, SVM, neural network, linear regression, lasso regression and k means.
- Experienced with most software deployment methodologies (SDLC) like Waterfall, Agile and Hybrid Waterfall.
- Experienced in Business requirements conformation, data analysis, data modeling, logical and physical database design and implementation. Designing star schema, Snowflake schema for Data Warehouse.
- Bridging the gap between latest business Technology trends and executive leaders to drive business and mission success.
- Hands on experience with big data tools like Hadoop, Spark, Hive, Impala, Pyspark, Spark SQL.
- Experience in using various packages in R and python like ggplot2, caret, dplyr, Rweka, rjson, plyr, SciPy, scikit - learn, Beautiful Soup, Rpy2.
- A go-to resource for problem solving beyond simple delivery of facts, known for thoroughness and for commitment to the accurate portrayal of the consumer voice
- Highly passionate about learning and acquiring new skills. Inquisitive and curious about data new trends in the field market, customer, Data analytics
- Experienced in big data Hadoop Ecosystem in ingestion, storage, querying, processing and analysis. Design big data architecture with multiple zones.
- Analysed different touch points in lifecycle of infrastructure from Customer experience and engagement perspective
- Developed scripts in Teradata, Redshift, and Snowflake SQL workbench, Python with the results and documentation and Validated with Validation DA for secondary validation and complete the validation check list and posted the results and documentation on DART’s Secured File Share drive and acknowledge the tester’s that the results are ready for review.
- Involved in installation of Birst and Tableau desktop 8.0, Tableau server Application software and in server migration from 8.3 to 9.0.
- Strong coding ability in SQL, exposure to data mining and statistical analysis (predictive analysis, hypothesis testing, regression, Time series analysis) using pandas, matplotlib and Multivariate analysis using python.
- Experience in star and snowflake schemas
- Strong knowledge of Database Architecture, OLAP Features, BI best practices, standards, and data models; optimized data warehouse and reporting performance Tuning.
- Vast Experience in Data warehousing and Data Modeling (ER, Dimensional) techniques.
- Experienced with Agile/Scrum SDLC practices, CI/CD and AWS server technology.
- Performed data analysis using Python algorithms and machine learning. Intermediate knowledge of Python programming used for any software coding and upgrades
- Expertise in RDBMS concepts and running SQL queries.
- Excellent SQL Programming skills and developed Stored Procedures, Triggers, Functions, Packages using SQL/PL SQL, Performance tuning and query optimization techniques in transactional and data warehouse environments.
- Experience in dashboard reports using SQL Server reporting services (SSRS).
TECHNICAL SKILLS
ETL Tools: ConfidentialInfosphere Datastage, SSIS, DTS.
Programming Languages: SQL, PL/SQL, HTML, Python, XML, Unix Script, AWS Redshift, Snowflake,S3
Reporting Tools: SSRS, Power BI, Tableau, SSAS, MS-Excel, Power BI, QlikView, TibcoSpotfire
Machine learning Methods: Linear regression, Logistic regression, Decision tree, Random forest, Forest, K nearest neighbor, K mean, NLP, ARIMA, APRIORI, ETS, SVM
Packages: Pandas, NumPy, SciPy, Scikit-learn, Statsmodels, NLTK, Plotly, Matplotlib, Seaborn, plyR, dplyR, data.table and sqldf, tidyR, Reshape2
Cloud Technologies: Amazon Web Services (AWS), Microsoft Azure (familiar), Amazon EC2
Data Modeling Tools: Erwin Data Modeler, Erwin Model Manager, Snowflake, ER Studio, and Power Designer.
OLAP Tools: Microsoft Analysis Services, Business Objects, and Crystal Reports 9/10
Databases: MS-Access, Microsoft SQL Server, Oracle, DB2MYSQL. Guidewire GStudio, Teradata
Project Management: Microsoft Project, Microsoft Office, Team Foundation Server (TFS), TSYS, Agile and Scrum Methodologies, Project scheduling, ROI Analysis
PROFESSIONAL EXPERIENCE
Data Analyst/Data EngineerConfidential, Kansas City, MO
Responsibilities:
- Gathered business requirements, working closely with business users, project leaders and developers. Analyzed the business requirements and designed Conceptual and Logical Data models.
- Worked in AWS Environment for loading data files from Legacy UNIX Systems to EC2 Instances.
- Developed views and templates with Python and Django's view controller and templating language to created user-friendly website interface.
- Worked on Python packages like Pandas, Matplotlib, XLSXWRITER, Pyodbc etc
- Developed Python program both in Linux and Windows environment (IPython Notebook IDE).
- Experience in working with complex SQL queries like Joins, Date functions, Inline functions, and Sub-queries to generate reports.
- Developed Dashboards with multiple data Sources (Excel, MS- Access, oracle,SAS using Business objects, TIBCO Spotfire
- Expert in designing Spotfire dashboards / reports with complex and multiple data sources.
- Integrated Spotfire Visualization into clients Salesforce environment.
- Developed Spotfire data visualization using cross Tab reports, summary Tables, line charts, bar charts, scatter plot, Pie chart, Bar charts, Density charts.
- Experience in developing and debugging Stored procedures and Triggers
- Extensively used SQL statements to query the Oracle Database for Data Validation and Data Integrity.
- Performed detailed data analysis to analyze the duration of claim processes and created the cubes with Star Schemas using facts and dimensions through SQL Server Analysis Services (SSAS).
- Analyzed the system thoroughly and Created System Document of a complex system without any input/document which helped us to get the project from competitors.
- Instrumental in providing data to various vendors to ensure precise attribution studies for driving customers to store using display ads, social media and digital video campaigns (SQL, SAS)
- Proactively designed scripts to quality Control Marketing, CRM databases to provide weekly feedback on data quality of the data marts
- Collected requirements, generated Functional Requirement Document and constructed data flow diagrams for customer data platform project
- Created descriptive, visual and intuitive dashboards using Tableau to enable executives make data driven decisions
- Served as a liaison between IT and Marketing team to scope, gather requirements, document, test and understand risks concerning IT initiatives impacting marketing data marts
- Generated, kept track and ran barcode reports, email capture reports (Tableau,SQL)
- Used big data technologies (Hadoop and Hive) to efficiently retrieve data from data lake
- Designed ER diagrams, logical model (relationship, cardinality, attributes, and candidate keys) and convert these to physical data model including capacity planning, object creation and aggregation strategies, partition strategies, Purging strategies according to business requirements.
- Developed and implemented predictive models along with hyper-parameter tune such as linear regression, classification, multivariate regression, Naive Bayes, Random Forests, K-means clustering
- Used Time Series Analysis to gain insight about stocks and derivate investments based on the historic data available
- Generated comprehensive analytical reports by running SQL queries against current databases to conduct data analysis.
- Used big data technologies to access and extract data with tools such as Apache Hadoop HDFS, Hive and Pig
- Data analysis of existing database to be aware of the data flow and business rule used on different databases by SQL.
- Included in all phases of projects including Requirement gathering, Analysis, Design, Coding, Testing, Documentation and warranty period
- Developed scripts using pandas to easily perform read/write operations to CSV files, manipulate and compare data by columns.
- Cleaned data and processed third-party spending data into maneuverable deliverables within specific formats with Excel macros and Python libraries.
- Written SQL quires for filtering data from multiple interlinked tables using ORM of Django.
- Designed and implemented the CURD operations for the views and used class-based views.
- Created and updated views with python view controller and template language to create new functionalities to the website.
- Built development environment using bug-tracking tools like Jira, Confluence and version controls such as Git, and SVN.
- Create insightful reports and metrics using Tableau to aid marketing teams make data driven decisions.
- Improved accuracy of direct mail underwriting by 5% by designing an accurate direct mail attribution logic using SQL
- Analyze and optimize direct mail program audience selection.
- Use sound statistical methods for data driven marketing channel attribution.
- Designed and Programmed direct mail activation code A/B test and improved direct mail conversion lift by .20%
- Automated email ETL process from sales force and reduced analysis time by almost 30% (SSIS, Snowflake, Salesforce)
- Create Pipelines to take data from various disparate systems and create a unified dimensional data model for marketing analytics team
- Reduced Direct mail costs by 5% creating custom response model based of credit bureau data (logistic Regression, Python)
- Increased withdrawal$ by 2% predicting performance of key variables of the business (multiple linear regression, Python) Acted as Liaison between Marketing databases and data architects to identify requirements, develop data processes and models
- Performed data mapping between Source data systems and marketing CRM systems to better execute marketing campaigns
- Designed and developed ETL process between marketing database and MTA databases for marketing attribution sciences (SSIS) Extensively handled documentation of Data Model, Mapping, Transformations and Scheduling jobs.
- Responsible for data mapping and writing transformation rules.
- Developed the Data mart with the base data in Star Schema, Snowflake Schema and Multi Star Schema and linked to developing the Data warehouse with the Database.
- Expertise in Data Analysis by writing complex SQL Queries.
- Coordinated with customers and provided support on data analysis.
- Performed data analysis on all results and able presentations for clients.
- Performed audit on data and resolved business related issues for subscriber base.
- Used Forward Engineering to further improve the Data Structures with the Existing Database and Create a Physical Data Model with DDL that most closely fits the requirements through the Logical Data Model.
Environment: SQL Server, XML, AWS, Redshift, Teradata, MS office suite, Python, Tableau, Unix, Perl Script, SQL Master Data Management, SAS, Apache Hadoop 2, Hive, Pig Latin, Linux, SQL, Tableau, Python 3.2 (Numpy, Pandas, Scikit-learn, Matplotlib), AWS
Data Management Data Analyst / Data EngineerConfidential, Houston, TX
Responsibilities:
Used Agile software development methodology in defining the problem, gathering requirements, development iterations, business modeling and communicating with the technical team for development of the system.
- Created regression and classification models for clients using Python programming for research and display results to Birst and other frameworks. The visualization presentation of data can be static or more exploratory based on audience. This enables teams to use the information for sales, construction and workflows
- Moved data from AWS S3 buckets to AWS redshift, Snowflake cluster by using CLI commands
- Also worked on Amazon EC2 Clusters to deploy files into Buckets.
- Involved with project team members to deliver data models to meet data requirements and create all documentation and deliverables in accordance with enterprise guidelines and standards. Work on Metadata transfer amongst various proprietary systems. Utilized SSIS to create ETL process to Validate, Extract, Transform and Load data into Data Warehouse and Data Mart.
- Improved Anti-Money Laundering prediction by developing machine learning algorithms such as random forest (RF) and gradient boosting machines for feature selection with Python Scikit-learn.
- Developed KNN, Logistic Regression, SVM, and Deep Neural Networks for rare event cases and suspicious activities.
- Tackled highly imbalanced Fraud dataset using undersampling, oversampling with SMOTE and cost-sensitive algorithms with Python Scikit-learn.
- Explored optimized sampling methodologies for different types of datasets.
- Explored and analyzed the suspicious transactions features by using SparkSQL.
- Used big data tools Spark (Pyspark, SparkSQL, Mllib) to conduct real-time analysis of transaction default based on AWS.
- Designed and implemented a recommendation system which utilized Collaborative filtering techniques to recommend course for different customers and deployed to AWS EMR cluster.
- Designed rich data visualizations to model data into human-readable form with Tableau, Shiny App, and Matplotlib
- Used Python 3.X (numpy, SciPy, pandas, scikit-learn, seaborn) and Spark 2.0 (PySpark, MLlib) to develop variety of models and algorithms for occasionally real-time analytic purposes.
- Performed a rough image recognition using CNN and Python Tensor Flow package to identify common type problem such as bone fracture categorizing, tumor detection and blood flow evaluation.
- Extracted the data tables using custom generated queries from Excel files, Access/ SQL database and providing extracted data as an input to Birst.
- Ensure use of proper constraints, name standards, data types & lengths and generate DDL from physical data model for database implementation using the tool SQL Teradata Client. Created, tested, and implemented Teradata DDL scripts.
- Data Modelling for data warehouse /data mart development, Data analysis and SOL
- Mapped company enterprise requirements and new database to logical data models which defines the project delivery needs.
- Evaluate Provider level data to assess provider data quality, profiling the data and building data quality reports using DQ analyzer, SQL server, MS Excel.
- Reviewed basic SQL queries and edited inner, left and right joins in Birst Desktop by connecting live and extract datasets, VLOOKUP in Excel.
- Designed customized interactive dashboards in Birst using Marks, Action, filters, parameter, calculations and Relationships.
- Building models in spark using python in Databricks.
- Worked on migrating SQL scripts from Teradata to Redshift and Snowflake.
- Worked on Databricks and Crescendo for Automation
- Created custom time dimensions and worked with role playing dimensions, degenerated dimensions.
- Responsibilities include developing complex SQL to extract data within streaming big data environment, applying advanced data cleaning techniques to ensure data quality and integrity, leveraging machine learning techniques to extract insights, building centralized visualization and reporting capabilities and providing actionable solution to maximize business ROI.
- Built automated ETL data pipelines with python and complex SQL queries joining and aggregating data from multiple tables and sources.
- Applied advanced data cleaning and featuring engineering techniques to handle missing values, outliers, distorted data distribution to ensure data integrity and consistency, improving data accuracy by 47%.
- Processed and analyzed point of sale data using SQL and python to identify customer shopping behaviors, developed predictive analysis model and provided corresponding strategies and recommendation to help business drive sale by 6%.
- Ingest Flat files received via tool and files received from Sqoop into Data Lake Hive using Data Fabric functionalities.
- Built end to end machine learning model include logistic regression, random forest to score and identify the potential new business case with Python Scikit-learn and also tested quality of the models for high accuracy by automating the infrastructure
Environment: SQL Server, SSIS, AWS (EC2, S3, RedShift, EMR), Python (Scikit-learn), Machine Learning (KNN, Logistic Regression, SVM), Tableau, JIRA, GitHub, ETL, Spark, Hadoop, R, Shiny App, and Linux.
Data ScientistConfidential, Tampa, FL
Responsibilities:
- Implemented some important features (like Brown cluster, word2vec cluster, Word Windowing, LDA topic of the word) for CRF model to correctly tag a word with its corresponding NER tag.
- Implemented spell check with Edit and phonetic based match using Damerau-Levenshtein and Soundex distance algorithm respectively.
- Worked on deep learning model building and Optimization, regularization techniques such as Recurrent Dropout, Early Stopping, Gradient Clipping, Dynamic batching etc. to overcome the problem like over fitting and Network lately converge in deep learning model.
- Trained Bidirectional LSTM CNNs network for Named-Entity-Recognitions.
- Worked on Natural Language pipeline to structure text from various sources.
- Written different shallow parsing rules to extract various entities from the user questions.
- Ensemble Deep learning model, CRF model and NLP techniques to improve the model results.
- Written complex aggregation Elastic search query as per user questions using Python Query DSL.
- Worked on data ingestion activities from different sources (S3 server, hive) for Elastic search using Logstash and ES-Hadoop connector.
- Implemented language translator utility in chatbot to convert German to English language question & answers.
Environment: Python 3.x (Scikit-Learn/Scipy/Numpy/Pandas/Matplotlib/Seaborn), R Studio (ggplot2/ shiny/ httr), Tensorflow, AWS RedShift, EC2, EMR, Hadoop Framework, HDFS, Spark (Pyspark, MLlib, Spark SQL), Agile/SCRUM
Data specialistConfidential
Responsibilities:
- Involved with Business Analysts team in requirements gathering and in preparing functional specifications and changing them into technical specifications.
- Created incremental refreshes for data sources on Tableau server.
- Responsible for resolving errors, generating client payment data, transmitting data to clients and ensuring conflict resolution utilizing MS Access, MySQL, Perl scripting and other various tools.
- Wrote PL/SQL statement and stored procedures in DB2 for extracting as well as writing data.
- Worked extensively on SQL querying using Joins, Alias, Functions, Triggers, and Indexes.
- Developed mappings in Confidential Infosphere Datastage to load the data from various sources includes SQL server, DB2, Oracle, Flat files into the Data Warehouse.
- Constructed triage script to troubleshoot problems when business users request support tickets from help desk for SAP BW and SAP BO teams to achieve 85% first call resolution
- Generated reports on SAP ERP to obtain critical business data
- Performed Transformation on the data obtained from SAP ERP and Web Intelligence Reports to maintain uniformity within the data before loading it onto the cube
- Created knowledge base for critical SAP BW errors to improve the mean time for resolution
- Structured and executed all aspects of the modules case, customer, provider and finance .
- Extended Support developing business cases - align with the Customer Engagement, Loyalty, Marketing, Revenue Management (Finance) team on data and analyses as needed to help determine priorities based on business impact.
- Streamlined the business processes and procedures by addressing identified problems and gaps to improve efficiency.
- Source data analysis/cleansing/profiling from multiple source systems and identify key data elements, structures. Design and implement analytics functionality for Business end users.
- Worked on Data Warehousing, data modeling, schema development for analytics.
- Integrated multiple business areas to achieve consolidated view.
- Ownership of cross team issues and delivery issues including pro-actively communicating across teams to coordinate activities.
- Created DDL scripts for implementing Data Modeling changes. Created naming convention files, co-coordinated with DBAs’ to apply the data model changes.
- Used forward engineering to create a Physical Data Model with DDL that best suits the requirements from the Logical Data Model.
- Maintaining and implementing Data Models for Enterprise Data Warehouse using ERWIN r9
- Create and maintain Metadata, including table, column definitions
- Synched up the models by reverse engineering, compare model and merge model from database to the original models
- Responsible for defining the naming standards for Data warehouse, Data Governance team.
- Used Erwin for reverse engineering to connect to existing database and ODS to create graphical representation in the form of Entity Relationships and elicit more information
- Designed ER diagrams, logical model (relationship, cardinality, attributes, and, candidate keys) and convert these to physical data model including capacity planning, object creation and aggregation strategies, partition strategies, Purging strategies according to business requirements.
- Developed Data models and ERD diagrams using Erwin.
- Data analysis of existing database to be aware of the data flow and business rule used on different databases by SQL.
- Included in all phases of projects including Requirement gathering, Analysis, Design, Coding, Testing, Documentation and warranty period.
- Built customer obsession Ticket Assistant system, a tool that leverages machine learning and natural processing Technique.
- Have a good experience on text data mining using natural language processing (NLP).
Environment: SQL Server, SSIS, AWS (EC2, S3, RedShift, EMR), Python (Scikit-learn), Machine Learning (KNN, Logistic Regression, SVM), Tableau, JIRA, GitHub, ETL, Spark, Hadoop, R, Shiny App, and Linux.