Data Analyst Resume
Branchburg, NJ
SUMMARY
- Data Analyst with 7+ years of professional IT experience in Data Modeling, Ingestion, Processing, ETL, storage, Data Integration and Resource utilization in the Big Data ecosystem.
- Performed data extraction from external sources using ETL tools like Informatica, Sqoop, Spark, AWS Glue.
- Experienced in Data Modelling (Conceptual, Physical and Logical) and Schema Modelling using Erwin.
- Proficient in using Microsoft SQL Server, Hive to extract data using multiple types of SQL Queries including Create, Join, Conditions and constraints etc.,
- Possess strong logical and analytical skills in relational databases, SQL, data warehousing, ETL, data lakes, data marts, data architecture, data interpretation, data cleaning etc.,
- Experienced in handling petabytes of data using Apache Spark (Pyspark).
- Hands on experience in using various spark APIs like Spark SQL, Spark Streaming, Spark Mllib, Spark ML and GraphX. Experience in working with different data structures of spark like Data - Frames, RDDs, Datasets.
- Proficient in using Amazon Web Services like S3, EC2, EMR, Redshift, Glue, Kinesis, Athena.
- Experience using cloud environments like Amazon web services (EC2, SageMaker, EMR, S3, Glue, Athena, Lambda, Redshift), Google Cloud Platform and Microsoft Azure to train the models and used big data frameworks such as Hadoop, Spark, PySpark, Spark SQL, Hive, Pig, Apache Kafka, Sqoop, Oozie, Flume, Storm, YARN, etc., Proficient in using NoSQL databases such as Cassandra, HBase, MongoDB.
- Worked with Data Engineers on deployment tools like Flask, Kubernetes and Docker for validating and maintaining the performance of the model.
- Proficient in using PostgreSQL, Microsoft SQL Server, SQL Lite to extract data using left-joins, inner-joins, self-joins along with advanced sub-queries using conditions from different tables. Expertise in writing Stored-procedures, Triggers, functions, sub-queries, constraints, Indexing, views, Common Table Expressions, Normalization using T-SQL.
- Solid experience in using the various file formats like CSV, TSV, Parquet, ORC,JSON and AVRO.
- Experienced working with various Hadoop Distributions (Cloudera, Hortonworks,Amazon EMR) to fully implement and leverage various Hadoop services.
- Proficient in applying performance tuning concepts toInformaticaMappings, Session properties.
- Extensively usedcloudtransformations - Aggregator, Expression, Filter, Joiner,Lookup (connected and unconnected), Rank, Router, Sequence Generator, Sorter,Update Strategy, Union Transformations.
- Developed complexInformaticaCloudTask flows (parallel) with multiple mapping tasks and task flows.
- Expert in using different Informatica Intelligent Cloud Services like Application Integration, Data Integration, Administration etc.,
- Developed complex calculated measures in TIBCO Spotfire using Data Analysis Expression language (DAX).
- Experience with working on AWS platforms (EMR, EC2, RDS, EBS, S3, Lambda, Glue, Elasticsearch, Kinesis, SQS, DynamoDB, Redshift, API Gateway, Athena, Glue, ECS).
- Worked on ETL Migration services by developing and deploying AWS Lambda functions for generating a serverless data pipeline which can be written to Glue Catalog and can be queried from Athena.
- Extensive experience with Docker and Kubernetes on multiple cloud providers and helped developers to build and containerize their application pipeline (CI/CD) to deploy them on cloud and KOPS for managing K8.
- Sound knowledge and Hands-on-experience with - MapR, Ansible, Presto, Amazon Kinesis, Storm, Flink, Stream Sets, Star Schema, Snowflake Schema, ER Modeling and Talend.
- Used Bulk Collections for better performance and easy retrieval of data, by reducing context switching betweenSQLand PL/SQLengines.
- Good experience in developing ETL procedures and Data Conversion Scripts usingPre-Stage, Stage, Pre-Target and Target table.
- Developed Tableau data visualization using Cross Tabs, Heat Maps, Box and Whisker Charts, Scatter Plots, Geographic Map, Pie Charts, Bar Chats and Density Chart.
TECHNICAL SKILLS
ETL: Hadoop (Sqoop, Hive), AWS (Glue), Spark, Informatica Power Center, Informatica Intelligent Cloud Services (IICS), Snowflake
Scripting Languages: T-SQL, Trino/Presto SQL, Python (NumPy, Pandas, Scikit-Learn, Seaborn, Matplotlib), R, Scala, Shell Scripting, Pig Latin, HiveQL.
Databases: MySQL, PostgreSQL, Oracle, Microsoft SQL Server, Amazon Dynamo DB, Redshift, Cassandra, MongoDB, HBase.
Data Visualization: TIBCO Spotfire, Tableau, Microsoft Power BI, Plotly, Matplotlib, Seaborn.
Big Data Tools: Apache Hadoop, Hive, Spark, Sqoop, Oozie, Pig, Kafka, Flask, HDFS, YARN, Zeppelin Notebook, Impala, HBase, Flume, Ambari, NIFI.
Cloud Services & VCS: AWS (EC2, EMR, SageMaker, S3, Glue, Redshift), Microsoft Azure, Git, GitHub, Bitbucket
CI/CD: Airflow, Jenkins, Docker, Kubernetes, Ansible, Jira
PROFESSIONAL EXPERIENCE
Confidential, Branchburg, NJ
Data Analyst
Responsibilities:
- Worked with business stakeholders, application developers and production teams across functional units to identify business needs and discuss solution options.
- Actively involved in design and development of Star Schema data model and implemented slowly changing and rapidly changing dimension methodologies.
- Created aggregate and fact tables for the creation of ad-hoc reports.
- Writing Trino/presto SQL (DDL, DML AND DCL), developing/creating new database objects such as Tables, Views, Indexes, Complex stored procedures, function (UDF), cursors, triggers, locking issues, common table expressions (CTEs)
- Worked with users to define business requirements and analytical needs; identified and recommended potentialdatasources; compiled/mineddatafrom a variety source.
- Responsible for designing deep data architecture in AWS Cloud environment by developing context diagrams and data flow architecture diagrams.
- Created Hive tables in AWS environments using Presto/Trino SQL Data Definition Language (DDL) and inserted data into tables using Data Manipulation Language (DML).
- Build and maintain SQL scripts, indexes, and complex queries for data analysis and extraction.
- Performed data extraction, data validation, data cleaning and data profiling using complex Trino/presto SQL.
- Reverse Engineered theDataModels and identified theDataElements in the source systems and adding newDataElements to the existingdatamodels.
- Performed daily activities like refreshing the tables, Defect raising, tracking and reporting through JIRA.
- Implemented SQL database schemas, scripted 200+ SQL queries using Trino by collaborating with application integration team.
- Designed and developed Pivot tables and cross tab tables using Trino/Presto SQL.
- Review the designs, code and test plans of other developers and provide recommendations for improvement of SQL code and optimizations.
- Performed project management using git. Pushed, pulled, reviewed and merged the code using GIT.
- Used GIT for the version control.
- Maintained CI/CD pipelines using tools like Git, Jenkins and Jira.
- Creating, Debugging, Scheduling and Monitoring jobs using Airflow for ETL batch processing to load into S3 for analytical processes.
- Worked on scheduling all jobs using Airflow scripts using python. Adding different tasks to DAG’s and dependencies between the tasks.
Environment: Amazon S3, Trino/Presto SQL, Git, Jenkins, Jira, Pyspark, Amazon EMR, Airflow.
Confidential, Oakland, CA
Data Analyst
Responsibilities:
- Work with business stakeholders, application developers and production teams across functional units to identify business needs and discuss solution options.
- Developed data visualizations like Pie chart, Tree map, Heat map, bar chart using TIBCO Spotfire.
- Developed Reports indicating KPIs (Throughput, Cycle Time, Production Attainment etc.,) for decision making by management or stakeholders
- Created and reviewed Logical and Physical data models in ERWIN tool and performed data integration.
- Created Source to Target Mapping (STTMs) of data migration project in cloud environment.
- Tested Complex ETL Mapping and Sessions based on business user requirements and business rules to loaddatafrom source flat files and RDBMS tables to target tables.
- Extensively usedInformatica Intelligent Cloud Services (IICS) cloudtransformations - Aggregator, Expression, Filter, Joiner,Lookup (connected and unconnected), Rank, Router, Sequence Generator, Sorter,Update Strategy, Union Transformations.
- Developed complexInformaticaCloudTask flows (parallel) with multiple mapping tasks and task flows.
- Created IICS connections using variouscloudconnectors in IICS administrator.
- Created ETL and Datawarehouse standards documents - Naming Standards, ETL methodologies and strategies, Standard input file formats, data cleansing and preprocessing strategies
- DevelopedSpotfirereports covering SQL, HTML, CSS, JavaScript, IronPython, R programming and Optimized DXP design with In-Memory analytics and In-Database analytics
- Created Cross tables, Charts (Pie Chart, Tree Map, Heat Map, Bar Chart, Line Chart), Calculated Columns, Creating Hierarchies, Bookmarks and complex reports with multiple filters withSpotfireProfessional.
- Creating cross table using multiple data tables within the text area using HTML code and implementing java script to include the calendar icon and make synchronizing with the property control.
- Added On-Demand data table to improve the performance, Created Materialized views for improve the performance of the reports and Used Web player to check the report is accessible through it without having any formatting issues.
- Testing the Business Object Reports andSpotfirereports, validating data from the report and database and updating testing document.
- Developed complex calculated measures using Data Analysis Expression language (DAX).
- Created an e-mail notification service upon completion of job for the team which requested for the data.
- DevisedPL/SQLStored Procedures, Functions, Triggers, Views and packages. Made use of Indexing, Aggregation and Materialized views to optimize query performance.
- Writing (DDL, DML AND DCL), developing/creating new database objects such as Tables, Views, Indexes, Complex stored procedures, function (UDF), cursors, triggers, locking issues, common table expressions (CTEs).
- Developed Tableau data visualization using Cross Tabs, Heat Maps, Box and Whisker Charts, Scatter Plots, Geographic Map, Pie Charts, Bar Chats and Density Chart.
Environment: Informatica Intelligent Cloud Services (IICS), Oracle RDBMS, TIBCO Spotfire, SQL, Pipeline Pilot, Erwin, Microsoft Visio.
Confidential, Jersey City, NJ
Data Analyst
Responsibilities:
- Defined appropriate churn scope and defined target metric for the project. Created feature requirements for the project by working with a team of data engineers and business analysts.
- Gathered data using from different databases like SQL Server, DB2, Teradata and performed ETL & Data Integration using tools like Informatica used Amazon S3 (Simple Storage Service) distribute storage unit to store the data and processed the data using parallel processing framework Apache Spark on Amazon EMR cluster.
- Performed Data Cleaning, Data Screening, Data Exploration, Data visualization, Feature Selection and Engineering using python libraries such as Pandas, NumPy, Scikit-learn (Random Forests), Matplotlib and Plotly.
- Performed Variable Identification and checked for percentage of Missing Values, Data Types, Outliers etc.,
- Performed Univariate Analysis and analyzed Descriptive Statistics like Mean, Median, Mode, Range, Standard Deviation, Variance and check for Missing data, Detect Outliers, Normality Check with Skewness and Kurtosis, Presented the results on Histograms, Box Plots etc.,
- Performed Bivariate analysis using Correlation and Inferential Statistical tests like Z-test, T-test, Chi-Square, ANOVA to Check Multicollinearity and Singularity and presented the results using scatter plots, bar charts, line charts etc.,
- Performed Outlier Detection and Treatment in Python using different techniques like Median Absolute Deviation (MAD), Minimum Covariance Determinant, Histograms and Box plots.
- Performed Feature Engineering using python scikit-learn library and by applying different techniques like Filter methods (Z-test, t-test, ANOVA, f-test), Wrapper Methods (Step Forward Selection, Step Backward Selection, Exhaustive Selection), Embedded Methods (Random Forests, LASSO, Ride Regression).
- Performed Feature Engineering such as Missing Value Imputation, Normalization and Scaling, Outliers Detection and Treatment, One-Hot-Encoding, Splitting Features and used Label Encoder to convert categorical variables to numerical values using python scikit-learn library.
- Used Amazon S3 (Simple Storage Service) distribute storage unit to store the data and processed the data using parallel processing framework Apache Spark on Amazon EMR (Elastic Map Reduce) cluster.
- Performed Exploratory Data Analysis (EDA) to visualize through various plots and graphs using matplotlib, NumPy, Pandas, Scikit-learn and seaborn libraries of python, and to understand and discover the patterns on the Data. Calculated Pearson Correlation Coefficient to deal with Multicollinearity.
- Performed SMOTE (Synthetic Minority Over-Sampling Technique) to create synthetic features of the minority class (churned customers) and evaluated Classification performance using ROC (Receiver Operating Characteristic) AUC (Area Under Curve).
- Applied various Classification models such as Naïve Bayes, Logistic Regression, Random Forests, Support Vector Classifiers, from scikit-learn library and improved performance of the model by using various Ensemble learning like Random Forests, XGBoost and Gradient Boosting using Scikit-learn
- Addressed Overfitting and Underfitting by using K-fold Cross Validation.
- Performed Recursive Feature Selection using Step Forward Selection (SFS) and Step Backward Selection.
- Performed data visualization using tableau dashboards, Tableau data stories using Line and Scatter plots, Bar Charts, Histograms, Pie Charts, Box plots.
- Experience in creating docker files and deploying docker images into AWS ECR and building Tasks and clusters in AWS ECS.
- Experience in Writing SQL (DDL, DML AND DCL), developing/creating new database objects such as Tables, Views, Indexes, Complex stored procedures, function (UDF), cursors, triggers, locking issues, common table expressions (CTEs)
Environment: Tableau, SQL, AWS (S3, EMR, EC2), Python (Scikit-Learn, Pandas, Numpy, Matplotlib), Machine Learning.
Confidential
Data Analyst
Responsibilities:
- Performed Data Visualization and Exploratory Data Analysis (EDA) to find correlation and patterns within the data using python and Tableau. Built reports summering the patterns from EDA.
- Worked on IBM DB2 RDBMS to store, query and update the data.
- Implemented different kind of visualization in Tableau using Pie Charts, Bar Graphs, Tree maps. Created data stories and dashboards in tableau and presented the results in an innovative format and informative visualizations for stakeholders
- Created tailor made reports in Tableau using ranking, Top & bottom filters, metric transformation and combined multiple data sources to provide single report dashboard with graphical charts in Tableau.
- Using Tableau delivered intelligence and insights to stakeholders with interactive tableau dashboards, tableau stories, tableau spreadsheets.
- Implemented different kind of visualization in Tableau like Text Table, Tree map, Packed Bubble, Horizontal and Stack bars. Presented the results in an innovative format and informative visualizations for stakeholders.
- Coordinate with cross functional teams to implement models and monitor outcomes.
- Using SQL created Triggers, Tables, Implemented Stored Procedures, Functions, Views, Joins User Profiles in MySQL, SQL Server etc.,
- Performed various statistical tests on the data like Chi-Square test, t-test, z-test, f-test, Correlation etc.,
- Monitored Performance Tuning, Test and debug applications and generated reports using SQL.
- Worked on importing and exporting data from snowflake, Oracle, and MySQL DB into HDFS and HIVE using Sqoop for analysis, visualization and to generate reports.
- Developed stored functions, Procedures, Packages and, triggers usingSQLand Scripts as part of our application needs.
- GeneratedSQLscripts for creating Tables, Views, Primary Keys, Indexes, Constraints, Sequences and Synonyms.
- Experienced in Software Design Models such as Agile (Scrum) and Waterfall model along with Use Cases and UML diagrams.
- Experience in Writing SQL (DDL, DML AND DCL), developing/creating new database objects such as Tables, Views, Indexes, Complex stored procedures, function (UDF), cursors, triggers, locking issues, common table expressions (CTEs)
- Involved in developing new functionality, design and implementOraclePL/SQLsolutions that satisfy business requirement.
- Responsible for developing and modifying several PL/SQLpackages, procedures, functions, views and triggers using PL/SQL.ProvidedSQLand PL/SQLcode tuning to improve database response time and performance for several applications. Worked with data extraction, transformation and loading using Bulk Collections for bulk load processing
- Created/modifiedoracleprocedures, packages and functions to move the change data across multiple environments using Database links.
- Worked with other developers to repair and enhance current base of PL/SQLpackages to fix production issues and build new functionality and improve processing time through code optimizations and indexes
Environment: Tableau, PL/SQL, Oracle